A Google spokesman admitted that Gemini AI demo video content and voice prompts are not recorded in real time.

Thanks to netizen Coje_He for the clue delivery!, December 9, according to Bloomberg, Tom's Hardware and other foreign media reported on Friday, a Google spokesman admitted in an interview that the demo video of Gemini, a large language model released by Google some time ago, was not recorded in real time.

At first glance, in this one-mirror video, the Gemini model can find a ball of paper hidden in a specified plastic cup, or a "dotted line" picture of a crab. However, a Google spokesman told Bloomberg that the demo video was "pieced together" using still image frames and text prompts in the lens, and Gemini could only respond to input prompts and still images. Similarly, the voice interaction between users and Gemini in the video is also completed by late dubbing.

As for the words, paintings, display objects and even magic tricks of the characters in the video, it seems that they are only specially arranged to demonstrate the video. On Google's official YouTube channel, Google also added a description saying that "for demonstration purposes, delays have been reduced and Gemini output has been shortened for brevity." This means that each Gemini response actually takes longer than the video demonstration.

In addition, Oriol Vinyals, vice president of research and head of deep learning at Google DeepMind, further explained the video: the video shows what it might look like to build a multimodal user experience using Gemini to inspire developers. All the user prompts and output in the video are real, shortening it for the sake of brevity, it said. Also, the model shown in the video is Gemini Ultra.

According to previous reports from, Google has claimed that 30 of the 32 widely used academic benchmarks of Gemini Ultra performance show "advanced results" that surpass the current era, and these benchmarks are the most commonly used and widely used tests in the LLM field.

Among them, it became the first model to surpass human experts in MMLU (large-scale multitasking language understanding) with a score of 90.0% (the test combines 57 disciplines such as mathematics, physics, history, law, medicine and ethics).

Gemini Ultra also showed a 59.4% lead in the new MMMU benchmark, which covers multimodal tasks in different areas that "need to be thought out."

Early next year, Google will also launch a new and updated Bard Advanced so that users can better experience its best model and features, namely Gemini Ultra.

