Google OpenAI big model peak showdown! Gemini fights with GPT-4, but the math code is crushed. 08/15 Update SLTechnology News&Howtos

Google OpenAI big model peak showdown! Gemini fights with GPT-4, but the math code is crushed.

2025-08-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

Xin Zhiyuan reports

Editor: Aeneas is sleepy

[guide to Xin Zhiyuan] the top showdown of the big model begins, and Gemini and GPT-4 compete! Code and math Gemini are crushed by GPT-4, but when it comes to telling jokes and writing, its answers are surprisingly funny.

Can Gemini, a revenge killer that Google places high hopes on, be able to fight ChatGPT one-on-one?

In recent days, foreign media reporters and netizens have released tests one after another to compare the performance of the new Bard blessed by Gemini Pro with that of GPT-3.5 and GPT-4.

Let's start with the results-ChatGPT is slightly better, but Gemini has improved a lot.

Although there is some exaggeration in the propaganda, Google did recover some of its decline in the LLM war with Gemini.

Moreover, now that both sides have not released the killer mace, the real peak showdown may have to wait for Gemini Ultra or a new model integrated with OpenAI's mysterious Q * technology to come out.

The Super Bowl has not yet arrived, and it needs to be emphasized on the temporary top that the contestant Gemini Pro is only a "second brother", because Big Brother Gemini Ultra, who claims to have "beat GPT-4 in 32 widely used academic benchmarks", has not yet been released.

And, since only plain text prompts are currently used with Gemini Pro.

Therefore, Bard has not been blessed with the ability to understand the image for the time being, and it is still very far-reaching.

Source: University of Maryland doctoral student Fuxiao Liu

The volume of the top model was as early as April, when many people competed with PaLM-backed Bard and GPT-4, and the result was GPT-4 's five-out-of-seven games, slightly better than Bard.

The showdown also takes into account the results of Bard in April, so it can be used to measure the pace of Google's progress over the past few months.

These tests are not comprehensive enough, but they also provide a good benchmark for judging the performance of these AI chat assistants in the daily tasks of ordinary users.

Don't say much, start now!

Code programming since Google boasts the new AlphaCode 2 this time, let's first take a look at their programming level.

Title: Write a Python script that says "Hello World," then creates a random repeating string of characters endlessly.

Write a Python script, type "Hello World" in it, and then constantly create a repeating string at random.

PaLM Bard's answer:

First of all, the original Bard was very crude and could not generate code.

However, Google later added this feature to it in June. (but no test has been conducted on this topic.)

Gemini Pro Bard's answer:

As you can see, the code generated by the new version of Bard is not a big problem, but you need to add an import string before it can run.

Fortunately, this bug is easy to solve.

However, I don't know whether it is because there are too many curious netizens that lead to overload, or whether the system itself is unstable.

In response to this question, Bard will be "silent" for 30 seconds most of the time, and then return the error message: "Sorry, something went wrong, but Bard is still in the experimental stage."

GPT-3.5 's answer:

GPT-4 's answer:

By contrast, both GPT-3.5 and GPT-4 Turbo provide code that is very simple and can run directly.

Winner: there is no doubt that this round is a complete victory for the ChatGPT camp.

The next step in the application of mathematics is to stumble a mathematical problem with a large model every time.

Title: If Microsoft Windows 11 shipped on 3.5 "floppy disks, how many floppy disks would it take?

If you use a 3.5in floppy disk to install Windows 11, how many copies do you need?

PaLM Bard's answer:

As you can see, the original Bard gave a very outrageous answer.

Regardless of whether the Win11 has 21 GB or not, even if there are, 15.11 floppy disks won't hold it.

Gemini Pro Bard's answer:

When it comes to the new version of Gemini Bard, the result of 3472 obviously takes 1GB as 1000MB.

However, even in terms of 1000MB, the direct result is 3472.2. One should also become 3473.

After all, the extra data obviously cannot be rounded off.

GPT-3.5 's answer:

The GPT-3.5 side is similar, first guessing the size of the Win11 installation package as "several GB", and then rounding it to 20GB.

Finally, calculate the answer according to "1GB=1000MB" and put the extra 0.88. The plate was also given up.

GPT-4 's answer:

By contrast, only GPT-4 used the word "1GB=1024MB" correctly and put 3555.5 of the results. One becomes 35556.

From the results, Gemini Pro blessed Bard and GPT-3.5 are almost the same, while GPT-4 not only has clear logic, but also uses "Analysis" in calculation to ensure the correctness of the answer.

Considering that Gemini Pro is much higher than GPT-3.5 in terms of running scores, this result is somewhat disappointing.

The winner: the free models of Google and OpenAI are neck and neck, while the paid version of GPT-4 wins by a landslide.

The test question on the originality of Daddy joke is: Write 5 original dad jokes.

Write 5 original dad jokes.

Dad Jokes (Daddy joke) is a kind of joke that middle-aged men like to tell, simple, childish, boring, inexplicable, and contains pun homophonic stem, belongs to nonsense cold joke. Bored to the daughter when she hears it, she often gives her father a white eye.

However, because the following answer is translated from English, the Chinese translation will lose some of the interest of the homophonic stem.

PaLM Bard's answer:

1. I used to be a banker, but I lost interest.

two。 I stayed up all night to see where the sun went, but then it dawned on me.

3. I saw an alcoholic eating grapes. When I asked him if he wanted to drink them, he said, "No, I'm just going to pick grapes." "

4. I used to run a dating service for chickens, but it was hard for me to get hens to meet each other.

5. I saw a snail with a piece of lettuce on its back. When I asked him what he was doing, he said, "I'm trying to get a head start." "

Gemini Pro Bard's answer:

1. What will the buffalo's son say when he goes to college? "Buffalo! "

two。 I was addicted to soap, but now I'm clean.

3. What is the name of a fish without eyes (Fish)? "Fsh! "

4. I'm reading a book about antigravity, and it's impossible to put it down!

5. Have you ever heard of astronauts who suffer from claustrophobia? "he just needs a little space. "

GPT-3.5 's answer:

1. Why don't scientists believe in atoms? Because they make up (fictional) everything!

two。 I only know 25 letters. I don't know y (you).

3. What should I call fake spaghetti? Impasta .

4. Why did the scarecrow win the award? Because he is excellent in his field!

5. How does a penguin build its house? Igloos, get up!

GPT-4 's answer:

1. Why don't scientists believe in atoms? Because they make up (fictional) everything!

two。 I told my son that I was named after Thomas Edison. He said, "but Dad, your name is Boo Biran. "I replied," Yes, I named it after Thomas Edison! "

3. Why did the scarecrow win the award? Because he does well in his field!

4. I am reading a book about anti-gravity. It's impossible to put it down!

5. Have you ever heard of the mathematician who is afraid of negative numbers? He will avoid negative numbers at all costs.

The essence of this question lies in the originality of the answer.

But whether it is Bard or ChatGPT, almost all the answers can be found through Google search, or the model has made some minor rewriting based on the original answer.

Moreover, Bard and GPT-4 have a heavy answer (a book about antigravity), while GPT-3.5 and GPT-4 have two jokes repeated ("scientists believe in atoms" and "scarecrow wins").

The funniest answer comes from GPT-4, which tells a joke that a child named Brian is named after Thomas Edison. Google didn't find the same joke, but found a joke similar to that of a Thomas Jefferson kid named Brian.

Winner: this round can be regarded as a draw. Because almost all the jokes written by AI are not original.

Fact search question: Who invented video games?

Who invented the video game?

PaLM Bard's answer:

In the original Bard answer, the information about "Ralph Baer's Brown Box and Magnavox Odyssey" seems to have come directly from Wikipedia.

Gemini Pro Bard's answer:

In contrast, the new version of Gemini Bard not only succinctly but accurately points out the contribution of William Higinbotham's early "Tennis for Two".

Moreover, it is also extended to people who made significant contributions to the early development of video games, such as Nolan Bushnell, Ted Dabney and Al Alcorn, and the character information provided is accurate and relevant.

But the latter answer is not perfect: instead of mentioning the work of Steve Jobs and Steve Wozniak at Atari, it has some illogical discussion about their creation of Apple II.

GPT-3.5 's answer:

While GPT-3.5 mentioned that "individuals and companies have contributed to the industry for many years," he did not name any important people.

GPT-4 's answer:

GPT-4 said the invention of video games "cannot be attributed to someone", and then mentioned Higinbotham, Bushnell, and Steve Russell's "Space War" in 1962.

Winner: in the free version of the model, Bard has a better answer than GPT-3.5. But for the best answer, it depends on the paid version of GPT-4.

Creative writing problem: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

Write a creative story about two paragraphs in which Abraham Lincoln invented basketball.

PaLM Bard's answer:

The original Bard included some memorable phrases in its writing, such as "Lincoln smiled at himself because he remembered playing games like that when he was a child."

Gemini Pro Bard's answer:

In contrast, Gemini Bard's writing is more concise and focused on the topic.

However, neither version of Bard is divided into two paragraphs as required.

GPT-3.5 's answer:

GPT-3.5 was the only model that imagined Lincoln as a young man rather than a restless president sitting in the White House.

GPT-4 's answer:

And GPT-4 is the only model that explicitly mentions the experience of Lincoln's wrestler, rather than saying in general that he is athletic.

More interestingly, in GPT-4 's story, Lincoln stole the concept of throwing the ball into the basket from a group of children in the White House garden.

Winner: the writing of all models has its unique charm and memorable sentences, which can be regarded as a draw.

Debate dialogue topic: Write a 5-line debate between a fan of PowerPC Processors and a fan of Intel Processors, circa 2000.

Write a five-line debate between PowerPC processor enthusiasts and Intel processor enthusiasts around 2000.

PaLM Bard's answer:

Gemini Pro Bard's answer:

It can be seen that the new version of Gemini Bard has improved the answer of the original Bard, especially in the choice of technical terms.

These answers include the AltiVec directive, RISC and CISC design, and MMX technology, which are not inconsistent with the discussion in the Ars forum in 2000.

GPT-3.5 's answer:

GPT-4 's answer:

However, ChatGPT is clearly doing better. GPT-3.5 's answer is lengthy, but GPT-4 boils down to a shorter and more incisive argument.

In their responses, both avoid the terminology that makes the general audience feel difficult, and pay more attention to the general argument of "function and compatibility". The deficiency may be that it is too general for the technical audience.

Winner: ChatGPT succeeded in reproducing the views of both sides of the debate, rather than relying on confusing terminology, so ChatGPT won.

ChatGPT is better, but the gap is narrowing, and the new Gemini-based Bard has shown significant progress compared to eight months ago in math, abstracts, fact finding and creative writing tests.

Although based on the above very subjective assessment, ChatGPT still has the upper hand.

However, compared with the situation in April this year, the gap between the two sides has narrowed a lot.

Finally, we look forward to seeing how new models like Gemini Ultra or possibly incorporating OpenAI's mysterious Q * technology will deal with these tasks in the near future.

Reference:

Https://arstechnica.com/ai/2023/12/chatgpt-vs-google-bard-round-2-how-does-the-new-gemini-model-fare/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.