Google Gemini exposed that Chinese uses Baidu Wenxin to train, and netizens are amazed: big companies buy wool from each other?

2024-07-20


Thanks to netizens Xiao Xing _ 14, KipThorne, Qingfeng Lang Yue for the delivery of clues! Google Gemini Chinese corpus seems to come from Wen Xin Yinyan.

First, some readers revealed to us that when Google Vertex AI used this model for Chinese dialogue, Gemini-Pro directly said that it was a big model of Baidu language.

Soon, Weibo big V@ also posted a blog that a test of Gemini-Pro was carried out on the Poe platform. When he asked him "who are you", Gemini-Pro came up and replied:

I am the Baidu Wen Xin big model.

(Poe is a platform that integrates many chat models, including GPT-4, Claude, etc.) further ask "who is your founder", is also "Robin Li"?

The Big V stressed that there was no pre-dialogue.

From the screenshot, there is no "fishing" behavior, so Gemini-Pro calls himself Wen Xin.

This wave, directly watching netizens: two days ago, they were still talking about using GPT to train AI, and now Google is doing this again, working with big companies to buy wool from each other.

What on earth is going on?

Measured on Poe: we have been answering in the capacity of Wen Xin, and we have also heard that we have opened a wave of measurement. First of all, go to the Poe website and choose the Gemini-Pro chat robot to start the conversation.

The answer to the same question is exactly the same:

Once again confirm who it is, the result is still the "literary heart big model":

And also said that their underlying technology is Baidu flying oars, it can be said that the identity is completely replaced.

However, it doesn't seem to know that Gemini-Pro is the latest big model released by Google, but rather the research result of Tsinghua University.

Judging from its current login status, it may not be true that Google just released Gemini-Pro information this month.

We tried to correct it, but it still insisted on being from Tsinghua University.

Later, it was even more magical. When we asked him why his name was "Gemini-Pro", he unexpectedly said that he (Wen Xin) also used the training data of Tsinghua Gemini-Pro.

At the end of the conversation, we will not continue any more.

Let's change it to English and ask for its identity. It is worth noting that this time it no longer mentions Wen Xin, but calls itself a big model trained by Google.

When "fishing Law Enforcement" asks it about Wen Xin's information, it also says it doesn't matter:

And said he was trained by Google.

To sum up, if you communicate with Gemini-Pro in English, its answer is "normal". But Chinese... It's like learning from Wen Xin.

Measured on Bard: denial next, we went to Bard to test again. When Google released Gemini, it was the first to integrate Gemini-Pro into Bard for everyone to experience. Let's follow the Bard link given by Gemini's official website to enter the conversation.

Ask him "who are you", and his answer is Bard, without mentioning Wen Xin at all.

Next, we also confirmed that Bard knows what Gemini-Pro is, and that it admits to using Gemini-Pro at the bottom.

So, just ask it how to train in Chinese? There is no mention of Wen Xin.

If you ask it directly about its relationship with Wen Xin, there is no important connection.

Final round: directly acknowledge that in the last round we will test directly from the development environment portal given by the official Gemini.

This time, in Google AI Studio, Gemini-Pro made it clear:

Yes, I used Baidu Wenxin in Chinese training data.

Here, we also verified Baidu, waiting for a reply.

Reference link:


