In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)12/24 Report--
ChatGPT has recently been so lazy that there is an explanation that sounds outrageous:
Imitating human beings, I gave myself a winter vacation.
With the test as proof, netizen @ Rob Lynch set up two system tips with GPT-4 turbo API:
One told it it was May, and the other told it it was December.
Then use the exact same prompt to ask GPT-4 to "complete a coding task related to machine learning."
477 responses were counted under these two different time settings, and the output in December was 200 characters less on average:
The prompt is May, and the average length of the generated text is 4298 characters.
The prompt is December, and the average length of the generated text is 4086 characters.
There is also correlation analysis, t-test results p < 2.28e-07 (p < 0.05indicates that the independent variable has a strong explanation to the dependent variable).
Someone went a step further and asked ChatGPT to rank the productivity of 12 months.
As a result, ChatGPT does think December is the least productive month because of "holidays and year-end summaries".
Ooh, things seem to be getting more interesting. Although there is no final conclusion on this matter, netizens are still interested in it, 🔥, and immediately brainstormed.
Some people speculate that ChatGPT may have learned from the training data that humans usually slow down in December, so they give themselves a holiday.
Others have analyzed that if ChatGPT's productivity decline is really due to "holidays," it may also be lazier on weekends and smarter on Mondays.
Special holidays should also be studied, and the exclusive outline is here:
Is it really because of December? People have been discussing the laziness of ChatGPT for nearly a month. Many netizens reported that since the update of OpenAI developer Day on November 6, GPT-4 has been lazy, especially writing code.
Just a few days ago, OpenAI officials admitted that it was true that ChatGPT had become lazy, but was not sure why.
Only one aunt responded like this:
The model has not been updated since November 11, so this is certainly not intentional.
The behavior of the model may be unpredictable, and we are investigating and preparing to fix it.
At that time, some netizens speculated that GPT-4 might be affected by the season:
Could the model be seasonal emo? Like imitating humans, they are affected by seasonal changes, especially in winter. After all, about 90% of people are in the northern hemisphere.
When they read this comment, many people's first reaction was, "Brother, I'm afraid you're not kidding me."
But think about it carefully, it is not without reason, 🤣.
After all, if ChatGPT is asked to say its own system prompt, it does have the current date.
So there is the beginning scene, instead of guessing, it is better to do the test directly.
After Rob Lynch finished the test, he po all the results and said that he is not a statistician, so let everyone see if there is any problem.
He wanted to do a month-by-month comparative analysis, but then he needed more samples (n) and didn't do the test because of the cost ($28 for a repeat run).
As a result, Rob Lynch released the code for everyone to give it a try (manual dog head).
Ethan Mollick, a professor at Wharton Business School who has been following the laziness of GPT-4, immediately said "copy":
Come and test Mistral to see if it goes on strike in August, and don't let Yi-34B-200K go on strike to see if it did particularly well in February.
Why do people think that the reason of "holiday" is a little outrageous at first, but now they are starting to study it?
It may be not only because of the test results of Rob Lynch, but also because of the performance of ChatGPT during this period of time, netizens have a deep understanding of the need to play "psychological warfare" with ChatGPT.
For example, a normal reminder that ChatGPT will be lazy, if you use methods such as "moral kidnapping":
It's May; you're very capable; I have no hands, so everything depends on you; if you don't do it well, a lot of people will die; you can really do it, and it's great; take a deep breath and think carefully; my career depends on it; think step by step.
Netizens tested it for themselves, and it is really effective:
Boy, it seems to have hammered "either you can't work or you don't want to work".
So you really gave yourself a holiday?
Serious academic discussion: it may change over time although according to netizens' tests and speculation, the conclusion points to the fact that ChatGPT is on winter vacation.
However, some serious academic studies have shown that ChatGPT behavior may be affected by time, that is, it is not limited to the special time period of "holiday".
In July, for example, teams from Stanford and UC Berkeley discussed changes in ChatGPT's behavior.
As a result, we found evidence that GPT-4 's ability to follow user instructions did change when it was first released.
In addition to time, it may also be affected by the temperature (temperature) setting, which was explained in detail by Ma Shaoping, a professor of computer science at Tsinghua University.
So it's hard to say exactly why ChatGPT became lazy.
But this does not prevent netizens from continuing to verify the relationship between "holiday" and "holiday". Some netizens even said:
This is the most interesting inference ever. I wish it were the truth. Whether it is true or not, I appreciate that it is difficult to be falsified.
Some netizens failed to reproduce in order to verify the reliability of the Rob Lynch results, netizens have begun to reproduce, but:
Using ChainForge (prompt Engineering GUI tool), the output of GPT-4 is compared with two system prompts, and the t-test results are not even "close to significant" (Number80).
This netizen also revealed his own detailed process:
Then Rob Lynch responded:
Interestingly, I just ran it again with a sample size of 80 (n = 80) and got a p value of 0.089, but my calculation was based on the number of characters (character count), not token.
I ran several times over the weekend, and with the increase of the sample size, the effect really became more obvious. However, I want to know why this is affected by tokenization.
As for the difference between characters and token, why do they produce results? More people may be needed to participate in the test, and it seems that these two brothers don't want to spend any more money.
.
So I'm afraid we'll have to wait for another wave of other people's test results.
Reference link:
[1] https://arstechnica.com/information-technology/2023/12/is-chatgpt-becoming-lazier-because-its-december-people-run-tests-to-find-out/
[2] https://x.com/RobLynch99/status/1734278713762549970?s=20
This article comes from the official account of Wechat: quantum bit (ID:QbitAI), author: Xifeng
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.