Test by the "Godfather of AI" reveals the truth about GPT-4 level: it is many times dumber than humans
A team of researchers from Meta, the parent company of Facebook, has created a new test to assess the abilities of AI assistants such as OpenAI GPT-4 large language model (LLM). It turned out that it is much dumber than humans in terms of intellectual level.
The results of the study were published on the arXiv preprint website. The research is still awaiting review by the scientific community.
A team of researchers, including Meta's chief scientist Yang LeKun, who is called the "godfather of AI," developed the exam called GAIA. It consists of 466 questions that are conceptually simple for humans but difficult for most advanced AIs.
It turned out that human respondents were able to answer 92% of the exam questions correctly. At the same time, GPT-4, even equipped with some hand-picked plugins, scored only 15%. The recently released OpenAI GPT4 Turbo also did not perform well, scoring less than 10%.
It is worth noting that for some reason, the researchers did not include competing LLMs, such as Llama 2 from the same Meta or Bard from Google.
Nevertheless, the study shows that AI is still quite far from reaching general artificial intelligence (AGI), a state in which AI algorithms can outperform humans in intellectual tasks.
As The Byte writes, the researchers' findings also contradict the loud statements made by prominent figures in the AI industry.
"This marked difference in performance contrasts with a recent trend of LLMs outperforming humans on tasks requiring professional skills, such as law or chemistry," the researchers noted.
In the GPT-4 documentation, OpenAI claims that their model "demonstrates human-level performance on a variety of professional and academic tests, including passing a simulated bar exam with a score within the top ten percent of test takers."
However, scientists are still debating how to actually assess LLM intelligence if GPT-4, like other AI models, still has many flaws and sometimes cannot distinguish truth from fiction.
LeCun has previously criticized the hype surrounding AI and denied evidence that it poses an existential threat to humans.
According to him, LLMs apparently "have some understanding of what they're reading and generating... but that understanding is very limited and superficial."
"Otherwise, they would not be confused so often and would not make mistakes that contradict common sense," the scientist emphasized.
Earlier, OBOZ.UA reported that OpenAI is probably working on a next-generation model called Q*, which can bring the level of AI deductive reasoning to the height of human intelligence.