Polski
русский
Українська

Six of the most advanced AI models failed research-level math tests: the best result turned out to be only 2%

Inna VasilyukNews
The most advanced AI solved only 2% of the tasks. Source: Freepik

Mathematicians developed new problems to test the reasoning skills of six most advanced artificial intelligence models . However, AIs failed almost all of the tests.

Modern AI models have difficulty solving research-level math problems. And even the most advanced AI systems are able to solve only 2% of the hundreds of problems they face, LiveScience writes.

According to the Epoch AI research institute, it usually takes hours or days for mathematicians with doctoral degrees to solve complex problems. And the most advanced AI models got less than 2% of them right.

A number of AI tests have been developed over the past decade, and in many cases, AI models easily pass these tests, scientists say. For example, in the standard MMLU (Measuring Massive Multitask Language Understanding) test, modern AI models answer 98% of math problems correctly.

Most of these tests are aimed at checking the ability of artificial intelligence to perform high school and college-level math, writes Elliot Glazer, a mathematician at Epoch AI.

However, a new set of tests called FrontierMath is aimed at a higher level of reasoning. Epoch AI developed the questions with the help of math professors. According to the developers, the tests cover a wide range of subfields, from number theory to algebraic geometry.

The scientists' findings show that currently, artificial intelligence models do not have mathematical reasoning at the research level. However, as AI develops, these comparative tests will provide a way to see if their reasoning abilities are deepening.

"By regularly evaluating state-of-the-art models and collaborating with the AI research community, we aim to deepen our understanding of AI’s capabilities and limitations," the team of scientists said.

Only verified information is available on OBOZ.UA Telegram channel and Viber. Do not fall for fakes!

Other News

Technology in the spotlight: Kyivstar launches annual survey for Ukrainian business owners

Technology in the spotlight: Kyivstar launches annual survey for Ukrainian business owners

Kyivstar has been conducting the survey since the beginning of the full-scale war
How to get a good night's sleep on an airplane

How to get a good night's sleep on an airplane: a simple life hack that conquered TikTok. Video

The video has collected hundreds of thousands of views
Gigi Hadid showed the perfect fall look with fashionable loafers

Gigi Hadid showed the perfect fall look with fashionable loafers

The image of the celebrity looked both modern and contained vintage aesthetics