Grok 3: Elon Musk's AI is accused of cheating during benchmarks

Deal Score0
Deal Score0

Igor Babushkin, co -founder of Xai, defends Xai's AI, Elon Musk's IA company, faced with the accusations of his competitor. The debate relates to a graph which shows the performances of Grok 3 on the AIM 2025, a set of mathematical problems which is however called into question by the experts but serves as a reference during the benchmarks.

Advertising, your content continues below

Grok 3 accused of cheating during benchmarks

The graph, published on Xai's blog, shows two versions of AI, Grok 3 Reasoning Beta and Grok 3 Mini Reasoning, which surpass O3-mini-highBest model of Openai. But according to employees of the Sam Altman company, who recently tackled Elon Muskthe results ignore the scores of O3-Mini-High with the Cons@64 parameter, abbreviation of consensus@64.

The consensus@64 measure allows a model to try 64 times each problem of a test while keeping, as a final response, the one that comes up most often. It improves the scoring of models. His forgetfulness creates a misleading impression of superiority from one model on another.

The @1 scores of Grok 3 Reasoning Beta and Grok 3 Mini Reasoning, that is to say their first results in the test, are actually lower than those of O3-Mini-High. Grok 3 Reasoning Beta is even slightly less efficient than the O1 model of Openai in medium configuration. However, XAI present Grok 3 as “The most intelligent AI in the world”.

Igor Babushkin defends himself on X explaining that Optai also published potentially deceptive results in the past, even if it was his own models in comparison. A user has created a more precise graphic with the Cons@64 performance from almost all models.

Nathan Lambert, AI researcher, explains that a crucial metric has not been revealed: the computational and financial cost necessary for each model to reach maximum scores. For him, current reference tests have limits and do not effectively communicate on strengths, but especially weaknesses, AI models.

Advertising, your content continues below

More Info

We will be happy to hear your thoughts

Leave a reply

Bonplans French
Logo