Study finds LM Arena may favor major AI labs in its benchmarking

A new study by researchers from Cohere, Stanford, MIT, and Ai2 alleges that LM Arena, the organization behind the Chatbot Arena AI benchmark, provided preferential treatment to major AI companies. According to Maxwell Zeff’s TechCrunch report, companies like Meta, OpenAI, Google, and Amazon were allowed to privately test multiple model variants and only publish scores of top performers. The researchers claim Meta tested 27 model variants before its Llama 4 release but only published one high-scoring result. LM Arena co-founder Ion Stoica defended the organization, calling the study “full of inaccuracies” and stating that all model providers had equal opportunities. The controversy emerges just weeks after Meta was caught optimizing specifically for Chatbot Arena benchmarks and as LM Arena prepares to launch as a company seeking investor funding.

Study finds LM Arena may favor major AI labs in its benchmarking

Related posts:

Stay up-to-date: