For the first time, Scale AI publishes rankings for large language models, evaluating their performance in specific application areas such as generative AI programming, regression, mathematics, and multilingualism. OpenAI’s GPT models ranked first in three of the four areas (coding, multilingual, instruction following), while Anthropic’s Claude 3 Opus ranked first in the remaining one (Math).