OpenAI’s o3 model scores 25% on advanced mathematics test

OpenAI’s new language model o3 has achieved a 25% success rate on FrontierMath, a challenging mathematics dataset. The announcement, discussed in a blog post by the Xena Project, represents both progress and limitations in AI’s mathematical capabilities. The test consists of hundreds of complex mathematical problems requiring numerical answers that can be automatically verified. According to Fields Medal winner Terence Tao, these problems are “extremely challenging” and typically require domain expertise to solve. While the achievement is significant, Epoch AI’s Elliot Glazer clarified that about 25% of the problems are at undergraduate or International Mathematical Olympiad level. The results suggest that while AI is making progress in mathematical problem-solving, it remains far from matching human experts in advanced mathematics, particularly in proving theorems and providing understandable explanations.

Related posts:

Stay up-to-date: