OpenAI funded math benchmark before achieving record results

OpenAI has been revealed as the financial backer of FrontierMath, a significant AI mathematics benchmark, only after announcing their own record-breaking performance on it. According to Matthias Bastian’s report in Decoder, OpenAI’s new o3 model achieved a 25.2 percent success rate on complex mathematical problems, far surpassing previous models’ 2 percent capability. Epoch AI, the benchmark’s developer, had been contractually prevented from disclosing OpenAI’s involvement until o3’s announcement in December 2024. Tamay Besiroglu from Epoch AI acknowledged they should have been more transparent about the partnership, particularly with the more than 60 mathematicians who created the test problems. While OpenAI received access to many benchmark problems before the announcement, Epoch AI maintained a private set for independent testing.

OpenAI funded math benchmark before achieving record results

Related posts:

Stay up-to-date: