New AI math benchmark exposes limitations in advanced reasoning
The FrontierMath benchmark, developed by Epoch AI, presents hundreds of challenging math problems that require deep reasoning and creativity to solve. Despite the growing power of AI models like GPT-4o and Gemini 1.5 Pro, they are solving fewer than 2% of these problems, even with extensive support, according to Epoch AI. The benchmark was created … Read more