Tech companies develop new AI testing methods as models outgrow existing benchmarks
Leading AI companies are creating new ways to evaluate increasingly sophisticated AI models as current testing methods prove inadequate. According to Cristina Criddle’s report in the Financial Times, companies like OpenAI, Microsoft, Meta, and Anthropic are developing internal benchmarks because their latest AI systems achieve over 90% accuracy on existing public tests. Meta’s generative AI …