DeepMind launches new benchmark to test AI model accuracy
Google DeepMind has introduced FACTS Grounding, a new benchmark system to evaluate the factual accuracy of large language models (LLMs). According to Taryn Plumb’s report in VentureBeat, the benchmark tests how well AI models generate accurate responses based on long-form documents. The system includes a public leaderboard on Kaggle, where Gemini 2.0 Flash currently leads … Read more