New benchmark reveals leading AI models confidently produce false information
A new benchmark called Phare has revealed that leading large language models (LLMs) frequently generate false information with high confidence, particularly when handling misinformation. The research, conducted by Giskard with partners including Google DeepMind, evaluated top models from eight AI labs across multiple languages. The Phare benchmark focuses on four critical domains: hallucination, bias and …