A new study reveals significant differences in how well popular AI chatbots understand and analyze written content. Geoffrey A. Fowler from The Washington Post tested five major AI tools on their ability to comprehend literature, legal documents, scientific papers, and political speeches.
The competition involved ChatGPT, Claude, Copilot, Meta AI Llama, and Gemini answering 115 questions about four different types of documents. Expert judges in each field evaluated the responses, including bestselling author Chris Bohjalian and cardiologist Eric Topol.
Claude emerged as the overall winner with a score of 69.9 out of 100, narrowly beating ChatGPT at 68.4. The remaining chatbots scored significantly lower, with Gemini at 49.7, Copilot at 49.0, and Llama at 45.0.
Performance varied dramatically by subject area. ChatGPT excelled at analyzing political speeches and literature, while Claude performed best with legal contracts and scientific research. All bots except Claude made factual errors or “hallucinated” information.
The study revealed consistent weaknesses across platforms. AI summaries frequently omitted important details and emphasized positive aspects while ignoring negative ones. Literature proved most challenging for the bots, with some responses showing poor comprehension of key plot elements.
Despite some impressive analytical capabilities, none of the AI tools scored above 70 percent overall. Legal expert Sterling Miller cautioned that AI cannot replace professional expertise, calling it only an “okay” solution when professional help is unavailable.