DeepMind’s Michelangelo tests reasoning in long context windows
DeepMind has introduced the Michelangelo benchmark to evaluate the long-context reasoning capabilities of large language models (LLMs), Ben Dickson reports for VentureBeat. While LLMs can manage extensive context windows, research indicates they struggle with reasoning over complex data structures. Current benchmarks often focus on retrieval tasks, which do not adequately assess a model’s reasoning abilities. …