Two recent studies provide the first empirical evidence that having AI models debate each other can help a human or machine judge discern the truth, reports Nash Weerasekera for Quanta Magazine. The approach, first proposed in 2018, involves two expert language models presenting arguments on a given question to a less-informed judge, who then decides which side is correct.
In experiments conducted by Anthropic and Google DeepMind, AI judges were able to identify the right answers to reading comprehension and science questions more accurately when the language models debated, compared to debate-free setups. However, researchers caution that the models can still be sensitive to irrelevant factors like argument length and may not generalize well to complex real-world problems.