OpenAI explains why AI models are rewarded for inventing facts

Large language models like ChatGPT sometimes generate false information (“hallucinations”) because their evaluation systems reward guessing over admitting uncertainty. In an official post, the company OpenAI reports that this incentive structure is a fundamental challenge for all current AI models.

Hallucinations can occur even with seemingly simple questions. For example, a chatbot gave three different incorrect answers when asked for the title of a researcher’s dissertation. According to OpenAI, these errors persist because current evaluation methods create the wrong incentives for the models.

The company compares the situation to a multiple-choice test. A student who guesses a random answer might get it right, while leaving it blank guarantees a zero. Similarly, AI models are often graded only on accuracy, which is the percentage of questions they answer correctly. This encourages them to guess rather than state that they do not know the answer.

To illustrate the problem, OpenAI provides data from an evaluation. An older model achieved a slightly higher accuracy rate of 24% compared to a newer model’s 22%. However, the older model’s error rate was 75%, while the newer model made errors in only 26% of cases. The newer model achieved this by abstaining from answering 52% of the time, admitting its uncertainty.

OpenAI argues that this shows how strategically guessing can improve accuracy scores while significantly increasing the rate of hallucinations. Because accuracy-only metrics dominate leaderboards, developers are motivated to build models that guess.

As a solution, OpenAI suggests a change in how models are graded. The company proposes penalizing confident errors more than expressions of uncertainty. This would involve updating the widely used, accuracy-based evaluations to discourage guessing. If scoreboards reward models for admitting their limits, developers will be more likely to adopt techniques that reduce hallucinations.

The company also explains how these factual inaccuracies originate. Models learn by predicting the next word in vast amounts of text. They can easily learn consistent patterns like spelling and grammar. However, arbitrary facts that do not follow a predictable pattern are harder to learn and can lead to errors.

OpenAI concludes by addressing several common misconceptions. The company states that hallucinations are not inevitable, because models can be designed to abstain when uncertain. It also clarifies that understanding and reducing hallucinations is not a mystery but a result of understanding the statistical mechanisms and evaluation incentives at play.

OpenAI explains why AI models are rewarded for inventing facts

Related posts:

Stay up-to-date: