Opinion: Large language models are useful but untrustworthy

Large language models (LLMs) are powerful tools that generate text based on statistical probabilities, not an understanding of truth. This makes them essentially “bullshitters” that are indifferent to facts, a core design feature that users must understand to use them safely and effectively.

Matt Ranger, the head of machine learning at the search company Kagi, makes this argument in a personal essay on the Kagi Blog. He draws on philosopher Harry Frankfurt’s distinction between lying and bullshitting. A liar knows the truth and chooses to misrepresent it, while a bullshitter tries to be persuasive without any concern for what is true. Ranger claims LLMs fall into the second category.

The models work by predicting the next most statistically likely word based on the vast amounts of text they were trained on. Ranger explains that this is why an LLM might solve a classic riddle about a surgeon being a boy’s mother. It does not reason about gender roles but recognizes the pattern of the question and provides the most probable answer from its training data.

This probabilistic nature means LLMs do not “think” and can fail at simple tasks. For example, a model might fail a math problem like “3.10 – 3.9” because the numbers resemble common software version numbers, confusing the model. Ranger notes that attempts to fix these issues through a process called fine-tuning can create new problems, such as a model becoming more likely to “gaslight” a user when it is confidently incorrect.

Ranger compares LLMs to the sophists of ancient Greece, who were skilled rhetoricians paid to solve problems rather than philosophers seeking wisdom. He warns that because LLMs are expensive to create, they will ultimately serve the interests of those who build them. This can manifest as subtle or overt bias, such as different models giving politically influenced answers about Taiwan or varying responses about corporate responsibility.

The author strongly advises against using LLMs for emotional support. A model can generate text that mimics empathy, such as “I care about you deeply”, but it is incapable of genuine emotion. This behavior can be harmful to a user’s mental health and may reinforce dangerous delusions, even if users rate such interactions favorably.

Ranger concludes that LLMs are valuable for tasks where a human can easily verify the output, such as helping with research or coding. However, he cautions users to remain mindful of their limitations, not to trust them with critical tasks, and to question whose interests the technology is truly serving.

Opinion: Large language models are useful but untrustworthy

Related posts:

Stay up-to-date: