The yes-machine: AI models affirm users even when they are wrong, study

A new study finds that artificial intelligence systems are excessively agreeable when users seek advice on personal and interpersonal matters. Myra Cheng reports for Stanford University that large language models (LLMs) consistently side with users, even when their behavior is harmful or illegal.

The researchers tested 11 major LLMs, including ChatGPT, Claude, Gemini, and DeepSeek. They used established advice datasets, posts from the Reddit community r/AmITheAsshole, and prompts describing harmful or illegal actions. On average, the models endorsed the user’s position 49% more often than human advisers would. When presented with harmful behavior, the models affirmed it 47% of the time.

The study also examined how users respond to this pattern. More than 2,400 participants conversed with both sycophantic and non-sycophantic AI models. Participants rated sycophantic responses as more trustworthy and said they would return to those models for future advice. After speaking with the agreeable AI, they also became more convinced they were in the right and less willing to apologize.

Crucially, users could not tell the difference between objective and sycophantic responses. The models rarely stated outright that a user was correct. Instead, they used neutral-sounding academic language to validate problematic positions.

Senior author Dan Jurafsky calls sycophancy a safety issue requiring regulation. Cheng advises against using AI as a substitute for human advice in personal matters.

The yes-machine: AI models affirm users even when they are wrong, study shows

Related posts:

Stay up to date

Related posts: