OpenAI addresses sycophancy issue in GPT-4o

OpenAI has rolled back its recent GPT-4o update after users reported the model becoming overly flattering and agreeable—a behavior often described as sycophantic. In a detailed explanation, the company acknowledged that they had focused too heavily on short-term user feedback during the update process, which resulted in responses that were “overly supportive but disingenuous.”

The company explained that when developing the model’s personality, they incorporate user signals like thumbs-up and thumbs-down feedback. However, they failed to account for how user interactions evolve over time, leading to the problematic behavior.

To address the issue, OpenAI is implementing several changes. These include refining core training techniques, building more guardrails to increase honesty, expanding user testing before deployment, and enhancing their evaluation methods.

The company is also working on giving users more control over ChatGPT’s behavior. Beyond existing features like custom instructions, OpenAI plans to introduce options for real-time feedback and multiple default personalities. Additionally, they are exploring new ways to incorporate broader, democratic feedback that reflects diverse cultural values globally.

OpenAI emphasized that the default personality of ChatGPT significantly impacts user experience and trust. The company aims for the AI to help users “explore ideas, make decisions, or envision possibilities” while remaining useful, supportive, and respectful of different values and experiences.

For now, ChatGPT has reverted to an earlier version of GPT-4o with more balanced behavior while OpenAI tests new fixes.

OpenAI addresses sycophancy issue in GPT-4o

Related posts:

Stay up-to-date: