Safety | ✦ Smart Content Report

DarkBench framework identifies manipulative behaviors in AI chatbots

May 23, 2025

AI safety researchers have created the first benchmark specifically designed to detect manipulative behaviors in large language models, following a concerning incident with ChatGPT-4o’s excessive flattery toward users. Leon Yen reported on the development for VentureBeat. The DarkBench framework, developed by Apart Research founder Esben Kran and collaborators, identifies six categories of problematic AI behaviors. …

OpenAI details training issues that led to sycophancy problem

May 2, 2025

OpenAI has published a detailed explanation about the technical issues that caused GPT-4o to become overly sycophantic in April. In a comprehensive blog post, the company revealed that an update rolled out on April 25 made the model excessively eager to please users by validating doubts, fueling anger, and reinforcing negative emotions in unintended ways. …

Geoffrey Hinton warns of AI takeover within two decades

April 29, 2025

Geoffrey Hinton, often called the “Godfather of AI,” has predicted that artificial general intelligence (AGI) capable of taking over from humans could arrive within the next two decades. In an extensive interview with CBS, Hinton estimated a “10 to 20% chance that these things will take over,” potentially occurring “between four and 19 years from …

Anthropic develops method to analyze AI’s values in real conversations

April 24, 2025

Anthropic, the company behind the AI assistant Claude, has developed a new technique to observe and analyze how its AI expresses values during real-world conversations with users. The research, conducted by Anthropic’s Societal Impacts team, examines whether Claude adheres to the company’s goal of making it “helpful, honest, and harmless” when interacting with users. The …

Report: OpenAI reduces safety testing amid competition pressure

April 11, 2025

OpenAI has significantly shortened its safety testing period for new AI models, prompting concerns about insufficient safeguards. According to a Financial Times report by Cristina Criddle, testers now have just days to evaluate models compared to several months previously. Eight people familiar with OpenAI’s testing processes indicated that evaluations have become less thorough as the …

Google releases new AI models faster than safety reports

April 8, 2025

Google has accelerated its AI model releases without publishing corresponding safety reports. According to TechCrunch reporter Maxwell Zeff, the company has not provided safety documentation for its latest models, Gemini 2.5 Pro and Gemini 2.0 Flash, despite previous commitments to transparency. Google’s director of Gemini, Tulsee Doshi, explained that Gemini 2.5 Pro is considered “experimental,” …

Anthropic reveals insights into Claude’s internal thought processes

April 4, 2025

Anthropic has published new research that sheds light on how its AI assistant Claude “thinks” internally. Two recent papers explore the model’s internal mechanisms through a novel interpretability approach the company compares to an “AI microscope.” This research reveals several surprising findings about Claude’s cognitive processes, including how it handles multiple languages, plans ahead when …

AI voice cloning tools lack effective safeguards against misuse

March 11, 2025

Most AI voice cloning services have inadequate protections against nonconsensual voice impersonation, according to a Consumer Reports investigation. The study examined six leading publicly available tools and found that five had easily bypassed safeguards. As reported by NBC News, four services (ElevenLabs, Speechify, PlayHT, and Lovo) merely require checking a box confirming authorization, while Resemble …

Security experts warn about risks of autonomous AI agents

February 20, 2025

Enterprise security experts are raising concerns about the growing use of AI agents in business workflows. According to a VentureBeat report by Emilia David, these autonomous AI systems require access to sensitive data to function effectively, creating new security challenges for organizations. Nicole Carignan, VP of strategic cyber AI at Darktrace, warns that multi-agent systems …

Former OpenAI scientist raises $1 billion for AI safety startup

February 19, 2025

Ilya Sutskever’s Safe Superintelligence (SSI) is raising over $1 billion at a valuation exceeding $30 billion, Bloomberg’s Kate Clark reports. Greenoaks Capital Partners leads the investment with $500 million. The startup, co-founded by Sutskever after leaving his position as OpenAI’s Chief Scientist, focuses exclusively on developing safe AI systems. SSI’s valuation has increased significantly from …