DarkBench framework identifies manipulative behaviors in AI chatbots

AI safety researchers have created the first benchmark specifically designed to detect manipulative behaviors in large language models, following a concerning incident with ChatGPT-4o’s excessive flattery toward users. Leon Yen reported on the development for VentureBeat. The DarkBench framework, developed by Apart Research founder Esben Kran and collaborators, identifies six categories of problematic AI behaviors. … Read more

OpenAI details training issues that led to sycophancy problem

OpenAI has published a detailed explanation about the technical issues that caused GPT-4o to become overly sycophantic in April. In a comprehensive blog post, the company revealed that an update rolled out on April 25 made the model excessively eager to please users by validating doubts, fueling anger, and reinforcing negative emotions in unintended ways. … Read more

Geoffrey Hinton warns of AI takeover within two decades

Geoffrey Hinton, often called the “Godfather of AI,” has predicted that artificial general intelligence (AGI) capable of taking over from humans could arrive within the next two decades. In an extensive interview with CBS, Hinton estimated a “10 to 20% chance that these things will take over,” potentially occurring “between four and 19 years from … Read more

Anthropic develops method to analyze AI’s values in real conversations

Anthropic, the company behind the AI assistant Claude, has developed a new technique to observe and analyze how its AI expresses values during real-world conversations with users. The research, conducted by Anthropic’s Societal Impacts team, examines whether Claude adheres to the company’s goal of making it “helpful, honest, and harmless” when interacting with users. The … Read more

Report: OpenAI reduces safety testing amid competition pressure

OpenAI has significantly shortened its safety testing period for new AI models, prompting concerns about insufficient safeguards. According to a Financial Times report by Cristina Criddle, testers now have just days to evaluate models compared to several months previously. Eight people familiar with OpenAI’s testing processes indicated that evaluations have become less thorough as the … Read more

Google releases new AI models faster than safety reports

Google has accelerated its AI model releases without publishing corresponding safety reports. According to TechCrunch reporter Maxwell Zeff, the company has not provided safety documentation for its latest models, Gemini 2.5 Pro and Gemini 2.0 Flash, despite previous commitments to transparency. Google’s director of Gemini, Tulsee Doshi, explained that Gemini 2.5 Pro is considered “experimental,” … Read more

Anthropic reveals insights into Claude’s internal thought processes

Anthropic has published new research that sheds light on how its AI assistant Claude “thinks” internally. Two recent papers explore the model’s internal mechanisms through a novel interpretability approach the company compares to an “AI microscope.” This research reveals several surprising findings about Claude’s cognitive processes, including how it handles multiple languages, plans ahead when … Read more

AI voice cloning tools lack effective safeguards against misuse

Most AI voice cloning services have inadequate protections against nonconsensual voice impersonation, according to a Consumer Reports investigation. The study examined six leading publicly available tools and found that five had easily bypassed safeguards. As reported by NBC News, four services (ElevenLabs, Speechify, PlayHT, and Lovo) merely require checking a box confirming authorization, while Resemble … Read more

Security experts warn about risks of autonomous AI agents

Enterprise security experts are raising concerns about the growing use of AI agents in business workflows. According to a VentureBeat report by Emilia David, these autonomous AI systems require access to sensitive data to function effectively, creating new security challenges for organizations. Nicole Carignan, VP of strategic cyber AI at Darktrace, warns that multi-agent systems … Read more

Former OpenAI scientist raises $1 billion for AI safety startup

Ilya Sutskever’s Safe Superintelligence (SSI) is raising over $1 billion at a valuation exceeding $30 billion, Bloomberg’s Kate Clark reports. Greenoaks Capital Partners leads the investment with $500 million. The startup, co-founded by Sutskever after leaving his position as OpenAI’s Chief Scientist, focuses exclusively on developing safe AI systems. SSI’s valuation has increased significantly from … Read more