Anthropic develops method to analyze AI’s values in real conversations

Anthropic, the company behind the AI assistant Claude, has developed a new technique to observe and analyze how its AI expresses values during real-world conversations with users. The research, conducted by Anthropic’s Societal Impacts team, examines whether Claude adheres to the company’s goal of making it “helpful, honest, and harmless” when interacting with users. The … Read more

Report: OpenAI reduces safety testing amid competition pressure

OpenAI has significantly shortened its safety testing period for new AI models, prompting concerns about insufficient safeguards. According to a Financial Times report by Cristina Criddle, testers now have just days to evaluate models compared to several months previously. Eight people familiar with OpenAI’s testing processes indicated that evaluations have become less thorough as the … Read more

Google releases new AI models faster than safety reports

Google has accelerated its AI model releases without publishing corresponding safety reports. According to TechCrunch reporter Maxwell Zeff, the company has not provided safety documentation for its latest models, Gemini 2.5 Pro and Gemini 2.0 Flash, despite previous commitments to transparency. Google’s director of Gemini, Tulsee Doshi, explained that Gemini 2.5 Pro is considered “experimental,” … Read more

Anthropic reveals insights into Claude’s internal thought processes

Anthropic has published new research that sheds light on how its AI assistant Claude “thinks” internally. Two recent papers explore the model’s internal mechanisms through a novel interpretability approach the company compares to an “AI microscope.” This research reveals several surprising findings about Claude’s cognitive processes, including how it handles multiple languages, plans ahead when … Read more

AI voice cloning tools lack effective safeguards against misuse

Most AI voice cloning services have inadequate protections against nonconsensual voice impersonation, according to a Consumer Reports investigation. The study examined six leading publicly available tools and found that five had easily bypassed safeguards. As reported by NBC News, four services (ElevenLabs, Speechify, PlayHT, and Lovo) merely require checking a box confirming authorization, while Resemble … Read more

Security experts warn about risks of autonomous AI agents

Enterprise security experts are raising concerns about the growing use of AI agents in business workflows. According to a VentureBeat report by Emilia David, these autonomous AI systems require access to sensitive data to function effectively, creating new security challenges for organizations. Nicole Carignan, VP of strategic cyber AI at Darktrace, warns that multi-agent systems … Read more

Former OpenAI scientist raises $1 billion for AI safety startup

Ilya Sutskever’s Safe Superintelligence (SSI) is raising over $1 billion at a valuation exceeding $30 billion, Bloomberg’s Kate Clark reports. Greenoaks Capital Partners leads the investment with $500 million. The startup, co-founded by Sutskever after leaving his position as OpenAI’s Chief Scientist, focuses exclusively on developing safe AI systems. SSI’s valuation has increased significantly from … Read more

New security flaw allows manipulation of Gemini’s memory function

Security researcher Johann Rehberger has discovered a vulnerability in Google’s Gemini AI that allows attackers to plant false long-term memories in the chatbot. As reported by Dan Goodin in Ars Technica, the hack uses a technique called “delayed tool invocation” to bypass Google’s security measures. The attack works by embedding malicious instructions in documents that … Read more

US and UK reject AI safety declaration as EU withdraws liability directive

The United States and United Kingdom have declined to sign an international declaration on AI safety at the Paris AI Action Summit, while the European Union has withdrawn its planned AI liability directive. These developments signal a significant shift in the global approach to AI regulation. At the Paris summit, US Vice President JD Vance … Read more

Anthropic’s new AI safety system blocks most jailbreak attempts

Anthropic has unveiled “constitutional classifiers,” a new security system that prevents AI models from generating harmful content. According to research published by Anthropic and reported by Taryn Plumb in VentureBeat, the system successfully blocks 95.6% of jailbreak attempts on their Claude 3.5 Sonnet model. The company tested the system with 10,000 synthetic jailbreaking prompts in … Read more