Safety | Page 2 of 6 | ✦ Smart Content Report

Former OpenAI scientist raises $1 billion for AI safety startup

February 19, 2025

Ilya Sutskever’s Safe Superintelligence (SSI) is raising over $1 billion at a valuation exceeding $30 billion, Bloomberg’s Kate Clark reports. Greenoaks Capital Partners leads the investment with $500 million. The startup, co-founded by Sutskever after leaving his position as OpenAI’s Chief Scientist, focuses exclusively on developing safe AI systems. SSI’s valuation has increased significantly from …

New security flaw allows manipulation of Gemini’s memory function

February 19, 2025February 17, 2025

Security researcher Johann Rehberger has discovered a vulnerability in Google’s Gemini AI that allows attackers to plant false long-term memories in the chatbot. As reported by Dan Goodin in Ars Technica, the hack uses a technique called “delayed tool invocation” to bypass Google’s security measures. The attack works by embedding malicious instructions in documents that …

US and UK reject AI safety declaration as EU withdraws liability directive

February 13, 2025

The United States and United Kingdom have declined to sign an international declaration on AI safety at the Paris AI Action Summit, while the European Union has withdrawn its planned AI liability directive. These developments signal a significant shift in the global approach to AI regulation. At the Paris summit, US Vice President JD Vance …

Anthropic’s new AI safety system blocks most jailbreak attempts

February 7, 2025

Anthropic has unveiled “constitutional classifiers,” a new security system that prevents AI models from generating harmful content. According to research published by Anthropic and reported by Taryn Plumb in VentureBeat, the system successfully blocks 95.6% of jailbreak attempts on their Claude 3.5 Sonnet model. The company tested the system with 10,000 synthetic jailbreaking prompts in …

DeepSeek R1 fails all security tests

February 6, 2025

Security researchers from Cisco and the University of Pennsylvania have discovered severe safety vulnerabilities in DeepSeek’s R1 AI chatbot. According to findings published by Matt Burgess in Wired, the model failed to detect or block any of the 50 tested malicious prompts designed to elicit harmful content. The researchers achieved a 100% success rate in …

New research reveals 15 methods to bypass AI safety controls

February 5, 2025

Researchers have identified 15 sophisticated techniques that can be used to circumvent safety measures in large language models (LLMs), raising concerns about AI security. Security researcher Nir Diamant detailed these findings in a comprehensive analysis that examines various methods attackers use to make AI models ignore their safety training. The research highlights several major attack …

AI integration challenges end-to-end encryption privacy guarantees

February 5, 2025January 21, 2025

A comprehensive analysis by Matthew Green examines how the increasing integration of AI technologies threatens traditional end-to-end encryption privacy protections. The article discusses concerns about AI assistants requiring access to private user data and the implications for secure messaging platforms. Green highlights that while end-to-end encryption has become standard in messaging apps like Signal, WhatsApp, …

Study reveals AI’s high success rate in personalized phishing attacks

February 5, 2025January 7, 2025

A new study has found that AI can successfully create and execute highly effective phishing email campaigns, achieving click-through rates of over 50%. The research, conducted by Simon Lermen and Fred Heiding, tested various AI models’ abilities to gather personal information and craft targeted phishing messages. The study compared four different approaches to phishing emails: …

OpenAI introduces new safety system for o1 and o3

February 5, 2025December 23, 2024

OpenAI has developed a new approach called “deliberative alignment” to make its AI models safer and more aligned with human values. According to Maxwell Zeff’s article, the company implemented this system in its latest AI reasoning models, o1 and o3. The new method enables the models to consider OpenAI’s safety policy during the inference phase …

New Anthropic study reveals simple AI jailbreaking method

February 5, 2025December 22, 2024

Anthropic researchers have discovered that AI language models can be easily manipulated through a simple automated process called Best-of-N Jailbreaking. According to an article published by Emanuel Maiberg at 404 Media, this method can bypass AI safety measures by using randomly altered text with varied capitalization and spelling. The technique achieved over 50% success rates …