Safety | Page 6 of 6 | ✦ Smart Content Report

OpenAI insiders warn of dangerous corporate culture

February 5, 2025June 14, 2024

In an open letter, current and former OpenAI employees warn of a “reckless” development in the race for supremacy in artificial intelligence. They call for sweeping changes in the AI industry, including more transparency and better protection for whistleblowers. The signatories criticize a culture of secrecy and profit at any cost at OpenAI. The company …

California plans strict safety rules for AI

February 5, 2025June 14, 2024

California wants to implement strict safety rules for artificial intelligence, including a “kill switch” and reporting requirements for developers. Critics warn of barriers to innovation, excessive bureaucracy, and negative impacts on open source models that could weaken the state’s technology sector.

Inspect helps to assess AI safety

February 5, 2025May 17, 2024

The UK’s AI Safety Institute releases Inspect, an open source toolset designed to simplify the safety assessment of AI models. Inspect can be used to test the capabilities of AI models, such as core knowledge and reasoning.

New guide for secure AI systems

February 5, 2025April 19, 2024

The NSA, in collaboration with international partners, is releasing a guide to best practices for the secure deployment and operation of AI systems. The Cybersecurity Information Sheet is aimed primarily at operators of national security systems and companies in the defense industry, but is also relevant to other organizations. Source: Hacker News

Snapchat adds watermark to Snap AI images

February 5, 2025April 19, 2024

Snapchat is also focusing on greater transparency and stricter guidelines for the use of AI. Any image generated with Snap AI will be tagged with a new watermark. Source: TechCrunch

Vectorview evaluates performance and security

February 5, 2025April 5, 2024

Vectorview helps to evaluate the performance and security of language models. Targeted testing with real-world scenarios is supposed to detect and prevent unintended behavior that is often missed by generic benchmarks. Sources: TechCrunch, Y Combinator

Jailbreak with ASCII trick

February 5, 2025March 22, 2024

Researchers from Washington and Chicago have developed “ArtPrompt“, a new method to bypass security measures in language models. Using this method, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be tricked into responding to requests they are supposed to reject using ASCII art prompts. This includes advice on how to make bombs and …