Safety | Page 3 of 5 | ✦ Smart Content Report

Microsoft exec explains AI safety approach and AGI limitations

February 5, 2025December 19, 2024

Microsoft’s chief product officer for responsible AI, Sarah Bird, detailed the company’s strategy for safe AI development in an interview with Financial Times reporter Cristina Criddle. Bird emphasized that while generative AI has transformative potential, artificial general intelligence (AGI) still lacks fundamental capabilities and remains a non-priority for Microsoft. The company focuses instead on augmenting …

Cryptomining code found in Ultralytics AI software versions

February 5, 2025December 11, 2024

Security researchers discovered malicious code in two versions of Ultralytics’ YOLO AI model that installed cryptocurrency mining software on users’ devices. According to Bill Toulas from Bleeping Computer, versions 8.3.41 and 8.3.42 of the popular computer vision software were compromised through a supply chain attack. Ultralytics CEO Glenn Jocher confirmed that the affected versions have …

How Anthropic tests AI models for potential security threats

February 5, 2025December 11, 2024

Anthropic’s Frontier Red Team, a specialized safety testing unit, has conducted extensive evaluations of the company’s latest AI model Claude 3.5 Sonnet to assess its potential dangers. As reported by Sam Schechner in The Wall Street Journal, the team led by Logan Graham runs thousands of tests to check the AI’s capabilities in areas like …

Privacy concerns arise over Apple’s AI features and settings

February 5, 2025December 10, 2024

A recent iOS update has sparked debate about Apple’s artificial intelligence features and their privacy implications. Security journalist Spencer Ackerman, known for his work on the NSA documents with The Guardian, raised concerns about default settings in iOS 18.1 and Apple Intelligence’s data handling practices. While Ackerman worried about data being uploaded to cloud-based AI …

Study reveals visual prompt injection vulnerabilities in GPT-4V

February 5, 2025November 14, 2024

A recent study by Lakera’s team demonstrates how GPT-4V can be manipulated through visual prompt injection attacks. As detailed by author Daniel Timbrell in his article, these attacks involve embedding text instructions within images to make AI models ignore their original programming or perform unintended actions. During Lakera’s internal hackathon, researchers successfully tested several methods, …

AI-generated images raise concerns about research integrity

February 5, 2025November 12, 2024

AI tools that can generate realistic images are becoming a significant concern for research integrity specialists. The ease with which these tools can create fake scientific figures that are hard to distinguish from real ones raises fears of an increasingly untrustworthy scientific literature, Nature reports. Companies like Proofig and Imagetwin are developing AI-based solutions to …

Patronus AI launches API to prevent AI hallucinations in real-time

February 5, 2025November 8, 2024

Patronus AI, a San Francisco startup, has launched a self-serve API that detects and prevents AI failures, such as hallucinations and unsafe responses, in real-time. According to CEO Anand Kannappan in an interview with VentureBeat, the platform introduces several innovations, including “judge evaluators” that allow companies to create custom rules in plain English and Lynx, …

Anthropic calls for targeted AI regulation to prevent catastrophic risks

February 5, 2025November 5, 2024

AI startup Anthropic, known for their Assistant Claude, is urging governments to take action on AI policy within the next 18 months to mitigate the growing risks posed by increasingly powerful AI systems. In a post on their official website, the company argues that narrowly-targeted regulation can help realize the benefits of AI while preventing …

Claude Computer Use enables remote code execution via prompt injection

February 5, 2025October 29, 2024

Anthropic’s recently released Claude Computer Use feature allows Claude to control a computer by taking screenshots, running bash commands, and more. However, this also introduces severe prompt injection risks, as Claude could be exploited to run malicious code autonomously. A post on ”Embrace the Red” demonstrated this by the author crafting a malicious webpage that …

Apple opens up Private Cloud Compute for security research

February 5, 2025October 24, 2024

Apple has opened its Private Cloud Compute (PCC) system to security researchers, according to a post on its Security Research Blog. PCC is designed to meet compute-intensive requests for Apple Intelligence while maintaining privacy by bringing Apple’s device security model to the cloud. The company is now making available a security guide, virtual research environment, …