Patronus AI launches API to prevent AI hallucinations in real-time

Patronus AI, a San Francisco startup, has launched a self-serve API that detects and prevents AI failures, such as hallucinations and unsafe responses, in real-time. According to CEO Anand Kannappan in an interview with VentureBeat, the platform introduces several innovations, including “judge evaluators” that allow companies to create custom rules in plain English and Lynx, …

Read more

Anthropic calls for targeted AI regulation to prevent catastrophic risks

AI startup Anthropic, known for their Assistant Claude, is urging governments to take action on AI policy within the next 18 months to mitigate the growing risks posed by increasingly powerful AI systems. In a post on their official website, the company argues that narrowly-targeted regulation can help realize the benefits of AI while preventing …

Read more

Claude Computer Use enables remote code execution via prompt injection

Anthropic’s recently released Claude Computer Use feature allows Claude to control a computer by taking screenshots, running bash commands, and more. However, this also introduces severe prompt injection risks, as Claude could be exploited to run malicious code autonomously. A post on ”Embrace the Red” demonstrated this by the author crafting a malicious webpage that …

Read more

Apple opens up Private Cloud Compute for security research

Apple has opened its Private Cloud Compute (PCC) system to security researchers, according to a post on its Security Research Blog. PCC is designed to meet compute-intensive requests for Apple Intelligence while maintaining privacy by bringing Apple’s device security model to the cloud. The company is now making available a security guide, virtual research environment, …

Read more

Anthropic tests its AI models for sabotage capabilities

Anthropic has developed new security assessments for AI models that test their ability to sabotage. In a blog post, the company describes four types of tests: “human decision sabotage,” “code sabotage,” “sandbagging,” and “undermining oversight.” In human decision sabotage, the models try to trick people into making the wrong decisions without arousing suspicion. Code sabotage …

Read more

Endor Labs scores open source AI models

Endor Labs has launched a new platform to score over 900,000 open-source AI models available on Hugging Face, focusing on security, activity, quality, and popularity. This initiative aims to address concerns regarding the trustworthiness and security of AI models, which often have complex dependencies and vulnerabilities, reports VentureBeat. Developers can query the platform about model …

Read more

Galileo evaluates AI models for business use

Galileo, an AI startup led by Vikram Chatterji, has raised $45 million in a Series B funding round, totaling $68 million since its inception three years ago. The company focuses on evaluating AI models to ensure they function effectively and do not generate incorrect information or leak sensitive data, reports Forbes. Its product suite includes …

Read more

Test shows compliance problems of leading AI models

A new tool for checking compliance with the EU AI Act has revealed weaknesses in leading AI models. As Martin Coulter reports for Reuters, some models from major tech companies are performing poorly in areas such as cybersecurity and discriminatory output. The “Large Language Model Checker” developed by LatticeFlow AI evaluates AI models across dozens …

Read more

Former CISO of Palantir joins OpenAI

Dane Stuckey, former CISO of Palantir, is joining OpenAI as its new CISO. According to Kyle Wiggers from TechCrunch, he will work alongside OpenAI’s head of security, Matt Knight. Stuckey announced the move on X/Twitter on Tuesday evening. He emphasized the importance of security for OpenAI’s mission. Stuckey started at Palantir in 2014 in information …

Read more

Anthropic updates AI safety policy

Anthropic has updated its AI safety policy to prevent misuse, reports VentureBeat author Michael Nuñez. The new “Capability Thresholds” define benchmarks for risky capabilities of AI models, such as in the area of bioweapons or autonomous AI research. If a model reaches such a threshold, additional safeguards are triggered. The revised policy also sets out …

Read more