Claude Mythos: Anthropic restricts its most capable AI model over

Anthropic has introduced a new AI model it considers too dangerous to release publicly. The model, called Claude Mythos Preview, can autonomously find and exploit security vulnerabilities in software. Instead of making it widely available, Anthropic is sharing access with a coalition of more than 40 organizations as part of an initiative called Project Glasswing.

The initiative includes major technology and finance companies such as Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Anthropic is committing up to $100 million in usage credits for the model across the effort, along with $4 million in donations to open-source security organizations.

What the model found

According to Anthropic, Claude Mythos Preview identified thousands of previously unknown security flaws, known as zero-day vulnerabilities, across every major operating system and web browser. The company says the model found these largely without human guidance.

Three examples Anthropic cited illustrate the scope of the findings:

A 27-year-old flaw in OpenBSD, an operating system widely used to run firewalls and critical internet infrastructure, which allowed a remote attacker to crash any machine running it simply by connecting to it.
A 16-year-old vulnerability in FFmpeg, a video processing library used across the internet, in a line of code that automated testing tools had examined five million times without catching the problem.
Several vulnerabilities in the Linux kernel, the software running most of the world’s servers, which the model chained together to escalate from ordinary user access to full control of a machine.

Anthropic says all three have since been patched. For thousands of other vulnerabilities still in the remediation process, the company is publishing cryptographic hashes of the details as a form of accountability, with plans to release the specifics once fixes are in place.

A deliberate head start for defenders

Anthropic’s stated rationale for restricting the model is that its capabilities could cause serious harm if accessible to malicious actors. At the same time, the company argues that withholding it entirely would cede the advantage to adversaries who may develop similar tools independently.

Logan Graham, who leads Anthropic’s team that tests models for dangerous capabilities, described the initiative as “the starting point for what we think will be an industry change point, or reckoning, with what needs to happen now.” Jared Kaplan, Anthropic’s chief science officer, said the goal is “both to raise awareness and to give good actors a head start on the process of securing open-source and private infrastructure and code.”

Anthropic notes that Mythos Preview was not specifically trained for cybersecurity. Its capabilities in this area emerged from broader improvements in coding and reasoning. The company says similar capabilities are likely to appear in other frontier models soon, from Anthropic and competitors alike.

Partners who have already tested the model report significant findings. Microsoft said the model showed substantial improvements over previous versions on its internal security benchmark. CrowdStrike’s chief technology officer, Elia Zaitsev, stated that “the window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI.”

Managing the volume of findings

One practical challenge the initiative faces is the sheer scale of vulnerabilities being discovered. Flooding open-source maintainers, many of whom are unpaid volunteers, with thousands of unverified bug reports could do more harm than good. Anthropic says it has built a triage pipeline to address this. Every finding is reviewed internally, and high-severity reports are validated by contracted professional security researchers before being sent to maintainers. The company says it coordinates with maintainers on a pace they can manage and aims to include a candidate patch with each report.

After the research preview period, Claude Mythos Preview will be available to participants at $25 per million input tokens and $125 per million output tokens, accessible through Anthropic’s API and the cloud platforms of Amazon, Google, and Microsoft.

The initiative’s name draws from the glasswing butterfly, which uses transparent wings to blend into its surroundings. Kaplan said the name reflects how critical vulnerabilities have long existed in plain view inside complex software systems, invisible only because finding them required expertise and effort that AI is now beginning to replace.

Sources: Anthropic, Anthropic Red Team, VentureBeat, TechCrunch, New York Times

Claude Mythos: Anthropic restricts its most capable AI model over cybersecurity risks

What the model found

A deliberate head start for defenders

Managing the volume of findings

Related posts:

What the model found

A deliberate head start for defenders

Managing the volume of findings

Stay up to date

Related posts: