Claude Sonnet 5 brings near-flagship AI performance to a lower price tier

Anthropic has released Claude Sonnet 5, a new AI model that the company says delivers performance close to its flagship Opus 4.8 system at a significantly lower price. The model is now the default for users on Anthropic’s Free and Pro plans and is also available to Max, Team, and Enterprise customers.

Developers can access it via the API at an introductory price of $2 per million input tokens and $10 per million output tokens until 31 August 2026, after which the price rises to $3 and $15 respectively. Anthropic’s flagship Opus 4.8 costs $5 per million input tokens and $25 per million output tokens, making Sonnet 5 considerably cheaper even at standard pricing.

Early access partners reported that the model completes complex, multi-step tasks where previous Sonnet versions stalled. Engineers at Zapier and Cursor were among those who described workflows that previously failed halfway through now running end to end.

What “agentic” means in practice

Anthropic positions Sonnet 5 as its most capable model for agentic tasks. In AI, “agentic” refers to a system’s ability to plan, use external tools such as browsers or code terminals, and execute multi-step tasks with minimal human input. Until recently, this level of capability was largely limited to more expensive, larger models.

Learn more about AI agents for content professionals in this article.

Benchmark results published by Anthropic show Sonnet 5 scoring 63.2% on SWE-bench Pro, a coding evaluation, compared to 58.1% for its predecessor Sonnet 4.6 and 69.2% for Opus 4.8. On a multidisciplinary reasoning test called Humanity’s Last Exam, Sonnet 5 with tools scores 57.4%, nearly matching Opus 4.8’s 57.9%. On a knowledge-work benchmark called GDPval-AA v2, Sonnet 5 scores 1,618, actually exceeding Opus 4.8’s 1,615.

As Simon Willison notes on his weblog, developers should pay attention to one technical detail: Sonnet 5 uses a new tokenizer that produces roughly 30% more tokens from the same English text compared to Sonnet 4.6. This effectively raises the real cost of using the model beyond what the headline price suggests. The impact varies by language and content type, with Simplified Mandarin barely affected and English text seeing the largest increase.

Safety improvements, with caveats

Anthropic reports that Sonnet 5 hallucinates less and resists manipulation attempts better than Sonnet 4.6. It also shows a lower overall rate of what the company calls “misaligned behaviour,” meaning actions that deviate from user intent or ethical guidelines. However, Anthropic acknowledges that Sonnet 5 still falls short of its more capable Opus 4.8 and the Claude Mythos Preview model on these measures.

On cybersecurity risk, Anthropic says Sonnet 5 cannot develop working software exploits, scoring 0.0% on that measure in a test conducted in collaboration with Mozilla. It does show a slightly higher rate of partial progress on such tasks than Sonnet 4.6, which Anthropic attributes to general intelligence improvements rather than deliberate training. The company has enabled cybersecurity safeguards in the model by default as a precaution.

As Amanda Caswell reports for The New Stack, the 145-page system card accompanying the release dedicates most of its length not to benchmark results but to how the model behaves during sustained autonomous operation: how it handles failed tool calls, resists hijacking attempts by malicious web pages, and recovers when a long-running task is interrupted. This focus reflects how seriously Anthropic takes the gap between a model that performs well in a chat window and one that operates reliably when left to work independently.

Sonnet 5 is available now across all Anthropic products and platforms, including Claude Code.

Sources

Introducing Claude Sonnet 5 – Anthropic
Anthropic launches Claude Sonnet 5 at a steep discount to its top model as the company races toward a blockbuster IPO – VentureBeat
Anthropic’s Claude Sonnet 5 system card says more about the future of AI than its benchmarks do – The New Stack
What’s new in Claude Sonnet 5 – Simon Willison’s Weblog

What “agentic” means in practice

Safety improvements, with caveats

Sources

Stay up to date

Related posts: