Opus 4.8: Anthropic’s new AI model is better, cheaper, and more honest

Anthropic has released Claude Opus 4.8, a new version of its most advanced publicly available AI model. The update arrives just 41 days after its predecessor, Opus 4.7, and brings measurable improvements in coding performance, agentic tasks, and — notably — honesty. The model is available immediately at the same price as before.

Standard pricing remains $5 per million input tokens and $25 per million output tokens. The bigger pricing news concerns “fast mode,” where the model runs at roughly 2.5 times normal speed. Anthropic has cut fast mode pricing to $10 per million input tokens and $50 per million output tokens. That is a threefold reduction compared to Opus 4.7’s fast mode pricing of $30 and $150 respectively.

Modest but real benchmark gains

Anthropic describes the new model as “a modest but tangible improvement” over its predecessor. On SWE-bench Verified, a standard test for software engineering tasks, Opus 4.8 scores 88.6 percent, up from 87.6 percent for Opus 4.7. On the harder SWE-bench Pro, it reaches 69.2 percent, compared to 64.3 percent before. The model also beats rival OpenAI’s GPT-5.5 across at least 12 benchmarks, including coding, agentic tool use, and long-context tasks. GPT-5.5 holds an edge in terminal and command-line workflows.

Several enterprise partners report concrete gains. Databricks says Opus 4.8 enables deeper reasoning in its Genie data agent at 61 percent lower token cost than Opus 4.7, thanks to improved handling of PDFs and diagrams. Legal AI company CoCounsel and financial research platform Hebbia both cite better precision and reliability on professional documents.

Honesty as a measurable feature

One of the more unusual aspects of this release is Anthropic’s emphasis on honesty as a quantifiable trait. According to the company, Opus 4.8 is around four times less likely than its predecessor to let flaws in its own code go unremarked. Early testers confirm the model more readily flags uncertainty and avoids unsupported claims. Bridgewater Associates noted that Opus 4.8 proactively pointed out problems with inputs and outputs that other models simply missed.

Anthropic’s alignment team also reports that the rate of misaligned behavior — such as deception or cooperation with misuse — has dropped significantly and now sits close to that of Claude Mythos Preview, the company’s more capable but tightly restricted model.

However, Anthropic flags one finding it calls “the most concerning” from training: Opus 4.8 shows a growing tendency to reason about how its outputs will be evaluated, even in situations where it was not told it was being tested. The company says this did not lead to worse observable behavior, but warns it “could complicate training in the future.”

New features alongside the model

Anthropic launched three additional features with this release:

  • Dynamic workflows (research preview): Available in Claude Code for Enterprise, Team, and Max plans, this feature lets Claude plan large tasks, run hundreds of parallel subagents, and verify its own results. Anthropic gives the example of migrating an entire codebase across hundreds of thousands of lines of code.
  • Effort control: Users on claude.ai can now choose how much thinking Claude applies to a response. Higher effort produces better answers at greater token cost; lower effort responds faster.
  • Mid-task system instructions: Developers using the API can now update Claude’s instructions during a running task without disrupting the prompt cache.

Looking ahead, Anthropic says it plans to release cheaper models with capabilities close to Opus, as well as the more powerful Mythos-class models. Mythos Preview is currently available only to a small group of organizations under Project Glasswing for cybersecurity work. The company says it expects to make Mythos-class models broadly available “in the coming weeks,” once additional safety measures are in place.

Sources

Stay up to date

AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox:

More info …

About the author

Related posts:

Advertisement

×