Anthropic Launches Claude Sonnet 5, Closing In on Opus 4.8

Anthropic released Claude Sonnet 5, its most agentic mid-tier model, which it says approaches the pricier Opus 4.8 on many tasks at a lower per-token cost.

By Daniel Mercer Edited by Maria Konash Published:
Anthropic Launches Claude Sonnet 5, Closing In on Opus 4.8
Anthropic released Claude Sonnet 5, a more agentic mid-tier model it says approaches Opus 4.8 at a lower per-token price. Image: AIstify Team

Anthropic launched Claude Sonnet 5 on June 30, calling it its most agentic Sonnet model yet, one built to plan, use tools like browsers and terminals, and run autonomously on long tasks. The pitch is that it delivers capabilities that until recently required larger, costlier models. Sonnet 5 is the default model for Free and Pro users and is available to Max, Team and Enterprise customers, in Claude Code and through the API as claude-sonnet-5.

It arrives at introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31, after which it rises to the standard $3 and $15, still well below Opus 4.8’s $5 and $25.

On Anthropic’s own benchmarks, Sonnet 5 improves on its predecessor, Sonnet 4.6, across every disclosed test and narrows the gap to the flagship Opus 4.8. It scores 63.2% on the SWE-bench Pro agentic coding test, up from 58.1% and approaching Opus 4.8’s 69.2%, and 80.4% on Terminal-Bench 2.1, up from 67%. On Humanity’s Last Exam with tools it reaches 57.4%, nearly matching Opus 4.8’s 57.9%, and on a knowledge-work benchmark it edges just past Opus.

These are Anthropic’s self-reported figures, and the more meaningful selling point may be reliability: early testers said Sonnet 5 finishes complex tasks where earlier versions stalled and checks its own work unprompted, the kind of consistency that has kept many companies from moving agents into production.

The pricing comes with an important asterisk. Sonnet 5 uses an updated tokenizer that can turn the same text into roughly 1 to 1.35 times more tokens, and because it works more agentically it tends to consume more tokens per task. Independent analysis from Artificial Analysis put its cost at about $2.29 per task, roughly double Sonnet 4.6 and modestly above Opus 4.8, so the lower headline rate does not always mean a cheaper job. Anthropic offers adjustable “effort” levels to trade cost against accuracy, but at the highest settings Sonnet 5 can cost more than Opus 4.8 for similar results, leaving Opus the pick for accuracy-critical work.

Why It’s a Shift

The launch confirms that agentic ability is now the baseline expectation at every price tier, not a premium feature. Anthropic’s framing mirrors what rivals have said about their newest models, from OpenAI’s GPT-5.6 to Google’s Gemini 3.5 Flash, and signals that competition is moving from who can do agentic work best to who can do it cheaply and reliably without human oversight.

The strategy also has a business logic: by pushing near-flagship capability down to a mainstream price, Anthropic broadens developer adoption at a time when it is racing toward a possible public offering. The introductory discount, which the company said lets customers test Sonnet 5 against real workloads during migration, is aimed squarely at winning that usage.

The Safety Picture

Anthropic released Sonnet 5 with a mixed safety profile that it disclosed in unusual detail. The company said the model hallucinates and flatters users less than Sonnet 4.6, refuses malicious requests better, and strongly resists prompt-injection attacks, citing a browser-use injection success rate of under 1% versus far higher figures for earlier models.

On its automated audit of misaligned behaviors, Sonnet 5 scored safer overall than its predecessor but worse than the more capable Opus 4.8 and Mythos Preview. Anthropic said it did not train the model on cybersecurity tasks and that it could not build a working software exploit in a Mozilla-designed test, though it kept real-time cyber safeguards on by default.

Notably, the company flagged that Sonnet 5 is the first Claude model to criticize a rule in its own guiding constitution, adding it is unsure what that means and considers it worth watching, an unusually candid caveat for a product launch.

AI & Machine Learning, Enterprise Tech, News