Drafted by AI from sourced research, reviewed by a human before publish. Model IDs, prices, and limits are cited to Anthropic’s model documentation; benchmark figures are attributed to Anthropic’s launch announcements with their model versions named.
Anthropic’s current stack is three tiers, and the choice is an economics problem, not a benchmark one. Opus 4.8 (claude-opus-4-8, $5 / $25 per million input/output tokens) is the orchestrator for long-horizon, high-autonomy work. Sonnet 4.6 (claude-sonnet-4-6, $3 / $15) is the default daily driver. Haiku 4.5 (claude-haiku-4-5, $1 / $5) is the worker you fan out across parallel sub-agents at roughly a third of Sonnet’s token cost. Match the tier to the job and you stop paying Opus rates for work Haiku finishes faster.
Definitions
Opus 4.8 — Anthropic’s most capable Opus-tier model: highest autonomy, best long-horizon agentic execution, 1M-token context, and up to 128K output tokens. Adaptive thinking only; effort defaults to high.
Sonnet 4.6 — the best balance of speed and intelligence. Same 1M-token context, 64K max output, adaptive thinking. The default model on Claude’s Free and Pro tiers.
Haiku 4.5 — the fast, cheap tier for high-volume and latency-sensitive work. 200K-token context, 64K max output. Built to run as a sub-agent under a more capable orchestrator.
The tiers at a glance
| Opus 4.8 | Sonnet 4.6 | Haiku 4.5 | |
|---|---|---|---|
| API ID | claude-opus-4-8 |
claude-sonnet-4-6 |
claude-haiku-4-5 |
| Input / output ($/M tokens) | $5 / $25 | $3 / $15 | $1 / $5 |
| Context window | 1M | 1M | 200K |
| Max output | 128K | 64K | 64K |
| Thinking | Adaptive (effort low–max, default high) |
Adaptive (effort low–max) |
No effort parameter |
| Fast mode | Yes (≈2.5× output speed, premium pricing) | — | — |
| Best fit | Long-horizon agents, codebase-scale migrations, orchestration | Daily coding, document Q&A, design-heavy frontend | Parallel sub-agents, real-time chat, high-volume ops |
Prices, context windows, and output caps are from Anthropic’s model documentation. The effort parameter is Opus- and Sonnet-tier only — it returns an error on Haiku 4.5, so Haiku is configured through the older thinking controls, not effort.
Pick by job, not by benchmark
Benchmark deltas between tiers are real but they are not the decision. The decision is which role the model plays in your workflow and what that role costs per token. Three roles:
| Role in the workflow | Tier | Why |
|---|---|---|
| Orchestrator — plans, delegates, holds the long thread | Opus 4.8 | Highest autonomy and long-horizon coherence; Claude Code’s dynamic workflows fan out hundreds of parallel sub-agents under it |
| Default worker — most coding and knowledge tasks, solo | Sonnet 4.6 | Closes most of the gap to Opus on everyday work at 60% of the input cost |
| Parallel worker — fast, scoped, many at once | Haiku 4.5 | ≈⅓ of Sonnet’s token cost and up to 4–5× faster than Sonnet 4.5, so you can run many in parallel without the bill exploding |
The expensive mistake is running every step at Opus rates. The other expensive mistake is forcing a deep, long-horizon migration through a tier that loses the thread halfway. Tier selection is the lever; the sections below are when to pull it.
Encoded as a config your scripts and sub-agent launchers can read:
ORCHESTRATOR=claude-opus-4-8 # long-horizon, high autonomy
DEFAULT=claude-sonnet-4-6 # daily driver
WORKER=claude-haiku-4-5 # parallel sub-agents
When to reach for Opus 4.8
Reach for Opus 4.8 when the task is long-horizon, hard to fully specify up front, and expensive to get wrong: a multi-day refactor, a codebase-scale migration, or an orchestrator that delegates to a fleet of workers and has to keep the plan straight across hundreds of tool calls.
The capability case is concrete. On Anthropic’s launch numbers, Opus 4.8 is roughly 4× less likely than Opus 4.7 to leave a code flaw unremarked, and scores 84% on Online-Mind2Web for browser and computer use — the kind of self-verifying, multi-step work where a cheaper tier compounds small errors into a wrong result.
Two operational notes:
- Give it the whole spec up front and run at high effort. Opus 4.8’s long-horizon strength comes from reasoning more at each step. A complete first turn plus
effort: "high"or"xhigh"produces more efficient and more accurate runs than drip-feeding context. - Fast mode is an Opus-tier lever. When latency matters, Opus 4.8 runs the same model at roughly 2.5× output speed at premium pricing — a knob the cheaper tiers don’t have.
When Sonnet 4.6 is the right default
Sonnet 4.6 is the model you should reach for first and downgrade from only with a reason. It carries the same 1M-token context as Opus, supports adaptive thinking, and at $3 / $15 costs 40% less on input than Opus for the bulk of coding and document work. At its launch, Claude Code testers preferred Sonnet 4.6 to the previous Sonnet generation in about 70% of head-to-heads — and to Opus 4.5, the prior flagship tier, in 59%. That is a genuine same-tier upgrade, not a marketing delta.
Stay on Sonnet 4.6 for interactive coding, document Q&A, and design-heavy frontend work. Escalate to Opus 4.8 only when the task is genuinely long-horizon or autonomy-critical — and remember that a bigger window is not what makes the harder tier worth it. The discipline that actually keeps agents reliable is context engineering, and it applies at every tier.
When Haiku 4.5 wins
Haiku 4.5 is the sub-agent economy. At $1 / $5 it is roughly a third of Sonnet’s per-token cost, and on Anthropic’s numbers it lands 73.3% on SWE-bench Verified while matching Sonnet 4-class coding and computer-use quality at up to 4–5× the speed of Sonnet 4.5. That combination — cheap, fast, capable enough — is exactly what a parallel worker needs.
Use it for real-time chat, high-volume classification and extraction, and any architecture that fans many scoped tasks out at once. The one constraint to design around: Haiku’s context window is 200K, not 1M, so keep each worker’s job small and self-contained rather than handing it a sprawling history.
The orchestrator–worker pattern
The tiers compose. The pattern that gets the most out of all three is an Opus 4.8 orchestrator that delegates scoped sub-tasks to Haiku 4.5 workers, with Sonnet 4.6 as the solo default when no orchestration is needed.
Symptom — a single big model reads a 30K-token API reference or a pile of source files, and that clutter stays in context for the rest of the run, dragging quality down. Cause — heavy reads done in the main loop persist on every subsequent turn. Solution — push the read into a sub-agent that runs in its own window and returns only the conclusion. Spawn those workers on Haiku and you get the isolation and the cheapest possible token rate for the grunt work. See subagent context isolation.
This is why the per-token spread matters more than the benchmark spread: in a fan-out architecture, the worker tier runs far more tokens than the orchestrator, so a 3× cheaper worker is a 3× cheaper bill on the dominant cost.
Bottom line
There is no single best Claude model — there is a best model per role. Default to Sonnet 4.6, escalate to Opus 4.8 for long-horizon and high-autonomy work, and delegate the parallel grunt work to Haiku 4.5. Pick by the job and the token bill, not the leaderboard, and the stack pays for itself.