Skip to content
AGENTIC_ENG
Claude

Pick the right Claude tier for agent workflows

AUTHOR:
Bartłomiej Krupa
PUBLISHED:
2026.06.30
EST_READ:
7 min

Drafted by AI from sourced research, reviewed by a human before publish. Model IDs, prices, and limits are cited to Anthropic’s model documentation; benchmark figures are attributed to Anthropic’s launch announcements with their model versions named.

Anthropic’s current stack is three tiers, and the choice is an economics problem, not a benchmark one. Opus 4.8 (claude-opus-4-8, $5 / $25 per million input/output tokens) is the orchestrator for long-horizon, high-autonomy work. Sonnet 4.6 (claude-sonnet-4-6, $3 / $15) is the default daily driver. Haiku 4.5 (claude-haiku-4-5, $1 / $5) is the worker you fan out across parallel sub-agents at roughly a third of Sonnet’s token cost. Match the tier to the job and you stop paying Opus rates for work Haiku finishes faster.

Definitions

Opus 4.8 — Anthropic’s most capable Opus-tier model: highest autonomy, best long-horizon agentic execution, 1M-token context, and up to 128K output tokens. Adaptive thinking only; effort defaults to high.

Sonnet 4.6 — the best balance of speed and intelligence. Same 1M-token context, 64K max output, adaptive thinking. The default model on Claude’s Free and Pro tiers.

Haiku 4.5 — the fast, cheap tier for high-volume and latency-sensitive work. 200K-token context, 64K max output. Built to run as a sub-agent under a more capable orchestrator.

The tiers at a glance

Opus 4.8 Sonnet 4.6 Haiku 4.5
API ID claude-opus-4-8 claude-sonnet-4-6 claude-haiku-4-5
Input / output ($/M tokens) $5 / $25 $3 / $15 $1 / $5
Context window 1M 1M 200K
Max output 128K 64K 64K
Thinking Adaptive (effort lowmax, default high) Adaptive (effort lowmax) No effort parameter
Fast mode Yes (≈2.5× output speed, premium pricing)
Best fit Long-horizon agents, codebase-scale migrations, orchestration Daily coding, document Q&A, design-heavy frontend Parallel sub-agents, real-time chat, high-volume ops

Prices, context windows, and output caps are from Anthropic’s model documentation. The effort parameter is Opus- and Sonnet-tier only — it returns an error on Haiku 4.5, so Haiku is configured through the older thinking controls, not effort.

Pick by job, not by benchmark

Benchmark deltas between tiers are real but they are not the decision. The decision is which role the model plays in your workflow and what that role costs per token. Three roles:

Role in the workflow Tier Why
Orchestrator — plans, delegates, holds the long thread Opus 4.8 Highest autonomy and long-horizon coherence; Claude Code’s dynamic workflows fan out hundreds of parallel sub-agents under it
Default worker — most coding and knowledge tasks, solo Sonnet 4.6 Closes most of the gap to Opus on everyday work at 60% of the input cost
Parallel worker — fast, scoped, many at once Haiku 4.5 ≈⅓ of Sonnet’s token cost and up to 4–5× faster than Sonnet 4.5, so you can run many in parallel without the bill exploding

The expensive mistake is running every step at Opus rates. The other expensive mistake is forcing a deep, long-horizon migration through a tier that loses the thread halfway. Tier selection is the lever; the sections below are when to pull it.

Encoded as a config your scripts and sub-agent launchers can read:

ORCHESTRATOR=claude-opus-4-8   # long-horizon, high autonomy
DEFAULT=claude-sonnet-4-6      # daily driver
WORKER=claude-haiku-4-5        # parallel sub-agents

When to reach for Opus 4.8

Reach for Opus 4.8 when the task is long-horizon, hard to fully specify up front, and expensive to get wrong: a multi-day refactor, a codebase-scale migration, or an orchestrator that delegates to a fleet of workers and has to keep the plan straight across hundreds of tool calls.

The capability case is concrete. On Anthropic’s launch numbers, Opus 4.8 is roughly 4× less likely than Opus 4.7 to leave a code flaw unremarked, and scores 84% on Online-Mind2Web for browser and computer use — the kind of self-verifying, multi-step work where a cheaper tier compounds small errors into a wrong result.

Two operational notes:

  • Give it the whole spec up front and run at high effort. Opus 4.8’s long-horizon strength comes from reasoning more at each step. A complete first turn plus effort: "high" or "xhigh" produces more efficient and more accurate runs than drip-feeding context.
  • Fast mode is an Opus-tier lever. When latency matters, Opus 4.8 runs the same model at roughly 2.5× output speed at premium pricing — a knob the cheaper tiers don’t have.

When Sonnet 4.6 is the right default

Sonnet 4.6 is the model you should reach for first and downgrade from only with a reason. It carries the same 1M-token context as Opus, supports adaptive thinking, and at $3 / $15 costs 40% less on input than Opus for the bulk of coding and document work. At its launch, Claude Code testers preferred Sonnet 4.6 to the previous Sonnet generation in about 70% of head-to-heads — and to Opus 4.5, the prior flagship tier, in 59%. That is a genuine same-tier upgrade, not a marketing delta.

Stay on Sonnet 4.6 for interactive coding, document Q&A, and design-heavy frontend work. Escalate to Opus 4.8 only when the task is genuinely long-horizon or autonomy-critical — and remember that a bigger window is not what makes the harder tier worth it. The discipline that actually keeps agents reliable is context engineering, and it applies at every tier.

When Haiku 4.5 wins

Haiku 4.5 is the sub-agent economy. At $1 / $5 it is roughly a third of Sonnet’s per-token cost, and on Anthropic’s numbers it lands 73.3% on SWE-bench Verified while matching Sonnet 4-class coding and computer-use quality at up to 4–5× the speed of Sonnet 4.5. That combination — cheap, fast, capable enough — is exactly what a parallel worker needs.

Use it for real-time chat, high-volume classification and extraction, and any architecture that fans many scoped tasks out at once. The one constraint to design around: Haiku’s context window is 200K, not 1M, so keep each worker’s job small and self-contained rather than handing it a sprawling history.

The orchestrator–worker pattern

The tiers compose. The pattern that gets the most out of all three is an Opus 4.8 orchestrator that delegates scoped sub-tasks to Haiku 4.5 workers, with Sonnet 4.6 as the solo default when no orchestration is needed.

Symptom — a single big model reads a 30K-token API reference or a pile of source files, and that clutter stays in context for the rest of the run, dragging quality down. Cause — heavy reads done in the main loop persist on every subsequent turn. Solution — push the read into a sub-agent that runs in its own window and returns only the conclusion. Spawn those workers on Haiku and you get the isolation and the cheapest possible token rate for the grunt work. See subagent context isolation.

This is why the per-token spread matters more than the benchmark spread: in a fan-out architecture, the worker tier runs far more tokens than the orchestrator, so a 3× cheaper worker is a 3× cheaper bill on the dominant cost.

Bottom line

There is no single best Claude model — there is a best model per role. Default to Sonnet 4.6, escalate to Opus 4.8 for long-horizon and high-autonomy work, and delegate the parallel grunt work to Haiku 4.5. Pick by the job and the token bill, not the leaderboard, and the stack pays for itself.