Anthropic released Claude Opus 4.8 on May 28, 2026, positioning it as a quality upgrade over Opus 4.7 with no change to pricing, context window, or model class.1 The practical implication for teams already using Opus 4.7 via the direct API: the same budget buys a more capable model, and nothing needs to change in the invoice.
What is Opus 4.8
Opus 4.8 is Anthropic’s current flagship, accessible via the API model ID claude-opus-4-8.2 It carries a 1M token context window (200k on Microsoft Foundry), supports up to 128k output tokens (up to 300k via Batch API beta header), and has a knowledge cutoff of January 2026.2
The model ships in two modes. Standard mode bills at $5 per million input tokens and $25 per million output tokens, matching Opus 4.7’s price exactly.1 Fast mode bills at $10 per million input and $50 per million output, delivers roughly 2.5x the throughput, and is according to Anthropic three times cheaper than previous fast modes for the same speed tier.1
How Opus 4.8 compares to Opus 4.7 on benchmarks
Anthropic published head-to-head numbers across seven evaluations.1 The table below covers each one, with competitor figures where published:
| Benchmark | Opus 4.8 | Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Pro (agentic coding) | 69.2% | 64.3% | 58.6% | 54.2% |
| Terminal-Bench 2.1 | 74.6% | 66.1% | 78.2% | 70.3% |
| Humanity’s Last Exam, no tools | 49.8% | 46.9% | 41.4% | 44.4% |
| Humanity’s Last Exam, with tools | 57.9% | 54.7% | 52.2% | 51.4% |
| OSWorld-Verified (computer use) | 83.4% | 82.8% | 78.7% | 76.2% |
| GDPval-AA (knowledge work, score) | 1890 | 1753 | 1769 | 1314 |
| Finance Agent v2 | 53.9% | 51.5% | 51.8% | 43.0% |
Two results warrant attention. On SWE-Bench Pro, Opus 4.8’s 69.2% leads every listed competitor by a wide margin; GPT-5.5 reaches 58.6% and Gemini 3.1 Pro 54.2%.1 On Terminal-Bench 2.1, Opus 4.8 scores 74.6% vs. Opus 4.7’s 66.1%, but GPT-5.5 leads this benchmark at 78.2%.1 Opus 4.8 does not top every benchmark.
Anthropic also published an Online-Mind2Web browser agent figure of 84% for Opus 4.8, without a corresponding Opus 4.7 figure in the same release.1
What did not change
Price, context, and output length are identical to Opus 4.7. The standard $5/$25 per million token rate is unchanged.1 The 1M input context window and 128k output limit carry forward.2 Teams that budgeted against Opus 4.7 for agentic workflows can substitute Opus 4.8 without adjusting cost models.
The model class is also unchanged. Opus 4.8 remains in the Opus tier, above Sonnet and Haiku in capability and above both in per-token cost.
Why Anthropic’s quality claims matter for code reliability
Anthropic states Opus 4.8 is four times less likely than Opus 4.7 to allow flaws in code.1 This claim covers the model’s behavior when reviewing or producing code containing potential vulnerabilities, not just its benchmark ranking. In agentic coding contexts, where the model may complete hundreds of file edits in a single session without human review of each change, a fourfold reduction in code-flaw pass-through is a structural shift in how much oversight a safe workflow requires.
Anthropic also describes Opus 4.8 as more likely to flag uncertainties, less likely to make unsupported claims, showing sharper judgment, and able to work independently for longer.1 These properties are relevant to multi-step agentic tasks where the model decides how to proceed without constant human prompting.
What shipped alongside Opus 4.8
Three related items released with or around the Opus 4.8 launch.1
Dynamic workflows (Claude Code research preview): Claude Code can run hundreds of parallel subagents in a single session, enabling workloads that previously required orchestrating separate API calls.
Effort control: Users can choose how much effort Claude applies to a task. The default is high on all surfaces.2
Claude Mythos Preview: An invitation-only research preview for defensive cybersecurity, associated with Project Glasswing.2
What Opus 4.8 costs relative to the alternatives
At standard API rates, the cost comparison is straightforward:
| Model | Input (per M tokens) | Output (per M tokens) |
|---|---|---|
| Opus 4.8 (standard) | $5.00 | $25.00 |
| Opus 4.8 (fast mode) | $10.00 | $50.00 |
| Opus 4.7 | $5.00 | $25.00 |
Standard Opus 4.8 and Opus 4.7 are priced identically.1 Fast mode doubles the input cost and doubles the output cost in exchange for roughly 2.5x throughput. Whether that tradeoff makes sense depends on whether your workflow is latency-bound or cost-bound. For batch jobs that can queue overnight, standard mode preserves the same cost floor as Opus 4.7. For real-time agentic loops where turn latency matters, fast mode costs twice as much per token but completes each turn in less than half the time.
The fast mode price also deserves context: Anthropic calls it three times cheaper than it was for previous models at equivalent speed.1 Teams that evaluated earlier fast tiers and rejected them on cost grounds should re-check the current numbers.
Frequently Asked Questions
Does Opus 4.8 replace Opus 4.7, or do both remain available?
Anthropic has not announced a deprecation timeline for Opus 4.7 as of the May 28, 2026 release. Both model IDs may be callable via the API simultaneously. Teams should check the models overview page for current availability.2
Is there a published SWE-bench Verified figure for Opus 4.8?
No. Anthropic’s Opus 4.8 release uses SWE-Bench Pro (69.2%) as its headline coding benchmark. There is no published SWE-bench Verified figure for Opus 4.8 in the release materials.1
How does the 1M context window change on Microsoft Foundry?
On Microsoft Foundry, the context window is 200k tokens rather than 1M.2 The reduction applies only to the Foundry deployment path; direct Anthropic API access retains the full 1M context.
What does the Batch API beta header extend for output tokens?
The standard max output is 128k tokens. With the Batch API beta header, output can extend up to 300k tokens per request.2 This applies to batch workloads, not synchronous API calls.
Where does Opus 4.8 rank on Terminal-Bench 2.1 vs. competitors?
Opus 4.8 scores 74.6% on Terminal-Bench 2.1, above Opus 4.7 (66.1%) and Gemini 3.1 Pro (70.3%), but below GPT-5.5 (78.2%).1 GPT-5.5 leads this particular benchmark.