Opus 4.8 Batch API: 1M Context, 300k Output, and Team Cost Controls

Q: What is the maximum output tokens for Opus 4.8?

The standard maximum is 128k tokens. Using the Batch API with the beta header raises the ceiling to 300k tokens per request. The extended output is not available on synchronous API calls.

Q: What is the knowledge cutoff date for Opus 4.8?

January 2026. The model has no native knowledge of events, publications, or API changes after that date. Retrieval augmentation is the standard method for bridging the gap.

Q: Does Fable 5 support the 300k Batch API output extension?

Yes, the output-300k-2026-03-24 beta header works with Fable 5 as well as Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6. [Updated June 2026] However, Fable 5 API access was suspended globally on June 13, 2026 under a US government export-control directive and is currently unavailable. Until access is restored, Opus 4.8 via Batch API is the practical path to the 300k output ceiling. --- See also: Opus 4.8 vs Opus 4.7: What Changed and What Did Not for a detailed capability comparison, and Should Your Coding Team Upgrade to Opus 4.8 for the cost tradeoff analysis. For context window architecture and practical token budgeting, see What Can You Actually Do With a Million-Token Context Window. For context on the Fable 5 and Mythos 5 access suspension and its implications for teams planning tier upgrades, see US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide.

Anthropic released Claude Opus 4.8 on May 28, 2026, with a 1M token context window, a standard 128k max output, and a Batch API beta header that extends output to 300k tokens per request.² The knowledge cutoff is January 2026.² For teams running large document analysis pipelines or multi-step agentic jobs, these three numbers determine what you can fit in a single call, how much you get back, and whether your model has seen recent data.

What Is the Opus 4.8 Context Window

The context window for claude-opus-4-8 is 1M input tokens via the direct Anthropic API.² That figure drops to 200k tokens on Microsoft Foundry.² The reduction on Foundry reflects deployment configuration at the infrastructure layer, not a model capability difference.

One million tokens is enough to hold an entire mid-sized codebase, a year of customer support transcripts, or several hundred pages of technical documentation in a single context. The practical limit for most teams is not the token ceiling but the cost of filling it. At $5 per million input tokens (standard rate),¹ a full 1M-token context costs $5.00 per inference call before any output costs are added. That math matters when designing pipeline jobs that run hundreds of calls per day.

On Microsoft Foundry, the 200k ceiling changes batch job design. Documents that fit in a single 1M-token call on the direct API need to be split into five or more chunks on Foundry. Teams migrating between deployment paths should audit their chunking logic before switching.

How the Batch API Extends Max Output

Standard max output for Opus 4.8 is 128k tokens.² With the Batch API beta header, output can reach up to 300k tokens per request.² That extension is available only on batch requests, not on synchronous API calls.

The 300k output ceiling is relevant for workloads that produce long-form content: full code files, detailed legal or financial reports, or multi-chapter documents generated in a single call. Without the batch extension, a 128k output limit forces some of these tasks into multi-call sequences where the model writes a section, stores intermediate output, and continues on a follow-up call. With the 300k extension, many of those tasks collapse into a single Batch API request.

The tradeoff is that Batch API jobs accept asynchronous processing. Results are not returned in real time. For latency-sensitive applications, the 128k standard output is the ceiling. For offline analysis or content generation pipelines that queue overnight, the 300k extension is the practical ceiling.

When to Use Batch API vs Real-Time API

The choice between Batch API and synchronous calls is primarily a latency vs. cost tradeoff. Opus 4.8 standard pricing is $5 per million input tokens and $25 per million output tokens.¹ Fast mode runs at $10 per million input and $50 per million output, delivering roughly 2.5x throughput.¹ Batch API jobs receive a 50% discount on all tokens, reducing effective costs to $2.50 per million input and $12.50 per million output. [Updated June 2026]

Access path	Input cost (per M tokens)	Output cost (per M tokens)	Max output	Latency
Synchronous, standard	$5.00	$25.00	128k	Real-time
Synchronous, fast mode	$10.00	$50.00	128k	~2.5x faster
Batch API	$2.50	$12.50	300k (beta header)	Asynchronous

For pipelines where output volume is the bottleneck, Batch API at $2.50/$12.50 per million tokens with the 300k output extension is the strongest-cost option across the three tiers. For interactive agents or real-time coding assistants, fast mode at $10/$50 is the option that reduces turn latency while keeping the model response within a session.¹

Anthropic describes fast mode as three times cheaper than previous fast-mode models.¹ Teams that evaluated earlier fast tiers and rejected them on cost grounds should re-check the current numbers before defaulting to synchronous standard.

With the June 2026 launch of Claude Fable 5, the $10/$50 per million token rate applies to Anthropic’s top-tier model, though access was suspended globally on June 13, 2026 under a US Commerce Department export-control directive — three days after general availability.⁵ [Updated June 2026] Fable 5 is positioned as Anthropic’s most capable widely released model, sitting above the Opus tier, with the same 1M context window and 128k synchronous output as Opus 4.8.⁵ The output-300k-2026-03-24 Batch API beta header applies to Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, and Fable 5; the extension is model-agnostic for supported batch endpoints.⁵ However, given Fable 5’s ongoing suspension, Opus 4.8 via Batch API is the reliable path for teams requiring the 300k output ceiling as of June 2026.

How to Structure Large Batch Jobs with Opus 4.8

Large batch workloads hit three structural constraints: context size per call, output size per call, and rate limits across calls.

Context budgeting. At 1M tokens per call (direct API),² the question is how to fill that context to maximize per-call yield. For document analysis, the most efficient approach batches multiple documents into a single call’s context rather than one document per call, up to the token ceiling. For a pipeline analyzing 10,000 documents averaging 5,000 tokens each, a naive one-document-per-call design produces 10,000 API calls. Packing ten documents per call (50k tokens) reduces the call count by 10x while keeping each call well under the 1M limit. The output budget per call is then the relevant ceiling, not the context budget.

Output budgeting. At 300k tokens per call via the Batch API beta header,² a pipeline producing 1,000-token summaries per document can handle 300 documents per call. Combining context and output limits: a well-designed batch call for a summarization pipeline might pack 300 documents into context (at 5,000 tokens per document, 1.5M tokens, which requires chunking to stay under 1M) or pack 200 documents (1M tokens context) and retrieve summaries at up to 300k output in a single response.

The 50% Batch API discount changes the per-document cost math significantly versus synchronous standard. A summarization call processing 200 documents — 1M input tokens plus 200k output tokens — costs $10 at synchronous standard rates ($5 input + $5 output). The same call via Batch API costs $5 ($2.50 + $2.50), halving the per-document cost before any prompt caching is applied. For overnight pipelines processing tens of thousands of documents, the cost difference compounds across the full run. [Updated June 2026]

Rate limit allocation. Rate limits on the Anthropic API apply at the organization level and can be distributed across projects or teams. For mixed deployments running both Opus 4.8 (at $5/$25 synchronous, $2.50/$12.50 batch) and Sonnet-tier models at lower per-token costs, quota allocation is a cost control lever as much as a throughput control. Assigning Opus 4.8 quota specifically to high-value agentic tasks, while routing classification or extraction tasks to a lower-cost model, reduces average spend per token across the system without degrading quality where it matters. [Updated June 2026]

Quota Allocation Strategies for Mixed Opus/Sonnet Teams

Teams running Opus 4.8 alongside Sonnet-class models face a per-task routing decision on every API call. The inputs to that decision are task complexity, latency requirements, and cost tolerance.

Opus 4.8 scores 69.2% on SWE-Bench Pro¹ and is four times less likely than Opus 4.7 to allow flaws in code.¹ Those numbers justify Opus 4.8 for tasks where code correctness directly affects production systems: automated PRs, security patch review, complex multi-file refactors. For tasks where errors are cheap to catch downstream, such as first-pass code search, test skeleton generation, or inline comment generation, a Sonnet-class model reduces cost without increasing downstream risk.

As of June 2026, Claude Fable 5 was released at $10/$50 per million tokens for workloads requiring the highest capability tier, but access was suspended June 13, 2026 under a US export-control order and remains unavailable via the API as of this writing.⁵ [Updated June 2026] Teams planning Fable 5 migrations should hold until access is restored. The routing logic below covers the currently operational tiers — Opus 4.8 and Sonnet — for batch-heavy pipelines.

A tiered routing design for Opus 4.8 might look like this:

Opus 4.8 (direct API, synchronous): Multi-step agentic tasks, code review where the output is committed without human review, long-context document analysis requiring reasoning across many pages.
Opus 4.8 (Batch API, asynchronous): Overnight report generation, bulk document summarization, output-heavy tasks where 300k output per call reduces overall call count.
Sonnet-tier (synchronous): Code autocomplete, single-turn Q&A, extraction tasks with structured output schemas, classification.

The Opus 4.8 knowledge cutoff of January 2026² is a practical constraint for routing as well. Tasks that depend on events or API changes from February 2026 onward should route to a model with a more recent cutoff or be supplemented with retrieval-augmented context.

Why the Knowledge Cutoff Affects Batch Pipeline Design

Opus 4.8’s January 2026 knowledge cutoff² means the model has no native knowledge of libraries, standards, or events published after that date. For pipelines analyzing current news, recent regulatory filings, or documentation for APIs released in 2026, this creates a gap that requires explicit handling.

The standard mitigation is retrieval augmentation: prepend relevant context from a current source into the call’s context window before the model processes the task. With 1M tokens available,² there is substantial room to inject retrieved documents alongside the primary task context. A pipeline that appends a 50,000-token context block of retrieved documentation still has 950,000 tokens available for the primary task on the direct API path.

For teams running Opus 4.8 on Microsoft Foundry with its 200k token limit,² retrieval augmentation competes more directly with primary task context. A 50,000-token retrieval block consumes 25% of a 200k context. Pipeline designers on Foundry should budget retrieval context as a first-class allocation, not an afterthought.

Frequently Asked Questions

What is the maximum output tokens for Opus 4.8?

The standard maximum is 128k tokens.² Using the Batch API with the beta header raises the ceiling to 300k tokens per request.² The extended output is not available on synchronous API calls.

Does the 1M context window apply on Microsoft Foundry?

No. The context window is 200k tokens on Microsoft Foundry,² not 1M. The full 1M context is available only via direct Anthropic API access.

What is the knowledge cutoff date for Opus 4.8?

January 2026.² The model has no native knowledge of events, publications, or API changes after that date. Retrieval augmentation is the standard method for bridging the gap.

Is the Batch API output extension generally available?

The 300k output extension is available via a Batch API beta header.² Beta features may have availability constraints or terms that differ from generally available capabilities.

How does Opus 4.8 pricing compare for batch vs. real-time jobs?

Batch API requests receive a 50% discount, so Opus 4.8 batch jobs cost $2.50 per million input tokens and $12.50 per million output tokens.¹ [Updated June 2026] Standard synchronous calls are $5/$25. Fast mode for synchronous calls is $10/$50 per million tokens with approximately 2.5x speed.¹ Batch jobs cannot run in fast mode; the trade is lower latency on fast mode synchronous calls versus the 50% price reduction and higher output ceiling on batch calls.

Does upgrading from Opus 4.7 change these limits?

No. The 1M context window, 128k standard output, and 300k Batch API output ceiling are the same between Opus 4.7 and Opus 4.8.²³ The upgrade changes quality, not specs. Existing batch pipeline designs built for Opus 4.7 run without modification on Opus 4.8 by updating the model ID to claude-opus-4-8.

Does Fable 5 support the 300k Batch API output extension?

Yes, the output-300k-2026-03-24 beta header works with Fable 5 as well as Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.⁵ [Updated June 2026] However, Fable 5 API access was suspended globally on June 13, 2026 under a US government export-control directive and is currently unavailable. Until access is restored, Opus 4.8 via Batch API is the practical path to the 300k output ceiling.

See also: Opus 4.8 vs Opus 4.7: What Changed and What Did Not for a detailed capability comparison, and Should Your Coding Team Upgrade to Opus 4.8 for the cost tradeoff analysis. For context window architecture and practical token budgeting, see What Can You Actually Do With a Million-Token Context Window. For context on the Fable 5 and Mythos 5 access suspension and its implications for teams planning tier upgrades, see US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide.