Last verified: June 13, 2026

The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously â€” at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.

Pricing: 50% off standard rates

Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests â€” only timing. The table below shows verified batch prices for active models.

Model

Batch input (per MTok)

Batch output (per MTok)

Standard input (per MTok)

Standard output (per MTok)

Claude Fable 5

$5.00

$25.00

$10.00

$50.00

Claude Opus 4.8

$2.50

$12.50

$5.00

$25.00

Claude Opus 4.7

$2.50

$12.50

$5.00

$25.00

Claude Opus 4.6

$2.50

$12.50

$5.00

$25.00

Claude Opus 4.5

$2.50

$12.50

$5.00

$25.00

Claude Sonnet 4.6

$1.50

$7.50

$3.00

$15.00

Claude Sonnet 4.5

$1.50

$7.50

$3.00

$15.00

Claude Haiku 4.5

$0.50

$2.50

$1.00

$5.00

Source: platform.claude.com/docs/en/build-with-claude/batch-processing

Key limits at a glance

Limit

Value

Maximum requests per batch

100,000

Maximum batch payload size

256 MB

Typical completion time

Under 1 hour

Hard expiration window

24 hours from creation

Result retention period

29 days after creation

Zero Data Retention eligible

Results format

JSONL, streamed via results_url

Supported models

All active Claude models

A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired â€” you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.

Message Batches API rate limits by tier

The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.

Tier

RPM (API calls)

Max batch requests in processing queue

Max batch requests per batch

Tier 1

100,000

Tier 2

1,000

200,000

100,000

Tier 3

2,000

300,000

100,000

Tier 4

4,000

500,000

100,000

Source: platform.claude.com/docs/en/api/rate-limits

RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute â€” that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.

Stacking batch pricing with prompt caching

The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.

The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.

Token type

Multiplier applied

Effective price per MTok

How calculated

Uncached input (standard)

$5.00

Baseline

Uncached input (batch)

0.5x

$2.50

50% batch discount

Cache write â€” 1h TTL (batch)

2x Ã— 0.5x = 1x

$5.00

2x write cost, then 50% batch

Cache read (batch)

0.1x Ã— 0.5x = 0.05x

$0.25

10% read cost, then 50% batch

Output (batch)

0.5x of $25.00

$12.50

50% batch discount on output

In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.

How results come back

When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format â€” one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.

Does the Batches API count against my standard Messages API rate limits?

No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

What happens if my batch does not finish within 24 hours?

Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.

Can I use extended thinking, tool use, or vision in a batch?

Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.

How long are batch results available for download?

Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.

Is the Batches API eligible for Zero Data Retention?

No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.

Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

Pricing: 50% off standard rates

Key limits at a glance

Message Batches API rate limits by tier

Stacking batch pricing with prompt caching

How results come back

Does the Batches API count against my standard Messages API rate limits?

What happens if my batch does not finish within 24 hours?

Can I use extended thinking, tool use, or vision in a batch?

How long are batch results available for download?

Is the Batches API eligible for Zero Data Retention?