Last verified: June 13, 2026
The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously — at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.
Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests — only timing. The table below shows verified batch prices for active models.
Model
Batch input (per MTok)
Batch output (per MTok)
Standard input (per MTok)
Standard output (per MTok)
Claude Fable 5
$5.00
$25.00
$10.00
$50.00
Claude Opus 4.8
$2.50
$12.50
$5.00
$25.00
Claude Opus 4.7
$2.50
$12.50
$5.00
$25.00
Claude Opus 4.6
$2.50
$12.50
$5.00
$25.00
Claude Opus 4.5
$2.50
$12.50
$5.00
$25.00
Claude Sonnet 4.6
$1.50
$7.50
$3.00
$15.00
Claude Sonnet 4.5
$1.50
$7.50
$3.00
$15.00
Claude Haiku 4.5
$0.50
$2.50
$1.00
$5.00
Source: platform.claude.com/docs/en/build-with-claude/batch-processing
Limit
Value
Maximum requests per batch
100,000
Maximum batch payload size
256 MB
Typical completion time
Under 1 hour
Hard expiration window
24 hours from creation
Result retention period
29 days after creation
Zero Data Retention eligible
No
Results format
JSONL, streamed via results_url
Supported models
All active Claude models
A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired — you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.
The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.
Tier
RPM (API calls)
Max batch requests in processing queue
Max batch requests per batch
Tier 1
50
100,000
100,000
Tier 2
1,000
200,000
100,000
Tier 3
2,000
300,000
100,000
Tier 4
4,000
500,000
100,000
Source: platform.claude.com/docs/en/api/rate-limits
RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute — that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.
The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.
The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.
Token type
Multiplier applied
Effective price per MTok
How calculated
Uncached input (standard)
1x
$5.00
Baseline
Uncached input (batch)
0.5x
$2.50
50% batch discount
Cache write — 1h TTL (batch)
2x × 0.5x = 1x
$5.00
2x write cost, then 50% batch
Cache read (batch)
0.1x × 0.5x = 0.05x
$0.25
10% read cost, then 50% batch
Output (batch)
0.5x of $25.00
$12.50
50% batch discount on output
In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.
When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format — one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.
No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.
Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.
Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.
Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.
No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.