English

Rate limits

中文

速率限制

English

To mitigate misuse and manage capacity on our API, we have implemented limits on how much an organization can use the Claude API.

中文

為了防止濫用並管理 API 容量，我們對組織可以使用 Claude API 的量實施了限制。

English

About our limits

中文

關於我們的限制

English

We have two types of limits:

中文

我們有兩種類型的限制：

English

Spend limits: Set a maximum monthly cost an organization can incur for API usage.
Rate limits: Set the maximum number of API requests an organization can make over a defined period of time.

中文

消費限制：設定組織每月 API 使用的最高成本上限。
速率限制：設定組織在特定時間內可以發出的最大 API 請求數量。

English

Limits are designed to prevent API abuse, while minimizing impact on common customer usage patterns. Limits are defined by usage tier, where each tier is associated with a different set of spend and rate limits.

中文

這些限制旨在防止 API 濫用，同時盡量減少對常見客戶使用模式的影響。限制由使用層級定義，每個層級都有不同的消費和速率限制組合。

English

Spend limits

中文

消費限制

English

Each usage tier has a limit on how much you can spend on the API each calendar month. Once you reach the spend limit of your tier, until you qualify for the next tier, you will have to wait until the next month to be able to use the API again.

中文

每個使用層級都有每個日曆月份的 API 消費限制。一旦達到你所在層級的消費限制，在你符合下一層級資格之前，你必須等到下個月才能再次使用 API。

English

Usage Tier	Credit Purchase	Max Credit Purchase
Tier 1	$5	$100
Tier 2	$40	$500
Tier 3	$200	$1,000
Tier 4	$400	$5,000
Monthly Invoicing	N/A	N/A

中文

使用層級	購買額度門檻	最大購買額度
第 1 層	$5	$100
第 2 層	$40	$500
第 3 層	$200	$1,000
第 4 層	$400	$5,000
月結帳單	不適用	不適用

English

Rate limits

中文

速率限制

English

Our rate limits for the Messages API are measured in requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM) for each model class.

中文

我們對 Messages API 的速率限制以每分鐘請求數（RPM）、每分鐘輸入 token 數（ITPM）和每分鐘輸出 token 數（OTPM）來計算，每個模型類別各有不同。

English

If you exceed any of the rate limits you will get a 429 error describing which rate limit was exceeded, along with a retry-after header indicating how long to wait.

中文

如果超過任何速率限制，你會收到 429 錯誤，說明超過了哪個限制，並附帶 retry-after 標頭指示需要等待多長時間。

English

Cache-aware ITPM

中文

快取感知的 ITPM

English

For most Claude models, only uncached input tokens count towards your ITPM rate limits. This is a key advantage that makes our rate limits effectively higher than they might initially appear.

中文

對於大多數 Claude 模型，只有未快取的輸入 token 會計入你的 ITPM 速率限制。這是一個關鍵優勢，使我們的速率限制實際上比表面看起來更高。

English

input_tokens (tokens after the last cache breakpoint) ✓ Count towards ITPM
cache_creation_input_tokens (tokens being written to cache) ✓ Count towards ITPM
cache_read_input_tokens (tokens read from cache) ✗ Do NOT count towards ITPM for most models

中文

input_tokens（最後一個快取斷點之後的 token）✓ 計入 ITPM
cache_creation_input_tokens（正在寫入快取的 token）✓ 計入 ITPM
cache_read_input_tokens（從快取讀取的 token）✗ 對大多數模型不計入 ITPM

English

Example: With a 2,000,000 ITPM limit and an 80% cache hit rate, you could effectively process 10,000,000 total input tokens per minute.

中文

範例：若有 2,000,000 ITPM 限制和 80% 的快取命中率，你實際上每分鐘可以處理 10,000,000 個總輸入 token。

English

Response headers

中文

回應標頭

English

The API response includes headers that show you the rate limit enforced, current usage, and when the limit will be reset.

中文

API 回應包含標頭，顯示強制執行的速率限制、目前使用量以及限制何時重置。

English

Header	Description
retry-after	The number of seconds to wait until you can retry the request.
anthropic-ratelimit-requests-limit	The maximum number of requests allowed within any rate limit period.
anthropic-ratelimit-requests-remaining	The number of requests remaining before being rate limited.
anthropic-ratelimit-tokens-limit	The maximum number of tokens allowed within any rate limit period.
anthropic-ratelimit-tokens-remaining	The number of tokens remaining before being rate limited.

中文

標頭	說明
retry-after	需要等待多少秒才能重試請求。
anthropic-ratelimit-requests-limit	任何速率限制期間內允許的最大請求數。
anthropic-ratelimit-requests-remaining	被限速前剩餘的請求數。
anthropic-ratelimit-tokens-limit	任何速率限制期間內允許的最大 token 數。
anthropic-ratelimit-tokens-remaining	被限速前剩餘的 token 數。