Rate Limits
Every agent has a per-minute request limit. Both /v1/chat and /v1/chat/stream share the same counter — a request to either endpoint counts toward your agent's allowance.
Per-plan rates
Starter
30 req / min
Pro
50 req / min
Enterprise
Custom
Need higher limits? Enterprise plans support custom rate limits per agent. Contact your account manager or reach out via the dashboard.
Response headers
Every API response includes rate-limit headers so your application always knows where it stands:
- X-RateLimit-LimitMaximum requests allowed per 60-second window.
- X-RateLimit-RemainingRequests remaining in the current window.
- X-RateLimit-ResetSeconds until the current window resets.
Example — normal response headers:
HTTP/2 200
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 29
X-RateLimit-Reset: 58Handling 429 errors
When you exceed the rate limit, the API returns HTTP 429. The response includes the same rate-limit headers plus a standard error body:
429 response headers:
HTTP/2 429
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 42429 response body:
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Please retry after 42 seconds."
}
}X-RateLimit-Reset header tells you exactly how many seconds to wait. Always use this value instead of hard-coding a retry delay.Retry strategy
We recommend exponential backoff with X-RateLimit-Reset as the minimum wait time. Ready-to-use examples:
Python — retry with backoff:
import time, requests
def chat_with_retry(base_url, headers, payload, max_retries=3):
for attempt in range(max_retries + 1):
res = requests.post(f"{base_url}/v1/chat", headers=headers, json=payload)
if res.status_code != 429:
return res.json()
reset = int(res.headers.get("X-RateLimit-Reset", 60))
wait = reset + (2 ** attempt) # backoff on top of reset
print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
raise Exception("Rate limit exceeded after max retries")JavaScript — retry with backoff:
async function chatWithRetry(baseUrl, headers, payload, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(`${baseUrl}/v1/chat`, {
method: "POST",
headers,
body: JSON.stringify(payload),
});
if (res.status !== 429) return res.json();
const reset = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
const wait = (reset + 2 ** attempt) * 1000;
console.log(`Rate limited. Retrying in ${wait / 1000}s (attempt ${attempt + 1}/${maxRetries})`);
await new Promise((r) => setTimeout(r, wait));
}
throw new Error("Rate limit exceeded after max retries");
}Best practices
- 1
Read the headers proactively
CheckX-RateLimit-Remainingon every response. If it's approaching zero, throttle outgoing requests before you hit the limit. - 2
Use the reset header, not a fixed delay
Hard-codingsleep(60)wastes time if the window resets sooner. Always readX-RateLimit-Resetfor the exact wait. - 3
Queue and batch where possible
If your application generates bursts of requests, implement a client-side queue that spaces them evenly across the 60-second window. - 4
Cache responses
If multiple users ask the same question, cache the agent's response on your side to avoid consuming your rate limit with duplicate calls.
Avoid the throttle entirely