Rate Limits

Every agent has a per-minute request limit. Both /v1/chat and /v1/chat/stream share the same counter — a request to either endpoint counts toward your agent's allowance.

Per-plan rates

Rate limits per agent · per 60-second window
Per-agent rate limitStarterProEnterpriseUpgrade

Starter

30 req / min

Pro

50 req / min

Enterprise

Custom

Per agent (not per account)Shared between /chat and /chat/stream

Need higher limits? Enterprise plans support custom rate limits per agent. Contact your account manager or reach out via the dashboard.

Response headers

Every API response includes rate-limit headers so your application always knows where it stands:

  • X-RateLimit-LimitMaximum requests allowed per 60-second window.
  • X-RateLimit-RemainingRequests remaining in the current window.
  • X-RateLimit-ResetSeconds until the current window resets.

Example — normal response headers:

text
HTTP/2 200
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 29
X-RateLimit-Reset: 58

Handling 429 errors

When you exceed the rate limit, the API returns HTTP 429. The response includes the same rate-limit headers plus a standard error body:

429 response headers:

text
HTTP/2 429
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 42

429 response body:

json
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Please retry after 42 seconds."
  }
}
The X-RateLimit-Reset header tells you exactly how many seconds to wait. Always use this value instead of hard-coding a retry delay.

Retry strategy

We recommend exponential backoff with X-RateLimit-Reset as the minimum wait time. Ready-to-use examples:

Python — retry with backoff:

python
import time, requests

def chat_with_retry(base_url, headers, payload, max_retries=3):
    for attempt in range(max_retries + 1):
        res = requests.post(f"{base_url}/v1/chat", headers=headers, json=payload)

        if res.status_code != 429:
            return res.json()

        reset = int(res.headers.get("X-RateLimit-Reset", 60))
        wait = reset + (2 ** attempt)  # backoff on top of reset
        print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(wait)

    raise Exception("Rate limit exceeded after max retries")

JavaScript — retry with backoff:

javascript
async function chatWithRetry(baseUrl, headers, payload, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(`${baseUrl}/v1/chat`, {
      method: "POST",
      headers,
      body: JSON.stringify(payload),
    });

    if (res.status !== 429) return res.json();

    const reset = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
    const wait = (reset + 2 ** attempt) * 1000;
    console.log(`Rate limited. Retrying in ${wait / 1000}s (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((r) => setTimeout(r, wait));
  }
  throw new Error("Rate limit exceeded after max retries");
}

Best practices

  1. 1

    Read the headers proactively

    Check X-RateLimit-Remaining on every response. If it's approaching zero, throttle outgoing requests before you hit the limit.
  2. 2

    Use the reset header, not a fixed delay

    Hard-coding sleep(60) wastes time if the window resets sooner. Always read X-RateLimit-Reset for the exact wait.
  3. 3

    Queue and batch where possible

    If your application generates bursts of requests, implement a client-side queue that spaces them evenly across the 60-second window.
  4. 4

    Cache responses

    If multiple users ask the same question, cache the agent's response on your side to avoid consuming your rate limit with duplicate calls.

Avoid the throttle entirely

Pin your integration to one outbound request per user action and add a tiny client-side debounce. If you're polling, switch to streaming via SSE — progressive UI with one request instead of N.