Rate Limits

Every agent has a per-minute request limit. Each request to /v1/chat/stream counts toward your agent's allowance.

Per-plan rates

Rate limits per agent · per 60-second window

Per-agent rate limitStarterProEnterpriseUpgrade

Starter

30 req / min

Pro

50 req / min

Enterprise

Custom

Per agent (not per account)Counts every /chat/stream request

Need higher limits? Enterprise plans support custom rate limits per agent. Contact your account manager or reach out via the dashboard.

Response headers

Every API response includes rate-limit headers so your application always knows where it stands:

X-RateLimit-LimitMaximum requests allowed per 60-second window.
X-RateLimit-RemainingRequests remaining in the current window.
X-RateLimit-ResetSeconds until the current window resets.

Example — normal response headers:

text

HTTP/2 200
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 29
X-RateLimit-Reset: 58

Handling 429 errors

When you exceed the rate limit, the API returns HTTP 429. The response includes the same rate-limit headers plus a standard error body:

429 response headers:

text

HTTP/2 429
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 42

429 response body:

json

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Please retry after 42 seconds."
  }
}

The X-RateLimit-Reset header tells you exactly how many seconds to wait. Always use this value instead of hard-coding a retry delay.

Retry strategy

We recommend exponential backoff with X-RateLimit-Reset as the minimum wait time. Ready-to-use examples:

Python — retry with backoff (streaming SSE):

python

import time, json, requests

def chat_with_retry(base_url, headers, payload, max_retries=3):
    for attempt in range(max_retries + 1):
        res = requests.post(
            f"{base_url}/v1/chat/stream", headers=headers, json=payload, stream=True
        )

        if res.status_code != 429:
            # The full answer is the concatenation of every "delta" content.
            answer = ""
            for line in res.iter_lines(decode_unicode=True):
                if not line or not line.startswith("data:"):
                    continue
                event = json.loads(line[len("data:"):].strip())
                if event["type"] == "delta":
                    answer += event["content"]
                elif event["type"] == "error":
                    raise Exception(event["message"])
                elif event["type"] == "done":
                    break
            return answer

        reset = int(res.headers.get("X-RateLimit-Reset", 60))
        wait = reset + (2 ** attempt)  # backoff on top of reset
        print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(wait)

    raise Exception("Rate limit exceeded after max retries")

JavaScript — retry with backoff (streaming SSE):

javascript

async function chatWithRetry(baseUrl, headers, payload, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(`${baseUrl}/v1/chat/stream`, {
      method: "POST",
      headers,
      body: JSON.stringify(payload),
    });

    if (res.status !== 429) {
      // The full answer is the concatenation of every "delta" content.
      const reader = res.body.getReader();
      const decoder = new TextDecoder();
      let buffer = "";
      let answer = "";
      for (;;) {
        const { value, done } = await reader.read();
        if (done) break;
        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? "";
        for (const line of lines) {
          if (!line.startsWith("data:")) continue;
          const event = JSON.parse(line.slice(5).trim());
          if (event.type === "delta") answer += event.content;
          else if (event.type === "error") throw new Error(event.message);
        }
      }
      return answer;
    }

    const reset = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
    const wait = (reset + 2 ** attempt) * 1000;
    console.log(`Rate limited. Retrying in ${wait / 1000}s (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((r) => setTimeout(r, wait));
  }
  throw new Error("Rate limit exceeded after max retries");
}

Best practices

1
Read the headers proactively
Check X-RateLimit-Remaining on every response. If it's approaching zero, throttle outgoing requests before you hit the limit.
2
Use the reset header, not a fixed delay
Hard-coding sleep(60) wastes time if the window resets sooner. Always read X-RateLimit-Reset for the exact wait.
3
Queue and batch where possible
If your application generates bursts of requests, implement a client-side queue that spaces them evenly across the 60-second window.
4
Cache responses
If multiple users ask the same question, cache the agent's response on your side to avoid consuming your rate limit with duplicate calls.

Avoid the throttle entirely

Pin your integration to one outbound request per user action and add a tiny client-side debounce. If you're polling, switch to streaming via SSE — progressive UI with one request instead of N.