Rate limits

Per-key throughput

Defaults are conservative and tuned for self-serve. Bursty workloads should enable retry-on-429 in the client; sustained higher throughput is a teams-plan conversation.

Defaults

  • 600 requests / minute per API key, shared across all models.
  • 1,000,000 tokens / minute per API key (prompt + completion combined), shared across all models.
  • 8 in-flight requests per API key. This separate concurrency cap keeps short spikes from turning into account-wide upstream throttling.
  • Both limits are surfaced on every response via the x-ratelimit-limit-requests, x-ratelimit-limit-tokens, and matching -remaining- headers — read them to pace your client.
  • Per-key monthly spend cap is configurable in the dashboard under Account → Monthly limit. Default: unset (no cap).
  • Per-key balance cap is your account balance — calls that would exceed it return 402.

429 response shape

When you exceed the per-minute limit, you get a 429 with this body:

json
{
  "error": {
    "message": "Rate limit exceeded. Try again in a moment.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

The response also includes a Retry-After header (seconds) — honor it before retrying.

Retry pattern (Python)

python
import os, time, random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key=os.environ["QSP_KEY"],
)

def call_with_retry(messages, model="deepseek-v4-flash", max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except RateLimitError as e:
            if attempt == max_attempts - 1:
                raise
            # Honor Retry-After if present, else exponential backoff with jitter.
            wait = float(getattr(e, "retry_after", 0)) or (2 ** attempt) + random.random()
            time.sleep(wait)

The OpenAI Python SDK already does this for you if you pass max_retries to the client — the snippet above is for when you want explicit control.

How to request a higher limit

If you're consistently hitting 429s on a single key, the right path depends on your scale:

  • Under $1K/mo: smooth bursts with a small client-side queue, honor Retry-After, and keep traffic on one key unless support tells you otherwise.
  • $1K/mo and up: a teams plan includes reserved per-key throughput tuned to your contract. Email raullen@machinefi.com with rough rps target.