Rate limits
Per-key throughput
Defaults are conservative and tuned for self-serve. Bursty workloads should enable retry-on-429 in the client; sustained higher throughput is a teams-plan conversation.
Defaults
- 600 requests / minute per API key, shared across all models.
- 1,000,000 tokens / minute per API key (prompt + completion combined), shared across all models.
- 8 in-flight requests per API key. This separate concurrency cap keeps short spikes from turning into account-wide upstream throttling.
- Both limits are surfaced on every response via the
x-ratelimit-limit-requests,x-ratelimit-limit-tokens, and matching-remaining-headers — read them to pace your client. - Per-key monthly spend cap is configurable in the dashboard under
Account → Monthly limit. Default: unset (no cap). - Per-key balance cap is your account balance — calls that would exceed it return 402.
429 response shape
When you exceed the per-minute limit, you get a 429 with this body:
json
{
"error": {
"message": "Rate limit exceeded. Try again in a moment.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}The response also includes a Retry-After header (seconds) — honor it before retrying.
Retry pattern (Python)
python
import os, time, random
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key=os.environ["QSP_KEY"],
)
def call_with_retry(messages, model="deepseek-v4-flash", max_attempts=5):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(model=model, messages=messages)
except RateLimitError as e:
if attempt == max_attempts - 1:
raise
# Honor Retry-After if present, else exponential backoff with jitter.
wait = float(getattr(e, "retry_after", 0)) or (2 ** attempt) + random.random()
time.sleep(wait)The OpenAI Python SDK already does this for you if you pass max_retries to the client — the snippet above is for when you want explicit control.
How to request a higher limit
If you're consistently hitting 429s on a single key, the right path depends on your scale:
- Under $1K/mo: smooth bursts with a small client-side queue, honor
Retry-After, and keep traffic on one key unless support tells you otherwise. - $1K/mo and up: a teams plan includes reserved per-key throughput tuned to your contract. Email raullen@machinefi.com with rough rps target.