Errors

Status codes and error bodies

The error shape matches OpenAI exactly: { error: { message, type, code } }. Status codes follow HTTP conventions. Below: what each means and what to do about it.

Error body shape

json
{
  "error": {
    "message": "Insufficient credits to cover the maximum cost of this request.",
    "type": "insufficient_quota",
    "code": "insufficient_credits"
  }
}

Status codes

CodeMeaningWhat to do
400Bad requestMalformed body, missing required fields, or invalid JSON. Check the error.message for specifics; the field that failed validation is usually named.
401Authentication failedMissing or invalid API key. Confirm Authorization: Bearer <key> and that the key hasn't been deleted in the dashboard.
402Insufficient creditsAccount balance can't cover the worst-case cost estimate for the request. Top up at /dashboard#credits — the launch bonus matches your first deposit 100% up to $50 free.
404Model not foundWrong model ID. Drop the provider prefix from OpenRouter / DeepInfra-style IDs (e.g. deepseek/deepseek-r1 → deepseek-r1). Full list at /docs/models.
429Rate limitedExceeded the per-key throughput cap (default 600 rpm, 1M tpm, 8 parallel). Honor the Retry-After header. See /docs/rate-limits for the retry pattern.
500Server errorTransient error on our side. Retry with exponential backoff. If it persists for more than a few minutes, check /status — and if status reports OK, please email so we can dig in.
503Service temporarily degradedA specific model is briefly unavailable. Try a different model -- the catalog is partly redundant on capability -- or retry shortly. Check /status for the live per-model picture.

Common gotchas

  • Provider-prefixed model IDs: if you copy a code sample from an OpenRouter / DeepInfra doc, the model ID has a provider prefix (e.g. deepseek/deepseek-r1). Drop the prefix on QSP.
  • AzureOpenAI client class: if migrating from Azure, use the plain openai.OpenAI client, not openai.AzureOpenAI. QSP doesn't use Azure's deployment-name indirection.
  • Thinking models & cost surprises: V4 wave, Qwen 3.6, Kimi K2.6, and R1 emit a chain-of-thought trace that counts as output tokens. A short user message can still produce 1000+ output tokens. Pass reasoning: { enabled: false } if you want non-thinking chat (ignored by R1, which is reasoning-only).
  • SSE buffering: if streaming responses arrive in one big chunk at the end, your reverse proxy is buffering. See Streaming for the fix.

Reporting a bug

Email hello@quicksilverpro.io with the x-request-id response header (we tag every response), a minimal reproduction, and what you expected to happen. We usually reply within a few hours.