Errors
Status codes and error bodies
The error shape matches OpenAI exactly: { error: { message, type, code } }. Status codes follow HTTP conventions. Below: what each means and what to do about it.
Error body shape
json
{
"error": {
"message": "Insufficient credits to cover the maximum cost of this request.",
"type": "insufficient_quota",
"code": "insufficient_credits"
}
}Status codes
| Code | Meaning | What to do |
|---|---|---|
| 400 | Bad request | Malformed body, missing required fields, or invalid JSON. Check the error.message for specifics; the field that failed validation is usually named. |
| 401 | Authentication failed | Missing or invalid API key. Confirm Authorization: Bearer <key> and that the key hasn't been deleted in the dashboard. |
| 402 | Insufficient credits | Account balance can't cover the worst-case cost estimate for the request. Top up at /dashboard#credits — the launch bonus matches your first deposit 100% up to $50 free. |
| 404 | Model not found | Wrong model ID. Drop the provider prefix from OpenRouter / DeepInfra-style IDs (e.g. deepseek/deepseek-r1 → deepseek-r1). Full list at /docs/models. |
| 429 | Rate limited | Exceeded the per-key throughput cap (default 600 rpm, 1M tpm, 8 parallel). Honor the Retry-After header. See /docs/rate-limits for the retry pattern. |
| 500 | Server error | Transient error on our side. Retry with exponential backoff. If it persists for more than a few minutes, check /status — and if status reports OK, please email so we can dig in. |
| 503 | Service temporarily degraded | A specific model is briefly unavailable. Try a different model -- the catalog is partly redundant on capability -- or retry shortly. Check /status for the live per-model picture. |
Common gotchas
- Provider-prefixed model IDs: if you copy a code sample from an OpenRouter / DeepInfra doc, the model ID has a provider prefix (e.g.
deepseek/deepseek-r1). Drop the prefix on QSP. - AzureOpenAI client class: if migrating from Azure, use the plain
openai.OpenAIclient, notopenai.AzureOpenAI. QSP doesn't use Azure's deployment-name indirection. - Thinking models & cost surprises: V4 wave, Qwen 3.6, Kimi K2.6, and R1 emit a chain-of-thought trace that counts as output tokens. A short user message can still produce 1000+ output tokens. Pass
reasoning: { enabled: false }if you want non-thinking chat (ignored by R1, which is reasoning-only). - SSE buffering: if streaming responses arrive in one big chunk at the end, your reverse proxy is buffering. See Streaming for the fix.
Reporting a bug
Email hello@quicksilverpro.io with the x-request-id response header (we tag every response), a minimal reproduction, and what you expected to happen. We usually reply within a few hours.