DeepSeek V3 on QuickSilver Pro
DeepSeek V3 is the production workhorse: 671B MoE (37B active), 128K context, and full OpenAI-compatible chat / tools / strict JSON. On QuickSilver Pro it's $0.16 input / $0.616 output per 1M tokens — about 16× cheaper than GPT-4o on the same text-only chat workloads, and ~20% below OpenRouter. No chain-of-thought overhead — predictable latency and token counts for production traffic.
At a glance
General chat, coding agents, tool-calling, structured JSON output — at GPT-4o quality, ~16x cheaper.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.16 | $0.62 | cheapest |
| OpenRouter (deepseek/deepseek-chat-v3-0324) | $0.20 | $0.77 | 20% cheaper |
| OpenAI (GPT-4o) | $2.50 | $10.00 | 94% cheaper |
When to use
V3 is the right default for any production chat / coding workload that doesn't specifically need chain-of-thought reasoning: customer-support agents, codegen, PR-review bots, summarization, classification, structured extraction. Tool calling and json_schema strict mode work out of the box; output is short and direct, so token-budget math is predictable.
When to use something else
For genuinely hard reasoning (competition math, multi-step proofs, debugging complex concurrency), use DeepSeek R1 or V4 Pro. For the cheapest possible chat with 1M context, V4 Flash beats V3 on both axes. For closed-model strengths (vision, audio, GPT-4-class creative writing), stay on OpenAI.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
V4 Flash is ~74% cheaper on output ($0.16/M vs V3's $0.616/M) and bumps context to 1M tokens, but it thinks by default — every call burns reasoning tokens unless you pass `reasoning: { enabled: false }`. V3 has no thinking overhead, so per-call latency and token counts are more predictable for production traffic. New deployments default to V4 Flash; V3 keeps shipping for teams already running it in prod or who need the deterministic non-thinking shape.
Yes — both work through the official OpenAI SDK. The `tools` / `tool_choice` parameters and `response_format: { type: "json_schema", strict: true }` mode are both supported. Drop-in replacement for GPT-4 in LangChain, LlamaIndex, Aider, Cline, Cursor, or any framework that expects OpenAI tool-calling response shapes.
QuickSilver Pro lists V3 at $0.16 input / $0.616 output per 1M tokens. OpenRouter's public per-token rate is $0.20 / $0.77 — we price ~20% below on the open-source models we serve. Same OpenAI-compatible surface; migration is a base_url + key swap, with the model ID changing from OpenRouter's `deepseek/deepseek-chat-v3-0324` to QSP's public alias `deepseek-v3`.