Home/Models/DeepSeek V3
128K contextTool callingJSON schema

DeepSeek V3 on QuickSilver Pro

DeepSeek V3 is the production workhorse: 671B MoE (37B active), 128K context, and full OpenAI-compatible chat / tools / strict JSON. On QuickSilver Pro it's $0.16 input / $0.616 output per 1M tokens — about 16× cheaper than GPT-4o on the same text-only chat workloads, and ~20% below OpenRouter. No chain-of-thought overhead — predictable latency and token counts for production traffic.

$0.16 input · $0.62 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
128K tokens
Input / 1M
$0.16
Output / 1M
$0.62
Thinks by default
No

General chat, coding agents, tool-calling, structured JSON output — at GPT-4o quality, ~16x cheaper.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$0.16$0.62cheapest
OpenRouter (deepseek/deepseek-chat-v3-0324)$0.20$0.7720% cheaper
OpenAI (GPT-4o)$2.50$10.0094% cheaper

When to use

V3 is the right default for any production chat / coding workload that doesn't specifically need chain-of-thought reasoning: customer-support agents, codegen, PR-review bots, summarization, classification, structured extraction. Tool calling and json_schema strict mode work out of the box; output is short and direct, so token-budget math is predictable.

When to use something else

For genuinely hard reasoning (competition math, multi-step proofs, debugging complex concurrency), use DeepSeek R1 or V4 Pro. For the cheapest possible chat with 1M context, V4 Flash beats V3 on both axes. For closed-model strengths (vision, audio, GPT-4-class creative writing), stay on OpenAI.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

V4 Flash is ~74% cheaper on output ($0.16/M vs V3's $0.616/M) and bumps context to 1M tokens, but it thinks by default — every call burns reasoning tokens unless you pass `reasoning: { enabled: false }`. V3 has no thinking overhead, so per-call latency and token counts are more predictable for production traffic. New deployments default to V4 Flash; V3 keeps shipping for teams already running it in prod or who need the deterministic non-thinking shape.

Yes — both work through the official OpenAI SDK. The `tools` / `tool_choice` parameters and `response_format: { type: "json_schema", strict: true }` mode are both supported. Drop-in replacement for GPT-4 in LangChain, LlamaIndex, Aider, Cline, Cursor, or any framework that expects OpenAI tool-calling response shapes.

QuickSilver Pro lists V3 at $0.16 input / $0.616 output per 1M tokens. OpenRouter's public per-token rate is $0.20 / $0.77 — we price ~20% below on the open-source models we serve. Same OpenAI-compatible surface; migration is a base_url + key swap, with the model ID changing from OpenRouter's `deepseek/deepseek-chat-v3-0324` to QSP's public alias `deepseek-v3`.

Try DeepSeek V3 with double credits — up to $50 free

Get API Key