DeepSeek V4 Flash on QuickSilver Pro
DeepSeek V4 Flash is the V4 wave's cheap chat workhorse — 1M-token context, thinks by default, output priced at $0.16/M (~74% cheaper than V3 and ~73% cheaper than GPT-4o-mini). On QuickSilver Pro it lists at $0.08 / $0.16 per million tokens, ~20% below OpenRouter's public rate. Drop into the OpenAI SDK with one line; pass `reasoning: { enabled: false }` for non-thinking V3-style chat.
At a glance
Cheap chat, coding, structured output — with a 1M context if you need it.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.08 | $0.16 | cheapest |
| OpenRouter (deepseek/deepseek-v4-flash) | $0.10 | $0.20 | 20% cheaper |
| OpenAI (GPT-4o-mini) | $0.15 | $0.60 | 73% cheaper |
When to use
Default to V4 Flash for any task where you'd previously reach for V3 or GPT-4o-mini: agentic chat, code generation, summarization, classification, structured JSON output. The 1M context window means you can dump a whole repo or transcript without RAG, and the per-token price is the lowest in the catalog.
When to use something else
For genuinely hard reasoning (competitive programming, multi-step proofs, complex math), escalate to V4 Pro or DeepSeek R1. For Opus-class agentic / long-horizon planning, Kimi K2.6 is the better fit. For closed-model capabilities (vision, audio, GPT-4-class creative writing), stay on OpenAI.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
V4 Flash is ~74% cheaper on output ($0.16/M vs V3's $0.616/M) and bumps context from 128K to 1M tokens. The trade-off is V4 Flash thinks by default — a one-token "Hi" can return ~175 reasoning tokens. For V3-style cheap chat, pass `reasoning: { enabled: false }` in the request body and you get V3 behavior at a fraction of the price.
QuickSilver Pro lists V4 Flash at $0.08 input / $0.16 output per 1M tokens — ~20% below OpenRouter's public $0.10 / $0.20 rate. Both expose the OpenAI-compatible chat completions API; migration is a base_url + key swap.
Yes — V4 Flash is an OpenAI-compatible chat completions endpoint. Set base_url=https://api.quicksilverpro.io/v1, paste your QSP key, and use model="deepseek-v4-flash". Streaming, tool calling, json_schema strict mode, and usage.cost accounting all work out of the box.