Home/Models/DeepSeek V4 Flash
New1M contextThinks by default

DeepSeek V4 Flash on QuickSilver Pro

DeepSeek V4 Flash is the V4 wave's cheap chat workhorse — 1M-token context, thinks by default, output priced at $0.16/M (~74% cheaper than V3 and ~73% cheaper than GPT-4o-mini). On QuickSilver Pro it lists at $0.08 / $0.16 per million tokens, ~20% below OpenRouter's public rate. Drop into the OpenAI SDK with one line; pass `reasoning: { enabled: false }` for non-thinking V3-style chat.

$0.08 input · $0.16 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
1M tokens
Input / 1M
$0.08
Output / 1M
$0.16
Thinks by default
Yes

Cheap chat, coding, structured output — with a 1M context if you need it.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$0.08$0.16cheapest
OpenRouter (deepseek/deepseek-v4-flash)$0.10$0.2020% cheaper
OpenAI (GPT-4o-mini)$0.15$0.6073% cheaper

When to use

Default to V4 Flash for any task where you'd previously reach for V3 or GPT-4o-mini: agentic chat, code generation, summarization, classification, structured JSON output. The 1M context window means you can dump a whole repo or transcript without RAG, and the per-token price is the lowest in the catalog.

When to use something else

For genuinely hard reasoning (competitive programming, multi-step proofs, complex math), escalate to V4 Pro or DeepSeek R1. For Opus-class agentic / long-horizon planning, Kimi K2.6 is the better fit. For closed-model capabilities (vision, audio, GPT-4-class creative writing), stay on OpenAI.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

V4 Flash is ~74% cheaper on output ($0.16/M vs V3's $0.616/M) and bumps context from 128K to 1M tokens. The trade-off is V4 Flash thinks by default — a one-token "Hi" can return ~175 reasoning tokens. For V3-style cheap chat, pass `reasoning: { enabled: false }` in the request body and you get V3 behavior at a fraction of the price.

QuickSilver Pro lists V4 Flash at $0.08 input / $0.16 output per 1M tokens — ~20% below OpenRouter's public $0.10 / $0.20 rate. Both expose the OpenAI-compatible chat completions API; migration is a base_url + key swap.

Yes — V4 Flash is an OpenAI-compatible chat completions endpoint. Set base_url=https://api.quicksilverpro.io/v1, paste your QSP key, and use model="deepseek-v4-flash". Streaming, tool calling, json_schema strict mode, and usage.cost accounting all work out of the box.

Try DeepSeek V4 Flash with double credits — up to $50 free

Get API Key