DeepSeek V4 Flash on QuickSilver Pro

Name: DeepSeek V4 Flash on QuickSilver Pro
Brand: QuickSilver Pro
Price: 0.08 USD
Availability: InStock

DeepSeek V4 Flash is the V4 wave's cheap chat workhorse — 1M-token context, thinks by default, output priced at $0.16/M (~74% cheaper than V3 and ~73% cheaper than GPT-4o-mini). On QuickSilver Pro it lists at $0.08 / $0.16 per million tokens, ~20% below OpenRouter's public rate. Drop into the OpenAI SDK with one line; pass `reasoning: { enabled: false }` for non-thinking V3-style chat.

$0.08 input · $0.16 output per 1M tokens

ByRaullen Chai·Updated May 29, 2026

At a glance

Context

1M tokens

Input / 1M

$0.08

Output / 1M

$0.16

Thinks by default

Yes

Cheap chat, coding, structured output — with a 1M context if you need it.

Pricing comparison ($/1M tokens)

Provider	Input	Output	vs QSP
QuickSilver Pro	$0.08	$0.16	cheapest
OpenRouter (deepseek/deepseek-v4-flash)	$0.10	$0.20	20% cheaper
OpenAI (GPT-4o-mini)	$0.15	$0.60	73% cheaper

When to use

Default to V4 Flash for any task where you'd previously reach for V3 or GPT-4o-mini: agentic chat, code generation, summarization, classification, structured JSON output. The 1M context window means you can dump a whole repo or transcript without RAG, and the per-token price is the lowest in the catalog.

When to use something else

For genuinely hard reasoning (competitive programming, multi-step proofs, complex math), escalate to V4 Pro or DeepSeek R1. For Opus-class agentic / long-horizon planning, Kimi K2.6 is the better fit. For closed-model capabilities (vision, audio, GPT-4-class creative writing), stay on OpenAI.

Quickstart (curl)

shellGet an API key →

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

How does V4 Flash compare to V3?

V4 Flash is ~74% cheaper on output ($0.16/M vs V3's $0.616/M) and bumps context from 128K to 1M tokens. The trade-off is V4 Flash thinks by default — a one-token "Hi" can return ~175 reasoning tokens. For V3-style cheap chat, pass `reasoning: { enabled: false }` in the request body and you get V3 behavior at a fraction of the price.

Why is QuickSilver Pro cheaper than OpenRouter on V4 Flash?

QuickSilver Pro lists V4 Flash at $0.08 input / $0.16 output per 1M tokens — ~20% below OpenRouter's public $0.10 / $0.20 rate. Both expose the OpenAI-compatible chat completions API; migration is a base_url + key swap.

Does it work with the OpenAI Python / Node / Swift SDK?

Yes — V4 Flash is an OpenAI-compatible chat completions endpoint. Set base_url=https://api.quicksilverpro.io/v1, paste your QSP key, and use model="deepseek-v4-flash". Streaming, tool calling, json_schema strict mode, and usage.cost accounting all work out of the box.

Try DeepSeek V4 Flash with double credits — up to $50 free

Get API Key