Gemini 2.5 Flash on QuickSilver Pro
Gemini 2.5 Flash is Google's workhorse — 1M-token context, multimodal foundation, thinks by default. On QuickSilver Pro it lists at $0.255 input / $2.125 output per 1M tokens, ~15% below Vertex retail and OpenRouter's $0.30/$2.50. Thinking is always-on for 2.5 Flash through QuickSilver Pro — `reasoning: { enabled: false }` is silently dropped; for non-thinking Gemini, use 2.5 Flash Lite ($0.085/$0.34).
At a glance
Coding, multimodal Q&A, long-context analysis at 1M tokens — Google's general-purpose pick.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.26 | $2.13 | cheapest |
| OpenRouter (google/gemini-2.5-flash) | $0.30 | $2.50 | 15% cheaper |
| OpenAI (GPT-4o-mini) | $0.15 | $0.60 | 254% more expensive |
When to use
Default to 2.5 Flash when you need Gemini's strengths: multimodal input (images, audio adjacent), 1M context fitting a whole codebase or transcript, thinking-by-default reasoning on harder turns of an agentic loop. Strong on coding evals (HumanEval, SWE-bench) and long-context retrieval. Good drop-in for teams already on Gemini for other Google Cloud reasons.
When to use something else
For pure cheap high-volume chat, Gemini 2.5 Flash Lite ($0.085/$0.34) is ~5-10x cheaper. For top-tier reasoning, escalate to Gemini 3.1 Pro Preview or DeepSeek R1. For non-Gemini coding at lower cost, DeepSeek V3 ($0.16/$0.616) or V4 Flash ($0.08/$0.16) beat 2.5 Flash on per-token economics.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
Yes. 2.5 Flash takes image inputs via the standard `content` array with `image_url` parts, same as OpenAI's GPT-4o multimodal API. Multimodal requests are forwarded verbatim — same JSON shape on the way in and out. For image *generation* (not analysis), use gemini-2.5-flash-image instead — that's a different model SKU priced for image output.
V4 Flash is cheaper: $0.08/$0.16 per 1M vs 2.5 Flash's $0.255/$2.125. Both think by default, and V4 Flash uniquely lets you suppress reasoning via `reasoning.enabled=false` (QuickSilver Pro honors it on DeepSeek models) — on 2.5 Flash that flag is silently dropped. So for cost-bound text workloads where you want a non-thinking path, V4 Flash wins. 2.5 Flash earns its premium for multimodal input and Google-ecosystem-aligned tasks.
QuickSilver Pro lists 2.5 Flash at $0.255 input / $2.125 output per 1M — ~15% below Vertex retail and OpenRouter's $0.30/$2.50. You get unified billing across Gemini, DeepSeek, Qwen, and Kimi catalogs through one OpenAI-compatible key, plus referral rewards and `usage.cost` accounting.