Gemini 2.5 Flash Lite on QuickSilver Pro
Gemini 2.5 Flash Lite is Google's cheapest Gemini — purpose-built for high-volume short-turn workloads where Flash is overkill. On QuickSilver Pro it's $0.085 input / $0.34 output per 1M tokens, ~15% below Vertex retail and OpenRouter's $0.10/$0.40, ~43% cheaper than GPT-4o-mini on output. 1M-token context, non-thinking by default — predictable latency, predictable output length.
At a glance
Cheap high-volume chat, classification, simple summarization — at the lowest per-token rate in Google's lineup.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.09 | $0.34 | cheapest |
| OpenRouter (google/gemini-2.5-flash-lite) | $0.10 | $0.40 | 15% cheaper |
| OpenAI (GPT-4o-mini) | $0.15 | $0.60 | 43% cheaper |
When to use
Reach for 2.5 Flash Lite when the task is bounded and you're optimizing cost-per-call: bulk classification, short-form summarization, simple Q&A, intent detection, content moderation, autocomplete. Non-thinking means predictable token budgets and snappy latency. 1M context fits the occasional long prompt without paying a context tax.
When to use something else
Don't pick 2.5 Flash Lite for anything that genuinely needs reasoning — competition math, multi-step planning, complex tool calls. For those use Gemini 3.1 Pro Preview (thinks deeply) or DeepSeek R1 (cheaper reasoning at $0.56/$2.00). For non-Gemini cheap chat with a 1M context, DeepSeek V4 Flash at $0.08/$0.16 is cheaper on both input and output.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash-lite",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
Flash Lite is ~3x cheaper than Flash on both input ($0.085 vs $0.255 per 1M) and output ($0.34 vs $2.125 per 1M). The trade-off: 2.5 Flash has stronger reasoning (thinks by default — a one-line prompt can burn 100+ reasoning tokens before the visible answer), while Flash Lite ships pure non-thinking with predictable token budgets. Pick Flash for harder coding/agentic workloads; pick Flash Lite for simple-task high-volume.
Most of the time, yes. Both are OpenAI-compatible chat completions, both target the cheap-fast tier, and Flash Lite is ~43% cheaper on output ($0.34/M vs $0.60/M). Migration is a base_url + key swap with `model="gemini-2.5-flash-lite"`. Run your evals — for some tasks (creative writing, certain math), 4o-mini may still edge ahead.
QuickSilver Pro lists 2.5 Flash Lite at $0.085 input / $0.34 output per 1M tokens — ~15% below Vertex retail ($0.10/$0.40) and OpenRouter's matching $0.10/$0.40. You get unified billing across 14 models through one OpenAI-compatible key, plus `usage.cost` accounting per response. The savings compound at high-volume Lite-tier workloads where you're sending millions of short requests.