Home/Models/Gemini 2.5 Flash Lite
Cheapest Gemini1M contextNon-thinking

Gemini 2.5 Flash Lite on QuickSilver Pro

Gemini 2.5 Flash Lite is Google's cheapest Gemini — purpose-built for high-volume short-turn workloads where Flash is overkill. On QuickSilver Pro it's $0.085 input / $0.34 output per 1M tokens, ~15% below Vertex retail and OpenRouter's $0.10/$0.40, ~43% cheaper than GPT-4o-mini on output. 1M-token context, non-thinking by default — predictable latency, predictable output length.

$0.09 input · $0.34 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
1M tokens
Input / 1M
$0.09
Output / 1M
$0.34
Thinks by default
No

Cheap high-volume chat, classification, simple summarization — at the lowest per-token rate in Google's lineup.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$0.09$0.34cheapest
OpenRouter (google/gemini-2.5-flash-lite)$0.10$0.4015% cheaper
OpenAI (GPT-4o-mini)$0.15$0.6043% cheaper

When to use

Reach for 2.5 Flash Lite when the task is bounded and you're optimizing cost-per-call: bulk classification, short-form summarization, simple Q&A, intent detection, content moderation, autocomplete. Non-thinking means predictable token budgets and snappy latency. 1M context fits the occasional long prompt without paying a context tax.

When to use something else

Don't pick 2.5 Flash Lite for anything that genuinely needs reasoning — competition math, multi-step planning, complex tool calls. For those use Gemini 3.1 Pro Preview (thinks deeply) or DeepSeek R1 (cheaper reasoning at $0.56/$2.00). For non-Gemini cheap chat with a 1M context, DeepSeek V4 Flash at $0.08/$0.16 is cheaper on both input and output.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-lite",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

Flash Lite is ~3x cheaper than Flash on both input ($0.085 vs $0.255 per 1M) and output ($0.34 vs $2.125 per 1M). The trade-off: 2.5 Flash has stronger reasoning (thinks by default — a one-line prompt can burn 100+ reasoning tokens before the visible answer), while Flash Lite ships pure non-thinking with predictable token budgets. Pick Flash for harder coding/agentic workloads; pick Flash Lite for simple-task high-volume.

Most of the time, yes. Both are OpenAI-compatible chat completions, both target the cheap-fast tier, and Flash Lite is ~43% cheaper on output ($0.34/M vs $0.60/M). Migration is a base_url + key swap with `model="gemini-2.5-flash-lite"`. Run your evals — for some tasks (creative writing, certain math), 4o-mini may still edge ahead.

QuickSilver Pro lists 2.5 Flash Lite at $0.085 input / $0.34 output per 1M tokens — ~15% below Vertex retail ($0.10/$0.40) and OpenRouter's matching $0.10/$0.40. You get unified billing across 14 models through one OpenAI-compatible key, plus `usage.cost` accounting per response. The savings compound at high-volume Lite-tier workloads where you're sending millions of short requests.

Try Gemini 2.5 Flash Lite with double credits — up to $50 free

Get API Key