Gemini 3.1 Flash Lite on QuickSilver Pro
Gemini 3.1 Flash Lite is Google's newest cost-efficient model — 1M-token context, fast, and built for high-volume, latency-sensitive workloads. Non-thinking by default, so token budgets stay predictable. On QuickSilver Pro it lists at $0.2125 input / $1.275 output per 1M tokens, ~15% below Vertex retail ($0.25/$1.50) — the cheapest model in the 3.x generation.
At a glance
Newest low-cost workhorse — predictable non-thinking output, 1M context, built for volume.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.21 | $1.27 | cheapest |
| OpenRouter (google/gemini-3.1-flash-lite) | $0.25 | $1.50 | 15% cheaper |
| OpenAI (GPT-4o mini) | $0.15 | $0.60 | 112% more expensive |
When to use
Use 3.1 Flash Lite for high-volume, cost-sensitive work where you don't need a reasoning trace: routing and classification, extraction, summarization, simple chat, and agent sub-tasks where latency and price beat raw reasoning depth. Non-thinking by default means output tokens are predictable — easy to budget at scale.
When to use something else
For multi-step reasoning, hard coding, or analysis, step up to 3.5 Flash ($1.275/$7.65) or a Pro tier — Flash Lite trades depth for cost. If you specifically want a thinking model with 1M context at low cost, 2.5 Flash ($0.255/$2.125) reasons by default. For image generation, use the Gemini image models or FLUX.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-flash-lite",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
No — Flash Lite is the non-thinking tier, so it answers directly without a reasoning trace. That keeps output token counts (and cost) predictable, which is exactly what high-volume workloads want. If you need reasoning, 2.5 Flash or 3.5 Flash think by default.
3.1 Flash Lite is the newer generation — improved quality at a similar low-cost position. 2.5 Flash Lite ($0.085/$0.34) is still the absolute cheapest Gemini; 3.1 Flash Lite ($0.2125/$1.275) costs more but brings 3.x-generation improvements. Run both on your task — for the cheapest possible routing/classification, 2.5 Flash Lite still wins.
QuickSilver Pro lists 3.1 Flash Lite at $0.2125 input / $1.275 output per 1M tokens — ~15% below Vertex retail's $0.25/$1.50. One OpenAI-compatible key across 18 models, one bill, and a `usage.cost` field on every response so you can reconcile spend per request.