Home/Models/Gemini 3.1 Flash Lite
1M contextCheapest 3.xNon-thinking

Gemini 3.1 Flash Lite on QuickSilver Pro

Gemini 3.1 Flash Lite is Google's newest cost-efficient model — 1M-token context, fast, and built for high-volume, latency-sensitive workloads. Non-thinking by default, so token budgets stay predictable. On QuickSilver Pro it lists at $0.2125 input / $1.275 output per 1M tokens, ~15% below Vertex retail ($0.25/$1.50) — the cheapest model in the 3.x generation.

$0.21 input · $1.27 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
1M tokens
Input / 1M
$0.21
Output / 1M
$1.27
Thinks by default
No

Newest low-cost workhorse — predictable non-thinking output, 1M context, built for volume.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$0.21$1.27cheapest
OpenRouter (google/gemini-3.1-flash-lite)$0.25$1.5015% cheaper
OpenAI (GPT-4o mini)$0.15$0.60112% more expensive

When to use

Use 3.1 Flash Lite for high-volume, cost-sensitive work where you don't need a reasoning trace: routing and classification, extraction, summarization, simple chat, and agent sub-tasks where latency and price beat raw reasoning depth. Non-thinking by default means output tokens are predictable — easy to budget at scale.

When to use something else

For multi-step reasoning, hard coding, or analysis, step up to 3.5 Flash ($1.275/$7.65) or a Pro tier — Flash Lite trades depth for cost. If you specifically want a thinking model with 1M context at low cost, 2.5 Flash ($0.255/$2.125) reasons by default. For image generation, use the Gemini image models or FLUX.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-flash-lite",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

No — Flash Lite is the non-thinking tier, so it answers directly without a reasoning trace. That keeps output token counts (and cost) predictable, which is exactly what high-volume workloads want. If you need reasoning, 2.5 Flash or 3.5 Flash think by default.

3.1 Flash Lite is the newer generation — improved quality at a similar low-cost position. 2.5 Flash Lite ($0.085/$0.34) is still the absolute cheapest Gemini; 3.1 Flash Lite ($0.2125/$1.275) costs more but brings 3.x-generation improvements. Run both on your task — for the cheapest possible routing/classification, 2.5 Flash Lite still wins.

QuickSilver Pro lists 3.1 Flash Lite at $0.2125 input / $1.275 output per 1M tokens — ~15% below Vertex retail's $0.25/$1.50. One OpenAI-compatible key across 18 models, one bill, and a `usage.cost` field on every response so you can reconcile spend per request.

Try Gemini 3.1 Flash Lite with double credits — up to $50 free

Get API Key