Home/Models/Gemini 3.1 Pro Preview
1M contextThinks deeplyPreview API

Gemini 3.1 Pro Preview on QuickSilver Pro

Gemini 3.1 Pro Preview is Google's flagship reasoning model — 1M-token context window, deep thinking by default, multimodal foundation. On QuickSilver Pro it lists at $1.70 input / $10.20 output per 1M tokens, ~15% below Google's Vertex retail and OpenRouter's $2/$12. This is a preview API; semantics may shift before GA.

$1.70 input · $10.20 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
1M tokens
Input / 1M
$1.70
Output / 1M
$10.20
Thinks by default
Yes

Long-context reasoning, multi-step analysis, complex code review — at o1-class quality and ~6x lower output cost.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$1.70$10.20cheapest
OpenRouter (google/gemini-3.1-pro-preview)$2.00$12.0015% cheaper
OpenAI (o1)$15.00$60.0083% cheaper

When to use

Reach for Gemini 3.1 Pro Preview when the task genuinely benefits from chain-of-thought over a long context: large-codebase review, multi-document legal/research synthesis, complex agentic planning with rich memory, mathematical reasoning over a long problem. The 1M context fits a substantial corpus without chunking, and the thinking trace adds answer quality on hard problems. Equivalent quality tier to OpenAI o1 at a fraction of the cost.

When to use something else

Don't default to 3.1 Pro Preview for routine chat or short-prompt classification — it thinks by default, so a one-line question can burn 100-200 reasoning tokens before the visible answer. For high-volume cheap chat, Gemini 2.5 Flash Lite ($0.085/$0.34) or DeepSeek V4 Flash ($0.08/$0.16) win on cost-per-task. For non-Gemini-flavored reasoning, DeepSeek R1 ($0.56/$2.00) is ~3× cheaper on input and ~5× cheaper on output.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-pro-preview",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

It's a preview API — Google labels it as such, and semantics may shift before GA. Output formats, thinkingConfig handling, and rate limits can change without notice. For prototyping and evals, ship today. For revenue-critical paths, pin to a Gemini GA model (e.g. Gemini 2.5 Flash) and treat 3.1 Pro as a behind-flag experiment until Google promotes it.

Gemini 3.1 Pro Preview thinks by default — the model emits a long reasoning trace before its final answer, and usage.completion_tokens includes both reasoning_tokens and text_tokens. A simple "what is 7+8?" returned 160 reasoning + 3 text tokens in our smoke test. Budget max_tokens accordingly: anything under ~200 tokens risks the model running out of budget on the thinking phase and returning an empty answer.

Not yet, on QuickSilver Pro — `reasoning: { enabled: false }` is silently dropped on Gemini 3.1 Pro. For non-thinking Gemini, use gemini-2.5-flash-lite ($0.085/$0.34) which is genuinely non-thinking. Note: 2.5 Flash thinks by default too — only Flash Lite ships without reasoning.

QuickSilver Pro lists Gemini 3.1 Pro Preview at $1.70 input / $10.20 output per 1M tokens — about 15% below Google's Vertex retail and OpenRouter's $2/$12. You pay one bill across 14 models on a single OpenAI-compatible key, with `usage.cost` accounting per response. Switching from another provider is a base_url + key swap on the OpenAI SDK with `model="gemini-3.1-pro-preview"`.

Try Gemini 3.1 Pro Preview with double credits — up to $50 free

Get API Key