Gemini 3.1 Pro Preview on QuickSilver Pro
Gemini 3.1 Pro Preview is Google's flagship reasoning model — 1M-token context window, deep thinking by default, multimodal foundation. On QuickSilver Pro it lists at $1.70 input / $10.20 output per 1M tokens, ~15% below Google's Vertex retail and OpenRouter's $2/$12. This is a preview API; semantics may shift before GA.
At a glance
Long-context reasoning, multi-step analysis, complex code review — at o1-class quality and ~6x lower output cost.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $1.70 | $10.20 | cheapest |
| OpenRouter (google/gemini-3.1-pro-preview) | $2.00 | $12.00 | 15% cheaper |
| OpenAI (o1) | $15.00 | $60.00 | 83% cheaper |
When to use
Reach for Gemini 3.1 Pro Preview when the task genuinely benefits from chain-of-thought over a long context: large-codebase review, multi-document legal/research synthesis, complex agentic planning with rich memory, mathematical reasoning over a long problem. The 1M context fits a substantial corpus without chunking, and the thinking trace adds answer quality on hard problems. Equivalent quality tier to OpenAI o1 at a fraction of the cost.
When to use something else
Don't default to 3.1 Pro Preview for routine chat or short-prompt classification — it thinks by default, so a one-line question can burn 100-200 reasoning tokens before the visible answer. For high-volume cheap chat, Gemini 2.5 Flash Lite ($0.085/$0.34) or DeepSeek V4 Flash ($0.08/$0.16) win on cost-per-task. For non-Gemini-flavored reasoning, DeepSeek R1 ($0.56/$2.00) is ~3× cheaper on input and ~5× cheaper on output.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
It's a preview API — Google labels it as such, and semantics may shift before GA. Output formats, thinkingConfig handling, and rate limits can change without notice. For prototyping and evals, ship today. For revenue-critical paths, pin to a Gemini GA model (e.g. Gemini 2.5 Flash) and treat 3.1 Pro as a behind-flag experiment until Google promotes it.
Gemini 3.1 Pro Preview thinks by default — the model emits a long reasoning trace before its final answer, and usage.completion_tokens includes both reasoning_tokens and text_tokens. A simple "what is 7+8?" returned 160 reasoning + 3 text tokens in our smoke test. Budget max_tokens accordingly: anything under ~200 tokens risks the model running out of budget on the thinking phase and returning an empty answer.
Not yet, on QuickSilver Pro — `reasoning: { enabled: false }` is silently dropped on Gemini 3.1 Pro. For non-thinking Gemini, use gemini-2.5-flash-lite ($0.085/$0.34) which is genuinely non-thinking. Note: 2.5 Flash thinks by default too — only Flash Lite ships without reasoning.
QuickSilver Pro lists Gemini 3.1 Pro Preview at $1.70 input / $10.20 output per 1M tokens — about 15% below Google's Vertex retail and OpenRouter's $2/$12. You pay one bill across 14 models on a single OpenAI-compatible key, with `usage.cost` accounting per response. Switching from another provider is a base_url + key swap on the OpenAI SDK with `model="gemini-3.1-pro-preview"`.