Home/Models/Qwen 3.7 Max
New1M contextFlagshipThinks by default

Qwen 3.7 Max on QuickSilver Pro

Qwen 3.7 Max is Alibaba's May 2026 flagship — the top of the Qwen 3.7 line, tuned for hard multi-step reasoning, long-horizon agentic workflows, and the deep tool-calling tasks where you'd otherwise reach for GPT-4o or Claude Sonnet 4.5. On QuickSilver Pro it's $2.0 input / $6.0 output per 1M tokens — ~40% cheaper than GPT-4o on output, with 1M context (8× GPT-4o's), one OpenAI-compatible key across 18 models, and steady-state pricing not subject to upstream promo cycles.

$2.00 input · $6.00 output per 1M tokens
ByRaullen Chai·Updated

At a glance

Context
1M tokens
Input / 1M
$2.00
Output / 1M
$6.00
Thinks by default
Yes

Flagship reasoning + long-horizon agentic — GPT-4o quality at 40% lower output cost, 8× the context.

Pricing comparison ($/1M tokens)

ProviderInputOutputvs QSP
QuickSilver Pro$2.00$6.00
OpenRouter (qwen/qwen3.7-max)$1.25$3.7560% more expensive
OpenAI (GPT-4o)$2.50$10.0040% cheaper

When to use

Reach for 3.7 Max when 3.6 Plus isn't smart enough on your eval: multi-hop reasoning, agentic workflows that plan across 10+ tool calls, long-context analysis over a million-token corpus, code-review or refactor agents that need to hold a whole repo in head. Same shape as GPT-4o or Claude Sonnet 4.5 on the workload — pick 3.7 Max for the cost win on output tokens and the wider context window.

When to use something else

For routine chat, codegen, or production agents where DeepSeek V4 Pro at $0.348/$0.696 lands the same answer, V4 Pro is ~6× cheaper on output. For pure math / theorem-style reasoning, DeepSeek R1 is still the price/quality leader. 3.7 Max thinks by default; the QuickSilver Pro gateway sends `reasoning.enabled=false` by default to suppress the thinking trace on routine calls — pass `reasoning.enabled=true` to opt back into reasoning.

Quickstart (curl)

curl https://api.quicksilverpro.io/v1/chat/completions \
  -H "Authorization: Bearer $QSP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.7-max",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.

FAQ

On published Qwen benchmarks (MMLU-Pro, GPQA Diamond, HumanEval, Tau-bench, BFCL-v3) 3.7 Max lands within striking distance of GPT-4o on reasoning and tool-use, and noticeably ahead on long-context retrieval thanks to its 1M-token window vs GPT-4o's 128K. Match on your own evals before committing — for the workloads 3.7 Max is built for (agentic + long context), the per-token economics ($2.0/$6.0 vs GPT-4o's $2.5/$10) and 8× context advantage are usually decisive.

3.7 Max thinks by default. To keep routine calls from billing a hidden reasoning trace, the QuickSilver Pro gateway sends `reasoning.enabled=false` by default on 3.7 Max requests. If you want the reasoning trace, pass `reasoning.enabled=true` in the request body and budget output tokens accordingly (3-5× typical chat output).

V4 Pro at $0.348 / $0.696 per 1M is ~6× cheaper on output and matches 3.7 Max on many published reasoning benchmarks (MATH, GPQA, AIME). Reach for 3.7 Max only when V4 Pro visibly underperforms on your agentic / long-context evals — Qwen's tool-use and BFCL-v3 scores edge ahead on the agent benchmarks Alibaba published. A/B before committing for the price premium; for most production reasoning workloads V4 Pro is the better default.

Try Qwen 3.7 Max with double credits — up to $50 free

Get API Key