Qwen 3.6-35B-A3B on QuickSilver Pro
Qwen 3.6-35B-A3B is Alibaba's April 2026 MoE refresh — 35B params, 3B active per token, 262K context window, and a meaningful reasoning bump over 3.5 at the same architectural footprint. On QuickSilver Pro it's $0.12 input / $0.80 output per 1M tokens, ~20% below OpenRouter, and ~12–21x cheaper than GPT-4o on the long-context RAG tasks it's actually built for.
At a glance
Long-context RAG, document summarization, drop-in 3.5 upgrade with reasoning.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.12 | $0.80 | cheapest |
| OpenRouter (qwen/qwen3.6-35b-a3b) | $0.15 | $1.00 | 20% cheaper |
| OpenAI (GPT-4o) | $2.50 | $10.00 | 92% cheaper |
When to use
Qwen 3.6 shines on long-document summarization, multi-document RAG over 100K+ token corpora, and Chinese-language tasks where its training data advantage is real. The 3B-active MoE means it serves cheaply at scale; the 262K context lets you concatenate a Confluence space or technical spec without chunking.
When to use something else
For coding-agent workloads, DeepSeek V3 / V4 Flash beats it on HumanEval-style benchmarks and is cheaper. For top-tier reasoning, R1 or V4 Pro. Qwen 3.6 thinks by default; the QuickSilver Pro gateway sends `reasoning.enabled=false` by default to suppress the thinking trace — pass `reasoning.enabled=true` to opt back in.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.6-35b",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
Same architecture (35B params, 3B active MoE) and same 262K context window, but stronger reasoning on the published evals (math, coding, long-context retrieval) and adjusted reasoning behavior — Qwen 3.6 thinks by default. On QSP both lie in the same price tier (3.6 input $0.12/M vs 3.5's $0.111/M); output is $0.80/M for both, so per output token they're priced the same even with 3.6's thinking trace included.
Yes — thinking is suppressed by default. The QuickSilver Pro gateway sends `reasoning.enabled=false` on Qwen 3.6 requests by default, so you don't pay for a thinking trace on routine chat. To opt back into reasoning, pass `reasoning.enabled=true` in the request body.
OpenRouter lists Qwen 3.6-35B-A3B at $0.15 input / $1.00 output per 1M tokens. QuickSilver Pro is $0.12 / $0.80 — ~20% below on both legs. Same OpenAI-compatible surface; migration is a base_url + key swap. Drop the `qwen/` provider prefix from the model ID.