Qwen 3.5-35B-A3B on QuickSilver Pro
Qwen 3.5-35B-A3B is Alibaba's 35B MoE model with 3B active parameters per token and a 262K context window — the QSP catalog's pick for long-document RAG, multi-document summarization, and Chinese-language tasks where Qwen's training data has the edge. On QuickSilver Pro it's $0.111 input / $0.80 output per 1M tokens, ~20% below OpenRouter and ~23× cheaper than GPT-4o on input.
At a glance
Long-context RAG, document summarization, Chinese-language workloads — at a fraction of GPT-4o's cost.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.11 | $0.80 | cheapest |
| OpenRouter (qwen/qwen3.5-35b-a3b) | $0.14 | $1.00 | 20% cheaper |
| OpenAI (GPT-4o) | $2.50 | $10.00 | 92% cheaper |
When to use
Qwen 3.5 shines on long-document workloads: multi-document RAG over 100K+ token corpora, technical-spec summarization, Confluence-space QA, transcript analysis. The 262K context fits a substantial corpus without chunking, and the 3B-active MoE serves cheaply at scale. Particularly strong on Chinese-language tasks where its training data advantage is real.
When to use something else
For coding-agent workloads, DeepSeek V3 / V4 Flash beats Qwen 3.5 on HumanEval-style benchmarks. For top-tier reasoning, R1 or V4 Pro. For drop-in upgrade with stronger reasoning at the same architecture, see Qwen 3.6-35B-A3B (newer, same $0.80/M output price). Qwen 3.5 is still shipping for teams running it in prod or who specifically don't want the 3.6 reasoning behavior.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-35b",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
Yes for most new deployments — Qwen 3.6-35B-A3B has the same architecture (35B/3B-active MoE, 262K context) but stronger reasoning on the published evals and the same $0.80/M output price (3.6 input is $0.12/M vs 3.5's $0.111/M, a hair more). The catch is 3.6 thinks by default, so output token counts go up. For teams already running 3.5 in prod with predictable token budgets, 3.5 keeps shipping unchanged.
On the long-context retrieval benchmarks Alibaba published with the 3.5 release, Qwen lands in the same quality tier as GPT-4o for retrieval and summarization tasks up to 200K tokens. The price gap is dramatic — $0.111 input / $0.80 output vs GPT-4o's $2.50 / $10.00 per 1M tokens, about 23× cheaper on input and 13× on output. For RAG pipelines where the dominant cost is feeding long context, that's the entire economics.
QuickSilver Pro lists Qwen 3.5-35B-A3B at $0.111 input / $0.80 output per 1M tokens. OpenRouter's public per-token rate is $0.139 / $1.00 — ~20% below on both legs. Same OpenAI-compatible surface; migration is a base_url + key swap (drop the `qwen/` provider prefix from the model ID).