GLM 5.2 on QuickSilver Pro
GLM 5.2 is Z.ai's large-scale reasoning flagship: a 1M-token context window tuned for long-horizon agent workflows and project-level software engineering. On QuickSilver Pro it's $0.80 input / $3.20 output per 1M tokens, ~20% below OpenRouter's $1.00 / $4.00. It thinks by default; the QuickSilver Pro gateway sends `reasoning.enabled=false` by default to suppress the thinking trace on routine calls — pass `reasoning.enabled=true` to opt back into reasoning.
At a glance
Long-horizon agents and project-level coding — Z.ai's reasoning flagship with a 1M-token context window.
Pricing comparison ($/1M tokens)
| Provider | Input | Output | vs QSP |
|---|---|---|---|
| QuickSilver Pro | $0.80 | $3.20 | cheapest |
| OpenRouter (z-ai/glm-5.2) | $1.00 | $4.00 | 20% cheaper |
| OpenAI (GPT-4o) | $2.50 | $10.00 | 68% cheaper |
When to use
Reach for GLM 5.2 on long-horizon agent loops and repo-scale software engineering: multi-step refactors, plan-then-act agents coordinating many tool calls, and tasks that need to hold a large working set in its 1M-token context. It's a reasoning model, so it's well suited to problems where an explicit reasoning trace improves the result — pass `reasoning.enabled=true` to opt into the trace (the gateway suppresses it by default to keep routine calls cheap).
When to use something else
For routine chat, short-context codegen, or single-shot tasks, the per-token price and reasoning overhead are overkill — DeepSeek V4 Flash ($0.08/$0.16) or V4 Pro ($0.348/$0.696) land most of those cheaper. For pure mathematical reasoning, DeepSeek R1. For agentic coding specifically, A/B against Kimi K2.7 Code.
Quickstart (curl)
curl https://api.quicksilverpro.io/v1/chat/completions \
-H "Authorization: Bearer $QSP_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible. Same model as OpenRouter; one-line migration via base_url.
FAQ
GLM 5.2 is a reasoning model, but to keep routine calls from billing a hidden reasoning trace the QuickSilver Pro gateway sends `reasoning.enabled=false` by default on GLM 5.2 requests — so out of the box you get a direct reply. To opt into the reasoning trace, pass `reasoning: { enabled: true }` in the request body and budget output tokens accordingly.
Yes — GLM 5.2 is an OpenAI-compatible chat completions endpoint on QuickSilver Pro. Set base_url=https://api.quicksilverpro.io/v1, paste your QSP key, and use model="glm-5.2". Streaming, tool calling, json_schema strict mode, and usage.cost accounting all work.
OpenRouter lists GLM 5.2 at $1.00 input / $4.00 output per 1M tokens; QuickSilver Pro is $0.80 / $3.20 — ~20% below on both legs. Same OpenAI-compatible surface; migration is a base_url + key swap, dropping the `z-ai/` provider prefix from the model ID.