Models

Model catalog

19 models through one OpenAI-compatible API: open-source LLMs (DeepSeek, Qwen including 3.7 Max & 3.6 Plus, Kimi), Whisper 1 for transcription, plus Google's Gemini family for multimodal chat, reasoning, and image generation. Token-priced models are listed per 1M tokens; Whisper is billed per audio minute. USD, as of May 2026. The model ID is what you pass in the request body.

At a glance

Model ID	Context	Input	Output	Notes
deepseek-v4-flash	1M	$0.08	$0.16	Cheap chat & coding. Thinks by default — pass reasoning.enabled=false for V3-style replies.
deepseek-v4-pro	1M	$0.348	$0.696	Premium reasoning with 1M-token context. Maps to o3-mini at ~6× lower output cost.
deepseek-v3	128K	$0.16	$0.616	General chat, coding, tool calling, structured output. Non-thinking by default.
deepseek-r1	128K	$0.56	$2.00	Reasoning specialist. Emits a long chain-of-thought trace; output tokens include the reasoning.
qwen3.7-max	1M	$2.00	$6.00	Qwen 3.7 flagship. Agent-centric, coding & productivity tasks. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.7-plus	262K	$0.256	$1.024	Alibaba's hosted agent flagship. Long-running coding/agent loops, 262K context. ~6× cheaper on output than 3.7 Max. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.6-plus	1M	$0.26	$1.56	1T-parameter MoE flagship. 1M context. Top Qwen on OpenRouter by token volume. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.6-35b	262K	$0.12	$0.80	35B/3B-active MoE. Drop-in upgrade for Qwen 3.5 with stronger reasoning.
qwen3.5-35b	262K	$0.111	$0.80	Long-context RAG and summarization workhorse. Predictable, non-thinking.
kimi-k2.6	256K	$0.584	$2.79	Opus-class agentic / planning. Best fit when your eval picks Claude Opus.
kimi-k2.7-code	256K	$0.60	$2.80	Moonshot's K2 tuned for long-horizon agentic coding. Moonshot reports ~30% fewer reasoning tokens per task than K2.6.
glm-5.2	1M	$0.80	$3.20	Z.ai's large-scale reasoning flagship. 1M context, built for long-horizon agent workflows and project-level software engineering. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
whisper-large-v3-turbo	25 MB file limit	$0.0004/min	$0	Fast speech-to-text via /v1/audio/transcriptions. Billed by audio duration, rounded to the second.
gemini-2.5-flash	1M	$0.255	$2.125	Google multimodal chat with vision input. Thinking by default. ~15% below Google list pricing.
gemini-2.5-flash-image	1M	$0.255	$25.50	Image generation. Output rate is per 1M tokens for emitted image data.
gemini-2.5-flash-lite	1M	$0.085	$0.34	Cheapest Gemini. High-volume short-turn chat where Flash is overkill.
gemini-3-flash-preview	1M	$0.425	$2.55	Next-gen Flash with stronger reasoning. Preview API; semantics may shift before GA.
gemini-3-pro-image-preview	1M	$1.70	$102.00	Pro-grade image generation. Preview API.
gemini-3.1-pro-preview	1M	$1.70	$10.20	Google's flagship reasoning model. 1M context, thinks deeply. Preview API; semantics may shift before GA.
gemini-3.5-flash	1M	$1.275	$7.65	Next-gen Flash GA from Google. 1M context, thinks by default. Sits between 3 Flash Preview and 3.1 Pro on capability and price.
gemini-2.5-pro	1M	$1.0625	$8.50	Google's proven pro-tier reasoning model. 1M context, thinks deeply. ~15% below Vertex retail.
gemini-3.1-flash-lite	1M	$0.2125	$1.275	Newest low-cost Gemini, non-thinking. 1M context, built for high-volume, latency-sensitive traffic.

Which model should I use?

Default to deepseek-v4-flash for new projects. 1M context, lowest per-token price in the catalog, fast.
Hard reasoning (math, theorem proving, competitive problems) — use deepseek-r1. The chain-of-thought trace counts as output tokens, so expect 3–10× more output than non-thinking models.
Premium reasoning with long context — deepseek-v4-pro. Thinks by default. Maps cleanly to o3-mini at ~6× lower output cost.
Long-document RAG (200K+ tokens) — qwen3.6-35b (or 3.5 if you already have it tuned). 262K context, MoE for low per-token cost.
Opus-class agentic / planning — kimi-k2.6. Higher per-token, but the only QSP model in that quality tier.
Production traffic on an existing V3 integration — keep deepseek-v3. Stable, well-understood, non-thinking by default.
Multimodal (image input) — gemini-2.5-flash or gemini-3-flash-preview. The open-source models in the catalog are text-only.
Image generation — gemini-2.5-flash-image for standard tasks, gemini-3-pro-image-preview for pro-grade output. Both billed per 1M-token output rate on emitted image data.
Audio transcription — whisper-large-v3-turbo. OpenAI-compatible speech-to-text via /v1/audio/transcriptions, priced at $0.0002 per audio minute.
High-volume short-turn chat — gemini-2.5-flash-lite at $0.085 input / $0.34 output.

Thinking vs. non-thinking

The V4 wave (V4 Flash, V4 Pro, Kimi K2.6) and Qwen 3.6 emit a chain-of-thought trace before the final answer. A one-token "Hi" can return ~175 reasoning tokens. To get V3-style cheap chat behavior on these models, pass:

See quickstart for full code →

DeepSeek R1 ignores reasoning.enabled=false— reasoning IS the model. Use V3 or V4 Flash if you don't want the trace.

Per-model deep dives

Each model has its own page with pricing comparisons, FAQs, and quickstart code. Linked here for convenience:

Per-model deep-dive pages for the Gemini family are in progress — the catalog table above carries the canonical pricing and capability summary in the meantime.