Model catalog
19 models through one OpenAI-compatible API: open-source LLMs (DeepSeek, Qwen including 3.7 Max & 3.6 Plus, Kimi), Whisper 1 for transcription, plus Google's Gemini family for multimodal chat, reasoning, and image generation. Token-priced models are listed per 1M tokens; Whisper is billed per audio minute. USD, as of May 2026. The model ID is what you pass in the request body.
At a glance
| Model ID | Context | Input | Output | Notes |
|---|---|---|---|---|
| deepseek-v4-flash | 1M | $0.08 | $0.16 | Cheap chat & coding. Thinks by default — pass reasoning.enabled=false for V3-style replies. |
| deepseek-v4-pro | 1M | $0.348 | $0.696 | Premium reasoning with 1M-token context. Maps to o3-mini at ~6× lower output cost. |
| deepseek-v3 | 128K | $0.16 | $0.616 | General chat, coding, tool calling, structured output. Non-thinking by default. |
| deepseek-r1 | 128K | $0.56 | $2.00 | Reasoning specialist. Emits a long chain-of-thought trace; output tokens include the reasoning. |
| qwen3.7-max | 1M | $2.00 | $6.00 | Qwen 3.7 flagship. Agent-centric, coding & productivity tasks. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace. |
| qwen3.7-plus | 262K | $0.256 | $1.024 | Alibaba's hosted agent flagship. Long-running coding/agent loops, 262K context. ~6× cheaper on output than 3.7 Max. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace. |
| qwen3.6-plus | 1M | $0.26 | $1.56 | 1T-parameter MoE flagship. 1M context. Top Qwen on OpenRouter by token volume. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace. |
| qwen3.6-35b | 262K | $0.12 | $0.80 | 35B/3B-active MoE. Drop-in upgrade for Qwen 3.5 with stronger reasoning. |
| qwen3.5-35b | 262K | $0.111 | $0.80 | Long-context RAG and summarization workhorse. Predictable, non-thinking. |
| kimi-k2.6 | 256K | $0.584 | $2.79 | Opus-class agentic / planning. Best fit when your eval picks Claude Opus. |
| kimi-k2.7-code | 256K | $0.60 | $2.80 | Moonshot's K2 tuned for long-horizon agentic coding. Moonshot reports ~30% fewer reasoning tokens per task than K2.6. |
| glm-5.2 | 1M | $0.80 | $3.20 | Z.ai's large-scale reasoning flagship. 1M context, built for long-horizon agent workflows and project-level software engineering. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace. |
| whisper-large-v3-turbo | 25 MB file limit | $0.0004/min | $0 | Fast speech-to-text via /v1/audio/transcriptions. Billed by audio duration, rounded to the second. |
| gemini-2.5-flash | 1M | $0.255 | $2.125 | Google multimodal chat with vision input. Thinking by default. ~15% below Google list pricing. |
| gemini-2.5-flash-image | 1M | $0.255 | $25.50 | Image generation. Output rate is per 1M tokens for emitted image data. |
| gemini-2.5-flash-lite | 1M | $0.085 | $0.34 | Cheapest Gemini. High-volume short-turn chat where Flash is overkill. |
| gemini-3-flash-preview | 1M | $0.425 | $2.55 | Next-gen Flash with stronger reasoning. Preview API; semantics may shift before GA. |
| gemini-3-pro-image-preview | 1M | $1.70 | $102.00 | Pro-grade image generation. Preview API. |
| gemini-3.1-pro-preview | 1M | $1.70 | $10.20 | Google's flagship reasoning model. 1M context, thinks deeply. Preview API; semantics may shift before GA. |
| gemini-3.5-flash | 1M | $1.275 | $7.65 | Next-gen Flash GA from Google. 1M context, thinks by default. Sits between 3 Flash Preview and 3.1 Pro on capability and price. |
| gemini-2.5-pro | 1M | $1.0625 | $8.50 | Google's proven pro-tier reasoning model. 1M context, thinks deeply. ~15% below Vertex retail. |
| gemini-3.1-flash-lite | 1M | $0.2125 | $1.275 | Newest low-cost Gemini, non-thinking. 1M context, built for high-volume, latency-sensitive traffic. |
Which model should I use?
- Default to
deepseek-v4-flashfor new projects. 1M context, lowest per-token price in the catalog, fast. - Hard reasoning (math, theorem proving, competitive problems) — use
deepseek-r1. The chain-of-thought trace counts as output tokens, so expect 3–10× more output than non-thinking models. - Premium reasoning with long context —
deepseek-v4-pro. Thinks by default. Maps cleanly to o3-mini at ~6× lower output cost. - Long-document RAG (200K+ tokens) —
qwen3.6-35b(or 3.5 if you already have it tuned). 262K context, MoE for low per-token cost. - Opus-class agentic / planning —
kimi-k2.6. Higher per-token, but the only QSP model in that quality tier. - Production traffic on an existing V3 integration — keep
deepseek-v3. Stable, well-understood, non-thinking by default. - Multimodal (image input) —
gemini-2.5-flashorgemini-3-flash-preview. The open-source models in the catalog are text-only. - Image generation —
gemini-2.5-flash-imagefor standard tasks,gemini-3-pro-image-previewfor pro-grade output. Both billed per 1M-token output rate on emitted image data. - Audio transcription —
whisper-large-v3-turbo. OpenAI-compatible speech-to-text via/v1/audio/transcriptions, priced at $0.0002 per audio minute. - High-volume short-turn chat —
gemini-2.5-flash-liteat $0.085 input / $0.34 output.
Thinking vs. non-thinking
The V4 wave (V4 Flash, V4 Pro, Kimi K2.6) and Qwen 3.6 emit a chain-of-thought trace before the final answer. A one-token "Hi" can return ~175 reasoning tokens. To get V3-style cheap chat behavior on these models, pass:
DeepSeek R1 ignores reasoning.enabled=false— reasoning IS the model. Use V3 or V4 Flash if you don't want the trace.
Per-model deep dives
Each model has its own page with pricing comparisons, FAQs, and quickstart code. Linked here for convenience:
Per-model deep-dive pages for the Gemini family are in progress — the catalog table above carries the canonical pricing and capability summary in the meantime.