Models

Model catalog

19 models through one OpenAI-compatible API: open-source LLMs (DeepSeek, Qwen including 3.7 Max & 3.6 Plus, Kimi), Whisper 1 for transcription, plus Google's Gemini family for multimodal chat, reasoning, and image generation. Token-priced models are listed per 1M tokens; Whisper is billed per audio minute. USD, as of May 2026. The model ID is what you pass in the request body.

At a glance

Model IDContextInputOutputNotes
deepseek-v4-flash1M$0.08$0.16Cheap chat & coding. Thinks by default — pass reasoning.enabled=false for V3-style replies.
deepseek-v4-pro1M$0.348$0.696Premium reasoning with 1M-token context. Maps to o3-mini at ~6× lower output cost.
deepseek-v3128K$0.16$0.616General chat, coding, tool calling, structured output. Non-thinking by default.
deepseek-r1128K$0.56$2.00Reasoning specialist. Emits a long chain-of-thought trace; output tokens include the reasoning.
qwen3.7-max1M$2.00$6.00Qwen 3.7 flagship. Agent-centric, coding & productivity tasks. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.7-plus262K$0.256$1.024Alibaba's hosted agent flagship. Long-running coding/agent loops, 262K context. ~6× cheaper on output than 3.7 Max. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.6-plus1M$0.26$1.561T-parameter MoE flagship. 1M context. Top Qwen on OpenRouter by token volume. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
qwen3.6-35b262K$0.12$0.8035B/3B-active MoE. Drop-in upgrade for Qwen 3.5 with stronger reasoning.
qwen3.5-35b262K$0.111$0.80Long-context RAG and summarization workhorse. Predictable, non-thinking.
kimi-k2.6256K$0.584$2.79Opus-class agentic / planning. Best fit when your eval picks Claude Opus.
kimi-k2.7-code256K$0.60$2.80Moonshot's K2 tuned for long-horizon agentic coding. Moonshot reports ~30% fewer reasoning tokens per task than K2.6.
glm-5.21M$0.80$3.20Z.ai's large-scale reasoning flagship. 1M context, built for long-horizon agent workflows and project-level software engineering. Thinks by default — the gateway suppresses thinking by default; pass reasoning.enabled=true to opt into the reasoning trace.
whisper-large-v3-turbo25 MB file limit$0.0004/min$0Fast speech-to-text via /v1/audio/transcriptions. Billed by audio duration, rounded to the second.
gemini-2.5-flash1M$0.255$2.125Google multimodal chat with vision input. Thinking by default. ~15% below Google list pricing.
gemini-2.5-flash-image1M$0.255$25.50Image generation. Output rate is per 1M tokens for emitted image data.
gemini-2.5-flash-lite1M$0.085$0.34Cheapest Gemini. High-volume short-turn chat where Flash is overkill.
gemini-3-flash-preview1M$0.425$2.55Next-gen Flash with stronger reasoning. Preview API; semantics may shift before GA.
gemini-3-pro-image-preview1M$1.70$102.00Pro-grade image generation. Preview API.
gemini-3.1-pro-preview1M$1.70$10.20Google's flagship reasoning model. 1M context, thinks deeply. Preview API; semantics may shift before GA.
gemini-3.5-flash1M$1.275$7.65Next-gen Flash GA from Google. 1M context, thinks by default. Sits between 3 Flash Preview and 3.1 Pro on capability and price.
gemini-2.5-pro1M$1.0625$8.50Google's proven pro-tier reasoning model. 1M context, thinks deeply. ~15% below Vertex retail.
gemini-3.1-flash-lite1M$0.2125$1.275Newest low-cost Gemini, non-thinking. 1M context, built for high-volume, latency-sensitive traffic.

Which model should I use?

  • Default to deepseek-v4-flash for new projects. 1M context, lowest per-token price in the catalog, fast.
  • Hard reasoning (math, theorem proving, competitive problems) — use deepseek-r1. The chain-of-thought trace counts as output tokens, so expect 3–10× more output than non-thinking models.
  • Premium reasoning with long context deepseek-v4-pro. Thinks by default. Maps cleanly to o3-mini at ~6× lower output cost.
  • Long-document RAG (200K+ tokens) qwen3.6-35b (or 3.5 if you already have it tuned). 262K context, MoE for low per-token cost.
  • Opus-class agentic / planning kimi-k2.6. Higher per-token, but the only QSP model in that quality tier.
  • Production traffic on an existing V3 integration — keep deepseek-v3. Stable, well-understood, non-thinking by default.
  • Multimodal (image input) gemini-2.5-flash or gemini-3-flash-preview. The open-source models in the catalog are text-only.
  • Image generation gemini-2.5-flash-image for standard tasks, gemini-3-pro-image-preview for pro-grade output. Both billed per 1M-token output rate on emitted image data.
  • Audio transcription whisper-large-v3-turbo. OpenAI-compatible speech-to-text via /v1/audio/transcriptions, priced at $0.0002 per audio minute.
  • High-volume short-turn chat gemini-2.5-flash-lite at $0.085 input / $0.34 output.

Thinking vs. non-thinking

The V4 wave (V4 Flash, V4 Pro, Kimi K2.6) and Qwen 3.6 emit a chain-of-thought trace before the final answer. A one-token "Hi" can return ~175 reasoning tokens. To get V3-style cheap chat behavior on these models, pass:

See quickstart for full code →

DeepSeek R1 ignores reasoning.enabled=false— reasoning IS the model. Use V3 or V4 Flash if you don't want the trace.

Per-model deep dives

Each model has its own page with pricing comparisons, FAQs, and quickstart code. Linked here for convenience:

Per-model deep-dive pages for the Gemini family are in progress — the catalog table above carries the canonical pricing and capability summary in the meantime.