Badges — Powered by QuickSilver Pro

Terms — the short version

Badges

Anthropic Mythos-class flagship, 1M context

fast, low-cost, 200K context

Anthropic flagship, 1M context

balanced mid-tier, 1M context

reasoning, math, o1-equivalent

general chat, coding, tool calling

1M ctx, thinks by default, ~74% cheaper than V3

premium reasoning, 1M context

1M context, multimodal, thinking

1M context, image generation

cheapest Gemini, 1M context

pro mid-tier reasoning, 1M context

newest low-cost workhorse, 1M context

flagship reasoning, 1M context

next-gen Flash GA, 1M context

next-gen flash, 1M context

pro image generation

Z.ai reasoning flagship, 1M context

Opus-class reasoning, 256K

Agentic-coding K2, 256K context

262K long-context, RAG

262K long-context, MoE upgrade

1T-MoE flagship, 1M context, thinks by default

Qwen 3.7 flagship, 1M context, thinks by default

Qwen 3.7 agent flagship, 262K context

Launch bonus: we match your first deposit 100%, up to $50 free. Drop in to the official OpenAI SDK and start saving.

When your workload could use open-source quality, QSP is 6×–30× cheaper than Azure's closed catalog. No resource provisioning, no AAD setup.

QuickSilver Pro vs Azure OpenAI

~9% cheaper on DeepSeek R1 output, and DeepSeek V4 / Qwen 3.6 / Kimi K2.6 aren't on Bedrock yet. Drop-in OpenAI SDK — no SigV4 or AWS plumbing.

QuickSilver Pro vs AWS Bedrock

Lower list price on DeepSeek V3; R1 output ~9% cheaper (input at parity). DeepInfra's cache discount may change the math for cache-heavy prompts.

QuickSilver Pro vs DeepInfra

~32% cheaper on V3, ~75% cheaper on R1 output. OpenAI-compatible surface, same tool-calling semantics.

QuickSilver Pro vs Fireworks AI

Different categories: managed token-priced LLM API vs serverless GPU. QSP for stock open-source chat; Modal for custom models or non-LLM workloads.

QuickSilver Pro vs Modal

Managed inference vs containers you deploy on H100s. QSP is faster to ship when you don't have a GPU fleet to fill or strict data-locality requirements.

QuickSilver Pro vs NVIDIA NIM

20% cheaper on DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.6 & 3.5-35B-A3B, and Kimi K2.6 at the per-token level. Same OpenAI-compatible API, two-line migration.

QuickSilver Pro vs OpenRouter

~71% cheaper on DeepSeek R1 output. Largest gap among resellers for reasoning-heavy workloads.

QuickSilver Pro vs Together AI

DeepSeek V3 is ~7× cheaper than Gemini 2.0 Pro on output. Plain OpenAI SDK — no GCP project, service-account JSON, or quota request.

QuickSilver Pro vs Vertex AI

Head-to-head pricing comparisons for QuickSilver Pro vs the competing OpenAI-compatible inference providers, plus per-model use-case guides with quickstart code.

Compare & Use — QuickSilver Pro

Compare & Use

Math, algorithms, multi-step planning. Open-source o1 alternative at 30x less.

Code generation, refactoring, tool-calling agents. $0.16 / $0.616 per 1M tokens.

262K context, 3B active MoE. RAG and long-document summarization at $0.111 input.

One quick question — it helps us know which channels actually reach developers like you.

How did you find QuickSilver Pro?

Check your inbox

Sign in with your email and password.

Sign in

Set a new password for this account, then we'll sign you in automatically.

Choose a new password

Enter your email and we'll send a link to set a new dashboard password.

Reset your password

Beta invite accepted.

This invite has already been claimed.

We'll email you a verification link to finish sign-up. {amount} free credits included.

We'll email you a verification link to finish sign-up. Top up to start — your first credit purchase is doubled, up to $50 free.

Create your account

Reset your password below — we'll email you a fresh reset link.

Account already exists.

Links are valid for 30 minutes. Enter your email below to send a fresh one.

Link expired.

We couldn't verify that link. Enter your email below to receive a new one.

Invalid link.

Stripe payment method

Keep your balance above a floor and recharge automatically with a saved card.

Auto Recharge

Credits are added to your key automatically within seconds. No refunds once applied.

Use the same email you registered with.

Friend or partner gave you a code? Apply it here to add {amount} of credits to this account. One per account.

Have a referral code?

Credits never expire. Pay only for what you use.

Buy Credits

Permanently remove your account, all API keys, and any remaining credits. This cannot be undone.

If a key was exposed (leaked to git, shared in a screenshot, posted in logs), use this to immediately revoke all keys. A fresh replacement key will be generated automatically.

Danger zone

Name

Create API key

will immediately start getting 401 errors. This can't be undone.

Delete this key?

Delete your account?

Editing {alias}. This caps the spend that can be charged to this specific key within a 30-day window; other keys are unaffected.

Monthly spend limit

All existing keys will be permanently revoked and stop working immediately. Any app using them will get 401 errors. A fresh replacement key will be generated so you're not locked out.

Your old keys have been revoked. Copy this new replacement key now — you won't see it again after closing this dialog.

All keys revoked

Revoke all API keys?

Create named keys for each app or environment. All keys share your account balance.

API Keys

Full keys are shown only once at creation time. If a key leaks, delete it and create a new one - other keys keep working.

Keep your keys secret.

Top open-source models at 20% below market.

Models

Paste these three values wherever your agent / CLI / IDE asks for an "OpenAI-compatible endpoint". Keep your key secret.

Connect your agent

Your key is provisioned but balance is zero, so the demo call below will 402 until you load credits. Launch bonus: we match your first deposit 100%, up to $50 free. Pay $5, get $10.

Top up to enable your key.

Copy this command, paste into your terminal. Costs about

Make your first call

Friends signing up with your link get {amount} in credits on their first top-up. We add the same {amount} to your balance at the same time.

Invite friends — get {amount} each

Your balance is depleted ({spent} spent so far). Add credits to keep making API calls — same pricing, same models, pay as you go.

You're out of credits

Your account activity and usage at a glance.

Overview

Agent-friendly:

V4 Flash, V4 Pro, and Kimi K2.6 think by default - pass

in the request body for V3-style cheap chat. (The Qwen 3.6/3.7 models think by default but the gateway suppresses it for you - pass reasoning.enabled=true to opt in. DeepSeek R1 is a dedicated reasoning model - don't send this flag to it.)

Available models:

Vercel AI SDK users:

Drop-in replacement for OpenAI. Change one line.

Quick Start

Spend and requests across models.

Usage

Set daily, weekly and monthly spend caps per team, per person and per project. Overspend is blocked automatically — no surprise invoices.

Central budget control

Run self-hosted open-source models for everyday work and call frontier models only when they earn it — you control the ratio.

Cheap local, capped frontier

Real-time cost dashboards broken down by user, team and model. The AI bill is no longer a black box.

Know exactly where it goes

Stop guessing where your AI budget goes. Set hard limits, see every dollar, and run cheap local models for daily work while reserving frontier models for what truly matters.

Turn AI spend into a controlled, visible line item

Something went wrong sending your request. Please try again in a moment.

Tell us about your environment and we'll scope a private deployment for your team.

Your request is on its way to our team. We'll reach out at the email you provided.

Thanks — we'll be in touch

Book a demo

Runs in your own network or private cloud, even fully air-gapped. Code and conversations never leave your perimeter.

Private deployment

Freely mix and switch local open-source and paid frontier models. The leverage stays with you.

No vendor lock-in

Upstream credentials and every usage log live on your infrastructure — encrypted and auditable.

Your keys, your logs

Private by default — built for compliance and data residency.

Your AI stays inside your boundary

Claude Code, Codex, Cursor and more work out of the box. Developers copy one line of config and go.

Works with the tools devs already use

Server-side translation lets even a self-hosted open-source model power Claude Code — no local proxy, no per-machine setup.

Any model drives any tool

An admin adds an account, the employee signs in and copies their endpoint. Done.

Zero-friction onboarding

Get your whole team productive on day one.

One endpoint, every tool, any model

Multiple admins, team-scoped permissions, and per-team control over which models and providers are allowed.

Roles, teams and access control

Connect your enterprise SSO (OIDC) and enforce TOTP two-factor for secure, compliant access.

SSO and 2-factor

Every call is traceable for internal review and security requirements.

Full audit trail

You decide who can use what.

Enterprise-grade governance

Load balancing and automatic failover across multiple upstreams — a single provider outage won't stop your team.

High availability

We handle deployment, model integration and training. Optional ongoing operations and support.

Turnkey delivery

Make it yours — your company name, logo and interface.

White-label

Production-ready, with people behind it.

Reliable, and we deliver it

Self-hosted AI gateway for enterprises. One private endpoint for Claude Code, Codex and every agent tool. Mix local open-source and frontier models with central budgets, governance and full cost visibility.

QuickSilver Pro Enterprise — Your company's private AI gateway

One private endpoint for every AI coding and agent tool. Run local open-source models alongside paid frontier models — with central budgets, governance, and full cost visibility, all inside your own network.

Your company's own AI gateway

The page you're looking for doesn't exist or has been moved.

Everything you might want to know about QuickSilver Pro — the OpenAI-compatible inference API for DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.6 & 3.5-35B-A3B, and Kimi K2.6.

FAQ — QuickSilver Pro

Frequently asked questions

Plug in your monthly usage — see the cost on QSP vs every competitor.

How much would you save?

output with stable exit codes — Claude Code, Cursor, Aider can call it without parsing HTML.

Create an account, buy credits, get your API key in 30 seconds.

Start saving on inference today

Common questions

The 9 most popular open-source models — DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.7 Max + 3.6 Plus + 3.6 + 3.5, Kimi K2.6 — through an OpenAI-compatible API. Cheaper than every other reseller. Change one line of code.

20% below the rest.

Open-source inference,

Customers save 20% today on a curated catalog. The gap comes from a narrow operational surface and tight engineering, not margin shaving. Phase 2 widens the gap as more of the stack moves in-house.

Today: launched on a curated catalog

We're building a self-hosted serving layer on H100/H200 using SGLang + continuous batching, EAGLE-3 speculative decoding, FP8 quantization via DeepGEMM, and SageAttention / ThunderMLA custom kernels. Target: another 30-50% below current prices on DeepSeek V3.

Next: our own inference stack on dedicated GPUs

Weights are public — we can actually run and optimize them. Closed models (GPT-4, Claude) don't expose weights, so no amount of infra work makes them cheaper. That's why our catalog is 7 open models we can verify, route, and eventually host ourselves.

Why open-source is the only way there

Per 1M tokens for text models · per image or per audio minute where noted.

Cheapest open-source inference

DeepSeek V3 for tool-calling agents →

Qwen3.5-35B-A3B for 262K RAG →

DeepSeek R1 for math & algorithms →

vs DeepInfra

vs Fireworks

vs OpenAI

vs OpenRouter

vs Together AI

OpenAI-compatible API for top open-source LLMs — Qwen 3.7 Max & 3.6 Plus (new), DeepSeek V4 Flash & Pro, V3, R1, Kimi K2.6 — 20% cheaper than OpenRouter, Together AI, Fireworks. One-line drop-in. Launch bonus: match 100% of your first credit purchase, up to $50 free.

Function calling

Best models for tool use

Python — single tool

Parallel tool calls

Streaming tool calls

Strict mode