QuickSilver Pro vs NVIDIA NIM
NIM is NVIDIA's containerized inference: ship a Docker image, deploy on your own H100s/H200s, or call the hosted endpoint at build.nvidia.com. It's the right fit when you have GPU capacity to fill or strict data-locality requirements that rule out shared inference. For everyone else, QuickSilver Pro runs the same DeepSeek and Qwen weights as a managed OpenAI-compatible service — no Kubernetes, no GPU operator, no Triton configs.
At a glance
| Feature | QuickSilver Pro | nvidia-nim |
|---|---|---|
| Deployment model | Managed (shared) | Self-host containers or hosted on build.nvidia.com |
| API surface | OpenAI-compatible | OpenAI-compatible (NIM exposes /v1) |
| Open-source model catalog | 9 (V4 Flash + Pro, V3, R1, Qwen 3.7 Max + 3.6 Plus + 3.6 + 3.5, Kimi K2.6) | Large; varies by NIM image |
| Ops burden | None | Kubernetes / Triton / NGC pulls / driver versions |
| Cost shape | Pay per token | Pay per GPU-hour (self-host) or per-token (hosted) |
| Minimum top-up | $5 | GPU reservation or NGC credit |
| Best for | Devs who want to ship today | Teams with GPU fleets or data-locality requirements |
Pricing (per million tokens, USD)
Public list prices as of May 2026.
| Model | QSP input | QSP output | nvidia-nim input | nvidia-nim output | Savings |
|---|---|---|---|---|---|
| DeepSeek R1 | $0.56 | $2.00 | ~$0.30 | ~$2.00 | input ~higher, output ~comparable |
| DeepSeek V3 | $0.16 | $0.616 | varies | varies | case by case |
| Self-host on H100 | $0.56 | $2.00 | ~$2/hr GPU | + ops | depends on utilization |
Migration - two lines
import os
from openai import OpenAI
# Was: OpenAI(base_url="https://integrate.api.nvidia.com/v1", ...)
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key=os.environ["QSP_KEY"],
)
r = client.chat.completions.create(
model="deepseek-r1",
messages=[{"role": "user", "content": "Hi"}],
)FAQ
When you already pay for GPUs that would otherwise sit idle, when data-locality forces inference to specific regions, or when you need a custom-finetuned model. The break-even math: an H100 at ~$2/hr serves ~200 R1 tokens/sec, so the cost-per-token only beats QSP when your H100 is sustainably above ~60% utilization. Below that, you're paying for idle GPUs.
Pricing varies by model and tier. On DeepSeek R1 output, the hosted-NIM rate has historically been comparable to QSP; QSP holds the line at $0.56 / $2.00 with a transparent OpenAI-compatible surface. Sign-up friction is lower on QSP (paste a key) than on NGC (NVIDIA developer account + quota).
Yes — NIM exposes an OpenAI-compatible /v1 endpoint, so swapping base_url is enough. Same SDK, same shape, same streaming / tool-calling behavior. The model ID changes from NIM's qualified name (e.g. nvidia/deepseek-r1) to QSP's simpler ID (deepseek-r1).
Not disclosed publicly. The contract we make to callers is OpenAI-compatible chat + tool calling + JSON schema + usage accounting, with the listed per-token price and uptime — how that's delivered is an implementation detail that's evolved over time and will continue to. Status and per-model latency are public at /status.