How do I migrate from NIM to QuickSilver Pro?

NIM exposes an OpenAI-compatible /v1 endpoint, so swapping base_url to https://api.quicksilverpro.io/v1 is enough. Model IDs drop the nvidia/ prefix.

Home/Compare/vs nvidia-nim

Comparison

QuickSilver Pro vs NVIDIA NIM

Q: When does NIM self-hosting beat managed QSP?

When you already pay for GPUs at sustained >60% utilization, when data-locality forces inference to specific regions, or when you need a custom finetune. Below ~60% H100 utilization, NIM self-host is more expensive per token than QSP managed.

Q: How does QSP compare to hosted NIM at build.nvidia.com?

Pricing is comparable on most shared models, with QSP at $0.56 / $2.00 for DeepSeek R1. Sign-up friction is lower on QSP (paste a key) than on NGC (developer account + quota request).

NIM is NVIDIA's containerized inference: ship a Docker image, deploy on your own H100s/H200s, or call the hosted endpoint at build.nvidia.com. It's the right fit when you have GPU capacity to fill or strict data-locality requirements that rule out shared inference. For everyone else, QuickSilver Pro runs the same DeepSeek and Qwen weights as a managed OpenAI-compatible service — no Kubernetes, no GPU operator, no Triton configs.

At a glance

Feature	QuickSilver Pro	nvidia-nim
Deployment model	Managed (shared)	Self-host containers or hosted on build.nvidia.com
API surface	OpenAI-compatible	OpenAI-compatible (NIM exposes /v1)
Open-source model catalog	9 (V4 Flash + Pro, V3, R1, Qwen 3.7 Max + 3.6 Plus + 3.6 + 3.5, Kimi K2.6)	Large; varies by NIM image
Ops burden	None	Kubernetes / Triton / NGC pulls / driver versions
Cost shape	Pay per token	Pay per GPU-hour (self-host) or per-token (hosted)
Minimum top-up	$5	GPU reservation or NGC credit
Best for	Devs who want to ship today	Teams with GPU fleets or data-locality requirements

Pricing (per million tokens, USD)

Public list prices as of May 2026.

Model	QSP input	QSP output	nvidia-nim input	nvidia-nim output	Savings
DeepSeek R1	$0.56	$2.00	~$0.30	~$2.00	input ~higher, output ~comparable
DeepSeek V3	$0.16	$0.616	varies	varies	case by case
Self-host on H100	$0.56	$2.00	~$2/hr GPU	+ ops	depends on utilization

Migration - two lines

After - QuickSilver Pro

import os
from openai import OpenAI

# Was: OpenAI(base_url="https://integrate.api.nvidia.com/v1", ...)
client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key=os.environ["QSP_KEY"],
)

r = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hi"}],
)

FAQ

When does self-hosting NIM beat managed QSP?

When you already pay for GPUs that would otherwise sit idle, when data-locality forces inference to specific regions, or when you need a custom-finetuned model. The break-even math: an H100 at ~$2/hr serves ~200 R1 tokens/sec, so the cost-per-token only beats QSP when your H100 is sustainably above ~60% utilization. Below that, you're paying for idle GPUs.

How does QSP compare to hosted NIM at build.nvidia.com?

Pricing varies by model and tier. On DeepSeek R1 output, the hosted-NIM rate has historically been comparable to QSP; QSP holds the line at $0.56 / $2.00 with a transparent OpenAI-compatible surface. Sign-up friction is lower on QSP (paste a key) than on NGC (NVIDIA developer account + quota).

Can I migrate from NIM to QSP?

Yes — NIM exposes an OpenAI-compatible /v1 endpoint, so swapping base_url is enough. Same SDK, same shape, same streaming / tool-calling behavior. The model ID changes from NIM's qualified name (e.g. nvidia/deepseek-r1) to QSP's simpler ID (deepseek-r1).

What inference stack does QSP run?

Not disclosed publicly. The contract we make to callers is OpenAI-compatible chat + tool calling + JSON schema + usage accounting, with the listed per-token price and uptime — how that's delivered is an implementation detail that's evolved over time and will continue to. Status and per-model latency are public at /status.

Try it with double credits — up to $50 free

Change two lines, save 20% instantly.

Get API Key