Should I use QuickSilver Pro or Modal?

If your workload runs on stock open-source LLMs (DeepSeek, Qwen, Kimi, Llama), QSP is cheaper and faster to ship. If you need a custom-finetuned model, image generation, embeddings, or anything outside a chat-completions shape, Modal's serverless GPU is the right tool.

When is Modal self-host cheaper than QSP?

When sustained GPU utilization is above roughly 60%. Spiky traffic loses badly because every cold-start to a freshly loaded LLM is wasted GPU-seconds and Modal's per-second billing means idle scale-up time is paid for.

Home/Compare/vs modal

Comparison

QuickSilver Pro vs Modal

Q: Can I run QSP's models on Modal?

Yes — DeepSeek, Qwen, and Kimi weights are open-source (MIT / Apache), so you can load them on Modal yourself. Whether that pays for itself is a utilization-and-engineering-cost question; for most teams the QSP managed price wins.

Modal isn't an LLM API the way QuickSilver Pro is — it's serverless GPU compute. You ship Python code that loads a model and serves it, pay per GPU-second, and Modal handles cold starts and scaling. QSP is the opposite trade: you give up the ability to run custom models, and in exchange every call is one HTTP request to a managed OpenAI-compatible endpoint. This page exists for teams comparing both approaches — most projects pick one or run them side-by-side.

At a glance

Feature	QuickSilver Pro	modal
Product shape	Managed inference API	Serverless GPU compute
What you bring	An API key	Python code + a model
Cost shape	Per token	Per GPU-second
Custom / finetuned models	No (curated catalog only)	Yes
Cold start	None	Seconds (model load) per scale-up
DeepSeek / Qwen / Kimi out of the box	Yes (7 LLMs)	BYO image
Setup	Sign up, paste key	modal CLI + container + GPU plumbing

Pricing (per million tokens, USD)

Public list prices as of May 2026.

Model	QSP input	QSP output	modal input	modal output	Savings
DeepSeek R1 (managed)	$0.56	$2.00	~$2/hr H100	+ engineering	depends on traffic
Custom finetuned model	—	—	Pay per GPU-sec	BYO	Modal only
Qwen 3.6-35B (long context)	$0.12	$0.80	BYO image	+ ops	QSP for managed RAG

Migration - two lines

After - QuickSilver Pro

# QSP isn't a drop-in for Modal -- they're different categories.
# But if you're moving Modal-hosted LLM inference behind an
# OpenAI-compatible interface, here's the QSP equivalent:
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key=os.environ["QSP_KEY"],
)

r = client.chat.completions.create(
    model="deepseek-v3",  # or whichever LLM your Modal app served
    messages=[{"role": "user", "content": "Hi"}],
)

FAQ

Should I use QSP or Modal?

If your workload runs on stock open-source LLMs (DeepSeek, Qwen, Kimi, Llama), QSP is cheaper and faster to ship: zero ops, OpenAI-compatible, per-token billing. If you need to run a custom-finetuned model, a non-LLM workload (image generation, embeddings, transcription), or anything that doesn't fit a chat-completions shape, Modal's serverless GPU is the right tool. Many teams use both: QSP for chat, Modal for custom models.

When does Modal-self-host beat QSP managed cost?

Same math as NIM self-host. An H100 on Modal at ~$2/hr serves ~200 R1 tokens/sec; the break-even vs QSP's $2.00/M-output price is roughly sustained >60% GPU utilization. Spiky traffic loses badly: every cold start is wasted GPU-seconds, and Modal's per-second billing means idle scale-up time is paid for.

Can I run QSP's models on Modal?

The weights are open-source (DeepSeek, Qwen, Kimi are MIT / Apache), so yes — you can load them on Modal yourself. The question is whether owning that integration pays for itself: QSP serves V3 at $0.616/M output; matching that on Modal requires sustained utilization above the cross-over point and is the engineering team's full-time job. For most teams, the managed price wins.

What about cold starts?

QSP has none — it's a managed shared service, models are always warm. Modal's cold-start to a fresh GPU + loaded LLM is in the seconds for small models and longer for 70B+ class. For latency-sensitive workloads (interactive chat, agents), QSP is the safer default.

Try it with double credits — up to $50 free

Change two lines, save 20% instantly.

Get API Key