QuickSilver Pro vs Google Vertex AI
Vertex AI is GCP's managed inference plane: Gemini, Claude on Vertex, Llama, plus a Model Garden of open weights. It's the right tool when GCP-native integration (IAM, BigQuery sources, Vertex Search) is load-bearing. For everyone else, QuickSilver Pro serves DeepSeek V4 Flash + Pro, V3, R1, Qwen 3.7 Max + 3.6 Plus + 3.6 + 3.5, and Kimi K2.6 through a plain OpenAI-compatible API — no GCP project, no service-account JSON, no quota requests.
At a glance
| Feature | QuickSilver Pro | vertex-ai |
|---|---|---|
| Model focus | 9 open-source LLMs (DeepSeek, Qwen, Kimi) | Gemini family, Claude on Vertex, Llama, Mistral, Model Garden |
| API surface | OpenAI-compatible (drop-in) | Vertex SDK / REST with GCP auth |
| DeepSeek V4 Pro output | $0.696 / 1M | n/a (not in Model Garden as of 2026) |
| DeepSeek R1 output | $2.00 / 1M | varies by Model Garden host |
| GCP IAM + Private Service Connect | No | Yes |
| Vertex Search / Vector Search | No | Yes |
| Setup | Sign up, paste key | GCP project + quota + service account |
Pricing (per million tokens, USD)
Public list prices as of May 2026.
| Model | QSP input | QSP output | vertex-ai input | vertex-ai output | Savings |
|---|---|---|---|---|---|
| deepseek-v4-flash vs gemini-2.0-flash | $0.08 | $0.16 | $0.075 | $0.30 | Gemini cheaper input, V4 Flash cheaper output |
| deepseek-v3 vs gemini-2.0-pro | $0.16 | $0.616 | $1.25 | $5.00 | ~8x |
| deepseek-r1 vs gemini-2.0-pro (reasoning) | $0.56 | $2.00 | $1.25 | $5.00 | ~2.5x |
| qwen3.6-35b vs gemini-2.0-flash | $0.12 | $0.80 | $0.075 | $0.30 | Gemini cheaper, Qwen has 262K context |
Migration - two lines
import os
from openai import OpenAI
# Was: aiplatform.init(project=..., location=...); GenerativeModel(...)
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key=os.environ["QSP_KEY"],
)
r = client.chat.completions.create(
model="deepseek-v3", # or qwen3.6-35b for long-context RAG
messages=[{"role": "user", "content": "Hi"}],
)FAQ
For pure-price cheap chat, Gemini 2.0 Flash is competitive with V4 Flash on input ($0.075 vs $0.08) but more expensive on output ($0.30 vs $0.16). For general chat against Gemini 2.0 Pro, DeepSeek V3 is ~8x cheaper on output. For reasoning workloads, DeepSeek R1 is ~2.5x cheaper than Gemini 2.0 Pro on output. The open-source vs closed quality gap on text-only tasks is small for most production workloads.
No — Vertex's SDK is GCP-shaped (project / location / publisher / model / endpoint). Use the openai SDK with base_url=https://api.quicksilverpro.io/v1 and a QSP key. Migration is typically one file: the OpenAI shape covers streaming, tool calling, structured output, and usage accounting without GCP IAM setup.
When GCP IAM, Private Service Connect, BigQuery-as-source RAG, Vertex Search, or sovereign-cloud regions are non-negotiable. Also when you need Gemini's vision / video / native multimodal capabilities or Claude on Vertex for the same procurement reason. QSP is for the chat / coding / reasoning slice where open-source DeepSeek / Qwen is quality-equivalent.
No. QuickSilver Pro is a curated 7-model catalog focused on the highest-quality open-source LLMs at the lowest sustainable per-token price. We don't offer per-customer fine-tunes, embeddings, or non-LLM modalities. Vertex's Model Garden is broader; QSP is deeper on a smaller surface.