What you can verify about QuickSilver Pro

The LLM API resale market has a real fraud problem: some providers quietly serve a cheaper or heavily-quantized model while billing for the premium one. It works because most users can't tell — you get an answer, not a receipt for which model produced it. We think the answer is verifiability, not promises. Below are the commitments we hold ourselves to, and concrete checks you can run to hold us to them.

Our commitments

The model you request is the model you get

When you call deepseek-r1, you get DeepSeek R1 — never a cheaper model quietly swapped in to widen our margin. The model field on every response echoes back exactly the model ID you sent. We do not reroute requests to a different model behind your back.

Full-precision weights — no quantized substitutes

We serve each model at the publisher's reference precision. We do not silently drop to an 8-bit or 4-bit quantized copy and bill it as the original — aggressive quantization is the most common way a reseller cuts GPU cost while keeping the same model name on the invoice, and it's the substitution that statistical probes catch least well, so this commitment matters most.

Published, checkable specs

Every model's context window, pricing, and reasoning behavior is documented on its model page and in a machine-readable feed at /pricing.json. There is no hidden 'house' model and no undocumented routing tier.

Transparent pricing you can audit

Per-token rates are public. Every response carries a synthetic usage.cost field computed from those published rates, so you can reconcile your spend per request without trusting a separate billing endpoint. The full catalog is one fetch away at /pricing.json.

Your prompts are not training data

We retain only token counts and timing — the minimum required for billing. Request and response content is never retained beyond that and never used to train any model. The specifics are in our Privacy Policy and Data Processing Addendum.

A real, accountable company

QuickSilver Pro is built by the MachineFi Labs team and operated by MachineFi Inc., a Delaware corporation in Menlo Park, California — the same team behind IoTeX, public infrastructure running since 2017. Not an anonymous operator behind a payment page.

Public status and latency

Per-model uptime and latency are published live at /status. When a model degrades, you can see it — you don't have to take our word that everything is fine.

How to verify what you're getting

Don't take our word for it. None of these checks require anything beyond the OpenAI-compatible API you already use — run them against us, and against any provider you're comparing.

1
Confirm the echoed model ID
Every chat-completions response includes a model field. Confirm it matches what you requested. It's the cheapest check and a provider doing crude substitution often leaks an inconsistent or generic value here.
2
Run a distribution test against a reference
The most rigorous text-only check: sample many completions for a fixed prompt set from QuickSilver Pro and from a trusted copy of the same open-weight model — run it yourself, or use another provider — then apply a two-sample statistical test (the published Model Equality Testing method, ICLR 2025). It catches a swapped model and heavy quantization, and it works on our DeepSeek, Qwen, and Kimi models precisely because their weights are open, so a reference is obtainable. When researchers ran this on 31 public Llama endpoints, 11 were not serving the publisher's weights. We invite the same test on ours.
3
Fingerprint the tokenizer with glitch tokens
Glitch tokens — odd under-trained strings like SolidGoldMagikarp — break in ways specific to a model family's tokenizer. Ask the model to repeat a few; the failure pattern fingerprints the family. A model that claims to be DeepSeek but glitches like a Llama is a red flag.
4
Probe reasoning models for a real reasoning trace
Send a hard multi-step problem to a thinking model (deepseek-r1, deepseek-v4-pro). A genuine reasoning model returns a long chain-of-thought and a high usage.completion_tokens; a cheap non-reasoning model swapped in its place returns a short, direct answer.
5
Probe the context window
Send a prompt longer than a cheaper model could accept — for example, >200K tokens to a model documented at 1M context. The real model processes it; an under-spec substitute errors or silently truncates.
6
Reconcile usage.cost against /pricing.json
Multiply your token counts by the per-token rates in /pricing.json and confirm the usage.cost field on each response agrees. A mismatch means either a pricing bug or an undisclosed model tier.

⚠

One check that does NOT work: asking the model "what model are you?". Models infer their identity from the system prompt and their training data — any model trained on GPT-4 transcripts will happily claim to be GPT-4. Self-identification proves nothing.

Be honest with yourself about the limits: no end-user text probe proves model identity with certainty, and very heavy quantization can evade even the statistical tests. The only real cryptographic proof is hardware (TEE) attestation — the response signed inside a secure enclave bound to the loaded model. That's provider-side work and it's on our roadmap; we'll publish here when it's live. Until then, running these probes — against us and against anyone you compare us to — makes silent substitution expensive to hide. If a check ever fails on QuickSilver Pro, tell us — we treat it as a P0.

Machine-readable pricing Live status Privacy Policy Data Processing Addendum Pricing comparisons

Verify it for yourself

Top up $5, run the checks above, and decide. First deposit matched 100%, up to $50 free.

Get API Key

What you can verify about QuickSilver Pro

Our commitments

How to verify what you're getting

Confirm the echoed model ID

Run a distribution test against a reference

Fingerprint the tokenizer with glitch tokens

Probe reasoning models for a real reasoning trace

Probe the context window

Reconcile usage.cost against /pricing.json

More

Verify it for yourself