QuickSilver Pro system status
Services
Model availability
qwen3.5-35bdeepseek-v3deepseek-r1Roadmap - how we become a real inference company
Now - bridge phase while our GPU capacity comes online
LiveA transitional tactic so customers can already save 20% today. Requests are dispatched to the cheapest healthy open-source backend at that instant; because we focus on only three models the route table stays hot and verified. Downside of the bridge: system_fingerprint cannot be stable - backend varies by call. Stable fingerprints land with Phase 2.
Q2 2026 - our own inference stack on H100/H200
PlannedSelf-hosted serving on dedicated GPUs using SGLang + continuous batching, EAGLE-3 speculative decoding, FP8 quantization via DeepGEMM, and SageAttention / ThunderMLA custom kernels. At that point system_fingerprint becomes stable (it changes only when we rev the stack), and repeatable-seed workflows start working properly. Target: 30-50% below current prices on DeepSeek V3.
H2 2026 - colocated data center + AIDC partnerships
FutureMove from rented (Vast.ai) to self-owned or colocated racks. Partner with AI-datacenter operators where that makes sense. The goal is the cheapest reliable inference for open-source models on the planet - full stack, our engineering.
About this page.Service rows run client-side probes from your browser. Model rows reflect a real 1-token probe sent server-side every 3 minutes from our backend. Historical bars show the results of recent probes stored in this browser's localStorage; cleared if you switch devices.
Public uptime tracking began 2026-04-16. For a contractual SLA and third-party-monitored history, contact us.