DeepSeek R1 for reasoning
DeepSeek R1 is an open-source reasoning model trained with RL to emit explicit chain-of-thought. It's competitive with OpenAI o1 on AIME and MATH benchmarks, while costing ~35x less: $0.40 input / $1.70 output per 1M tokens on QuickSilver Pro vs o1's $15 / $60. For math, code challenges, and logic-heavy agent loops, R1 is the open-source default.
What R1 is good at
Math: Strong on AIME-2024, MATH-500, and Olympiad-level problems. The reasoning trace walks through derivations; final answer appears in content.
Algorithms: Competitive-programming-grade code generation. LiveCodeBench and Codeforces benchmark scores rival o1. Better than V3 for novel-algorithm tasks; slower because of CoT.
Multi-step planning: Useful in agent loops where the planner needs to decompose before acting. Each planning call has explicit reasoning, which improves tool-use decisions.
When R1 is worth the extra tokens
Use R1 for: math word problems, novel algorithm design, logic puzzles, theorem proving, multi-step tool planning, hard debugging. Tasks where the reasoning step is where the model earns its keep.
Skip R1 for: factual Q&A, code completion, summarization, entity extraction, simple classification, translation. V3 is cheaper, faster, and quality is equivalent on non-reasoning tasks.
Cost calibration: a 2000-word essay takes V3 ~600 output tokens ($0.42/1000 essays). R1 on the same essay takes ~2500 output tokens including reasoning trace ($4.25/1000 essays). 10x premium. Reserve R1 for when that premium buys something.
Quickstart code
from openai import OpenAI
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key="sk-qsp-...",
)
resp = client.chat.completions.create(
model="deepseek-r1",
messages=[{
"role": "user",
"content": "A box has 12 red and 8 blue balls. Three drawn without replacement. Probability exactly two are red?",
}],
)
# Chain-of-thought reasoning:
print(resp.choices[0].message.reasoning_content)
# Final answer:
print(resp.choices[0].message.content)
print(f"Output tokens: {resp.usage.completion_tokens}")
print(f"Cost: ${resp.usage.cost:.6f}")FAQ
Is DeepSeek R1 as good as o1?
On published math (AIME-2024, MATH-500), coding (LiveCodeBench, Codeforces), and reasoning (GPQA Diamond) benchmarks, DeepSeek R1 is within a few points of o1 and exceeds o1-mini on most. For production use at 35x lower cost, it's the open-source equivalent.
How long are the reasoning traces?
Typical range is 500-3000 tokens. For hard problems (IMO-grade math), traces can exceed 5000 tokens. All reasoning tokens are billed as output tokens — account for this in cost projections.
Does R1 support tool calling?
R1 accepts the OpenAI tools array but is less reliable at tool calling than V3. For agent loops, use V3 as the tool-calling executor and invoke R1 only for hard planning sub-problems. This hybrid pattern gets the best of both.
Can I hide the reasoning trace from users?
Yes. Ignore reasoning_content server-side and return only content. You still pay for reasoning tokens because R1 has to generate them to reach the answer — there's no cheap "skip thinking" mode.