DeepSeek R1 for reasoning
DeepSeek R1 is an open-source reasoning model trained with RL to emit explicit chain-of-thought. It's competitive with OpenAI o1 on AIME and MATH benchmarks, while costing ~30x less: $0.56 input / $2.00 output per 1M tokens on QuickSilver Pro vs o1's $15 / $60. For math, code challenges, and logic-heavy agent loops, R1 is the open-source default.
What R1 is good at
Math: Strong on AIME-2024, MATH-500, and Olympiad-level problems. The reasoning trace walks through derivations; final answer appears in content.
Algorithms: Competitive-programming-grade code generation. LiveCodeBench and Codeforces benchmark scores rival o1. Better than V3 for novel-algorithm tasks; slower because of CoT.
Multi-step planning: Useful in agent loops where the planner needs to decompose before acting. Each planning call has explicit reasoning, which improves tool-use decisions.
When R1 is worth the extra tokens
Use R1 for: math word problems, novel algorithm design, logic puzzles, theorem proving, multi-step tool planning, hard debugging. Tasks where the reasoning step is where the model earns its keep.
Skip R1 for: factual Q&A, code completion, summarization, entity extraction, simple classification, translation. V3 is cheaper, faster, and quality is equivalent on non-reasoning tasks.
Cost calibration: a 2000-word essay takes V3 ~600 output tokens ($0.37/1000 essays). R1 on the same essay takes ~2500 output tokens including reasoning trace ($5.00/1000 essays). 13x premium. Reserve R1 for when that premium buys something.
Quickstart code
from openai import OpenAI
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key="sk-qsp-...",
)
resp = client.chat.completions.create(
model="deepseek-r1",
messages=[{
"role": "user",
"content": "A box has 12 red and 8 blue balls. Three drawn without replacement. Probability exactly two are red?",
}],
)
# Chain-of-thought reasoning:
print(resp.choices[0].message.reasoning_content)
# Final answer:
print(resp.choices[0].message.content)
print(f"Output tokens: {resp.usage.completion_tokens}")
print(f"Cost: ${resp.usage.cost:.6f}")FAQ
On published math (AIME-2024, MATH-500), coding (LiveCodeBench, Codeforces), and reasoning (GPQA Diamond) benchmarks, DeepSeek R1 is within a few points of o1 and exceeds o1-mini on most. For production use at 30x lower cost, it's the open-source equivalent.
Typical range is 500-3000 tokens. For hard problems (IMO-grade math), traces can exceed 5000 tokens. All reasoning tokens are billed as output tokens — account for this in cost projections.
R1 accepts the OpenAI tools array but is less reliable at tool calling than V3. For agent loops, use V3 as the tool-calling executor and invoke R1 only for hard planning sub-problems. This hybrid pattern gets the best of both.
Yes. Ignore reasoning_content server-side and return only content. You still pay for reasoning tokens because R1 has to generate them to reach the answer — there's no cheap "skip thinking" mode.