What is DeepSeek R1 good for?

DeepSeek R1 is a reasoning model trained with reinforcement learning to produce explicit chain-of-thought before answering. It excels at math (AIME, MATH), competitive programming (Codeforces), logic puzzles, formal proofs, and multi-step planning. For tasks where the answer's quality depends on the reasoning process, R1 outperforms non-reasoning models like DeepSeek V3 at the cost of 3-5x more output tokens.

How does DeepSeek R1 pricing compare to OpenAI o1?

OpenAI o1 costs $15 per million input tokens and $60 per million output tokens. DeepSeek R1 on QuickSilver Pro costs $0.56 input and $2.00 output per million tokens. For the same workload, R1 is ~27x cheaper on input and ~30x cheaper on output — with comparable math and coding benchmark performance.

How do I access the reasoning trace?

DeepSeek R1 returns a reasoning_content field in the message object alongside content. reasoning_content holds the chain-of-thought trace; content holds the final answer. Both are billed as output tokens. If you only need the answer, you can discard reasoning_content — the cost is the same.

Is R1 overkill for simple questions?

Yes. R1 generates a long chain-of-thought even for trivial questions, which is wasted output cost. For factual Q&A, simple summarization, or casual chat, use DeepSeek V3 ($0.616 per 1M output) instead of R1 ($2.00 per 1M output). Reserve R1 for problems where the reasoning step materially changes the answer quality.

Home/Use cases/deepseek-r1 for reasoning

Use case

DeepSeek R1 for reasoning

DeepSeek R1 is an open-source reasoning model trained with RL to emit explicit chain-of-thought. It's competitive with OpenAI o1 on AIME and MATH benchmarks, while costing ~30x less: $0.56 input / $2.00 output per 1M tokens on QuickSilver Pro vs o1's $15 / $60. For math, code challenges, and logic-heavy agent loops, R1 is the open-source default.

$0.56 / $2.00 per 1M tokens

What R1 is good at

Math: Strong on AIME-2024, MATH-500, and Olympiad-level problems. The reasoning trace walks through derivations; final answer appears in content.

Algorithms: Competitive-programming-grade code generation. LiveCodeBench and Codeforces benchmark scores rival o1. Better than V3 for novel-algorithm tasks; slower because of CoT.

Multi-step planning: Useful in agent loops where the planner needs to decompose before acting. Each planning call has explicit reasoning, which improves tool-use decisions.

When R1 is worth the extra tokens

Use R1 for: math word problems, novel algorithm design, logic puzzles, theorem proving, multi-step tool planning, hard debugging. Tasks where the reasoning step is where the model earns its keep.

Skip R1 for: factual Q&A, code completion, summarization, entity extraction, simple classification, translation. V3 is cheaper, faster, and quality is equivalent on non-reasoning tasks.

Cost calibration: a 2000-word essay takes V3 ~600 output tokens ($0.37/1000 essays). R1 on the same essay takes ~2500 output tokens including reasoning trace ($5.00/1000 essays). 13x premium. Reserve R1 for when that premium buys something.

Quickstart code

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key="sk-qsp-...",
)

resp = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{
        "role": "user",
        "content": "A box has 12 red and 8 blue balls. Three drawn without replacement. Probability exactly two are red?",
    }],
)

# Chain-of-thought reasoning:
print(resp.choices[0].message.reasoning_content)

# Final answer:
print(resp.choices[0].message.content)

print(f"Output tokens: {resp.usage.completion_tokens}")
print(f"Cost: ${resp.usage.cost:.6f}")

FAQ

Is DeepSeek R1 as good as o1?

On published math (AIME-2024, MATH-500), coding (LiveCodeBench, Codeforces), and reasoning (GPQA Diamond) benchmarks, DeepSeek R1 is within a few points of o1 and exceeds o1-mini on most. For production use at 30x lower cost, it's the open-source equivalent.

How long are the reasoning traces?

Typical range is 500-3000 tokens. For hard problems (IMO-grade math), traces can exceed 5000 tokens. All reasoning tokens are billed as output tokens — account for this in cost projections.

Does R1 support tool calling?

R1 accepts the OpenAI tools array but is less reliable at tool calling than V3. For agent loops, use V3 as the tool-calling executor and invoke R1 only for hard planning sub-problems. This hybrid pattern gets the best of both.

Can I hide the reasoning trace from users?

Yes. Ignore reasoning_content server-side and return only content. You still pay for reasoning tokens because R1 has to generate them to reach the answer — there's no cheap "skip thinking" mode.

Try it with double credits — up to $50 free

Get API Key