Tool calling

Function calling

Pass tools to the chat-completions endpoint and the model can request that you call them. The wire format matches OpenAI — same tool_calls in the message, same role: tool replies on the way back.

Best models for tool use

  • DeepSeek V3 — production default for tool-calling agents. Reliable JSON args, low latency, no reasoning trace cluttering the response.
  • DeepSeek V4 Flash — also strong; thinks by default so you may want reasoning.enabled=false to keep tool selection fast.
  • Kimi K2.6 — agentic / planning workloads where tool chaining benefits from longer deliberation.
  • Avoid R1for pure tool calling — the chain-of-thought trace means you pay for tokens you don't need.

Python — single tool

python
import os, json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key=os.environ["QSP_KEY"],
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

resp = client.chat.completions.create(
    model="deepseek-v3",
    messages=messages,
    tools=tools,
)

msg = resp.choices[0].message
if msg.tool_calls:
    call = msg.tool_calls[0]
    args = json.loads(call.function.arguments)
    # ... call your tool ...
    result = {"city": args["city"], "temp_c": 22, "conditions": "clear"}
    messages.append(msg)
    messages.append({
        "role": "tool",
        "tool_call_id": call.id,
        "content": json.dumps(result),
    })
    final = client.chat.completions.create(model="deepseek-v3", messages=messages, tools=tools)
    print(final.choices[0].message.content)

Parallel tool calls

When the model wants to call multiple tools in one turn, it returns multiple entries in tool_calls. Resolve them in any order; reply with one role: tool message per call, each referencing the matching tool_call_id.

Streaming tool calls

With stream=true, tool calls arrive as deltas just like content. The function name is streamed once; arguments accumulate across chunks as JSON fragments. You typically buffer until finish_reason=tool_calls before executing.

Strict mode

For tighter schemas, set strict: true inside the function object. The model is constrained to produce arguments that match your JSON schema exactly — including refusing unknown fields. See Structured output for the equivalent on plain replies.