Streaming

Server-sent events

Set stream=true and the response is delivered as OpenAI-shaped SSE chunks. The wire format and event semantics match OpenAI exactly — the official SDKs handle the parsing.

Python (SDK)

python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.quicksilverpro.io/v1",
    api_key=os.environ["QSP_KEY"],
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Write a haiku about gradients."}],
    stream=True,
    reasoning={"enabled": False},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Node.js (SDK)

typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.quicksilverpro.io/v1",
  apiKey: process.env.QSP_KEY,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a haiku about gradients." }],
  stream: true,
  reasoning: { enabled: false },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Wire format (raw)

If you're writing your own client, each event is a data:-prefixed JSON line followed by a blank line. The final event is data: [DONE].

shell
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1715800000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1715800000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"Mercury"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1715800000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":" drips"}}]}

...

data: [DONE]

Thinking models

For V4 Flash, V4 Pro, Qwen 3.6, Kimi K2.6, and R1, the reasoning trace streams in a separate field. The shape depends on the model:

  • R1 / V4 wave: delta.reasoning contains chain-of-thought deltas, delta.content contains the visible answer.
  • For V4 wave non-reasoning chat, pass reasoning: { enabled: false } in the request body to suppress the trace and get V3-style replies.

Buffering pitfalls

  • If you proxy through nginx, set proxy_buffering off on the upstream block — otherwise the buffer holds all chunks until the response finishes, defeating streaming.
  • In Next.js API routes, return ReadableStream with Content-Type: text/event-stream — the App Router handles it natively; for the pages router you may need res.flushHeaders().
  • CloudFront / CDN edges sometimes buffer SSE. Stream from a non-cached path or set Cache-Control: no-cache + X-Accel-Buffering: no.