Streaming
Streaming
Real-time streaming via Server-Sent Events (SSE).
How It Works
CanaryLLM uses a queue-based streaming system. All requests go through the queue first, then you stream the results.
- Submit request with
stream: truetoPOST /api/llm/complete - Receive a
queueIdimmediately - Connect to
POST /api/llm/queue/streamwith the queueId to receive SSE events
Queue Lifecycle
queued→processing→completedA task can also end with error or cancelled status.
1. Submit with streaming
bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/complete \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4.1-mini",
"messages": [{ "role": "user", "content": "Write a haiku" }],
"stream": true
}'2. Connect to stream
bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/queue/stream \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{ "queueId": "abc123-def456" }'SSE Events
| Event | Description | Data |
|---|---|---|
start | Stream started | { queueId } |
chunk | Text or tool call chunk | { delta, finishReason, toolCallDeltas? } |
done | Stream completed | {} |
typescript
event: start
data: {"queueId":"abc123-def456"}
event: chunk
data: {"delta":"Autumn ","finishReason":null}
event: chunk
data: {"delta":"leaves ","finishReason":null}
event: chunk
data: {"delta":"falling","finishReason":"stop","metadata":{...}}
event: done
data: {}Streaming Tool Calls
When the model invokes tools during streaming, chunks include toolCallDeltas instead of delta. The final chunk has finishReason: "tool_calls".
typescript
event: start
data: {"queueId":"abc123-def456"}
event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"get_weather","arguments":""}}],"finishReason":null}
event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"function":{"arguments":"{\"loc"}}],"finishReason":null}
event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"function":{"arguments":"ation\":\"Brussels\"}"}}],"finishReason":null}
event: chunk
data: {"delta":"","finishReason":"tool_calls"}
event: done
data: {}Each toolCallDelta contains: index (tool call position), id (first chunk only), type (first chunk only), and function.arguments (streamed incrementally). Concatenate the argument chunks to reconstruct the full JSON.
TypeScript Example
typescript
const BASE_URL = "https://canaryllm.canarycoders.es/api/llm";
const API_KEY = "clk_live_your_key_here";
// 1. Submit request
const { data } = await fetch(`${BASE_URL}/complete`, {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
provider: "openai",
model: "gpt-4.1-mini",
messages: [{ role: "user", content: "Write a haiku" }],
stream: true,
}),
}).then(r => r.json());
// 2. Stream the result
const response = await fetch(`${BASE_URL}/queue/stream`, {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ queueId: data.queueId }),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (line.startsWith("data: ")) {
const json = JSON.parse(line.slice(6));
if (json.delta) process.stdout.write(json.delta);
}
}
}Cancellation
Cancel a running or queued task at any time:
bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/queue/cancel \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{ "queueId": "abc123-def456" }'The provider request will be aborted via AbortSignal if still in progress.