Skip to content

Streaming

Streaming

Real-time streaming via Server-Sent Events (SSE).

How It Works

CanaryLLM uses a queue-based streaming system. All requests go through the queue first, then you stream the results.

  1. Submit request with stream: true to POST /api/llm/complete
  2. Receive a queueId immediately
  3. Connect to POST /api/llm/queue/stream with the queueId to receive SSE events

Queue Lifecycle

queuedprocessingcompleted

A task can also end with error or cancelled status.

1. Submit with streaming

bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/complete \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "messages": [{ "role": "user", "content": "Write a haiku" }],
    "stream": true
  }'

2. Connect to stream

bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/queue/stream \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{ "queueId": "abc123-def456" }'

SSE Events

EventDescriptionData
startStream started{ queueId }
chunkText or tool call chunk{ delta, finishReason, toolCallDeltas? }
doneStream completed{}
typescript
event: start
data: {"queueId":"abc123-def456"}

event: chunk
data: {"delta":"Autumn ","finishReason":null}

event: chunk
data: {"delta":"leaves ","finishReason":null}

event: chunk
data: {"delta":"falling","finishReason":"stop","metadata":{...}}

event: done
data: {}

Streaming Tool Calls

When the model invokes tools during streaming, chunks include toolCallDeltas instead of delta. The final chunk has finishReason: "tool_calls".

typescript
event: start
data: {"queueId":"abc123-def456"}

event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"get_weather","arguments":""}}],"finishReason":null}

event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"function":{"arguments":"{\"loc"}}],"finishReason":null}

event: chunk
data: {"delta":"","toolCallDeltas":[{"index":0,"function":{"arguments":"ation\":\"Brussels\"}"}}],"finishReason":null}

event: chunk
data: {"delta":"","finishReason":"tool_calls"}

event: done
data: {}

Each toolCallDelta contains: index (tool call position), id (first chunk only), type (first chunk only), and function.arguments (streamed incrementally). Concatenate the argument chunks to reconstruct the full JSON.

TypeScript Example

typescript
const BASE_URL = "https://canaryllm.canarycoders.es/api/llm";
const API_KEY = "clk_live_your_key_here";

// 1. Submit request
const { data } = await fetch(`${BASE_URL}/complete`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    provider: "openai",
    model: "gpt-4.1-mini",
    messages: [{ role: "user", content: "Write a haiku" }],
    stream: true,
  }),
}).then(r => r.json());

// 2. Stream the result
const response = await fetch(`${BASE_URL}/queue/stream`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ queueId: data.queueId }),
});

const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() || "";

  for (const line of lines) {
    if (line.startsWith("data: ")) {
      const json = JSON.parse(line.slice(6));
      if (json.delta) process.stdout.write(json.delta);
    }
  }
}

Cancellation

Cancel a running or queued task at any time:

bash
curl -X POST https://canaryllm.canarycoders.es/api/llm/queue/cancel \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{ "queueId": "abc123-def456" }'

The provider request will be aborted via AbortSignal if still in progress.