TypeScript SDK

The official client for Node 18+ and Bun. One typed client over the whole API, with the queue polling, SSE streaming, retries, and error typing handled for you.

Install

bash

bun add @canarycoders/canaryllm
# or: npm install @canarycoders/canaryllm

Zero runtime dependencies. Ships ESM, CommonJS, and type declarations.

First request

chat.complete submits the request and polls until the result is ready, so you get the answer back from a single await.

typescript

import CanaryLLM from "@canarycoders/canaryllm";

const client = new CanaryLLM({ apiKey: process.env.CANARYLLM_API_KEY });

const res = await client.chat.complete({
  provider: "openai",
  model: "gpt-4.1-mini",
  messages: [{ role: "user", content: "Hello, world!" }],
});

console.log(res.content, res.usage.totalTokens);

apiKey defaults to CANARYLLM_API_KEY and baseURL to the hosted gateway.

Streaming

Events are normalized across providers: text, thinking, tool_call, usage, done. A server error mid-stream throws out of the loop as a typed error.

typescript

for await (const event of client.chat.stream({
  provider: "anthropic",
  model: "claude-sonnet-4-5",
  messages: [{ role: "user", content: "Write a haiku" }],
})) {
  if (event.type === "text") process.stdout.write(event.delta);
}

Long-running jobs

Image, video, audio, and vision calls run through the queue. The default method waits for the result; the *Job / submit variant hands back the job so you can track it out of band.

typescript

// Wait inline
const img = await client.images.generate({
  provider: "openai",
  prompt: "a yellow canary on a branch",
});

// Or hold the handle
const job = await client.images.generateJob({ provider: "openai", prompt: "..." });
console.log(job.id);
const result = await job.result();

// Cancelling the await also cancels the job server-side
const ctrl = new AbortController();
const pending = client.video.generate({ provider: "gemini", prompt: "..." }, { signal: ctrl.signal });
ctrl.abort();

Embeddings

Embed text into vectors with a local model for customer-side RAG. The gateway stores nothing — keep the vectors in your own store. For the full ingest/retrieve pipeline, see the @canarycoders/canaryllm-rag toolkit.

typescript

const { embeddings, dimensions } = await client.embeddings.create({
  provider: "lmstudio",
  model: "nomic-embed-text-v1.5",
  input: ["first chunk of text", "second chunk of text"],
});

Errors

Every failure is a subclass of APIError (AuthenticationError, RateLimitError, BadRequestError, and so on). Branch on .code or .status, not the message. Transient failures retry automatically with backoff.

typescript

import { RateLimitError } from "@canarycoders/canaryllm";

try {
  await client.chat.complete({ provider: "openai", messages });
} catch (err) {
  if (err instanceof RateLimitError) {
    console.warn("slow down", err.retryAfterMs);
  } else {
    throw err;
  }
}

Realtime and conversational agents

The SDK runs on your backend and mints a short-lived credential. The browser opens the actual connection, so the API key never leaves the server.

typescript

import { toBrokeredCredential } from "@canarycoders/canaryllm";

// backend
const session = await client.realtime.sessions.create({ kind: "voice", voice: "alloy" });
return toBrokeredCredential(session); // safe to send to the browser

On the browser, import connectRealtime from @canarycoders/canaryllm/realtime-client.

Drop-in OpenAI / Anthropic

Already using the official SDKs? Point them at the gateway and keep your code. Use the provider/modelId format for the model.

typescript

import OpenAI from "openai";

const oa = new OpenAI(client.compat.openai());
await oa.chat.completions.create({
  model: "openai/gpt-4.1-mini",
  messages,
  stream: true,
});

npm: @canarycoders/canaryllm

Source: github.com/CanaryCoders/canaryllm-sdk