TypeScript SDK
TypeScript SDK
The official client for Node 18+ and Bun. One typed client over the whole API, with the queue polling, SSE streaming, retries, and error typing handled for you.
Install
bun add @canarycoders/canaryllm
# or: npm install @canarycoders/canaryllmZero runtime dependencies. Ships ESM, CommonJS, and type declarations.
First request
chat.complete submits the request and polls until the result is ready, so you get the answer back from a single await.
import CanaryLLM from "@canarycoders/canaryllm";
const client = new CanaryLLM({ apiKey: process.env.CANARYLLM_API_KEY });
const res = await client.chat.complete({
provider: "openai",
model: "gpt-4.1-mini",
messages: [{ role: "user", content: "Hello, world!" }],
});
console.log(res.content, res.usage.totalTokens);apiKey defaults to CANARYLLM_API_KEY and baseURL to the hosted gateway.
Streaming
Events are normalized across providers: text, thinking, tool_call, usage, done. A server error mid-stream throws out of the loop as a typed error.
for await (const event of client.chat.stream({
provider: "anthropic",
model: "claude-sonnet-4-5",
messages: [{ role: "user", content: "Write a haiku" }],
})) {
if (event.type === "text") process.stdout.write(event.delta);
}Long-running jobs
Image, video, audio, and vision calls run through the queue. The default method waits for the result; the *Job / submit variant hands back the job so you can track it out of band.
// Wait inline
const img = await client.images.generate({
provider: "openai",
prompt: "a yellow canary on a branch",
});
// Or hold the handle
const job = await client.images.generateJob({ provider: "openai", prompt: "..." });
console.log(job.id);
const result = await job.result();
// Cancelling the await also cancels the job server-side
const ctrl = new AbortController();
const pending = client.video.generate({ provider: "gemini", prompt: "..." }, { signal: ctrl.signal });
ctrl.abort();Embeddings
Embed text into vectors with a local model for customer-side RAG. The gateway stores nothing — keep the vectors in your own store. For the full ingest/retrieve pipeline, see the @canarycoders/canaryllm-rag toolkit.
const { embeddings, dimensions } = await client.embeddings.create({
provider: "lmstudio",
model: "nomic-embed-text-v1.5",
input: ["first chunk of text", "second chunk of text"],
});Errors
Every failure is a subclass of APIError (AuthenticationError, RateLimitError, BadRequestError, and so on). Branch on .code or .status, not the message. Transient failures retry automatically with backoff.
import { RateLimitError } from "@canarycoders/canaryllm";
try {
await client.chat.complete({ provider: "openai", messages });
} catch (err) {
if (err instanceof RateLimitError) {
console.warn("slow down", err.retryAfterMs);
} else {
throw err;
}
}Realtime and conversational agents
The SDK runs on your backend and mints a short-lived credential. The browser opens the actual connection, so the API key never leaves the server.
import { toBrokeredCredential } from "@canarycoders/canaryllm";
// backend
const session = await client.realtime.sessions.create({ kind: "voice", voice: "alloy" });
return toBrokeredCredential(session); // safe to send to the browserOn the browser, import connectRealtime from @canarycoders/canaryllm/realtime-client.
Drop-in OpenAI / Anthropic
Already using the official SDKs? Point them at the gateway and keep your code. Use the provider/modelId format for the model.
import OpenAI from "openai";
const oa = new OpenAI(client.compat.openai());
await oa.chat.completions.create({
model: "openai/gpt-4.1-mini",
messages,
stream: true,
});