Compatible APIs
Compatible APIs
Drop-in compatible endpoints for OpenAI, Anthropic, and OpenAI Responses client libraries, SDKs, and tools like Claude Code, Codex, Cursor, and pi.dev.
Overview
CanaryLLM exposes three compatibility endpoints that let you use any supported provider through standard client libraries. All use the provider/model format to route requests to the right backend.
| Endpoint | Protocol | Use Case |
|---|---|---|
POST /v1/chat/completions | OpenAI Chat | OpenAI SDKs, LangChain, OpenCode, any OpenAI-compatible client |
POST /v1/messages | Anthropic | Claude Code, Anthropic SDKs, any Anthropic-compatible client |
POST /v1/responses | OpenAI Responses | Codex CLI, Cursor, OpenAI Responses SDK |
Model Format
Both endpoints use provider/model as the model identifier. This routes your request through CanaryLLM to the specified provider.
anthropic/claude-sonnet-4-5
openai/gpt-4.1
gemini/gemini-2.5-flash
xai/grok-4-1-fast-non-reasoning
vertex/gemini-2.5-pro
perplexity/sonar-pro
lmstudio/qwen2.5-coder-32bClaude Code Integration
Use CanaryLLM as a drop-in replacement for the Anthropic API in Claude Code. This lets you route Claude Code requests through any provider supported by CanaryLLM.
Setup
Set two environment variables and start Claude Code:
export ANTHROPIC_BASE_URL=https://canaryllm.canarycoders.es
export ANTHROPIC_AUTH_TOKEN=clk_live_your_key_here
# Use any provider through CanaryLLM
claude --model anthropic/claude-sonnet-4-5
claude --model openai/gpt-4.1
claude --model gemini/gemini-2.5-flashVS Code / Cursor Settings
If you use Claude Code in VS Code or Cursor, add these to your settings:
{
"claude-code.environmentVariables": {
"ANTHROPIC_BASE_URL": "https://canaryllm.canarycoders.es",
"ANTHROPIC_AUTH_TOKEN": "clk_live_your_key_here"
}
}Note
Claude Code is context-heavy. Use a model with at least 25K context length for best results. When using local models via lmstudio/ or ollama/, ensure your model supports tool use (function calling).
pi.dev Integration
Pi Coding Agent is a terminal coding agent (@earendil-works/pi-coding-agent). Register CanaryLLM as a custom OpenAI-compatible provider in its models.json and route every Pi request through any provider supported by CanaryLLM.
Setup
Set the API key as an env var:
export CANARYLLM_API_KEY=clk_live_your_key_hereThen add a canary provider to~/.pi/agent/models.json:
{
"providers": {
"canary": {
"api": "openai-completions",
"apiKey": "CANARYLLM_API_KEY",
"baseUrl": "https://canaryllm.canarycoders.es/v1",
"models": [
{
"id": "gemini/gemini-3.5-flash",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": true,
"cost": { "input": 0.000001875, "output": 0.00001125, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-3.1-pro-preview",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": true,
"cost": { "input": 0.0000025, "output": 0.000015, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-3-flash-preview",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": true,
"cost": { "input": 0.000000625, "output": 0.00000375, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-3.1-flash-lite",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": true,
"cost": { "input": 0.0000003125, "output": 0.000001875, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-2.5-pro",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": true,
"cost": { "input": 0.000001563, "output": 0.0000125, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-2.5-flash",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": false,
"cost": { "input": 0.000000375, "output": 0.000003125, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "gemini/gemini-2.5-flash-lite",
"contextWindow": 1048576,
"maxTokens": 65536,
"input": ["text", "image"],
"reasoning": false,
"cost": { "input": 0.000000125, "output": 0.0000005, "cacheRead": 0, "cacheWrite": 0 }
},
{
"id": "lmstudio/unsloth/qwen3.6-27b-mlx",
"contextWindow": 32768,
"maxTokens": 8192,
"input": ["text"],
"reasoning": false,
"cost": { "input": 0.000001, "output": 0.000001, "cacheRead": 0, "cacheWrite": 0 }
}
]
}
}
}The apiKey field accepts either a literal value or an env-var name that Pi resolves at runtime. Prices include CanaryLLM's markup and reflect the rates at the time of writing — query /api/public/models for the live, authoritative values.
Usage
# List the canary provider's models
pi --list-models canary
# One-shot call
pi -p --model canary/gemini/gemini-2.5-pro "Refactor this function."
# Three-segment model IDs work too — pi splits on the first slash only
pi --model canary/lmstudio/unsloth/qwen3.6-27b-mlxNote
Pi uses the OpenAI Chat Completions endpoint, so the reasoning flag controls whether Pi exposes the thinking-level UI. Keep it false for all CanaryLLM models — provider-side reasoning (Gemini thinking, Claude thinking, etc.) is handled inside CanaryLLM, not as a separate mode in Pi.
Anthropic Messages API
POST /v1/messages — Full Anthropic Messages API compatibility with streaming and tool use.
Non-streaming
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
-H "x-api-key: clk_live_your_key_here" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "What is CanaryLLM?" }
]
}'Streaming
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
-H "x-api-key: clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"max_tokens": 1024,
"stream": true,
"messages": [
{ "role": "user", "content": "Write a haiku" }
]
}'Streaming uses named SSE events: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.
With Tools
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
-H "x-api-key: clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"max_tokens": 1024,
"tools": [{
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}],
"messages": [
{ "role": "user", "content": "What is the weather in Brussels?" }
]
}'Parameters
| Parameter | Type | Required |
|---|---|---|
model | "provider/model" | Yes |
messages | Message[] | Yes |
max_tokens | integer | Yes |
system | string | TextBlock[] | No |
stream | boolean | No |
temperature | number (0-1) | No |
top_p | number (0-1) | No |
stop_sequences | string[] | No |
tools | Tool[] | No |
tool_choice | {type: "auto"|"any"|"tool"} | No |
Error Format
Errors follow the Anthropic error format:
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "max_tokens is required"
}
}OpenAI Chat Completions
POST /v1/chat/completions — Standard OpenAI Chat Completions API format.
Non-streaming
curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1",
"messages": [
{ "role": "user", "content": "What is CanaryLLM?" }
]
}'Streaming
curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"stream": true,
"messages": [
{ "role": "user", "content": "Write a haiku" }
]
}'Streaming uses standard SSE with data: lines, terminated by data: [DONE].
OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://canaryllm.canarycoders.es/v1",
apiKey: "clk_live_your_key_here",
});
const response = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4-5",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);Parameters
| Parameter | Type | Required |
|---|---|---|
model | "provider/model" | Yes |
messages | Message[] | Yes |
stream | boolean | No |
temperature | number (0-2) | No |
max_tokens | integer | No |
top_p | number (0-1) | No |
stop | string | string[] | No |
tools | Tool[] | No |
tool_choice | string | object | No |
response_format | {type: "text"|"json_object"} | No |
OpenAI Responses
POST /v1/responses — OpenAI Responses API compatibility. Used by Codex CLI 0.125+ (which dropped Chat Completions support), Cursor, and the OpenAI Responses SDK.
Non-streaming
curl -X POST https://canaryllm.canarycoders.es/v1/responses \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5",
"input": "What is CanaryLLM?"
}'Streaming
curl -X POST https://canaryllm.canarycoders.es/v1/responses \
-H "Authorization: Bearer clk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5",
"stream": true,
"input": [
{
"type": "message",
"role": "user",
"content": [{ "type": "input_text", "text": "Write a haiku" }]
}
],
"instructions": "Be concise."
}'Streaming uses the canonical Responses event sequence:response.created →response.output_item.added →response.output_text.delta ×n →response.output_text.done →response.completed.
Codex CLI via fish wrapper
# Set CANARYLLM_API_KEY in your shell, then:
canaryllm-codex exec 'Refactor this function for clarity.'
# Override model:
CANARYLLM_MODEL=anthropic/claude-sonnet-4-6 canaryllm-codexThe wrapper invokes codex with inline-c overrides — no edits to~/.codex/config.toml needed.
Parameters
| Parameter | Type | Required |
|---|---|---|
model | "provider/model" | Yes |
input | string | InputItem[] | Yes |
instructions | string | No |
stream | boolean | No |
temperature | number (0-2) | No |
top_p | number (0-1) | No |
max_output_tokens | integer | No |
tools | FunctionTool[] | No |
tool_choice | "auto" | "required" | "none" | {type:"function",name} | No |
parallel_tool_calls | boolean | No |
metadata | Record<string,string> | No |
Built-in tools (web_search, file_search, computer_use, image_generation) and previous_response_id state are accepted but not yet executed — they pass through silently. Reasoning items in inputare tolerated and ignored. Function tools and function_call/function_call_output round-trips work end-to-end.
Authentication
All three compat endpoints accept your CanaryLLM API key via multiple headers:
| Header | Format | Used By |
|---|---|---|
Authorization | Bearer clk_live_... | OpenAI SDKs |
x-api-key | clk_live_... | Anthropic SDKs, Claude Code |