Skip to content

Compatible APIs

Compatible APIs

Drop-in compatible endpoints for OpenAI, Anthropic, and OpenAI Responses client libraries, SDKs, and tools like Claude Code, Codex, Cursor, and pi.dev.

Overview

CanaryLLM exposes three compatibility endpoints that let you use any supported provider through standard client libraries. All use the provider/model format to route requests to the right backend.

EndpointProtocolUse Case
POST /v1/chat/completionsOpenAI ChatOpenAI SDKs, LangChain, OpenCode, any OpenAI-compatible client
POST /v1/messagesAnthropicClaude Code, Anthropic SDKs, any Anthropic-compatible client
POST /v1/responsesOpenAI ResponsesCodex CLI, Cursor, OpenAI Responses SDK

Model Format

Both endpoints use provider/model as the model identifier. This routes your request through CanaryLLM to the specified provider.

text
anthropic/claude-sonnet-4-5
openai/gpt-4.1
gemini/gemini-2.5-flash
xai/grok-4-1-fast-non-reasoning
vertex/gemini-2.5-pro
perplexity/sonar-pro
lmstudio/qwen2.5-coder-32b

Claude Code Integration

Use CanaryLLM as a drop-in replacement for the Anthropic API in Claude Code. This lets you route Claude Code requests through any provider supported by CanaryLLM.

Setup

Set two environment variables and start Claude Code:

bash
export ANTHROPIC_BASE_URL=https://canaryllm.canarycoders.es
export ANTHROPIC_AUTH_TOKEN=clk_live_your_key_here

# Use any provider through CanaryLLM
claude --model anthropic/claude-sonnet-4-5
claude --model openai/gpt-4.1
claude --model gemini/gemini-2.5-flash

VS Code / Cursor Settings

If you use Claude Code in VS Code or Cursor, add these to your settings:

json
{
  "claude-code.environmentVariables": {
    "ANTHROPIC_BASE_URL": "https://canaryllm.canarycoders.es",
    "ANTHROPIC_AUTH_TOKEN": "clk_live_your_key_here"
  }
}

Note

Claude Code is context-heavy. Use a model with at least 25K context length for best results. When using local models via lmstudio/ or ollama/, ensure your model supports tool use (function calling).

pi.dev Integration

Pi Coding Agent is a terminal coding agent (@earendil-works/pi-coding-agent). Register CanaryLLM as a custom OpenAI-compatible provider in its models.json and route every Pi request through any provider supported by CanaryLLM.

Setup

Set the API key as an env var:

bash
export CANARYLLM_API_KEY=clk_live_your_key_here

Then add a canary provider to~/.pi/agent/models.json:

json
{
  "providers": {
    "canary": {
      "api": "openai-completions",
      "apiKey": "CANARYLLM_API_KEY",
      "baseUrl": "https://canaryllm.canarycoders.es/v1",
      "models": [
        {
          "id": "gemini/gemini-3.5-flash",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000001875, "output": 0.00001125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3.1-pro-preview",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.0000025, "output": 0.000015, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3-flash-preview",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000000625, "output": 0.00000375, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3.1-flash-lite",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.0000003125, "output": 0.000001875, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-pro",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000001563, "output": 0.0000125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-flash",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": false,
          "cost": { "input": 0.000000375, "output": 0.000003125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-flash-lite",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": false,
          "cost": { "input": 0.000000125, "output": 0.0000005, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "lmstudio/unsloth/qwen3.6-27b-mlx",
          "contextWindow": 32768,
          "maxTokens": 8192,
          "input": ["text"],
          "reasoning": false,
          "cost": { "input": 0.000001, "output": 0.000001, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

The apiKey field accepts either a literal value or an env-var name that Pi resolves at runtime. Prices include CanaryLLM's markup and reflect the rates at the time of writing — query /api/public/models for the live, authoritative values.

Usage

bash
# List the canary provider's models
pi --list-models canary

# One-shot call
pi -p --model canary/gemini/gemini-2.5-pro "Refactor this function."

# Three-segment model IDs work too — pi splits on the first slash only
pi --model canary/lmstudio/unsloth/qwen3.6-27b-mlx

Note

Pi uses the OpenAI Chat Completions endpoint, so the reasoning flag controls whether Pi exposes the thinking-level UI. Keep it false for all CanaryLLM models — provider-side reasoning (Gemini thinking, Claude thinking, etc.) is handled inside CanaryLLM, not as a separate mode in Pi.

Anthropic Messages API

POST /v1/messages — Full Anthropic Messages API compatibility with streaming and tool use.

Non-streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "What is CanaryLLM?" }
    ]
  }'

Streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ]
  }'

Streaming uses named SSE events: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.

With Tools

bash
curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "tools": [{
      "name": "get_weather",
      "description": "Get weather for a location",
      "input_schema": {
        "type": "object",
        "properties": { "location": { "type": "string" } },
        "required": ["location"]
      }
    }],
    "messages": [
      { "role": "user", "content": "What is the weather in Brussels?" }
    ]
  }'

Parameters

ParameterTypeRequired
model"provider/model"Yes
messagesMessage[]Yes
max_tokensintegerYes
systemstring | TextBlock[]No
streambooleanNo
temperaturenumber (0-1)No
top_pnumber (0-1)No
stop_sequencesstring[]No
toolsTool[]No
tool_choice{type: "auto"|"any"|"tool"}No

Error Format

Errors follow the Anthropic error format:

json
{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens is required"
  }
}

OpenAI Chat Completions

POST /v1/chat/completions — Standard OpenAI Chat Completions API format.

Non-streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      { "role": "user", "content": "What is CanaryLLM?" }
    ]
  }'

Streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ]
  }'

Streaming uses standard SSE with data: lines, terminated by data: [DONE].

OpenAI SDK

typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://canaryllm.canarycoders.es/v1",
  apiKey: "clk_live_your_key_here",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Parameters

ParameterTypeRequired
model"provider/model"Yes
messagesMessage[]Yes
streambooleanNo
temperaturenumber (0-2)No
max_tokensintegerNo
top_pnumber (0-1)No
stopstring | string[]No
toolsTool[]No
tool_choicestring | objectNo
response_format{type: "text"|"json_object"}No

OpenAI Responses

POST /v1/responses — OpenAI Responses API compatibility. Used by Codex CLI 0.125+ (which dropped Chat Completions support), Cursor, and the OpenAI Responses SDK.

Non-streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/responses \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5",
    "input": "What is CanaryLLM?"
  }'

Streaming

bash
curl -X POST https://canaryllm.canarycoders.es/v1/responses \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5",
    "stream": true,
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [{ "type": "input_text", "text": "Write a haiku" }]
      }
    ],
    "instructions": "Be concise."
  }'

Streaming uses the canonical Responses event sequence:response.createdresponse.output_item.addedresponse.output_text.delta ×n →response.output_text.doneresponse.completed.

Codex CLI via fish wrapper

bash
# Set CANARYLLM_API_KEY in your shell, then:
canaryllm-codex exec 'Refactor this function for clarity.'

# Override model:
CANARYLLM_MODEL=anthropic/claude-sonnet-4-6 canaryllm-codex

The wrapper invokes codex with inline-c overrides — no edits to~/.codex/config.toml needed.

Parameters

ParameterTypeRequired
model"provider/model"Yes
inputstring | InputItem[]Yes
instructionsstringNo
streambooleanNo
temperaturenumber (0-2)No
top_pnumber (0-1)No
max_output_tokensintegerNo
toolsFunctionTool[]No
tool_choice"auto" | "required" | "none" | {type:"function",name}No
parallel_tool_callsbooleanNo
metadataRecord<string,string>No

Built-in tools (web_search, file_search, computer_use, image_generation) and previous_response_id state are accepted but not yet executed — they pass through silently. Reasoning items in inputare tolerated and ignored. Function tools and function_call/function_call_output round-trips work end-to-end.

Authentication

All three compat endpoints accept your CanaryLLM API key via multiple headers:

HeaderFormatUsed By
AuthorizationBearer clk_live_...OpenAI SDKs
x-api-keyclk_live_...Anthropic SDKs, Claude Code