Compatible APIs

Drop-in compatible endpoints for OpenAI, Anthropic, and OpenAI Responses client libraries, SDKs, and tools like Claude Code, Codex, Cursor, and pi.dev.

Overview

CanaryLLM exposes three compatibility endpoints that let you use any supported provider through standard client libraries. All use the provider/model format to route requests to the right backend.

Endpoint	Protocol	Use Case
`POST /v1/chat/completions`	OpenAI Chat	OpenAI SDKs, LangChain, OpenCode, any OpenAI-compatible client
`POST /v1/messages`	Anthropic	Claude Code, Anthropic SDKs, any Anthropic-compatible client
`POST /v1/responses`	OpenAI Responses	Codex CLI, Cursor, OpenAI Responses SDK

Model Format

Both endpoints use provider/model as the model identifier. This routes your request through CanaryLLM to the specified provider.

text

anthropic/claude-sonnet-4-5
openai/gpt-4.1
gemini/gemini-2.5-flash
xai/grok-4-1-fast-non-reasoning
vertex/gemini-2.5-pro
perplexity/sonar-pro
lmstudio/qwen2.5-coder-32b

Claude Code Integration

Use CanaryLLM as a drop-in replacement for the Anthropic API in Claude Code. This lets you route Claude Code requests through any provider supported by CanaryLLM.

Setup

Set two environment variables and start Claude Code:

bash

export ANTHROPIC_BASE_URL=https://canaryllm.canarycoders.es
export ANTHROPIC_AUTH_TOKEN=clk_live_your_key_here

# Use any provider through CanaryLLM
claude --model anthropic/claude-sonnet-4-5
claude --model openai/gpt-4.1
claude --model gemini/gemini-2.5-flash

VS Code / Cursor Settings

If you use Claude Code in VS Code or Cursor, add these to your settings:

json

{
  "claude-code.environmentVariables": {
    "ANTHROPIC_BASE_URL": "https://canaryllm.canarycoders.es",
    "ANTHROPIC_AUTH_TOKEN": "clk_live_your_key_here"
  }
}

Note

Claude Code is context-heavy. Use a model with at least 25K context length for best results. When using local models via lmstudio/ or ollama/, ensure your model supports tool use (function calling).

pi.dev Integration

Pi Coding Agent is a terminal coding agent (@earendil-works/pi-coding-agent). Register CanaryLLM as a custom OpenAI-compatible provider in its models.json and route every Pi request through any provider supported by CanaryLLM.

Setup

Set the API key as an env var:

bash

export CANARYLLM_API_KEY=clk_live_your_key_here

Then add a canary provider to~/.pi/agent/models.json:

json

{
  "providers": {
    "canary": {
      "api": "openai-completions",
      "apiKey": "CANARYLLM_API_KEY",
      "baseUrl": "https://canaryllm.canarycoders.es/v1",
      "models": [
        {
          "id": "gemini/gemini-3.5-flash",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000001875, "output": 0.00001125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3.1-pro-preview",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.0000025, "output": 0.000015, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3-flash-preview",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000000625, "output": 0.00000375, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-3.1-flash-lite",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.0000003125, "output": 0.000001875, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-pro",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": true,
          "cost": { "input": 0.000001563, "output": 0.0000125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-flash",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": false,
          "cost": { "input": 0.000000375, "output": 0.000003125, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "gemini/gemini-2.5-flash-lite",
          "contextWindow": 1048576,
          "maxTokens": 65536,
          "input": ["text", "image"],
          "reasoning": false,
          "cost": { "input": 0.000000125, "output": 0.0000005, "cacheRead": 0, "cacheWrite": 0 }
        },
        {
          "id": "lmstudio/unsloth/qwen3.6-27b-mlx",
          "contextWindow": 32768,
          "maxTokens": 8192,
          "input": ["text"],
          "reasoning": false,
          "cost": { "input": 0.000001, "output": 0.000001, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

The apiKey field accepts either a literal value or an env-var name that Pi resolves at runtime. Prices include CanaryLLM's markup and reflect the rates at the time of writing — query /api/public/models for the live, authoritative values.

Usage

bash

# List the canary provider's models
pi --list-models canary

# One-shot call
pi -p --model canary/gemini/gemini-2.5-pro "Refactor this function."

# Three-segment model IDs work too — pi splits on the first slash only
pi --model canary/lmstudio/unsloth/qwen3.6-27b-mlx

Note

Pi uses the OpenAI Chat Completions endpoint, so the reasoning flag controls whether Pi exposes the thinking-level UI. Keep it false for all CanaryLLM models — provider-side reasoning (Gemini thinking, Claude thinking, etc.) is handled inside CanaryLLM, not as a separate mode in Pi.

Anthropic Messages API

POST /v1/messages — Full Anthropic Messages API compatibility with streaming and tool use.

Non-streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "What is CanaryLLM?" }
    ]
  }'

Streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ]
  }'

Streaming uses named SSE events: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop.

With Tools

bash

curl -X POST https://canaryllm.canarycoders.es/v1/messages \
  -H "x-api-key: clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "max_tokens": 1024,
    "tools": [{
      "name": "get_weather",
      "description": "Get weather for a location",
      "input_schema": {
        "type": "object",
        "properties": { "location": { "type": "string" } },
        "required": ["location"]
      }
    }],
    "messages": [
      { "role": "user", "content": "What is the weather in Brussels?" }
    ]
  }'

Parameters

Parameter	Type	Required
`model`	"provider/model"	Yes
`messages`	Message[]	Yes
`max_tokens`	integer	Yes
`system`	string \| TextBlock[]	No
`stream`	boolean	No
`temperature`	number (0-1)	No
`top_p`	number (0-1)	No
`stop_sequences`	string[]	No
`tools`	Tool[]	No
`tool_choice`	{type: "auto"\|"any"\|"tool"}	No

Error Format

Errors follow the Anthropic error format:

json

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens is required"
  }
}

OpenAI Chat Completions

POST /v1/chat/completions — Standard OpenAI Chat Completions API format.

Non-streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      { "role": "user", "content": "What is CanaryLLM?" }
    ]
  }'

Streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/chat/completions \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ]
  }'

Streaming uses standard SSE with data: lines, terminated by data: [DONE].

OpenAI SDK

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://canaryllm.canarycoders.es/v1",
  apiKey: "clk_live_your_key_here",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Parameters

Parameter	Type	Required
`model`	"provider/model"	Yes
`messages`	Message[]	Yes
`stream`	boolean	No
`temperature`	number (0-2)	No
`max_tokens`	integer	No
`top_p`	number (0-1)	No
`stop`	string \| string[]	No
`tools`	Tool[]	No
`tool_choice`	string \| object	No
`response_format`	{type: "text"\|"json_object"}	No

OpenAI Responses

POST /v1/responses — OpenAI Responses API compatibility. Used by Codex CLI 0.125+ (which dropped Chat Completions support), Cursor, and the OpenAI Responses SDK.

Non-streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/responses \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5",
    "input": "What is CanaryLLM?"
  }'

Streaming

bash

curl -X POST https://canaryllm.canarycoders.es/v1/responses \
  -H "Authorization: Bearer clk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5",
    "stream": true,
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [{ "type": "input_text", "text": "Write a haiku" }]
      }
    ],
    "instructions": "Be concise."
  }'

Streaming uses the canonical Responses event sequence:response.created →response.output_item.added →response.output_text.delta ×n →response.output_text.done →response.completed.

Codex CLI via fish wrapper

bash

# Set CANARYLLM_API_KEY in your shell, then:
canaryllm-codex exec 'Refactor this function for clarity.'

# Override model:
CANARYLLM_MODEL=anthropic/claude-sonnet-4-6 canaryllm-codex

The wrapper invokes codex with inline-c overrides — no edits to~/.codex/config.toml needed.

Parameters

Parameter	Type	Required
`model`	"provider/model"	Yes
`input`	string \| InputItem[]	Yes
`instructions`	string	No
`stream`	boolean	No
`temperature`	number (0-2)	No
`top_p`	number (0-1)	No
`max_output_tokens`	integer	No
`tools`	FunctionTool[]	No
`tool_choice`	"auto" \| "required" \| "none" \| {type:"function",name}	No
`parallel_tool_calls`	boolean	No
`metadata`	Record<string,string>	No

Built-in tools (web_search, file_search, computer_use, image_generation) and previous_response_id state are accepted but not yet executed — they pass through silently. Reasoning items in inputare tolerated and ignored. Function tools and function_call/function_call_output round-trips work end-to-end.

Authentication

All three compat endpoints accept your CanaryLLM API key via multiple headers:

Header	Format	Used By
`Authorization`	Bearer clk_live_...	OpenAI SDKs
`x-api-key`	clk_live_...	Anthropic SDKs, Claude Code