API Reference

Complete reference for all CanaryLLM endpoints.

OpenAPI Spec (YAML)— Import in Postman, Swagger UI, or use for SDK generation

Completions

POST/api/llm/complete

Submit an LLM completion request to the queue.

Request Body

json

{
  "provider": "openai",
  "model": "gpt-4.1-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain quantum computing in one sentence."
    }
  ],
  "temperature": 0.7,
  "maxTokens": 1024,
  "stream": false,
  "responseFormat": "text",
  "tag": "customer-support"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}

Queue Management

POST/api/llm/queue/status

Check the status of a queued task.

Request Body

json

{ "queueId": "abc123-def456" }

Response

json

{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "processing",
    "position": 0,
    "createdAt": "2025-01-15T10:00:00Z",
    "startedAt": "2025-01-15T10:00:01Z"
  }
}

POST/api/llm/queue/result

Retrieve the result of a completed task. Returns 202 if still processing.

Request Body

json

{ "queueId": "abc123-def456" }

Response

json

{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "completed",
    "result": {
      "content": "Quantum computing uses quantum bits...",
      "usage": {
        "inputTokens": 25,
        "outputTokens": 42,
        "totalTokens": 67
      },
      "model": "gpt-4.1-mini",
      "provider": "openai",
      "requestId": "req_789",
      "finishReason": "stop"
    }
  }
}

POST/api/llm/queue/stream

Stream the result of a queued task via Server-Sent Events (SSE).

Request Body

json

{ "queueId": "abc123-def456" }

Response

json

event: start
data: {"queueId":"abc123-def456"}

event: chunk
data: {"delta":"Quantum ","finishReason":null}

event: chunk
data: {"delta":"computing ","finishReason":null}

event: done
data: {}

POST/api/llm/queue/cancel

Cancel a queued or processing task.

Request Body

json

{ "queueId": "abc123-def456" }

Response

json

{
  "success": true,
  "data": { "queueId": "abc123-def456", "status": "cancelled" },
  "message": "Task cancelled successfully"
}

Media Generation

POST/api/llm/generate-image

Queue an image generation request.

Request Body

json

{
  "provider": "openai",
  "prompt": "A sunset over mountains",
  "model": "gpt-image-1",
  "n": 1,
  "size": "1024x1024",
  "quality": "hd",
  "tag": "marketing"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "img_abc123",
    "status": "queued",
    "message": "Image generation task added to queue"
  }
}

POST/api/llm/generate-video

Queue a video generation request. Available for Gemini, Vertex, and xAI.

Request Body

json

{
  "provider": "gemini",
  "prompt": "A timelapse of clouds moving over a city",
  "model": "veo-3.1-generate-preview",
  "aspectRatio": "16:9",
  "durationSeconds": 8,
  "tag": "content-creation"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "vid_abc123",
    "status": "queued",
    "message": "Video generation task added to queue"
  }
}

POST/api/llm/generate-sound-effect

Queue a sound effect generation request. Powered by ElevenLabs.

Request Body

json

{
  "text": "thunder rolling in the distance, rain on a tin roof",
  "model": "eleven_text_to_sound_v2",
  "durationSeconds": 10,
  "promptInfluence": 0.5,
  "tag": "ambient"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Sound effect task added to queue"
  }
}

POST/api/llm/generate-music

Queue a music generation request. Powered by ElevenLabs.

Request Body

json

{
  "prompt": "upbeat jazz jingle, bright piano and saxophone",
  "model": "music_v1",
  "durationMs": 10000,
  "forceInstrumental": true,
  "tag": "jingle"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Music generation task added to queue"
  }
}

Video Upload

POST/api/llm/upload-video

Upload a video file for use in chat completions. Returns a fileId valid for 1 hour. Max 100MB. Send as multipart/form-data with a 'video' field.

Request Body

json

// multipart/form-data
// Field: "video" (file)
// Accepted types: video/mp4, video/webm, video/mov, video/quicktime, video/mpeg, video/avi

curl -X POST /api/llm/upload-video \
  -H "Authorization: Bearer <api-key>" \
  -F "video=@recording.mp4"

Response

json

{
  "success": true,
  "data": {
    "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "mimeType": "video/mp4",
    "sizeBytes": 15728640,
    "expiresAt": "2026-02-13T15:00:00.000Z"
  }
}

Using Video in Completions

Videos can be included in message content as multipart arrays. Use inline base64 for small videos or a fileId from the upload endpoint for larger files. Supported by Gemini and Vertex providers.

POST/api/llm/complete

Completion with video input (via fileId).

Request Body

json

{
  "provider": "gemini",
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe what happens in this video." },
        { "type": "video", "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "mimeType": "video/mp4" }
      ]
    }
  ]
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "vid_input_abc123",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}

POST/api/llm/complete

Completion with inline video (small files, base64).

Request Body

json

{
  "provider": "gemini",
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this clip?" },
        { "type": "video", "data": "<base64_encoded_video>", "mimeType": "video/mp4" }
      ]
    }
  ]
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "vid_input_def456",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}

Audio (TTS / STT)

POST/api/llm/generate-audio

Queue a text-to-speech request. Supports ElevenLabs and MLX Audio (Local).

Request Body

json

// ElevenLabs (basic)
{
  "provider": "elevenlabs",
  "text": "Welcome to your interview.",
  "model": "eleven_flash_v2_5",
  "voiceId": "JBFqnCBsd6RMkjVDRZzb",
  "outputFormat": "mp3_44100_128",
  "tag": "interview"
}

// ElevenLabs (with voice settings)
{
  "provider": "elevenlabs",
  "text": "Welcome to your interview.",
  "model": "eleven_multilingual_v2",
  "voiceId": "JBFqnCBsd6RMkjVDRZzb",
  "outputFormat": "mp3_44100_192",
  "voiceSettings": {
    "stability": 0.5,
    "similarityBoost": 0.75,
    "style": 0.3,
    "useSpeakerBoost": true
  },
  "languageCode": "nl",
  "applyTextNormalization": "auto",
  "previousRequestIds": ["req_abc123"],
  "tag": "interview"
}

// MLX Audio (Local, free)
{
  "provider": "mlxaudio",
  "text": "Welkom bij het interview.",
  "model": "kokoro",
  "voiceId": "af_heart",
  "tag": "interview"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "TTS task added to queue"
  }
}

POST/api/llm/transcribe

Queue a speech-to-text transcription request. Supports ElevenLabs Scribe and MLX Audio (Local).

Request Body

json

// ElevenLabs with diarization
{
  "provider": "elevenlabs",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/mpeg",
  "model": "scribe_v2",
  "language": "nl",
  "tag": "interview",
  "diarize": true,
  "numSpeakers": 2,
  "timestampsGranularity": "word",
  "tagAudioEvents": true
}

// ElevenLabs (basic, no diarization)
{
  "provider": "elevenlabs",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/mpeg",
  "model": "scribe_v2"
}

// MLX Audio (Local, free)
{
  "provider": "mlxaudio",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/wav",
  "model": "whisper-large-v3",
  "language": "nl",
  "tag": "transcription"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "STT task added to queue"
  }
}

// Completed task result (with diarization):
{
  "text": "Hello, how are you? I'm fine, thanks.",
  "language": "en",
  "model": "scribe_v2",
  "provider": "elevenlabs",
  "requestId": "req_abc123",
  "words": [
    { "text": "Hello,", "start": 0.08, "end": 0.54, "type": "word", "speakerId": "speaker_0" },
    { "text": "how", "start": 0.56, "end": 0.72, "type": "word", "speakerId": "speaker_0" },
    { "text": "I'm", "start": 1.2, "end": 1.4, "type": "word", "speakerId": "speaker_1" }
  ]
}

Embeddings

POST/api/llm/embeddings

Queue an embeddings request. Embeds one string or an array of strings (max 2048) via a local model (LM Studio). Content is processed transiently and never stored. Poll /api/llm/queue/result for the vectors.

Request Body

json

{
  "provider": "lmstudio",
  "model": "nomic-embed-text-v1.5",
  "input": ["first chunk of text", "second chunk of text"],
  "dimensions": 768,
  "tag": "kb:contracts"
}

Response

json

{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Embedding task added to queue"
  }
}

The queue result holds { embeddings: number[][], model, provider, dimensions, usage }. For a synchronous, OpenAI-compatible call use POST /v1/embeddings with a provider/model id. See the Embeddings guide for ingestion and retrieval patterns.

Agents

POST/api/agents/signed-url

Generate a signed URL for a conversational AI agent.

Request Body

json

{ "agentId": "your-agent-id" }

Response

json

{
  "success": true,
  "data": {
    "signedUrl": "wss://...",
    "expiresIn": 900
  }
}

Discovery

GET/api/llm/providers

List all available providers.

Response

json

{
  "success": true,
  "data": {
    "providers": ["gemini", "vertex", "openai", "anthropic", "xai", "perplexity", "lmstudio", "elevenlabs", "mlxaudio", "ollama"]
  }
}

GET/api/llm/models?provider=openai

List all models for a specific provider.

Response

json

{
  "success": true,
  "data": {
    "provider": "openai",
    "models": [
      {
        "id": "gpt-4.1",
        "name": "GPT-4.1",
        "contextWindow": 1047576,
        "maxOutputTokens": 32768,
        "inputCostPer1k": 0.002,
        "outputCostPer1k": 0.008,
        "capabilities": ["chat"]
      }
    ]
  }
}

GET/api/llm/voices?provider=elevenlabs

List available voices with preview URLs.

Response

json

{
  "success": true,
  "data": {
    "provider": "elevenlabs",
    "voices": [
      {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb",
        "name": "George",
        "category": "premade",
        "preview_url": "https://storage.googleapis.com/...",
        "labels": { "accent": "British", "age": "middle-aged", "gender": "male" }
      }
    ]
  }
}

GET/api/llm/capabilities

List capabilities (image/video/audio) per provider.

Response

json

{
  "success": true,
  "data": {
    "openai": {
      "imageGeneration": true,
      "videoGeneration": false,
      "textToSpeech": false,
      "speechToText": false,
      "models": [
        { "id": "gpt-image-1", "name": "GPT Image 1", "capabilities": ["image-generation"] }
      ]
    },
    "elevenlabs": {
      "imageGeneration": false,
      "videoGeneration": false,
      "textToSpeech": true,
      "speechToText": true,
      "soundEffects": true,
      "musicGeneration": true,
      "models": [
        { "id": "eleven_flash_v2_5", "capabilities": ["text-to-speech"] },
        { "id": "scribe_v1", "capabilities": ["speech-to-text"] },
        { "id": "eleven_text_to_sound_v2", "capabilities": ["sound-effect"] },
        { "id": "music_v1", "capabilities": ["music-generation"] }
      ]
    },
    "mlxaudio": {
      "imageGeneration": false,
      "videoGeneration": false,
      "textToSpeech": true,
      "speechToText": true,
      "models": [
        { "id": "kokoro", "capabilities": ["text-to-speech"] },
        { "id": "whisper-large-v3", "capabilities": ["speech-to-text"] }
      ]
    }
  }
}

Concurrency

GET/api/llm/concurrency

View concurrency limits and active requests for all providers.

Response

json

{
  "success": true,
  "data": {
    "openai": { "activeRequests": 2, "queuedRequests": 5, "limit": 10 },
    "gemini": { "activeRequests": 0, "queuedRequests": 0, "limit": 15 }
  }
}

GET/api/llm/concurrency/:provider

View concurrency status for a specific provider.

Response

json

{
  "success": true,
  "data": {
    "provider": "openai",
    "activeRequests": 2,
    "queuedRequests": 5,
    "limit": 10
  }
}

Usage

GET/api/llm/usage

Get current month usage summary with breakdown per provider.

Response

json

{
  "success": true,
  "data": {
    "totalRequests": 1250,
    "totalTokens": 3450000,
    "cost": 12.45,
    "providers": {
      "openai": { "requests": 800, "tokens": 2100000, "cost": 8.40 },
      "gemini": { "requests": 450, "tokens": 1350000, "cost": 4.05 }
    }
  }
}

GET/api/llm/usage/monthly

Get monthly usage for the last 12 months.

Response

json

{
  "success": true,
  "data": [
    {
      "year": 2026,
      "month": 2,
      "provider": "openai",
      "totalRequests": 1250,
      "totalTokens": 3450000,
      "cost": 12.45
    }
  ]
}

GET/api/llm/usage/daily

Get daily usage for the last 30 days.

Response

json

{
  "success": true,
  "data": [
    {
      "date": "2026-02-09",
      "provider": "openai",
      "totalRequests": 42,
      "totalTokens": 125000,
      "cost": 0.50
    }
  ]
}

Health

GET/api/llm/health

Health check endpoint.

Response

json

{
  "status": "healthy",
  "service": "llm",
  "timestamp": "2025-01-15T10:00:00.000Z"
}

Completion Request Parameters

Parameter	Type	Required	Description
`provider`	string	Yes	Provider to use (openai, gemini, vertex, anthropic, xai, perplexity, lmstudio, elevenlabs, mlxaudio, ollama)
`messages`	array	Yes	Array of message objects with role and content
`model`	string	No	Model ID. Uses provider default if omitted
`temperature`	number	No	Sampling temperature (0-2)
`maxTokens`	number	No	Maximum output tokens. With thinkingMode the thinking budget is added on top, capped at the model limit
`topP`	number	No	Nucleus sampling (0-1)
`frequencyPenalty`	number	No	Frequency penalty (-2 to 2)
`presencePenalty`	number	No	Presence penalty (-2 to 2)
`stop`	string[]	No	Stop sequences
`stream`	boolean	No	Enable streaming (use /queue/stream to consume)
`responseFormat`	string	No	"text", "json", or "json_schema"
`jsonSchema`	object	No	JSON schema when responseFormat is "json_schema"
`thinkingMode`	object	No	{ enabled: boolean, budget?: number, effort?: string }. budget is the thinking token allowance, added on top of maxTokens (not subtracted from it)
`webSearch`	object	No	Web search config: { enabled, maxUses?, allowedDomains?, blockedDomains?, recencyFilter? }
`service`	string	No	Service name for usage tracking
`tag`	string	No	Optional label for usage tracking (max 100 chars)
`tools`	array	No	Array of tool definitions: { type: "function", function: { name, description?, parameters? } }
`toolChoice`	string \| object	No	"auto" \| "none" \| "required" or { type: "function", function: { name } }
`cache`	object	No	{ enabled?: boolean, ttl?: number }

Message Roles

Role	Fields	Description
`system`	content	System instructions for the model
`user`	content	User message (string or multipart array)
`assistant`	content, toolCalls?	Model response. Contains toolCalls array when the model invokes tools
`tool`	content, toolCallId	Tool result. Must include toolCallId matching the tool call

Tool Calling

Pass tool definitions to let the model call functions. Supported by Anthropic, OpenAI, and LMStudio.

POST/api/llm/complete

Completion request with tools.

Request Body

json

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "messages": [
    { "role": "user", "content": "What's the weather in Brussels?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "toolChoice": "auto"
}

Response

json

// Queue result (poll via /queue/result)
{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "completed",
    "result": {
      "content": "",
      "toolCalls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"Brussels\"}"
          }
        }
      ],
      "finishReason": "tool_calls",
      "usage": { "inputTokens": 85, "outputTokens": 32, "totalTokens": 117 },
      "model": "claude-sonnet-4-5",
      "provider": "anthropic",
      "requestId": "req_789"
    }
  }
}

Multi-turn Tool Calling Flow

After receiving tool calls, execute the function locally and send back the result in a follow-up request:

POST/api/llm/complete

Follow-up with tool results.

Request Body

json

{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "messages": [
    { "role": "user", "content": "What's the weather in Brussels?" },
    {
      "role": "assistant",
      "content": "",
      "toolCalls": [{
        "id": "call_abc123",
        "type": "function",
        "function": { "name": "get_weather", "arguments": "{\"location\":\"Brussels\"}" }
      }]
    },
    {
      "role": "tool",
      "toolCallId": "call_abc123",
      "content": "{\"temperature\":18,\"condition\":\"Partly cloudy\"}"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }
  ]
}

Response

json

// Final response with natural language answer
{
  "success": true,
  "data": {
    "queueId": "abc456-def789",
    "status": "completed",
    "result": {
      "content": "The weather in Brussels is 18°C and partly cloudy.",
      "finishReason": "stop",
      "usage": { "inputTokens": 142, "outputTokens": 18, "totalTokens": 160 },
      "model": "claude-sonnet-4-5",
      "provider": "anthropic",
      "requestId": "req_012"
    }
  }
}