Skip to content

API Reference

API Reference

Complete reference for all CanaryLLM endpoints.

OpenAPI Spec (YAML)— Import in Postman, Swagger UI, or use for SDK generation

Completions

POST/api/llm/complete

Submit an LLM completion request to the queue.

Request Body

json
{
  "provider": "openai",
  "model": "gpt-4.1-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain quantum computing in one sentence."
    }
  ],
  "temperature": 0.7,
  "maxTokens": 1024,
  "stream": false,
  "responseFormat": "text",
  "tag": "customer-support"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}

Queue Management

POST/api/llm/queue/status

Check the status of a queued task.

Request Body

json
{ "queueId": "abc123-def456" }

Response

json
{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "processing",
    "position": 0,
    "createdAt": "2025-01-15T10:00:00Z",
    "startedAt": "2025-01-15T10:00:01Z"
  }
}
POST/api/llm/queue/result

Retrieve the result of a completed task. Returns 202 if still processing.

Request Body

json
{ "queueId": "abc123-def456" }

Response

json
{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "completed",
    "result": {
      "content": "Quantum computing uses quantum bits...",
      "usage": {
        "inputTokens": 25,
        "outputTokens": 42,
        "totalTokens": 67
      },
      "model": "gpt-4.1-mini",
      "provider": "openai",
      "requestId": "req_789",
      "finishReason": "stop"
    }
  }
}
POST/api/llm/queue/stream

Stream the result of a queued task via Server-Sent Events (SSE).

Request Body

json
{ "queueId": "abc123-def456" }

Response

json
event: start
data: {"queueId":"abc123-def456"}

event: chunk
data: {"delta":"Quantum ","finishReason":null}

event: chunk
data: {"delta":"computing ","finishReason":null}

event: done
data: {}
POST/api/llm/queue/cancel

Cancel a queued or processing task.

Request Body

json
{ "queueId": "abc123-def456" }

Response

json
{
  "success": true,
  "data": { "queueId": "abc123-def456", "status": "cancelled" },
  "message": "Task cancelled successfully"
}

Media Generation

POST/api/llm/generate-image

Queue an image generation request.

Request Body

json
{
  "provider": "openai",
  "prompt": "A sunset over mountains",
  "model": "gpt-image-1",
  "n": 1,
  "size": "1024x1024",
  "quality": "hd",
  "tag": "marketing"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "img_abc123",
    "status": "queued",
    "message": "Image generation task added to queue"
  }
}
POST/api/llm/generate-video

Queue a video generation request. Available for Gemini, Vertex, and xAI.

Request Body

json
{
  "provider": "gemini",
  "prompt": "A timelapse of clouds moving over a city",
  "model": "veo-3.1-generate-preview",
  "aspectRatio": "16:9",
  "durationSeconds": 8,
  "tag": "content-creation"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "vid_abc123",
    "status": "queued",
    "message": "Video generation task added to queue"
  }
}
POST/api/llm/generate-sound-effect

Queue a sound effect generation request. Powered by ElevenLabs.

Request Body

json
{
  "text": "thunder rolling in the distance, rain on a tin roof",
  "model": "eleven_text_to_sound_v2",
  "durationSeconds": 10,
  "promptInfluence": 0.5,
  "tag": "ambient"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Sound effect task added to queue"
  }
}
POST/api/llm/generate-music

Queue a music generation request. Powered by ElevenLabs.

Request Body

json
{
  "prompt": "upbeat jazz jingle, bright piano and saxophone",
  "model": "music_v1",
  "durationMs": 10000,
  "forceInstrumental": true,
  "tag": "jingle"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Music generation task added to queue"
  }
}

Video Upload

POST/api/llm/upload-video

Upload a video file for use in chat completions. Returns a fileId valid for 1 hour. Max 100MB. Send as multipart/form-data with a 'video' field.

Request Body

json
// multipart/form-data
// Field: "video" (file)
// Accepted types: video/mp4, video/webm, video/mov, video/quicktime, video/mpeg, video/avi

curl -X POST /api/llm/upload-video \
  -H "Authorization: Bearer <api-key>" \
  -F "video=@recording.mp4"

Response

json
{
  "success": true,
  "data": {
    "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "mimeType": "video/mp4",
    "sizeBytes": 15728640,
    "expiresAt": "2026-02-13T15:00:00.000Z"
  }
}

Using Video in Completions

Videos can be included in message content as multipart arrays. Use inline base64 for small videos or a fileId from the upload endpoint for larger files. Supported by Gemini and Vertex providers.

POST/api/llm/complete

Completion with video input (via fileId).

Request Body

json
{
  "provider": "gemini",
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe what happens in this video." },
        { "type": "video", "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "mimeType": "video/mp4" }
      ]
    }
  ]
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "vid_input_abc123",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}
POST/api/llm/complete

Completion with inline video (small files, base64).

Request Body

json
{
  "provider": "gemini",
  "model": "gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this clip?" },
        { "type": "video", "data": "<base64_encoded_video>", "mimeType": "video/mp4" }
      ]
    }
  ]
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "vid_input_def456",
    "status": "queued",
    "message": "Task added to queue for processing"
  }
}

Audio (TTS / STT)

POST/api/llm/generate-audio

Queue a text-to-speech request. Supports ElevenLabs and MLX Audio (Local).

Request Body

json
// ElevenLabs (basic)
{
  "provider": "elevenlabs",
  "text": "Welcome to your interview.",
  "model": "eleven_flash_v2_5",
  "voiceId": "JBFqnCBsd6RMkjVDRZzb",
  "outputFormat": "mp3_44100_128",
  "tag": "interview"
}

// ElevenLabs (with voice settings)
{
  "provider": "elevenlabs",
  "text": "Welcome to your interview.",
  "model": "eleven_multilingual_v2",
  "voiceId": "JBFqnCBsd6RMkjVDRZzb",
  "outputFormat": "mp3_44100_192",
  "voiceSettings": {
    "stability": 0.5,
    "similarityBoost": 0.75,
    "style": 0.3,
    "useSpeakerBoost": true
  },
  "languageCode": "nl",
  "applyTextNormalization": "auto",
  "previousRequestIds": ["req_abc123"],
  "tag": "interview"
}

// MLX Audio (Local, free)
{
  "provider": "mlxaudio",
  "text": "Welkom bij het interview.",
  "model": "kokoro",
  "voiceId": "af_heart",
  "tag": "interview"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "TTS task added to queue"
  }
}
POST/api/llm/transcribe

Queue a speech-to-text transcription request. Supports ElevenLabs Scribe and MLX Audio (Local).

Request Body

json
// ElevenLabs with diarization
{
  "provider": "elevenlabs",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/mpeg",
  "model": "scribe_v2",
  "language": "nl",
  "tag": "interview",
  "diarize": true,
  "numSpeakers": 2,
  "timestampsGranularity": "word",
  "tagAudioEvents": true
}

// ElevenLabs (basic, no diarization)
{
  "provider": "elevenlabs",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/mpeg",
  "model": "scribe_v2"
}

// MLX Audio (Local, free)
{
  "provider": "mlxaudio",
  "audio": "<base64_encoded_audio>",
  "mimeType": "audio/wav",
  "model": "whisper-large-v3",
  "language": "nl",
  "tag": "transcription"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "STT task added to queue"
  }
}

// Completed task result (with diarization):
{
  "text": "Hello, how are you? I'm fine, thanks.",
  "language": "en",
  "model": "scribe_v2",
  "provider": "elevenlabs",
  "requestId": "req_abc123",
  "words": [
    { "text": "Hello,", "start": 0.08, "end": 0.54, "type": "word", "speakerId": "speaker_0" },
    { "text": "how", "start": 0.56, "end": 0.72, "type": "word", "speakerId": "speaker_0" },
    { "text": "I'm", "start": 1.2, "end": 1.4, "type": "word", "speakerId": "speaker_1" }
  ]
}

Embeddings

POST/api/llm/embeddings

Queue an embeddings request. Embeds one string or an array of strings (max 2048) via a local model (LM Studio). Content is processed transiently and never stored. Poll /api/llm/queue/result for the vectors.

Request Body

json
{
  "provider": "lmstudio",
  "model": "nomic-embed-text-v1.5",
  "input": ["first chunk of text", "second chunk of text"],
  "dimensions": 768,
  "tag": "kb:contracts"
}

Response

json
{
  "success": true,
  "data": {
    "queueId": "llm_abc123",
    "status": "queued",
    "message": "Embedding task added to queue"
  }
}

The queue result holds { embeddings: number[][], model, provider, dimensions, usage }. For a synchronous, OpenAI-compatible call use POST /v1/embeddings with a provider/model id. See the Embeddings guide for ingestion and retrieval patterns.

Agents

POST/api/agents/signed-url

Generate a signed URL for a conversational AI agent.

Request Body

json
{ "agentId": "your-agent-id" }

Response

json
{
  "success": true,
  "data": {
    "signedUrl": "wss://...",
    "expiresIn": 900
  }
}

Discovery

GET/api/llm/providers

List all available providers.

Response

json
{
  "success": true,
  "data": {
    "providers": ["gemini", "vertex", "openai", "anthropic", "xai", "perplexity", "lmstudio", "elevenlabs", "mlxaudio", "ollama"]
  }
}
GET/api/llm/models?provider=openai

List all models for a specific provider.

Response

json
{
  "success": true,
  "data": {
    "provider": "openai",
    "models": [
      {
        "id": "gpt-4.1",
        "name": "GPT-4.1",
        "contextWindow": 1047576,
        "maxOutputTokens": 32768,
        "inputCostPer1k": 0.002,
        "outputCostPer1k": 0.008,
        "capabilities": ["chat"]
      }
    ]
  }
}
GET/api/llm/voices?provider=elevenlabs

List available voices with preview URLs.

Response

json
{
  "success": true,
  "data": {
    "provider": "elevenlabs",
    "voices": [
      {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb",
        "name": "George",
        "category": "premade",
        "preview_url": "https://storage.googleapis.com/...",
        "labels": { "accent": "British", "age": "middle-aged", "gender": "male" }
      }
    ]
  }
}
GET/api/llm/capabilities

List capabilities (image/video/audio) per provider.

Response

json
{
  "success": true,
  "data": {
    "openai": {
      "imageGeneration": true,
      "videoGeneration": false,
      "textToSpeech": false,
      "speechToText": false,
      "models": [
        { "id": "gpt-image-1", "name": "GPT Image 1", "capabilities": ["image-generation"] }
      ]
    },
    "elevenlabs": {
      "imageGeneration": false,
      "videoGeneration": false,
      "textToSpeech": true,
      "speechToText": true,
      "soundEffects": true,
      "musicGeneration": true,
      "models": [
        { "id": "eleven_flash_v2_5", "capabilities": ["text-to-speech"] },
        { "id": "scribe_v1", "capabilities": ["speech-to-text"] },
        { "id": "eleven_text_to_sound_v2", "capabilities": ["sound-effect"] },
        { "id": "music_v1", "capabilities": ["music-generation"] }
      ]
    },
    "mlxaudio": {
      "imageGeneration": false,
      "videoGeneration": false,
      "textToSpeech": true,
      "speechToText": true,
      "models": [
        { "id": "kokoro", "capabilities": ["text-to-speech"] },
        { "id": "whisper-large-v3", "capabilities": ["speech-to-text"] }
      ]
    }
  }
}

Concurrency

GET/api/llm/concurrency

View concurrency limits and active requests for all providers.

Response

json
{
  "success": true,
  "data": {
    "openai": { "activeRequests": 2, "queuedRequests": 5, "limit": 10 },
    "gemini": { "activeRequests": 0, "queuedRequests": 0, "limit": 15 }
  }
}
GET/api/llm/concurrency/:provider

View concurrency status for a specific provider.

Response

json
{
  "success": true,
  "data": {
    "provider": "openai",
    "activeRequests": 2,
    "queuedRequests": 5,
    "limit": 10
  }
}

Usage

GET/api/llm/usage

Get current month usage summary with breakdown per provider.

Response

json
{
  "success": true,
  "data": {
    "totalRequests": 1250,
    "totalTokens": 3450000,
    "cost": 12.45,
    "providers": {
      "openai": { "requests": 800, "tokens": 2100000, "cost": 8.40 },
      "gemini": { "requests": 450, "tokens": 1350000, "cost": 4.05 }
    }
  }
}
GET/api/llm/usage/monthly

Get monthly usage for the last 12 months.

Response

json
{
  "success": true,
  "data": [
    {
      "year": 2026,
      "month": 2,
      "provider": "openai",
      "totalRequests": 1250,
      "totalTokens": 3450000,
      "cost": 12.45
    }
  ]
}
GET/api/llm/usage/daily

Get daily usage for the last 30 days.

Response

json
{
  "success": true,
  "data": [
    {
      "date": "2026-02-09",
      "provider": "openai",
      "totalRequests": 42,
      "totalTokens": 125000,
      "cost": 0.50
    }
  ]
}

Health

GET/api/llm/health

Health check endpoint.

Response

json
{
  "status": "healthy",
  "service": "llm",
  "timestamp": "2025-01-15T10:00:00.000Z"
}

Completion Request Parameters

ParameterTypeRequiredDescription
providerstringYesProvider to use (openai, gemini, vertex, anthropic, xai, perplexity, lmstudio, elevenlabs, mlxaudio, ollama)
messagesarrayYesArray of message objects with role and content
modelstringNoModel ID. Uses provider default if omitted
temperaturenumberNoSampling temperature (0-2)
maxTokensnumberNoMaximum output tokens. With thinkingMode the thinking budget is added on top, capped at the model limit
topPnumberNoNucleus sampling (0-1)
frequencyPenaltynumberNoFrequency penalty (-2 to 2)
presencePenaltynumberNoPresence penalty (-2 to 2)
stopstring[]NoStop sequences
streambooleanNoEnable streaming (use /queue/stream to consume)
responseFormatstringNo"text", "json", or "json_schema"
jsonSchemaobjectNoJSON schema when responseFormat is "json_schema"
thinkingModeobjectNo{ enabled: boolean, budget?: number, effort?: string }. budget is the thinking token allowance, added on top of maxTokens (not subtracted from it)
webSearchobjectNoWeb search config: { enabled, maxUses?, allowedDomains?, blockedDomains?, recencyFilter? }
servicestringNoService name for usage tracking
tagstringNoOptional label for usage tracking (max 100 chars)
toolsarrayNoArray of tool definitions: { type: "function", function: { name, description?, parameters? } }
toolChoicestring | objectNo"auto" | "none" | "required" or { type: "function", function: { name } }
cacheobjectNo{ enabled?: boolean, ttl?: number }

Message Roles

RoleFieldsDescription
systemcontentSystem instructions for the model
usercontentUser message (string or multipart array)
assistantcontent, toolCalls?Model response. Contains toolCalls array when the model invokes tools
toolcontent, toolCallIdTool result. Must include toolCallId matching the tool call

Tool Calling

Pass tool definitions to let the model call functions. Supported by Anthropic, OpenAI, and LMStudio.

POST/api/llm/complete

Completion request with tools.

Request Body

json
{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "messages": [
    { "role": "user", "content": "What's the weather in Brussels?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "toolChoice": "auto"
}

Response

json
// Queue result (poll via /queue/result)
{
  "success": true,
  "data": {
    "queueId": "abc123-def456",
    "status": "completed",
    "result": {
      "content": "",
      "toolCalls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"Brussels\"}"
          }
        }
      ],
      "finishReason": "tool_calls",
      "usage": { "inputTokens": 85, "outputTokens": 32, "totalTokens": 117 },
      "model": "claude-sonnet-4-5",
      "provider": "anthropic",
      "requestId": "req_789"
    }
  }
}

Multi-turn Tool Calling Flow

After receiving tool calls, execute the function locally and send back the result in a follow-up request:

POST/api/llm/complete

Follow-up with tool results.

Request Body

json
{
  "provider": "anthropic",
  "model": "claude-sonnet-4-5",
  "messages": [
    { "role": "user", "content": "What's the weather in Brussels?" },
    {
      "role": "assistant",
      "content": "",
      "toolCalls": [{
        "id": "call_abc123",
        "type": "function",
        "function": { "name": "get_weather", "arguments": "{\"location\":\"Brussels\"}" }
      }]
    },
    {
      "role": "tool",
      "toolCallId": "call_abc123",
      "content": "{\"temperature\":18,\"condition\":\"Partly cloudy\"}"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }
  ]
}

Response

json
// Final response with natural language answer
{
  "success": true,
  "data": {
    "queueId": "abc456-def789",
    "status": "completed",
    "result": {
      "content": "The weather in Brussels is 18°C and partly cloudy.",
      "finishReason": "stop",
      "usage": { "inputTokens": 142, "outputTokens": 18, "totalTokens": 160 },
      "model": "claude-sonnet-4-5",
      "provider": "anthropic",
      "requestId": "req_012"
    }
  }
}