API Reference
API Reference
Complete reference for all CanaryLLM endpoints.
Completions
/api/llm/completeSubmit an LLM completion request to the queue.
Request Body
{
"provider": "openai",
"model": "gpt-4.1-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing in one sentence."
}
],
"temperature": 0.7,
"maxTokens": 1024,
"stream": false,
"responseFormat": "text",
"tag": "customer-support"
}Response
{
"success": true,
"data": {
"queueId": "abc123-def456",
"status": "queued",
"message": "Task added to queue for processing"
}
}Queue Management
/api/llm/queue/statusCheck the status of a queued task.
Request Body
{ "queueId": "abc123-def456" }Response
{
"success": true,
"data": {
"queueId": "abc123-def456",
"status": "processing",
"position": 0,
"createdAt": "2025-01-15T10:00:00Z",
"startedAt": "2025-01-15T10:00:01Z"
}
}/api/llm/queue/resultRetrieve the result of a completed task. Returns 202 if still processing.
Request Body
{ "queueId": "abc123-def456" }Response
{
"success": true,
"data": {
"queueId": "abc123-def456",
"status": "completed",
"result": {
"content": "Quantum computing uses quantum bits...",
"usage": {
"inputTokens": 25,
"outputTokens": 42,
"totalTokens": 67
},
"model": "gpt-4.1-mini",
"provider": "openai",
"requestId": "req_789",
"finishReason": "stop"
}
}
}/api/llm/queue/streamStream the result of a queued task via Server-Sent Events (SSE).
Request Body
{ "queueId": "abc123-def456" }Response
event: start
data: {"queueId":"abc123-def456"}
event: chunk
data: {"delta":"Quantum ","finishReason":null}
event: chunk
data: {"delta":"computing ","finishReason":null}
event: done
data: {}/api/llm/queue/cancelCancel a queued or processing task.
Request Body
{ "queueId": "abc123-def456" }Response
{
"success": true,
"data": { "queueId": "abc123-def456", "status": "cancelled" },
"message": "Task cancelled successfully"
}Media Generation
/api/llm/generate-imageQueue an image generation request.
Request Body
{
"provider": "openai",
"prompt": "A sunset over mountains",
"model": "gpt-image-1",
"n": 1,
"size": "1024x1024",
"quality": "hd",
"tag": "marketing"
}Response
{
"success": true,
"data": {
"queueId": "img_abc123",
"status": "queued",
"message": "Image generation task added to queue"
}
}/api/llm/generate-videoQueue a video generation request. Available for Gemini, Vertex, and xAI.
Request Body
{
"provider": "gemini",
"prompt": "A timelapse of clouds moving over a city",
"model": "veo-3.1-generate-preview",
"aspectRatio": "16:9",
"durationSeconds": 8,
"tag": "content-creation"
}Response
{
"success": true,
"data": {
"queueId": "vid_abc123",
"status": "queued",
"message": "Video generation task added to queue"
}
}/api/llm/generate-sound-effectQueue a sound effect generation request. Powered by ElevenLabs.
Request Body
{
"text": "thunder rolling in the distance, rain on a tin roof",
"model": "eleven_text_to_sound_v2",
"durationSeconds": 10,
"promptInfluence": 0.5,
"tag": "ambient"
}Response
{
"success": true,
"data": {
"queueId": "llm_abc123",
"status": "queued",
"message": "Sound effect task added to queue"
}
}/api/llm/generate-musicQueue a music generation request. Powered by ElevenLabs.
Request Body
{
"prompt": "upbeat jazz jingle, bright piano and saxophone",
"model": "music_v1",
"durationMs": 10000,
"forceInstrumental": true,
"tag": "jingle"
}Response
{
"success": true,
"data": {
"queueId": "llm_abc123",
"status": "queued",
"message": "Music generation task added to queue"
}
}Video Upload
/api/llm/upload-videoUpload a video file for use in chat completions. Returns a fileId valid for 1 hour. Max 100MB. Send as multipart/form-data with a 'video' field.
Request Body
// multipart/form-data
// Field: "video" (file)
// Accepted types: video/mp4, video/webm, video/mov, video/quicktime, video/mpeg, video/avi
curl -X POST /api/llm/upload-video \
-H "Authorization: Bearer <api-key>" \
-F "video=@recording.mp4"Response
{
"success": true,
"data": {
"fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"mimeType": "video/mp4",
"sizeBytes": 15728640,
"expiresAt": "2026-02-13T15:00:00.000Z"
}
}Using Video in Completions
Videos can be included in message content as multipart arrays. Use inline base64 for small videos or a fileId from the upload endpoint for larger files. Supported by Gemini and Vertex providers.
/api/llm/completeCompletion with video input (via fileId).
Request Body
{
"provider": "gemini",
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe what happens in this video." },
{ "type": "video", "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "mimeType": "video/mp4" }
]
}
]
}Response
{
"success": true,
"data": {
"queueId": "vid_input_abc123",
"status": "queued",
"message": "Task added to queue for processing"
}
}/api/llm/completeCompletion with inline video (small files, base64).
Request Body
{
"provider": "gemini",
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this clip?" },
{ "type": "video", "data": "<base64_encoded_video>", "mimeType": "video/mp4" }
]
}
]
}Response
{
"success": true,
"data": {
"queueId": "vid_input_def456",
"status": "queued",
"message": "Task added to queue for processing"
}
}Audio (TTS / STT)
/api/llm/generate-audioQueue a text-to-speech request. Supports ElevenLabs and MLX Audio (Local).
Request Body
// ElevenLabs (basic)
{
"provider": "elevenlabs",
"text": "Welcome to your interview.",
"model": "eleven_flash_v2_5",
"voiceId": "JBFqnCBsd6RMkjVDRZzb",
"outputFormat": "mp3_44100_128",
"tag": "interview"
}
// ElevenLabs (with voice settings)
{
"provider": "elevenlabs",
"text": "Welcome to your interview.",
"model": "eleven_multilingual_v2",
"voiceId": "JBFqnCBsd6RMkjVDRZzb",
"outputFormat": "mp3_44100_192",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0.3,
"useSpeakerBoost": true
},
"languageCode": "nl",
"applyTextNormalization": "auto",
"previousRequestIds": ["req_abc123"],
"tag": "interview"
}
// MLX Audio (Local, free)
{
"provider": "mlxaudio",
"text": "Welkom bij het interview.",
"model": "kokoro",
"voiceId": "af_heart",
"tag": "interview"
}Response
{
"success": true,
"data": {
"queueId": "llm_abc123",
"status": "queued",
"message": "TTS task added to queue"
}
}/api/llm/transcribeQueue a speech-to-text transcription request. Supports ElevenLabs Scribe and MLX Audio (Local).
Request Body
// ElevenLabs with diarization
{
"provider": "elevenlabs",
"audio": "<base64_encoded_audio>",
"mimeType": "audio/mpeg",
"model": "scribe_v2",
"language": "nl",
"tag": "interview",
"diarize": true,
"numSpeakers": 2,
"timestampsGranularity": "word",
"tagAudioEvents": true
}
// ElevenLabs (basic, no diarization)
{
"provider": "elevenlabs",
"audio": "<base64_encoded_audio>",
"mimeType": "audio/mpeg",
"model": "scribe_v2"
}
// MLX Audio (Local, free)
{
"provider": "mlxaudio",
"audio": "<base64_encoded_audio>",
"mimeType": "audio/wav",
"model": "whisper-large-v3",
"language": "nl",
"tag": "transcription"
}Response
{
"success": true,
"data": {
"queueId": "llm_abc123",
"status": "queued",
"message": "STT task added to queue"
}
}
// Completed task result (with diarization):
{
"text": "Hello, how are you? I'm fine, thanks.",
"language": "en",
"model": "scribe_v2",
"provider": "elevenlabs",
"requestId": "req_abc123",
"words": [
{ "text": "Hello,", "start": 0.08, "end": 0.54, "type": "word", "speakerId": "speaker_0" },
{ "text": "how", "start": 0.56, "end": 0.72, "type": "word", "speakerId": "speaker_0" },
{ "text": "I'm", "start": 1.2, "end": 1.4, "type": "word", "speakerId": "speaker_1" }
]
}Embeddings
/api/llm/embeddingsQueue an embeddings request. Embeds one string or an array of strings (max 2048) via a local model (LM Studio). Content is processed transiently and never stored. Poll /api/llm/queue/result for the vectors.
Request Body
{
"provider": "lmstudio",
"model": "nomic-embed-text-v1.5",
"input": ["first chunk of text", "second chunk of text"],
"dimensions": 768,
"tag": "kb:contracts"
}Response
{
"success": true,
"data": {
"queueId": "llm_abc123",
"status": "queued",
"message": "Embedding task added to queue"
}
}The queue result holds { embeddings: number[][], model, provider, dimensions, usage }. For a synchronous, OpenAI-compatible call use POST /v1/embeddings with a provider/model id. See the Embeddings guide for ingestion and retrieval patterns.
Agents
/api/agents/signed-urlGenerate a signed URL for a conversational AI agent.
Request Body
{ "agentId": "your-agent-id" }Response
{
"success": true,
"data": {
"signedUrl": "wss://...",
"expiresIn": 900
}
}Discovery
/api/llm/providersList all available providers.
Response
{
"success": true,
"data": {
"providers": ["gemini", "vertex", "openai", "anthropic", "xai", "perplexity", "lmstudio", "elevenlabs", "mlxaudio", "ollama"]
}
}/api/llm/models?provider=openaiList all models for a specific provider.
Response
{
"success": true,
"data": {
"provider": "openai",
"models": [
{
"id": "gpt-4.1",
"name": "GPT-4.1",
"contextWindow": 1047576,
"maxOutputTokens": 32768,
"inputCostPer1k": 0.002,
"outputCostPer1k": 0.008,
"capabilities": ["chat"]
}
]
}
}/api/llm/voices?provider=elevenlabsList available voices with preview URLs.
Response
{
"success": true,
"data": {
"provider": "elevenlabs",
"voices": [
{
"voice_id": "JBFqnCBsd6RMkjVDRZzb",
"name": "George",
"category": "premade",
"preview_url": "https://storage.googleapis.com/...",
"labels": { "accent": "British", "age": "middle-aged", "gender": "male" }
}
]
}
}/api/llm/capabilitiesList capabilities (image/video/audio) per provider.
Response
{
"success": true,
"data": {
"openai": {
"imageGeneration": true,
"videoGeneration": false,
"textToSpeech": false,
"speechToText": false,
"models": [
{ "id": "gpt-image-1", "name": "GPT Image 1", "capabilities": ["image-generation"] }
]
},
"elevenlabs": {
"imageGeneration": false,
"videoGeneration": false,
"textToSpeech": true,
"speechToText": true,
"soundEffects": true,
"musicGeneration": true,
"models": [
{ "id": "eleven_flash_v2_5", "capabilities": ["text-to-speech"] },
{ "id": "scribe_v1", "capabilities": ["speech-to-text"] },
{ "id": "eleven_text_to_sound_v2", "capabilities": ["sound-effect"] },
{ "id": "music_v1", "capabilities": ["music-generation"] }
]
},
"mlxaudio": {
"imageGeneration": false,
"videoGeneration": false,
"textToSpeech": true,
"speechToText": true,
"models": [
{ "id": "kokoro", "capabilities": ["text-to-speech"] },
{ "id": "whisper-large-v3", "capabilities": ["speech-to-text"] }
]
}
}
}Concurrency
/api/llm/concurrencyView concurrency limits and active requests for all providers.
Response
{
"success": true,
"data": {
"openai": { "activeRequests": 2, "queuedRequests": 5, "limit": 10 },
"gemini": { "activeRequests": 0, "queuedRequests": 0, "limit": 15 }
}
}/api/llm/concurrency/:providerView concurrency status for a specific provider.
Response
{
"success": true,
"data": {
"provider": "openai",
"activeRequests": 2,
"queuedRequests": 5,
"limit": 10
}
}Usage
/api/llm/usageGet current month usage summary with breakdown per provider.
Response
{
"success": true,
"data": {
"totalRequests": 1250,
"totalTokens": 3450000,
"cost": 12.45,
"providers": {
"openai": { "requests": 800, "tokens": 2100000, "cost": 8.40 },
"gemini": { "requests": 450, "tokens": 1350000, "cost": 4.05 }
}
}
}/api/llm/usage/monthlyGet monthly usage for the last 12 months.
Response
{
"success": true,
"data": [
{
"year": 2026,
"month": 2,
"provider": "openai",
"totalRequests": 1250,
"totalTokens": 3450000,
"cost": 12.45
}
]
}/api/llm/usage/dailyGet daily usage for the last 30 days.
Response
{
"success": true,
"data": [
{
"date": "2026-02-09",
"provider": "openai",
"totalRequests": 42,
"totalTokens": 125000,
"cost": 0.50
}
]
}Health
/api/llm/healthHealth check endpoint.
Response
{
"status": "healthy",
"service": "llm",
"timestamp": "2025-01-15T10:00:00.000Z"
}Completion Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | Provider to use (openai, gemini, vertex, anthropic, xai, perplexity, lmstudio, elevenlabs, mlxaudio, ollama) |
messages | array | Yes | Array of message objects with role and content |
model | string | No | Model ID. Uses provider default if omitted |
temperature | number | No | Sampling temperature (0-2) |
maxTokens | number | No | Maximum output tokens. With thinkingMode the thinking budget is added on top, capped at the model limit |
topP | number | No | Nucleus sampling (0-1) |
frequencyPenalty | number | No | Frequency penalty (-2 to 2) |
presencePenalty | number | No | Presence penalty (-2 to 2) |
stop | string[] | No | Stop sequences |
stream | boolean | No | Enable streaming (use /queue/stream to consume) |
responseFormat | string | No | "text", "json", or "json_schema" |
jsonSchema | object | No | JSON schema when responseFormat is "json_schema" |
thinkingMode | object | No | { enabled: boolean, budget?: number, effort?: string }. budget is the thinking token allowance, added on top of maxTokens (not subtracted from it) |
webSearch | object | No | Web search config: { enabled, maxUses?, allowedDomains?, blockedDomains?, recencyFilter? } |
service | string | No | Service name for usage tracking |
tag | string | No | Optional label for usage tracking (max 100 chars) |
tools | array | No | Array of tool definitions: { type: "function", function: { name, description?, parameters? } } |
toolChoice | string | object | No | "auto" | "none" | "required" or { type: "function", function: { name } } |
cache | object | No | { enabled?: boolean, ttl?: number } |
Message Roles
| Role | Fields | Description |
|---|---|---|
system | content | System instructions for the model |
user | content | User message (string or multipart array) |
assistant | content, toolCalls? | Model response. Contains toolCalls array when the model invokes tools |
tool | content, toolCallId | Tool result. Must include toolCallId matching the tool call |
Tool Calling
Pass tool definitions to let the model call functions. Supported by Anthropic, OpenAI, and LMStudio.
/api/llm/completeCompletion request with tools.
Request Body
{
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"messages": [
{ "role": "user", "content": "What's the weather in Brussels?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}
],
"toolChoice": "auto"
}Response
// Queue result (poll via /queue/result)
{
"success": true,
"data": {
"queueId": "abc123-def456",
"status": "completed",
"result": {
"content": "",
"toolCalls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Brussels\"}"
}
}
],
"finishReason": "tool_calls",
"usage": { "inputTokens": 85, "outputTokens": 32, "totalTokens": 117 },
"model": "claude-sonnet-4-5",
"provider": "anthropic",
"requestId": "req_789"
}
}
}Multi-turn Tool Calling Flow
After receiving tool calls, execute the function locally and send back the result in a follow-up request:
/api/llm/completeFollow-up with tool results.
Request Body
{
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"messages": [
{ "role": "user", "content": "What's the weather in Brussels?" },
{
"role": "assistant",
"content": "",
"toolCalls": [{
"id": "call_abc123",
"type": "function",
"function": { "name": "get_weather", "arguments": "{\"location\":\"Brussels\"}" }
}]
},
{
"role": "tool",
"toolCallId": "call_abc123",
"content": "{\"temperature\":18,\"condition\":\"Partly cloudy\"}"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}
}
]
}Response
// Final response with natural language answer
{
"success": true,
"data": {
"queueId": "abc456-def789",
"status": "completed",
"result": {
"content": "The weather in Brussels is 18°C and partly cloudy.",
"finishReason": "stop",
"usage": { "inputTokens": 142, "outputTokens": 18, "totalTokens": 160 },
"model": "claude-sonnet-4-5",
"provider": "anthropic",
"requestId": "req_012"
}
}
}