Knowledge Base (RAG)

Build Retrieval-Augmented Generation over your own documents with the open-source @canarycoders/canaryllm-rag toolkit. Your documents, chunks, embeddings and vector index all stay on your infrastructure — CanaryLLM only does the transient embedding and chat calls, and stores nothing.

Why it's built this way

Embeddings derived from your documents are themselves personal data — text can be partially reconstructed from a vector. Keeping the vectors in your store, not ours, keeps you in control of access, retention and erasure, and means no document content is ever persisted by the gateway. The only thing that crosses the wire is chunk text, embedded transiently on local (LM Studio) inference with no third-country transfer.

Install

bash

bun add @canarycoders/canaryllm-rag @canarycoders/canaryllm

You also need a Postgres with the pgvector extension (or implement the VectorStore interface for another store), and an LM Studio embedding model loaded behind your gateway (e.g. nomic-embed-text-v1.5).

Ingest → retrieve → answer

typescript

import { Pool } from "pg";
import { CanaryLLM } from "@canarycoders/canaryllm";
import { canaryEmbedder, ingestDocuments, retrieve, buildRagMessages } from "@canarycoders/canaryllm-rag";
import { PgVectorStore } from "@canarycoders/canaryllm-rag/store/pgvector";

const client = new CanaryLLM({ apiKey: process.env.CANARYLLM_API_KEY });
const embedder = canaryEmbedder(client, { model: "nomic-embed-text-v1.5" });

const pool = new Pool({ connectionString: process.env.DATABASE_URL });
const store = new PgVectorStore(pool, { dimensions: 768 }); // match your model
await store.migrate(); // creates extension, table, HNSW cosine index

// 1. Ingest — chunk → embed → upsert (your store, your data)
await ingestDocuments(
  [
    { id: "handbook.md", text: handbookText, metadata: { source: "handbook.md" } },
    { id: "policy.md", text: policyText, metadata: { source: "policy.md" } },
  ],
  { embedder, store, chunk: { chunkSize: 512, chunkOverlap: 64 } },
);

// 2. Retrieve — embed the question, search your store
const hits = await retrieve("How many vacation days do I get?", { embedder, store, topK: 5 });

// 3. Answer — grounded completion via the gateway
const messages = buildRagMessages("How many vacation days do I get?", hits);
const answer = await client.chat.complete({ provider: "lmstudio", model: "qwen3-32b", messages });
console.log(answer.content);

Chunking

The recursive splitter packs pieces (paragraph → line → sentence → word) up to chunkSize tokens with chunkOverlap tokens of carry-over. The default token count is a ~4-chars-per-token estimate; pass a real tokenizer for exact sizing.

typescript

import { splitTextIntoChunks } from "@canarycoders/canaryllm-rag";

const chunks = splitTextIntoChunks(text, {
  chunkSize: 512,
  chunkOverlap: 64,
  countTokens: (t) => myTokenizer.encode(t).length,
});

Parsing documents

HTML and Markdown have built-in extractors. PDF and DOCX are extracted on your side (so the raw file never leaves your box) and passed in as text.

typescript

import { htmlToText, markdownToText } from "@canarycoders/canaryllm-rag";
import pdf from "pdf-parse";   // your dependency

const text = htmlToText(rawHtml);
const { text: pdfText } = await pdf(await Bun.file("contract.pdf").arrayBuffer());

await ingestDocuments(
  [{ id: "contract.pdf", text: pdfText, metadata: { source: "contract.pdf" } }],
  { embedder, store },
);

pgvector schema

store.migrate() creates this for you, but here it is for reference. The dimension must match your embedding model.

sql

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE rag_chunks (
  id          text PRIMARY KEY,
  document_id text NOT NULL,
  chunk_index integer NOT NULL,
  text        text NOT NULL,
  metadata    jsonb NOT NULL DEFAULT '{}'::jsonb,
  embedding   vector(768) NOT NULL
);

CREATE INDEX ON rag_chunks USING hnsw (embedding vector_cosine_ops);

Custom stores

Implement the VectorStore interface to back the toolkit with sqlite-vec, Qdrant, Weaviate, etc. PgVectorStore takes any node-postgres-shaped client, so the pg driver and your credentials stay your dependency and never touch the package.