Knowledge Base (RAG)
Knowledge Base (RAG)
Build Retrieval-Augmented Generation over your own documents with the open-source @canarycoders/canaryllm-rag toolkit. Your documents, chunks, embeddings and vector index all stay on your infrastructure — CanaryLLM only does the transient embedding and chat calls, and stores nothing.
Why it's built this way
Embeddings derived from your documents are themselves personal data — text can be partially reconstructed from a vector. Keeping the vectors in your store, not ours, keeps you in control of access, retention and erasure, and means no document content is ever persisted by the gateway. The only thing that crosses the wire is chunk text, embedded transiently on local (LM Studio) inference with no third-country transfer.
Install
bun add @canarycoders/canaryllm-rag @canarycoders/canaryllmYou also need a Postgres with the pgvector extension (or implement the VectorStore interface for another store), and an LM Studio embedding model loaded behind your gateway (e.g. nomic-embed-text-v1.5).
Ingest → retrieve → answer
import { Pool } from "pg";
import { CanaryLLM } from "@canarycoders/canaryllm";
import { canaryEmbedder, ingestDocuments, retrieve, buildRagMessages } from "@canarycoders/canaryllm-rag";
import { PgVectorStore } from "@canarycoders/canaryllm-rag/store/pgvector";
const client = new CanaryLLM({ apiKey: process.env.CANARYLLM_API_KEY });
const embedder = canaryEmbedder(client, { model: "nomic-embed-text-v1.5" });
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
const store = new PgVectorStore(pool, { dimensions: 768 }); // match your model
await store.migrate(); // creates extension, table, HNSW cosine index
// 1. Ingest — chunk → embed → upsert (your store, your data)
await ingestDocuments(
[
{ id: "handbook.md", text: handbookText, metadata: { source: "handbook.md" } },
{ id: "policy.md", text: policyText, metadata: { source: "policy.md" } },
],
{ embedder, store, chunk: { chunkSize: 512, chunkOverlap: 64 } },
);
// 2. Retrieve — embed the question, search your store
const hits = await retrieve("How many vacation days do I get?", { embedder, store, topK: 5 });
// 3. Answer — grounded completion via the gateway
const messages = buildRagMessages("How many vacation days do I get?", hits);
const answer = await client.chat.complete({ provider: "lmstudio", model: "qwen3-32b", messages });
console.log(answer.content);Chunking
The recursive splitter packs pieces (paragraph → line → sentence → word) up to chunkSize tokens with chunkOverlap tokens of carry-over. The default token count is a ~4-chars-per-token estimate; pass a real tokenizer for exact sizing.
import { splitTextIntoChunks } from "@canarycoders/canaryllm-rag";
const chunks = splitTextIntoChunks(text, {
chunkSize: 512,
chunkOverlap: 64,
countTokens: (t) => myTokenizer.encode(t).length,
});Parsing documents
HTML and Markdown have built-in extractors. PDF and DOCX are extracted on your side (so the raw file never leaves your box) and passed in as text.
import { htmlToText, markdownToText } from "@canarycoders/canaryllm-rag";
import pdf from "pdf-parse"; // your dependency
const text = htmlToText(rawHtml);
const { text: pdfText } = await pdf(await Bun.file("contract.pdf").arrayBuffer());
await ingestDocuments(
[{ id: "contract.pdf", text: pdfText, metadata: { source: "contract.pdf" } }],
{ embedder, store },
);pgvector schema
store.migrate() creates this for you, but here it is for reference. The dimension must match your embedding model.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE rag_chunks (
id text PRIMARY KEY,
document_id text NOT NULL,
chunk_index integer NOT NULL,
text text NOT NULL,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
embedding vector(768) NOT NULL
);
CREATE INDEX ON rag_chunks USING hnsw (embedding vector_cosine_ops);Custom stores
Implement the VectorStore interface to back the toolkit with sqlite-vec, Qdrant, Weaviate, etc. PgVectorStore takes any node-postgres-shaped client, so the pg driver and your credentials stay your dependency and never touch the package.