Knowledge
Per-user managed vector store. Upload documents, sync external sources, and give agents semantic search over user-specific knowledge — all without managing embeddings or pgvector yourself.
Knowledge is scoped to a session. Each user's documents are stored in an isolated collection and never mix across sessions. Chunking uses 512-token windows with 50-token overlap, embedded with
text-embedding-3-small and indexed with pgvector HNSW.Uploading documents
Upload one or more files to a named collection. Theazo chunks, embeds, and indexes them automatically. Supported formats: PDF, TXT, MD, DOCX, HTML, CSV.
upload.ts
import { Theazo } from class="cb-str">'theazo'
import { readFile } from class="cb-str">'fs/promises'
const theazo = new Theazo({ apiKey: class="cb-str">'th_live_...' })
const session = await theazo.sessions.forUser(class="cb-str">'user_123')
// Upload from Buffer
const pdfBuffer = await readFile(class="cb-str">'./company-docs.pdf')
const result = await session.knowledge.upload({
files: [
{ name: class="cb-str">'company-docs.pdf', content: pdfBuffer, type: class="cb-str">'application/pdf' },
{ name: class="cb-str">'faq.md', content: Buffer.from(faqMarkdown), type: class="cb-str">'text/markdown' },
],
collection: class="cb-str">'company-knowledge',
})
console.log(result.collection) class="cb-cmt">// 'company-knowledge'
console.log(result.chunks) class="cb-cmt">// 342 — total chunks indexed
console.log(result.sources) class="cb-cmt">// 2 — files processedUpload from URL
await session.knowledge.upload({
files: [
{
name: class="cb-str">'product-guide.pdf',
url: class="cb-str">'https://example.com/docs/product-guide.pdf'cs/product-guide.pdf',
type: class="cb-str">'application/pdf',
},
],
collection: class="cb-str">'product-docs',
})Syncing external sources
Connect live data sources. Theazo fetches and re-indexes content on your sync schedule, keeping the vector store fresh without manual uploads.
Notion
const sync = await session.knowledge.sync({
source: {
type: class="cb-str">'notion',
config: {
token: process.env.NOTION_TOKEN,
databaseId: class="cb-str">'abc123def456',
},
},
collection: class="cb-str">'user-notes',
syncSchedule: class="cb-str">'0 */6 * * *', class="cb-cmt">// re-sync every 6 hours
})
console.log(sync.id) class="cb-cmt">// 'ksync_...'
console.log(sync.status) class="cb-cmt">// 'syncing'
console.log(sync.chunks) class="cb-cmt">// 0 (updates when sync completes)Web pages / sitemaps
await session.knowledge.sync({
source: {
type: class="cb-str">'url',
config: {
urls: [class="cb-str">'https://docs.acme.com/sitemap.xml'acme.com/sitemap.xml'],
recursive: true,
maxDepth: class="cb-num">3,
},
},
collection: class="cb-str">'acme-docs',
syncSchedule: class="cb-str">'0 2 * * *', class="cb-cmt">// re-sync nightly at 2am
})Managing syncs
// List all syncs for a session
const syncs = await session.knowledge.listSyncs()
// Force an immediate sync
await session.knowledge.syncNow(class="cb-str">'ksync_abc123')
// Pause a sync (keeps indexed data, stops re-fetching)
await session.knowledge.pauseSync(class="cb-str">'ksync_abc123')
// Delete a sync and its indexed data
await session.knowledge.deleteSync(class="cb-str">'ksync_abc123')Querying knowledge
Run a semantic search against a collection. Returns the top-K most relevant chunks with scores and source metadata.
query.ts
const results = await session.knowledge.query({
query: class="cb-str">'What is our refund policy for enterprise customers?',
collection: class="cb-str">'company-knowledge',
topK: class="cb-num">5,
})
for (const result of results) {
console.log(result.content) class="cb-cmt">// chunk text
console.log(result.score) class="cb-cmt">// 0.0–1.0 cosine similarity
console.log(result.source) class="cb-cmt">// { type: 'upload', name: 'company-docs.pdf', page: 12 }
console.log(result.chunkId) class="cb-cmt">// 'chunk_...'
}
// Result shape:
// [
// {
// chunkId: 'chunk_001',
// content: 'Enterprise refunds are processed within 5 business days...',
// score: 0.91,
// source: { type: 'upload', name: 'company-docs.pdf', page: 12 },
// },
// ...
// ]Collection stats
const stats = await session.knowledge.stats()
console.log(stats.collections) class="cb-cmt">// 3
console.log(stats.totalChunks) class="cb-cmt">// 8432
console.log(stats.storageGB) class="cb-cmt">// 0.42
// Per-collection breakdown:
// stats.byCollection = [
// {
// name: 'company-knowledge',
// chunks: 2341,
// sources: 4,
// storageGB: 0.18,
// lastSync: '2024-01-15T08:00:00Z',
// },
// ...
// ]Agents with knowledge
Enable knowledge access when creating an agent. The agent can then search the session's knowledge collections using the built-in knowledge_search tool.
agent-with-knowledge.ts
import { Theazo } from class="cb-str">'theazo'
const theazo = new Theazo({ apiKey: class="cb-str">'th_live_...' })
const session = await theazo.sessions.forUser(class="cb-str">'user_123')
// First, make sure knowledge is indexed
await session.knowledge.upload({
files: [{ name: class="cb-str">'handbook.pdf', content: handbookBuffer, type: class="cb-str">'application/pdf' }],
collection: class="cb-str">'company-knowledge',
})
// Create agent with knowledge access
const agent = await session.agents.create({
name: class="cb-str">'support-agent',
knowledge: true, class="cb-cmt">// enables knowledge_search tool automatically
class="cb-cmt">// Optionally restrict to specific collections:
class="cb-cmt">// knowledgeCollections: ['company-knowledge'],
})
// Agent can now answer questions grounded in your documents
const result = await agent.run(
class="cb-str">'What is the maximum file size we support for CSV uploads?'
)
console.log(result.output)
// "According to the technical specifications (p. 8 of company-docs.pdf),
// the maximum CSV upload size is 500MB per file..."
console.log(result.cost) class="cb-cmt">// { amount: 28, currency: 'usd' }The agent uses
knowledge_search automatically — you do not need to configure it as a tool explicitly. It searches across all collections in the session by default, or only the ones specified in knowledgeCollections.Chunking and embedding details
Chunk size512 tokens per chunk
Overlap50 tokens between adjacent chunks to preserve context across boundaries
Embedding modelOpenAI text-embedding-3-small (1536 dimensions)
Index typepgvector HNSW (not IVFFlat) — better recall at query time
Distance metricCosine similarity. Scores range 0.0–1.0, higher is more relevant.
API reference
session.knowledge.upload({ files, collection })Promise<UploadResult>Upload and index documents. Returns chunk count and source count.session.knowledge.sync({ source, collection, syncSchedule })Promise<KnowledgeSync>Connect a live source (notion, url) with automatic re-indexing.session.knowledge.syncNow(syncId)Promise<void>Force an immediate re-sync of a connected source.session.knowledge.pauseSync(syncId)Promise<void>Pause re-syncing. Indexed data is preserved.session.knowledge.deleteSync(syncId)Promise<void>Delete a sync and remove all its indexed chunks.session.knowledge.listSyncs()Promise<KnowledgeSync[]>List all syncs for this session.session.knowledge.query({ query, collection, topK })Promise<KnowledgeResult[]>Semantic search. Returns top-K chunks by cosine similarity.session.knowledge.stats()Promise<KnowledgeStats>Aggregate stats: collection count, total chunks, storage in GB.session.knowledge.deleteCollection(name)Promise<void>Delete a collection and all its chunks.