Knowledge

Per-user managed vector store. Upload documents, sync external sources, and give agents semantic search over user-specific knowledge — all without managing embeddings or pgvector yourself.

Knowledge is scoped to a session. Each user's documents are stored in an isolated collection and never mix across sessions. Chunking uses 512-token windows with 50-token overlap, embedded with text-embedding-3-small and indexed with pgvector HNSW.

Uploading documents

Upload one or more files to a named collection. Theazo chunks, embeds, and indexes them automatically. Supported formats: PDF, TXT, MD, DOCX, HTML, CSV.

upload.ts

import { Theazo } from class="cb-str">'theazo'
import { readFile } from class="cb-str">'fs/promises'

const theazo = new Theazo({ apiKey: class="cb-str">'th_live_...' })
const session = await theazo.sessions.forUser(class="cb-str">'user_123')

// Upload from Buffer
const pdfBuffer = await readFile(class="cb-str">'./company-docs.pdf')

const result = await session.knowledge.upload({
  files: [
    { name: class="cb-str">'company-docs.pdf', content: pdfBuffer, type: class="cb-str">'application/pdf' },
    { name: class="cb-str">'faq.md', content: Buffer.from(faqMarkdown), type: class="cb-str">'text/markdown' },
  ],
  collection: class="cb-str">'company-knowledge',
})

console.log(result.collection)  class="cb-cmt">// 'company-knowledge'
console.log(result.chunks)      class="cb-cmt">// 342  — total chunks indexed
console.log(result.sources)     class="cb-cmt">// 2    — files processed

Upload from URL

await session.knowledge.upload({
  files: [
    {
      name:   class="cb-str">'product-guide.pdf',
      url:    class="cb-str">'https://example.com/docs/product-guide.pdf'cs/product-guide.pdf',
      type:   class="cb-str">'application/pdf',
    },
  ],
  collection: class="cb-str">'product-docs',
})

Syncing external sources

Connect live data sources. Theazo fetches and re-indexes content on your sync schedule, keeping the vector store fresh without manual uploads.

Notion

const sync = await session.knowledge.sync({
  source: {
    type:   class="cb-str">'notion',
    config: {
      token:      process.env.NOTION_TOKEN,
      databaseId: class="cb-str">'abc123def456',
    },
  },
  collection:    class="cb-str">'user-notes',
  syncSchedule:  class="cb-str">'0 */6 * * *',   class="cb-cmt">// re-sync every 6 hours
})

console.log(sync.id)          class="cb-cmt">// 'ksync_...'
console.log(sync.status)      class="cb-cmt">// 'syncing'
console.log(sync.chunks)      class="cb-cmt">// 0 (updates when sync completes)

Web pages / sitemaps

await session.knowledge.sync({
  source: {
    type:   class="cb-str">'url',
    config: {
      urls:       [class="cb-str">'https://docs.acme.com/sitemap.xml'acme.com/sitemap.xml'],
      recursive:  true,
      maxDepth:   class="cb-num">3,
    },
  },
  collection:   class="cb-str">'acme-docs',
  syncSchedule: class="cb-str">'0 2 * * *',   class="cb-cmt">// re-sync nightly at 2am
})

Managing syncs

// List all syncs for a session
const syncs = await session.knowledge.listSyncs()

// Force an immediate sync
await session.knowledge.syncNow(class="cb-str">'ksync_abc123')

// Pause a sync (keeps indexed data, stops re-fetching)
await session.knowledge.pauseSync(class="cb-str">'ksync_abc123')

// Delete a sync and its indexed data
await session.knowledge.deleteSync(class="cb-str">'ksync_abc123')

Querying knowledge

Run a semantic search against a collection. Returns the top-K most relevant chunks with scores and source metadata.

query.ts

const results = await session.knowledge.query({
  query:      class="cb-str">'What is our refund policy for enterprise customers?',
  collection: class="cb-str">'company-knowledge',
  topK:       class="cb-num">5,
})

for (const result of results) {
  console.log(result.content)   class="cb-cmt">// chunk text
  console.log(result.score)     class="cb-cmt">// 0.0–1.0 cosine similarity
  console.log(result.source)    class="cb-cmt">// { type: 'upload', name: 'company-docs.pdf', page: 12 }
  console.log(result.chunkId)   class="cb-cmt">// 'chunk_...'
}

// Result shape:
// [
//   {
//     chunkId:  'chunk_001',
//     content:  'Enterprise refunds are processed within 5 business days...',
//     score:    0.91,
//     source:   { type: 'upload', name: 'company-docs.pdf', page: 12 },
//   },
//   ...
// ]

Collection stats

const stats = await session.knowledge.stats()

console.log(stats.collections)   class="cb-cmt">// 3
console.log(stats.totalChunks)   class="cb-cmt">// 8432
console.log(stats.storageGB)     class="cb-cmt">// 0.42

// Per-collection breakdown:
// stats.byCollection = [
//   {
//     name:       'company-knowledge',
//     chunks:     2341,
//     sources:    4,
//     storageGB:  0.18,
//     lastSync:   '2024-01-15T08:00:00Z',
//   },
//   ...
// ]

Agents with knowledge

Enable knowledge access when creating an agent. The agent can then search the session's knowledge collections using the built-in knowledge_search tool.

agent-with-knowledge.ts

import { Theazo } from class="cb-str">'theazo'

const theazo = new Theazo({ apiKey: class="cb-str">'th_live_...' })
const session = await theazo.sessions.forUser(class="cb-str">'user_123')

// First, make sure knowledge is indexed
await session.knowledge.upload({
  files: [{ name: class="cb-str">'handbook.pdf', content: handbookBuffer, type: class="cb-str">'application/pdf' }],
  collection: class="cb-str">'company-knowledge',
})

// Create agent with knowledge access
const agent = await session.agents.create({
  name:      class="cb-str">'support-agent',
  knowledge: true,   class="cb-cmt">// enables knowledge_search tool automatically
  class="cb-cmt">// Optionally restrict to specific collections:
  class="cb-cmt">// knowledgeCollections: ['company-knowledge'],
})

// Agent can now answer questions grounded in your documents
const result = await agent.run(
  class="cb-str">'What is the maximum file size we support for CSV uploads?'
)

console.log(result.output)
// "According to the technical specifications (p. 8 of company-docs.pdf),
//  the maximum CSV upload size is 500MB per file..."
console.log(result.cost)  class="cb-cmt">// { amount: 28, currency: 'usd' }

The agent uses knowledge_search automatically — you do not need to configure it as a tool explicitly. It searches across all collections in the session by default, or only the ones specified in knowledgeCollections.

Chunking and embedding details

Chunk size512 tokens per chunk

Overlap50 tokens between adjacent chunks to preserve context across boundaries

Embedding modelOpenAI text-embedding-3-small (1536 dimensions)

Index typepgvector HNSW (not IVFFlat) — better recall at query time

Distance metricCosine similarity. Scores range 0.0–1.0, higher is more relevant.

API reference

session.knowledge.upload({ files, collection })Promise<UploadResult>Upload and index documents. Returns chunk count and source count.

session.knowledge.sync({ source, collection, syncSchedule })Promise<KnowledgeSync>Connect a live source (notion, url) with automatic re-indexing.

session.knowledge.syncNow(syncId)Promise<void>Force an immediate re-sync of a connected source.

session.knowledge.pauseSync(syncId)Promise<void>Pause re-syncing. Indexed data is preserved.

session.knowledge.deleteSync(syncId)Promise<void>Delete a sync and remove all its indexed chunks.

session.knowledge.listSyncs()Promise<KnowledgeSync[]>List all syncs for this session.

session.knowledge.query({ query, collection, topK })Promise<KnowledgeResult[]>Semantic search. Returns top-K chunks by cosine similarity.

session.knowledge.stats()Promise<KnowledgeStats>Aggregate stats: collection count, total chunks, storage in GB.

session.knowledge.deleteCollection(name)Promise<void>Delete a collection and all its chunks.