Guardrails

Content filtering, PII detection, prompt injection protection, and output limits. Set platform-wide defaults or override per agent.

Platform defaults

Configure guardrails that apply to every agent by default. Individual agents can override these with stricter or more lenient settings.

configure-guardrails.ts

import { Theazo } from class="cb-str">'theazo'

const theazo = new Theazo({ apiKey: class="cb-str">'th_live_...' })

await theazo.guardrails.configure({
  contentFilter: class="cb-str">'moderate',      class="cb-cmt">// 'off' | 'light' | 'moderate' | 'strict'
  blockPII: true,                 class="cb-cmt">// Redact SSN, credit cards, phone numbers
  promptInjection: true,          class="cb-cmt">// Detect and block injection attempts
  allowedDomains: null,           class="cb-cmt">// null = allow all
  blockedDomains: [class="cb-str">'*.darkweb.com'],
  maxOutputTokens: class="cb-num">8192,
  maxToolCalls: class="cb-num">20,               class="cb-cmt">// Prevent infinite tool loops
})

Content filtering

Content filtering scans both model inputs and outputs for harmful content. Four levels are available:

offNo content filtering. Use only for internal testing.

lightBlock clearly harmful content (violence, illegal activity). Allow most business content.

moderateDefault. Block harmful content plus sensitive topics. Suitable for most production use.

strictMaximum filtering. Block anything potentially sensitive. Use for regulated industries.

PII detection

When blockPII is enabled, Theazo automatically redacts sensitive data from agent outputs before they reach your users:

Social Security Numbers (SSN)
Credit card numbers
Phone numbers
Email addresses (optional, configurable)
IP addresses
API keys and tokens

Redacted values are replaced with [REDACTED]. The original values are never stored in logs or returned to the client.

Prompt injection protection

When promptInjection is enabled, Theazo analyzes incoming user inputs for injection patterns before passing them to the model. Detected injections are blocked and logged as violations.

Prompt injection detection adds a small amount of latency (10-50ms) to each model call. For latency-sensitive applications, consider using light content filtering instead.

Domain allowlists

Control which domains agents can access with HTTP requests and web search:

// Only allow specific domains
await theazo.guardrails.configure({
  allowedDomains: [class="cb-str">'*.company.com', class="cb-str">'*.google.com', class="cb-str">'api.stripe.com'],
})

// Block specific domains (allow everything else)
await theazo.guardrails.configure({
  allowedDomains: null,  class="cb-cmt">// allow all
  blockedDomains: [class="cb-str">'*.competitor.com', class="cb-str">'*.darkweb.com'],
})

Per-agent overrides

Override platform defaults on individual agents. Per-agent settings merge with platform defaults — anything not specified falls back to the platform setting.

agent-guardrails.ts

const session = await theazo.sessions.forUser(class="cb-str">'user_123')

const agent = await session.agents.create({
  compute: class="cb-str">'python',
  guardrails: {
    contentFilter: class="cb-str">'strict',       class="cb-cmt">// Override: stricter than platform default
    blockPII: true,
    allowedDomains: [class="cb-str">'*.company.com'],  class="cb-cmt">// Override: more restrictive
    maxOutputTokens: class="cb-num">4096,              class="cb-cmt">// Override: smaller limit
  },
})

Viewing violations

Query guardrail violations to understand what content was blocked and why.

const violations = await theazo.guardrails.violations({
  period: class="cb-str">'last_7_days',
})

// violations = [
//   { id: 'gv_001', agentId: 'agt_abc', type: 'pii_detected',
//     message: 'SSN pattern detected in output', timestamp: '...' },
//   { id: 'gv_002', agentId: 'agt_xyz', type: 'prompt_injection',
//     message: 'Injection attempt blocked in user input', timestamp: '...' },
// ]

Get current configuration

const config = await theazo.guardrails.get()

console.log(config.contentFilter)   class="cb-cmt">// 'moderate'
console.log(config.blockPII)        class="cb-cmt">// true
console.log(config.maxToolCalls)    class="cb-cmt">// 20

API reference

theazo.guardrails.configure(opts)Promise<void>Set platform-wide guardrail defaults. Merges with existing config.

theazo.guardrails.get()Promise<GuardrailConfig>Get the current platform guardrail configuration.

theazo.guardrails.violations(filters?)Promise<Violation[]>List guardrail violations. Filter by time period.

REST endpoints

POST/v1/guardrailsConfigure platform guardrail defaults

GET/v1/guardrailsGet current guardrail configuration

GET/v1/guardrails/violationsList violations (?period=)