Features

This page lists the surfaces available after the default stack starts. Each item below ships in the default install, with links to the relevant tutorial or reference page.

Deployment and data locality

The default stack keeps storage, retrieval, audit, and tool execution local. Outbound traffic appears only when operators configure external model providers or integrations.

Supported deployment targets. Docker Compose for single-host installs and development, plus an alpha Helm chart for Kubernetes data-plane pilots.
No required telemetry. The default stack does not require SaaS telemetry. External model providers and outbound integrations are used only when an operator configures them.
Tenant-partitioned data plane. The appliance serves one pinned tenant. tenant_id is carried as the data-partition key across Qdrant collections, MinIO buckets, Postgres rows, and Redis sessions — it just has one value. See Tenant isolation.
Self-hosted and auditable. The full source runs on your own infrastructure and can be inspected and audited end to end.

Chat and agents

Each chat session uses one adaptive main agent. When a task needs isolation, the main agent delegates to scoped subagents that Codex spawns natively.

One main agent, many tools. Search, scraping, knowledge, memory, workflow invocation, MCP-backed tools, and user-approved actions all live behind one runtime, alongside Codex-native subagents. Walk through it in the Chat tutorial.
Codex-native subagents for focused work. Subagent types are defined as Markdown files with YAML frontmatter under deploy/config/subagents/ — add a file to define a new type. The runtime surfaces those definitions to the main agent as an "Available subagents" instruction block, and Codex spawns the subagent itself via its multi-agent feature (gated for the agentic capability profile). See the Multi-Agent tutorial.
SKILL.md instruction packs expose only name and description in the initial prompt; the full body is loaded when the model calls load_skill. See Skills reference.
MCP-backed tools. Hosted MCP servers under mcp/ plug into the agent with the same auth, audit, and tenant scoping as built-in tools.
SSE streaming for chat — token-by-token, with tool calls and routing metadata visible in the message timeline.

Memory and knowledge you control

Context that survives the session, scoped to the pinned tenant.

Typed memories scoped by tenant, user, agent, and session, backed by Mem0 + Qdrant. Four types: user, feedback, project, reference. The runtime prefetches relevant notes before the model runs. See the Memory tutorial.
Contextual document RAG. Anthropic-style contextual chunking, hybrid search, citation-ready retrieval. See the Upload Documents tutorial and Knowledge reference.
Markdown wiki. A versioned knowledge base with audience and visibility checks — not a generic vector dump.

Audit receipts and guardrail enforcement

Audit records are designed for external review and internal incident response.

Hash-chained, HMAC-signed audit log. Tenant-scoped append-only rows, signed turn envelopes, receipt proofs, and offline verification. See Audit Trail.
Layered guardrails. LLM-Guard scanners (prompt-injection, toxicity, PII, secrets) plus optional Constitutional AI, with tenant policy layered on top of the global policy. Try them in the Guardrails tutorial.
OIDC identity. Keycloak with PKCE, mobile-push 2FA, and federation into your IdP. JWT validation at the gateway, signed propagation downstream. See Authentication.
Scoped code sandboxes. Docker-isolated Python 3.12 with numpy, pandas, and matplotlib preloaded; optional GPU passthrough. Workspaces are keyed by tenant, user, and session, then reused until idle cleanup or admin destroy. Walk through it in the Run Code tutorial.

Inference routing and observability

Route requests between local and external providers without changing client API calls.

OpenAI-compatible inference router. Callers send role aliases (default/title/classifier/memory/profile_curation/vision/guardrail/knowledge); the router resolves each role to a concrete model via its admin-configurable model_roles and routes to local vLLM or an external provider such as OpenRouter. On first boot the router seeds every role to a reachable backend — the local vLLM model when it's running, otherwise an external provider that has a key — and admins reassign roles at runtime. Configure roles and providers in Model Configuration.
Multi-vLLM router. make up GPU=multi runs multiple vLLM backends (Gemma, Qwen) behind OpenResty for per-model isolation on a single host.
First-party observability for trace inspection, latency, token usage, prices, budgets, and cost reporting — all on your own infrastructure. See Observability.

Operations

Operations are exposed through repeatable make / aibox-ctl commands and admin pages for activity, agents, inference, integrations, governance, access, and platform state.

Three commands to running. make ensure-secrets seeds deploy/secrets/ with fresh signing keys, make up starts the stack, make health confirms every service is ready.
Configurable laptop stack. make dev-select (or make up-lite) picks which services run locally as real, stub, or off via an interactive picker, driven by config/dev-bundles.toml. make up-lite auto-bootstraps the dev-lite default bundle (~5 GB RAM, 7 stubs).
Profile hot-swap. make use-env PROFILE=<name> switches the active deploy environment without rebuilds. Deploy environments (dev, demo, eg-prod) live under deploy/envs/.
Local GPU, opt-in. make up GPU=single (or GPU=multi for the Gemma + Qwen router, or GPU=vision for a local vision model, dev-only) enables local vLLM inference when you have a GPU and want inference to stay inside your network.
Admin console. A multi-tab UI at http://localhost/admin for activity, agents, inference, integrations, governance, access, and platform.
make health everywhere. CI-friendly readiness across the whole stack.

Quickstart — start the local stack and send the first chat.
Architecture — what runs where, and why.
Audit Trail — hash-chain checks, receipt proofs, and limits.
Tenant isolation — how the single pinned tenant is carried end-to-end.

Verified against commit 5187b91e (2026-06-11) · sources 184dd77acc5e.

Deployment and data locality​

Chat and agents​

Memory and knowledge you control​

Audit receipts and guardrail enforcement​

Inference routing and observability​

Operations​

Next pages​