Features
This page lists the surfaces available after the default stack starts. Each item below ships in the default install, with links to the relevant tutorial or reference page.
Deployment and data locality
The default stack keeps storage, retrieval, audit, and tool execution local. Outbound traffic appears only when operators configure external model providers or integrations.
- Supported deployment targets. Docker Compose for single-host installs and development, plus an alpha Helm chart for Kubernetes data-plane pilots.
- No required telemetry. The default stack does not require SaaS telemetry. External model providers and outbound integrations are used only when an operator configures them.
- Tenant-partitioned data plane. The appliance serves one pinned tenant.
tenant_idis carried as the data-partition key across Qdrant collections, MinIO buckets, Postgres rows, and Redis sessions — it just has one value. See Tenant isolation. - Self-hosted and auditable. The full source runs on your own infrastructure and can be inspected and audited end to end.
Chat and agents
Each chat session uses one adaptive main agent. When a task needs isolation, the main agent delegates to scoped subagents that Codex spawns natively.
- One main agent, many tools. Search, scraping, knowledge, memory, workflow invocation, MCP-backed tools, and user-approved actions all live behind one runtime, alongside Codex-native subagents. Walk through it in the Chat tutorial.
- Codex-native subagents for focused work. Subagent types are defined as
Markdown files with YAML frontmatter under
deploy/config/subagents/— add a file to define a new type. The runtime surfaces those definitions to the main agent as an "Available subagents" instruction block, and Codex spawns the subagent itself via its multi-agent feature (gated for the agentic capability profile). See the Multi-Agent tutorial. SKILL.mdinstruction packs expose only name and description in the initial prompt; the full body is loaded when the model callsload_skill. See Skills reference.- MCP-backed tools. Hosted MCP servers under
mcp/plug into the agent with the same auth, audit, and tenant scoping as built-in tools. - SSE streaming for chat — token-by-token, with tool calls and routing metadata visible in the message timeline.
Memory and knowledge you control
Context that survives the session, scoped to the pinned tenant.
- Typed memories scoped by tenant, user, agent, and session, backed by
Mem0 + Qdrant. Four types:
user,feedback,project,reference. The runtime prefetches relevant notes before the model runs. See the Memory tutorial. - Contextual document RAG. Anthropic-style contextual chunking, hybrid search, citation-ready retrieval. See the Upload Documents tutorial and Knowledge reference.
- Markdown wiki. A versioned knowledge base with audience and visibility checks — not a generic vector dump.
Audit receipts and guardrail enforcement
Audit records are designed for external review and internal incident response.
- Hash-chained, HMAC-signed audit log. Tenant-scoped append-only rows, signed turn envelopes, receipt proofs, and offline verification. See Audit Trail.
- Layered guardrails. LLM-Guard scanners (prompt-injection, toxicity, PII, secrets) plus optional Constitutional AI, with tenant policy layered on top of the global policy. Try them in the Guardrails tutorial.
- OIDC identity. Keycloak with PKCE, mobile-push 2FA, and federation into your IdP. JWT validation at the gateway, signed propagation downstream. See Authentication.
- Scoped code sandboxes. Docker-isolated Python 3.12 with numpy, pandas, and matplotlib preloaded; optional GPU passthrough. Workspaces are keyed by tenant, user, and session, then reused until idle cleanup or admin destroy. Walk through it in the Run Code tutorial.
Inference routing and observability
Route requests between local and external providers without changing client API calls.
- OpenAI-compatible inference router. Callers send role aliases
(
default/title/classifier/memory/profile_curation/vision/guardrail/knowledge); the router resolves each role to a concrete model via its admin-configurablemodel_rolesand routes to local vLLM or an external provider such as OpenRouter. On first boot the router seeds every role to a reachable backend — the local vLLM model when it's running, otherwise an external provider that has a key — and admins reassign roles at runtime. Configure roles and providers in Model Configuration. - Multi-vLLM router.
make up GPU=multiruns multiple vLLM backends (Gemma, Qwen) behind OpenResty for per-model isolation on a single host. - First-party observability for trace inspection, latency, token usage, prices, budgets, and cost reporting — all on your own infrastructure. See Observability.
Operations
Operations are exposed through repeatable make / aibox-ctl commands and
admin pages for activity, agents, inference, integrations, governance, access, and
platform state.
- Three commands to running.
make ensure-secretsseedsdeploy/secrets/with fresh signing keys,make upstarts the stack,make healthconfirms every service is ready. - Configurable laptop stack.
make dev-select(ormake up-lite) picks which services run locally as real, stub, or off via an interactive picker, driven byconfig/dev-bundles.toml.make up-liteauto-bootstraps thedev-litedefault bundle (~5 GB RAM, 7 stubs). - Profile hot-swap.
make use-env PROFILE=<name>switches the active deploy environment without rebuilds. Deploy environments (dev,demo,eg-prod) live underdeploy/envs/. - Local GPU, opt-in.
make up GPU=single(orGPU=multifor the Gemma + Qwen router, orGPU=visionfor a local vision model, dev-only) enables local vLLM inference when you have a GPU and want inference to stay inside your network. - Admin console. A multi-tab UI at http://localhost/admin for activity, agents, inference, integrations, governance, access, and platform.
make healtheverywhere. CI-friendly readiness across the whole stack.
Next pages
- Quickstart — start the local stack and send the first chat.
- Architecture — what runs where, and why.
- Audit Trail — hash-chain checks, receipt proofs, and limits.
- Tenant isolation — how the single pinned tenant is carried end-to-end.
Verified against commit 5187b91e (2026-06-11) · sources 184dd77acc5e.