Deployment overview
The repository currently supports two delivery paths: a Docker Compose
stack and an alpha Helm chart (data-plane only, not yet at parity).
Compose pulls its topology from the active
profile under config/profiles/ and secret files under
deploy/secrets/. Helm uses Kubernetes values and Secret objects instead;
it does not consume host files from deploy/secrets/.
When to use which
| Path | Use when | Trade-off |
|---|---|---|
| Docker Compose | Single host, on-prem appliance, dev laptop, CI demo | One machine, no horizontal scale; the only path that boots end-to-end today. |
| Helm chart (core slice) | You already run Kubernetes and want login + chat with an external LLM provider | Memory, knowledge, audit, code-sandbox, MCPs, observability, firecrawl, and vllm are not templated yet — chat works, web search/RAG/code execution/audit trails do not. See Helm chart. |
The Compose stack is the only surface validated by make pentest-stack,
make smoke-gateway, and tests/smoke/test_compose_dev_lite.py. Helm is
0.2.0 in helm/aibox/Chart.yaml: the core slice (identity + chat with
an external provider) deploys end-to-end, but it is not yet at Compose
parity.
Topology
The gateway uses Redis for distributed rate limiting only when REDIS_URL is
set; the shipped Compose profiles leave it unset, so the gateway falls back to
an in-process limiter (services/gateway/main.go).
In the prod posture (with or without a GPU axis) everything except
frontend:80 is closed off the host network and isolated onto an internal
data network — see Compose modes and
Single-port edge.
Outbound internet traffic can additionally be funneled through a single
allowlist-enforced egress gateway (opt-in via AIBOX_COMPOSE_EGRESS=1, or the
egress / egress_enforce inputs on the GitHub Actions deploy). It is
observe-only by default — see Egress gateway.
Codex chat turns can optionally run inside a Firecracker microVM (opt-in via
AIBOX_COMPOSE_CODEX_FIRECRACKER=1, the codex_firecracker overlay key in
deploy/envs/config.<env>.toml, or aibox-ctl deploy --codex-firecracker). It
is off by default and safe to enable — the backend falls back to the in-process
Codex path when the host is unprovisioned or a guest boot fails.
Supported compose modes
scripts/compose.sh is the source of truth for posture mode names. Posture is
orthogonal to the GPU axis: pick a posture, then select GPU support separately
via AIBOX_GPU (off|single|multi|vision). The supported posture modes are:
| Family | Modes |
|---|---|
| Development | dev, dev-lite |
| Production | prod |
| Bootstrap/internal | base, install |
GPU is selected on the orthogonal AIBOX_GPU axis (off|single|multi|vision)
on the dev and prod postures — make up GPU=single|multi|vision,
make up-prod GPU=single|multi|vision, aibox-ctl deploy --gpu single|multi|vision,
or the gpu input on the GitHub Actions deploy. See Compose modes.
The dev-lite posture is configurable per service: make dev-select (or
make up-lite) picks which services run as real (built locally), stub
(lightweight stand-in), or off, driven by config/dev-bundles.toml groups and
bundles. See Configurable local stack.
Profile matrix
Profiles live in config/profiles/*.toml and the active one is selected
by writing its slug into config/profiles/.active (or via
scripts/use-env.sh <profile>). make render-compose-env regenerates
deploy/.compose.env from the active profile + secrets manifest.
There is one profile: single-tenant. It is the only profile shipped under
config/profiles/.
| Profile | Tenancy | Inference egress | Vision model |
|---|---|---|---|
single-tenant | One pinned tenant (tenancy.single_tenant_id, default default) | Local-only out of the box — on first boot the inference-router seeds every role to the bundled vLLM model, and inference only leaves the box if an admin registers an external provider and allowlists it on the egress gateway | qwen3-vl (requires make up GPU=vision) |
Customer/site-specific inputs live in deploy/envs/config.<env>.toml for the
dev / demo / eg-prod environments and are consumed by deployment/CI
automation. scripts/render-compose-env.py does not merge those files into
deploy/.compose.env; it renders from the active profile plus secret files.
The legacy production target was renamed to eg-prod in commit cfd3286d.
Configuration surface
| Surface | File | Purpose |
|---|---|---|
| Compose env | deploy/.compose.env | Generated from active profile + secret listings by make render-compose-env. Do not edit by hand. |
| Active profile pointer | config/profiles/.active | Plain text file containing the active profile slug. |
| Profile TOML | config/profiles/single-tenant.toml | Topology + feature flags. |
| Deploy env input | deploy/envs/config.{dev,demo,eg-prod}.toml | Environment-specific deployment input; not merged by make render-compose-env. |
| Secrets manifest | deploy/secrets.manifest.toml | Declares the 52 secret files and how to generate them. |
| Secrets directory | deploy/secrets/ | One file per secret, mode 0644, dir 0700. |
| Compose dispatcher | scripts/compose.sh | Single entry point that picks the right -f overlay set per mode. |
| Stack record | deploy/aibox-stack.toml | Written by make stack-record; used by make down/make status to recover the active mode. |
Quick links
- Compose modes — every
up-*target, every overlay file. - Helm chart — what works today, what's missing.
- Single-port edge — how the
prodposture collapses everything behind one TLS port. - Secrets —
make ensure-secrets, escrow, rotation, sealing. - Multi-vLLM router — Gemma + Qwen behind OpenResty on one GPU.
Verify
make plan # dry-run the resolved compose mode (no containers touched)
make status # compare the running stack against its recorded mode
make health # readiness + liveness probes for every service
The recorded mode lives in deploy/aibox-stack.toml (printed by
make stack-show). If make status reports drift, re-run the matching
make up-* target — never docker compose up directly.
Verified against commit 5187b91e (2026-06-11) · sources 181b3bddde2d.