Skip to main content

Deployment overview

The repository currently supports two delivery paths: a Docker Compose stack and an alpha Helm chart (data-plane only, not yet at parity). Compose pulls its topology from the active profile under config/profiles/ and secret files under deploy/secrets/. Helm uses Kubernetes values and Secret objects instead; it does not consume host files from deploy/secrets/.

When to use which

PathUse whenTrade-off
Docker ComposeSingle host, on-prem appliance, dev laptop, CI demoOne machine, no horizontal scale; the only path that boots end-to-end today.
Helm chart (core slice)You already run Kubernetes and want login + chat with an external LLM providerMemory, knowledge, audit, code-sandbox, MCPs, observability, firecrawl, and vllm are not templated yet — chat works, web search/RAG/code execution/audit trails do not. See Helm chart.

The Compose stack is the only surface validated by make pentest-stack, make smoke-gateway, and tests/smoke/test_compose_dev_lite.py. Helm is 0.2.0 in helm/aibox/Chart.yaml: the core slice (identity + chat with an external provider) deploys end-to-end, but it is not yet at Compose parity.

Topology

The gateway uses Redis for distributed rate limiting only when REDIS_URL is set; the shipped Compose profiles leave it unset, so the gateway falls back to an in-process limiter (services/gateway/main.go).

In the prod posture (with or without a GPU axis) everything except frontend:80 is closed off the host network and isolated onto an internal data network — see Compose modes and Single-port edge.

Outbound internet traffic can additionally be funneled through a single allowlist-enforced egress gateway (opt-in via AIBOX_COMPOSE_EGRESS=1, or the egress / egress_enforce inputs on the GitHub Actions deploy). It is observe-only by default — see Egress gateway.

Codex chat turns can optionally run inside a Firecracker microVM (opt-in via AIBOX_COMPOSE_CODEX_FIRECRACKER=1, the codex_firecracker overlay key in deploy/envs/config.<env>.toml, or aibox-ctl deploy --codex-firecracker). It is off by default and safe to enable — the backend falls back to the in-process Codex path when the host is unprovisioned or a guest boot fails.

Supported compose modes

scripts/compose.sh is the source of truth for posture mode names. Posture is orthogonal to the GPU axis: pick a posture, then select GPU support separately via AIBOX_GPU (off|single|multi|vision). The supported posture modes are:

FamilyModes
Developmentdev, dev-lite
Productionprod
Bootstrap/internalbase, install

GPU is selected on the orthogonal AIBOX_GPU axis (off|single|multi|vision) on the dev and prod postures — make up GPU=single|multi|vision, make up-prod GPU=single|multi|vision, aibox-ctl deploy --gpu single|multi|vision, or the gpu input on the GitHub Actions deploy. See Compose modes.

The dev-lite posture is configurable per service: make dev-select (or make up-lite) picks which services run as real (built locally), stub (lightweight stand-in), or off, driven by config/dev-bundles.toml groups and bundles. See Configurable local stack.

Profile matrix

Profiles live in config/profiles/*.toml and the active one is selected by writing its slug into config/profiles/.active (or via scripts/use-env.sh <profile>). make render-compose-env regenerates deploy/.compose.env from the active profile + secrets manifest.

There is one profile: single-tenant. It is the only profile shipped under config/profiles/.

ProfileTenancyInference egressVision model
single-tenantOne pinned tenant (tenancy.single_tenant_id, default default)Local-only out of the box — on first boot the inference-router seeds every role to the bundled vLLM model, and inference only leaves the box if an admin registers an external provider and allowlists it on the egress gatewayqwen3-vl (requires make up GPU=vision)

Customer/site-specific inputs live in deploy/envs/config.<env>.toml for the dev / demo / eg-prod environments and are consumed by deployment/CI automation. scripts/render-compose-env.py does not merge those files into deploy/.compose.env; it renders from the active profile plus secret files. The legacy production target was renamed to eg-prod in commit cfd3286d.

Configuration surface

SurfaceFilePurpose
Compose envdeploy/.compose.envGenerated from active profile + secret listings by make render-compose-env. Do not edit by hand.
Active profile pointerconfig/profiles/.activePlain text file containing the active profile slug.
Profile TOMLconfig/profiles/single-tenant.tomlTopology + feature flags.
Deploy env inputdeploy/envs/config.{dev,demo,eg-prod}.tomlEnvironment-specific deployment input; not merged by make render-compose-env.
Secrets manifestdeploy/secrets.manifest.tomlDeclares the 52 secret files and how to generate them.
Secrets directorydeploy/secrets/One file per secret, mode 0644, dir 0700.
Compose dispatcherscripts/compose.shSingle entry point that picks the right -f overlay set per mode.
Stack recorddeploy/aibox-stack.tomlWritten by make stack-record; used by make down/make status to recover the active mode.
  • Compose modes — every up-* target, every overlay file.
  • Helm chart — what works today, what's missing.
  • Single-port edge — how the prod posture collapses everything behind one TLS port.
  • Secretsmake ensure-secrets, escrow, rotation, sealing.
  • Multi-vLLM router — Gemma + Qwen behind OpenResty on one GPU.

Verify

make plan # dry-run the resolved compose mode (no containers touched)
make status # compare the running stack against its recorded mode
make health # readiness + liveness probes for every service

The recorded mode lives in deploy/aibox-stack.toml (printed by make stack-show). If make status reports drift, re-run the matching make up-* target — never docker compose up directly.


Verified against commit 5187b91e (2026-06-11) · sources 181b3bddde2d.