Architecture

AIBox is a compose-first platform: a React frontend, a Go gateway, a Python and Go service mesh, shared identity libraries, a pluggable inference router, hash-chained audit, and optional integrations. This page is the hub — each subsystem has its own reference page linked at the bottom.

Request path

A browser submits a chat turn. The gateway validates the JWT, strips spoofable identity headers, stamps a signed principal plus a turn context, then proxies to agent-runtime. Participating services emit turn events to audit; audit seals the turn envelope and exposes receipts.

Owning files: services/gateway/main.go, services/gateway/internal/proxy/proxy.go, services/agent-runtime/src/agent_runtime/routes/chat.py, services/agent-runtime/src/agent_runtime/chat_handler/streaming_loop.py.

Core services

Service	Source	Public port	Internal gRPC	Role
Frontend	`frontend/`	80	—	React SPA: chat, agents, admin.
Gateway	`services/gateway/`	8080	—	JWT validation, admin guard, principal/turn stamping, rate limit, transport split.
Agent Runtime	`services/agent-runtime/`	8001	9001	Chat orchestration, tools, skills, MCP, approvals, sessions, conversations, notifications, replay.
Guardrail	`services/guardrail/`	8002	9002	Presidio-backed input/output safety, deidentification, turn verdicts.
Memory	`services/memory/`	8003	9003	Typed Mem0 memories, prefetch, scope enforcement, memory turn events.
Inference Router	`services/inference-router/`	8004	—	OpenAI-compatible router; multi-provider (Chat Completions + Responses adapters).
Code Sandbox	`services/code-sandbox/`	8006	9006	Docker execution, artifacts.
Knowledge	`services/knowledge/`	8007	9007	Document RAG, wiki, audience/visibility policy, RAG turn events.
Audit	`services/audit/`	8008	9008	Hash-chain audit log, turn envelopes, receipts, proofs.
Observability	`services/observability/`	8009	—	First-party generation usage, prices, budgets, traces.
egauth	`services/egauth/`	8010	—	Corporate NTLM verifier and backward-compatible `/login`; 2FA moved to `services/auth/`.
Auth	`services/auth/`	8012	—	Device pairing, mobile-push 2FA, login challenges, and optional password login compatibility.
Docs Site	`docs-site/`	3100 (dev)	—	This documentation.

Supporting infrastructure: PostgreSQL, Redis, Qdrant, MinIO, SearXNG, Steel, Keycloak. Airbyte, Dify, and Cloudflare Tunnel are not part of the current stack — earlier docs referring to them are stale.

Transport split

USE_GRPC=true is the compose default (aiboxconfig.MustGet().Transport.UseGRPC). The gateway uses gRPC for routes whose proto contracts are full-fidelity and falls back to HTTP per-route when richer fields are needed. Per services/gateway/internal/proxy/proxy.go:

Area	Default route	Why
`POST /v1/chat`, `POST /v1/chat/stream`	HTTP passthrough	gRPC `ChatRequest` carries only one `message` string; HTTP supports multimodal content parts and SSE.
`POST /v1/memory` (store)	gRPC	`MemoryService.Store` has full-fidelity proto.
`POST /v1/memory/search`	HTTP	gRPC proto drops `created_at` needed by the admin UI.
`GET /v1/memory`, `PUT/DELETE`	HTTP	Detail fields not on the proto.
`POST /v1/guard/input`, `POST /v1/guard/output`	gRPC	`GuardrailService.CheckInput` / `CheckOutput` are full-fidelity in proto. Other guardrail routes such as deidentify and policy stay HTTP.
`/v1/knowledge/search`	HTTP	Proto omits scoring/rationale fields.
`POST /v1/knowledge/wiki`, `GET /v1/knowledge/wiki/{topic}`	gRPC	`KnowledgeService.WikiWrite` / `KnowledgeService.WikiRead`. Wiki list/delete and item-admin routes stay REST.
`/v1/audit` append	gRPC	`AuditService.Log`.
`/v1/admin/audit/`, `/v1/receipts/`	HTTP	Receipts and proofs require richer JSON.
`/v1/models`, `/v1/routes`	Always HTTP	Inference router has no gRPC surface.
`/v1/sandbox/*`	HTTP (prefix stripped)	Reverse-proxy with `http.StripPrefix("/v1/sandbox", ...)`.

The gRPC ports are HTTP_port + 1000 (httpToGrpcAddr in proxy.go).

Identity Layer

The gateway is the sole identity authority. It validates the inbound JWT against Keycloak (aibox realm) and any additional_issuers (e.g. egauth), then strips and re-stamps trusted identity headers (services/gateway/internal/middleware/auth.go, header names live in services/shared-identity/aibox_identity/):

X-Tenant-ID — pinned to tenancy.single_tenant_id; the gateway ignores the inbound JWT tenant claim and rejects a mismatched inbound X-Tenant-ID with 403.
X-User-ID, X-User-Email, X-User-Roles — from verified JWT claims.
X-Aibox-Principal — HMAC-signed canonical Principal (services/shared-identity/aibox_identity/principal.py), signed with AIBOX_PRINCIPAL_KEYS.
X-Aibox-Turn-Id, X-Aibox-Cap-Token — per-turn context, signed with AIBOX_CAPTOKEN_KEYS (services/shared-identity/aibox_identity/turnctx.py, gRPC metadata keys aibox-turn-id / aibox-cap-token).

Internal service-to-service calls swap the user bearer for a short-lived Keycloak service JWT (INTERNAL_AUTH_CLIENT_ID, audience aibox-internal). Inference-router's /v1/models and /v1/routes are HTTP-only, but they use the same gateway reverse-proxy token-swap path.

When auth is disabled (auth.enabled=false plus AIBOX_ALLOW_AUTH_DISABLED=true, dev only), the gateway synthesizes an anonymous principal. Production launch refuses to start without auth enabled, and refuses to start with auth.password.enabled but no INTERNAL_AUTH_CLIENT_SECRET (so mobile-push 2FA cannot silently downgrade).

Turn events and receipts

Every authenticated /v1/* request gets a fresh turn_id. Services emit TurnEvent records (proto/aibox/v1/turn_events.proto) to audit over gRPC (TurnEventService.Publish). Event types emitted today include turn_started, prompt_generated, model_invoked, model_response, rag_chunks_retrieved, memory_op, tool_called, tool_returned, guardrail_verdict, turn_failed, and turn_sealed. cap_token_issued / cap_token_rejected exist in the proto but are not emitted by the gateway today.

Audit aggregates the stream, computes a Merkle root, signs the envelope, and surfaces receipts:

GET /v1/receipts/{turn_id} — receipt summary (gateway-proxied).
GET /v1/receipts/{turn_id}/proof — full Merkle proof + signature.
GET /v1/audit/turns/{turn_id}/replay?tenant_id=... — forensic replay detail.
GET /v1/audit/turns/{turn_id}/artifacts/{event_id}/{kind}?tenant_id=... — captured artifact body.
POST /v1/audit/turns/{turn_id}/replay/live — live replay carved out to agent-runtime in proxy.go.

Verification checks the Merkle root, envelope signature, per-tenant chain anchor, and exported suffix. See Compliance Audit Trail.

Data stores

Store	Used by	Notes
PostgreSQL	Audit, observability, Keycloak, inference-router provider + role registry, agent-runtime metadata	Per-service schema.
Redis	Sessions, gateway rate limiter, MFA pending bundle, MCP secrets cache	`REDIS_URL` gates Redis-backed limiter.
Qdrant	Memory vectors (`COLLECTION_PREFIX=mem0`), knowledge dense+sparse vectors	Dim mismatch triggers recreate.
MinIO	Wiki Markdown, document originals, artifacts	Object store.

Inference router

The router persists provider records in PostgreSQL (INFERENCE_DATABASE_URL) and serves a lock-free atomic snapshot refreshed every 30s. Two adapter kinds are wired today (services/inference-router/internal/providers/):

chat_completions — pass-through OpenAI-style Chat Completions.
responses — Responses-API adapter that translates back to Chat Completions on the wire.

Operator CRUD lives at /v1/internal/providers and is exposed through the gateway at /v1/admin/inference/* behind admin guard. Registration normalizes the operator-supplied base_url (whitespace/trailing-slash cleanup, https default for a missing scheme, pasted endpoint suffixes stripped) and probes GET {base_url}/models before persisting; adapter clients never follow upstream redirects — a redirecting base_url fails registration (and the data path) with an error naming the redirect target, since following would re-issue generation POSTs as body-less GETs. Multi-vLLM deployments use the vllm-router overlay (submodules/aibox-vllm/compose/compose.gpu-multi.yml) which front-ends multiple vLLM containers behind a single OpenAI-compatible base URL.

Model selection is runtime, not deploy-time. The router persists a model_roles table mapping canonical role aliases (default, title, classifier, memory, profile_curation, vision, guardrail, knowledge) to concrete models; LLM consumers send a role alias and the proxy expands role → model at request time; the per-model name knobs were removed from profile config. Roles are managed via /v1/admin/inference/roles and the Inference admin workspace. On first boot a reachability-aware bootstrap seeds roles to the bundled local vLLM model if its backend is healthy, else to an external provider that has a key, else leaves them for the admin.

User-facing inference errors carry a stable {"error","code"} envelope (codes such as provider_key_missing, role_not_configured, role_no_backend, model_unknown, backend_unavailable); agent-runtime forwards the code in the SSE error event so the chat UI renders audience-aware copy.

Deployment

aibox deploys two ways: locally via the make compose targets below, and through CI — the compose Deploy aibox workflow (deploy.yml) and the kube2x dev k8s workflow (deploy-k8s-dev.yml). Common make targets:

Command	Posture	GPU axis	Use
`make up`	`dev`	`off`	Full dev stack.
`make up-lite`	`dev-lite`	n/a	~5 GB RAM laptop stack, 6 stubs, no Qdrant/MinIO/SearXNG/Steel/egauth/docs-site.
`make up GPU=single`	`dev`	`single`	Single local vLLM.
`make up GPU=multi`	`dev`	`multi`	Gemma + Qwen vLLM behind the multi-vLLM router.
`make up-prod`	`prod`	`off`	Production-mode compose (`AIBOX_ENV=production`).
`make up-prod GPU=single`	`prod`	`single`	Production compose with a local vLLM (also `GPU=multi`).

Docs assume gateway-routed APIs unless a route is explicitly called out as a dev-only direct service port.

Network egress

Outbound internet traffic can be funneled through a single chokepoint: the optional egress-gateway, a hardened Squid forward proxy that follows the same pattern as the existing docker-socket proxies (cap_drop: ALL, no-new-privileges, tmpfs, memory limit, plus CAP_SETUID/CAP_SETGID so Squid can drop to its unprivileged user). It allows traffic by domain only — TLS SNI / HTTP CONNECT host, with no TLS interception, so payloads stay end-to-end encrypted to the real provider — and logs every allow/deny verdict.

It is off by default and opt-in per deploy. AIBOX_COMPOSE_EGRESS=1 (or the --egress deploy flag) stacks deploy/docker-compose.egress.yml — one overlay that adds the gateway, sets HTTP(S)_PROXY on every egressing service, and starts the egress-shipper (it tails the Squid access log and ships each allow/deny verdict into the signed audit log, surfaced in the admin Activity → Network egress tab). The allowlist source of truth is deploy/config/egress/allowlist.yaml, rendered to a Squid config by deploy/scripts/render-squid-conf.py. Two policies ship:

observe (default) — deploy/config/egress/squid.observe.conf: log-only, allows every destination, so enabling the gateway cannot break outbound traffic even with an incomplete allowlist.
enforce (EGRESS_ENFORCE=1 / --egress-enforce) — default-deny against the allowlist (deploy/config/egress/squid.conf).

The forced network flip (deploy/docker-compose.egress-forced.yml, which makes app internal: true so a service that ignores its proxy env has no route out) is a separate later step and is not part of the toggle. See Compose modes.

Codex microVM backend

Codex chat turns normally run in-process inside agent-runtime. They can optionally run inside a per-turn Firecracker microVM via the deploy/docker-compose.codex-firecracker.yml overlay, opt-in per deploy with AIBOX_COMPOSE_CODEX_FIRECRACKER=1 (or aibox-ctl deploy --codex-firecracker / the codex_firecracker overlay key). The overlay sets AIBOX_CODEX_FIRECRACKER=1, passes /dev/kvm through to agent-runtime, and bind-mounts the provisioned artifacts at /opt/aibox-fc. It is off by default and safe to enable: backend selection in services/agent-runtime/src/agent_runtime/codex_backend.py runs a preflight + /dev/kvm check and falls back to the in-process path on an unprovisioned host, and a runtime guest boot/transport failure also falls back in-process for that turn, so chat is never broken.

Verified against commit 0d6ee337 (2026-06-18) · sources 83bc72759a4b.

Request path​

Core services​

Transport split​

Identity Layer​

Turn events and receipts​

Data stores​

Inference router​

Deployment​

Network egress​

Codex microVM backend​

Related​