Architecture
AIBox is a compose-first platform: a React frontend, a Go gateway, a Python and Go service mesh, shared identity libraries, a pluggable inference router, hash-chained audit, and optional integrations. This page is the hub — each subsystem has its own reference page linked at the bottom.
Request path
A browser submits a chat turn. The gateway validates the JWT, strips spoofable identity headers, stamps a signed principal plus a turn context, then proxies to agent-runtime. Participating services emit turn events to audit; audit seals the turn envelope and exposes receipts.
Owning files: services/gateway/main.go, services/gateway/internal/proxy/proxy.go, services/agent-runtime/src/agent_runtime/routes/chat.py, services/agent-runtime/src/agent_runtime/chat_handler/streaming_loop.py.
Core services
| Service | Source | Public port | Internal gRPC | Role |
|---|---|---|---|---|
| Frontend | frontend/ | 80 | — | React SPA: chat, agents, admin. |
| Gateway | services/gateway/ | 8080 | — | JWT validation, admin guard, principal/turn stamping, rate limit, transport split. |
| Agent Runtime | services/agent-runtime/ | 8001 | 9001 | Chat orchestration, tools, skills, MCP, approvals, sessions, conversations, notifications, replay. |
| Guardrail | services/guardrail/ | 8002 | 9002 | Presidio-backed input/output safety, deidentification, turn verdicts. |
| Memory | services/memory/ | 8003 | 9003 | Typed Mem0 memories, prefetch, scope enforcement, memory turn events. |
| Inference Router | services/inference-router/ | 8004 | — | OpenAI-compatible router; multi-provider (Chat Completions + Responses adapters). |
| Code Sandbox | services/code-sandbox/ | 8006 | 9006 | Docker execution, artifacts. |
| Knowledge | services/knowledge/ | 8007 | 9007 | Document RAG, wiki, audience/visibility policy, RAG turn events. |
| Audit | services/audit/ | 8008 | 9008 | Hash-chain audit log, turn envelopes, receipts, proofs. |
| Observability | services/observability/ | 8009 | — | First-party generation usage, prices, budgets, traces. |
| egauth | services/egauth/ | 8010 | — | Corporate NTLM verifier and backward-compatible /login; 2FA moved to services/auth/. |
| Auth | services/auth/ | 8012 | — | Device pairing, mobile-push 2FA, login challenges, and optional password login compatibility. |
| Docs Site | docs-site/ | 3100 (dev) | — | This documentation. |
Supporting infrastructure: PostgreSQL, Redis, Qdrant, MinIO, SearXNG, Steel, Keycloak. Airbyte, Dify, and Cloudflare Tunnel are not part of the current stack — earlier docs referring to them are stale.
Transport split
USE_GRPC=true is the compose default (aiboxconfig.MustGet().Transport.UseGRPC). The gateway uses gRPC for routes whose proto contracts are full-fidelity and falls back to HTTP per-route when richer fields are needed. Per services/gateway/internal/proxy/proxy.go:
| Area | Default route | Why |
|---|---|---|
POST /v1/chat, POST /v1/chat/stream | HTTP passthrough | gRPC ChatRequest carries only one message string; HTTP supports multimodal content parts and SSE. |
POST /v1/memory (store) | gRPC | MemoryService.Store has full-fidelity proto. |
POST /v1/memory/search | HTTP | gRPC proto drops created_at needed by the admin UI. |
GET /v1/memory, PUT/DELETE | HTTP | Detail fields not on the proto. |
POST /v1/guard/input, POST /v1/guard/output | gRPC | GuardrailService.CheckInput / CheckOutput are full-fidelity in proto. Other guardrail routes such as deidentify and policy stay HTTP. |
/v1/knowledge/search | HTTP | Proto omits scoring/rationale fields. |
POST /v1/knowledge/wiki, GET /v1/knowledge/wiki/{topic} | gRPC | KnowledgeService.WikiWrite / KnowledgeService.WikiRead. Wiki list/delete and item-admin routes stay REST. |
/v1/audit append | gRPC | AuditService.Log. |
/v1/admin/audit/*, /v1/receipts/* | HTTP | Receipts and proofs require richer JSON. |
/v1/models, /v1/routes | Always HTTP | Inference router has no gRPC surface. |
/v1/sandbox/* | HTTP (prefix stripped) | Reverse-proxy with http.StripPrefix("/v1/sandbox", ...). |
The gRPC ports are HTTP_port + 1000 (httpToGrpcAddr in proxy.go).
Identity Layer
The gateway is the sole identity authority. It validates the inbound JWT against Keycloak (aibox realm) and any additional_issuers (e.g. egauth), then strips and re-stamps trusted identity headers (services/gateway/internal/middleware/auth.go, header names live in services/shared-identity/aibox_identity/):
X-Tenant-ID— pinned totenancy.single_tenant_id; the gateway ignores the inbound JWTtenantclaim and rejects a mismatched inboundX-Tenant-IDwith403.X-User-ID,X-User-Email,X-User-Roles— from verified JWT claims.X-Aibox-Principal— HMAC-signed canonicalPrincipal(services/shared-identity/aibox_identity/principal.py), signed withAIBOX_PRINCIPAL_KEYS.X-Aibox-Turn-Id,X-Aibox-Cap-Token— per-turn context, signed withAIBOX_CAPTOKEN_KEYS(services/shared-identity/aibox_identity/turnctx.py, gRPC metadata keysaibox-turn-id/aibox-cap-token).
Internal service-to-service calls swap the user bearer for a short-lived
Keycloak service JWT (INTERNAL_AUTH_CLIENT_ID, audience aibox-internal).
Inference-router's /v1/models and /v1/routes are HTTP-only, but they use
the same gateway reverse-proxy token-swap path.
When auth is disabled (auth.enabled=false plus AIBOX_ALLOW_AUTH_DISABLED=true, dev only), the gateway synthesizes an anonymous principal. Production launch refuses to start without auth enabled, and refuses to start with auth.password.enabled but no INTERNAL_AUTH_CLIENT_SECRET (so mobile-push 2FA cannot silently downgrade).
Turn events and receipts
Every authenticated /v1/* request gets a fresh turn_id. Services emit TurnEvent records (proto/aibox/v1/turn_events.proto) to audit over gRPC (TurnEventService.Publish). Event types emitted today include turn_started, prompt_generated, model_invoked, model_response, rag_chunks_retrieved, memory_op, tool_called, tool_returned, guardrail_verdict, turn_failed, and turn_sealed. cap_token_issued / cap_token_rejected exist in the proto but are not emitted by the gateway today.
Audit aggregates the stream, computes a Merkle root, signs the envelope, and surfaces receipts:
GET /v1/receipts/{turn_id}— receipt summary (gateway-proxied).GET /v1/receipts/{turn_id}/proof— full Merkle proof + signature.GET /v1/audit/turns/{turn_id}/replay?tenant_id=...— forensic replay detail.GET /v1/audit/turns/{turn_id}/artifacts/{event_id}/{kind}?tenant_id=...— captured artifact body.POST /v1/audit/turns/{turn_id}/replay/live— live replay carved out to agent-runtime inproxy.go.
Verification checks the Merkle root, envelope signature, per-tenant chain anchor, and exported suffix. See Compliance Audit Trail.
Data stores
| Store | Used by | Notes |
|---|---|---|
| PostgreSQL | Audit, observability, Keycloak, inference-router provider + role registry, agent-runtime metadata | Per-service schema. |
| Redis | Sessions, gateway rate limiter, MFA pending bundle, MCP secrets cache | REDIS_URL gates Redis-backed limiter. |
| Qdrant | Memory vectors (COLLECTION_PREFIX=mem0), knowledge dense+sparse vectors | Dim mismatch triggers recreate. |
| MinIO | Wiki Markdown, document originals, artifacts | Object store. |
Inference router
The router persists provider records in PostgreSQL (INFERENCE_DATABASE_URL) and serves a lock-free atomic snapshot refreshed every 30s. Two adapter kinds are wired today (services/inference-router/internal/providers/):
chat_completions— pass-through OpenAI-style Chat Completions.responses— Responses-API adapter that translates back to Chat Completions on the wire.
Operator CRUD lives at /v1/internal/providers and is exposed through the gateway at /v1/admin/inference/* behind admin guard. Registration normalizes the operator-supplied base_url (whitespace/trailing-slash cleanup, https default for a missing scheme, pasted endpoint suffixes stripped) and probes GET {base_url}/models before persisting; adapter clients never follow upstream redirects — a redirecting base_url fails registration (and the data path) with an error naming the redirect target, since following would re-issue generation POSTs as body-less GETs. Multi-vLLM deployments use the vllm-router overlay (submodules/aibox-vllm/compose/compose.gpu-multi.yml) which front-ends multiple vLLM containers behind a single OpenAI-compatible base URL.
Model selection is runtime, not deploy-time. The router persists a model_roles table mapping canonical role aliases (default, title, classifier, memory, profile_curation, vision, guardrail, knowledge) to concrete models; LLM consumers send a role alias and the proxy expands role → model at request time; the per-model name knobs were removed from profile config. Roles are managed via /v1/admin/inference/roles and the Inference admin workspace. On first boot a reachability-aware bootstrap seeds roles to the bundled local vLLM model if its backend is healthy, else to an external provider that has a key, else leaves them for the admin.
User-facing inference errors carry a stable {"error","code"} envelope (codes such as provider_key_missing, role_not_configured, role_no_backend, model_unknown, backend_unavailable); agent-runtime forwards the code in the SSE error event so the chat UI renders audience-aware copy.
Deployment
aibox deploys two ways: locally via the make compose targets below, and through CI — the compose Deploy aibox workflow (deploy.yml) and the kube2x dev k8s workflow (deploy-k8s-dev.yml). Common make targets:
| Command | Posture | GPU axis | Use |
|---|---|---|---|
make up | dev | off | Full dev stack. |
make up-lite | dev-lite | n/a | ~5 GB RAM laptop stack, 6 stubs, no Qdrant/MinIO/SearXNG/Steel/egauth/docs-site. |
make up GPU=single | dev | single | Single local vLLM. |
make up GPU=multi | dev | multi | Gemma + Qwen vLLM behind the multi-vLLM router. |
make up-prod | prod | off | Production-mode compose (AIBOX_ENV=production). |
make up-prod GPU=single | prod | single | Production compose with a local vLLM (also GPU=multi). |
Docs assume gateway-routed APIs unless a route is explicitly called out as a dev-only direct service port.
Network egress
Outbound internet traffic can be funneled through a single chokepoint: the
optional egress-gateway, a hardened Squid forward proxy that follows the same
pattern as the existing docker-socket proxies (cap_drop: ALL,
no-new-privileges, tmpfs, memory limit, plus CAP_SETUID/CAP_SETGID so
Squid can drop to its unprivileged user). It allows traffic by domain only —
TLS SNI / HTTP CONNECT host, with no TLS interception, so payloads stay
end-to-end encrypted to the real provider — and logs every allow/deny verdict.
It is off by default and opt-in per deploy. AIBOX_COMPOSE_EGRESS=1 (or the
--egress deploy flag) stacks deploy/docker-compose.egress.yml — one overlay
that adds the gateway, sets HTTP(S)_PROXY on every egressing service, and starts
the egress-shipper (it tails the Squid access log and ships each allow/deny
verdict into the signed audit log, surfaced in the admin Activity → Network
egress tab). The allowlist source of truth is
deploy/config/egress/allowlist.yaml, rendered to a Squid config by
deploy/scripts/render-squid-conf.py. Two policies ship:
- observe (default) —
deploy/config/egress/squid.observe.conf: log-only, allows every destination, so enabling the gateway cannot break outbound traffic even with an incomplete allowlist. - enforce (
EGRESS_ENFORCE=1/--egress-enforce) — default-deny against the allowlist (deploy/config/egress/squid.conf).
The forced network flip (deploy/docker-compose.egress-forced.yml, which makes
app internal: true so a service that ignores its proxy env has no route out)
is a separate later step and is not part of the toggle. See
Compose modes.
Codex microVM backend
Codex chat turns normally run in-process inside agent-runtime. They can
optionally run inside a per-turn Firecracker microVM via the
deploy/docker-compose.codex-firecracker.yml overlay, opt-in per deploy with
AIBOX_COMPOSE_CODEX_FIRECRACKER=1 (or aibox-ctl deploy --codex-firecracker /
the codex_firecracker overlay key). The overlay sets AIBOX_CODEX_FIRECRACKER=1,
passes /dev/kvm through to agent-runtime, and bind-mounts the provisioned
artifacts at /opt/aibox-fc. It is off by default and safe to enable: backend
selection in services/agent-runtime/src/agent_runtime/codex_backend.py runs a
preflight + /dev/kvm check and falls back to the in-process path on an
unprovisioned host, and a runtime guest boot/transport failure also falls back
in-process for that turn, so chat is never broken.
Related
- API Overview
- Chat
- Agents
- Memory
- Knowledge & RAG
- Authentication
- Compliance Audit Trail
- Multi-Tenant Isolation
- Observability
- Guardrails
- Skills
- Code Sandbox
Verified against commit 0d6ee337 (2026-06-18) · sources 83bc72759a4b.