Skip to main content

Architecture

AIBox is a compose-first platform: a React frontend, a Go gateway, a Python and Go service mesh, shared identity libraries, a pluggable inference router, hash-chained audit, and optional integrations. This page is the hub — each subsystem has its own reference page linked at the bottom.

Request path

A browser submits a chat turn. The gateway validates the JWT, strips spoofable identity headers, stamps a signed principal plus a turn context, then proxies to agent-runtime. Participating services emit turn events to audit; audit seals the turn envelope and exposes receipts.

Owning files: services/gateway/main.go, services/gateway/internal/proxy/proxy.go, services/agent-runtime/src/agent_runtime/routes/chat.py, services/agent-runtime/src/agent_runtime/chat_handler/streaming_loop.py.

Core services

ServiceSourcePublic portInternal gRPCRole
Frontendfrontend/80React SPA: chat, agents, admin.
Gatewayservices/gateway/8080JWT validation, admin guard, principal/turn stamping, rate limit, transport split.
Agent Runtimeservices/agent-runtime/80019001Chat orchestration, tools, skills, MCP, approvals, sessions, conversations, notifications, replay.
Guardrailservices/guardrail/80029002Presidio-backed input/output safety, deidentification, turn verdicts.
Memoryservices/memory/80039003Typed Mem0 memories, prefetch, scope enforcement, memory turn events.
Inference Routerservices/inference-router/8004OpenAI-compatible router; multi-provider (Chat Completions + Responses adapters).
Code Sandboxservices/code-sandbox/80069006Docker execution, artifacts.
Knowledgeservices/knowledge/80079007Document RAG, wiki, audience/visibility policy, RAG turn events.
Auditservices/audit/80089008Hash-chain audit log, turn envelopes, receipts, proofs.
Observabilityservices/observability/8009First-party generation usage, prices, budgets, traces.
egauthservices/egauth/8010Corporate NTLM verifier and backward-compatible /login; 2FA moved to services/auth/.
Authservices/auth/8012Device pairing, mobile-push 2FA, login challenges, and optional password login compatibility.
Docs Sitedocs-site/3100 (dev)This documentation.

Supporting infrastructure: PostgreSQL, Redis, Qdrant, MinIO, SearXNG, Steel, Keycloak. Airbyte, Dify, and Cloudflare Tunnel are not part of the current stack — earlier docs referring to them are stale.

Transport split

USE_GRPC=true is the compose default (aiboxconfig.MustGet().Transport.UseGRPC). The gateway uses gRPC for routes whose proto contracts are full-fidelity and falls back to HTTP per-route when richer fields are needed. Per services/gateway/internal/proxy/proxy.go:

AreaDefault routeWhy
POST /v1/chat, POST /v1/chat/streamHTTP passthroughgRPC ChatRequest carries only one message string; HTTP supports multimodal content parts and SSE.
POST /v1/memory (store)gRPCMemoryService.Store has full-fidelity proto.
POST /v1/memory/searchHTTPgRPC proto drops created_at needed by the admin UI.
GET /v1/memory, PUT/DELETEHTTPDetail fields not on the proto.
POST /v1/guard/input, POST /v1/guard/outputgRPCGuardrailService.CheckInput / CheckOutput are full-fidelity in proto. Other guardrail routes such as deidentify and policy stay HTTP.
/v1/knowledge/searchHTTPProto omits scoring/rationale fields.
POST /v1/knowledge/wiki, GET /v1/knowledge/wiki/{topic}gRPCKnowledgeService.WikiWrite / KnowledgeService.WikiRead. Wiki list/delete and item-admin routes stay REST.
/v1/audit appendgRPCAuditService.Log.
/v1/admin/audit/*, /v1/receipts/*HTTPReceipts and proofs require richer JSON.
/v1/models, /v1/routesAlways HTTPInference router has no gRPC surface.
/v1/sandbox/*HTTP (prefix stripped)Reverse-proxy with http.StripPrefix("/v1/sandbox", ...).

The gRPC ports are HTTP_port + 1000 (httpToGrpcAddr in proxy.go).

Identity Layer

The gateway is the sole identity authority. It validates the inbound JWT against Keycloak (aibox realm) and any additional_issuers (e.g. egauth), then strips and re-stamps trusted identity headers (services/gateway/internal/middleware/auth.go, header names live in services/shared-identity/aibox_identity/):

  • X-Tenant-ID — pinned to tenancy.single_tenant_id; the gateway ignores the inbound JWT tenant claim and rejects a mismatched inbound X-Tenant-ID with 403.
  • X-User-ID, X-User-Email, X-User-Roles — from verified JWT claims.
  • X-Aibox-Principal — HMAC-signed canonical Principal (services/shared-identity/aibox_identity/principal.py), signed with AIBOX_PRINCIPAL_KEYS.
  • X-Aibox-Turn-Id, X-Aibox-Cap-Token — per-turn context, signed with AIBOX_CAPTOKEN_KEYS (services/shared-identity/aibox_identity/turnctx.py, gRPC metadata keys aibox-turn-id / aibox-cap-token).

Internal service-to-service calls swap the user bearer for a short-lived Keycloak service JWT (INTERNAL_AUTH_CLIENT_ID, audience aibox-internal). Inference-router's /v1/models and /v1/routes are HTTP-only, but they use the same gateway reverse-proxy token-swap path.

When auth is disabled (auth.enabled=false plus AIBOX_ALLOW_AUTH_DISABLED=true, dev only), the gateway synthesizes an anonymous principal. Production launch refuses to start without auth enabled, and refuses to start with auth.password.enabled but no INTERNAL_AUTH_CLIENT_SECRET (so mobile-push 2FA cannot silently downgrade).

Turn events and receipts

Every authenticated /v1/* request gets a fresh turn_id. Services emit TurnEvent records (proto/aibox/v1/turn_events.proto) to audit over gRPC (TurnEventService.Publish). Event types emitted today include turn_started, prompt_generated, model_invoked, model_response, rag_chunks_retrieved, memory_op, tool_called, tool_returned, guardrail_verdict, turn_failed, and turn_sealed. cap_token_issued / cap_token_rejected exist in the proto but are not emitted by the gateway today.

Audit aggregates the stream, computes a Merkle root, signs the envelope, and surfaces receipts:

  • GET /v1/receipts/{turn_id} — receipt summary (gateway-proxied).
  • GET /v1/receipts/{turn_id}/proof — full Merkle proof + signature.
  • GET /v1/audit/turns/{turn_id}/replay?tenant_id=... — forensic replay detail.
  • GET /v1/audit/turns/{turn_id}/artifacts/{event_id}/{kind}?tenant_id=... — captured artifact body.
  • POST /v1/audit/turns/{turn_id}/replay/live — live replay carved out to agent-runtime in proxy.go.

Verification checks the Merkle root, envelope signature, per-tenant chain anchor, and exported suffix. See Compliance Audit Trail.

Data stores

StoreUsed byNotes
PostgreSQLAudit, observability, Keycloak, inference-router provider + role registry, agent-runtime metadataPer-service schema.
RedisSessions, gateway rate limiter, MFA pending bundle, MCP secrets cacheREDIS_URL gates Redis-backed limiter.
QdrantMemory vectors (COLLECTION_PREFIX=mem0), knowledge dense+sparse vectorsDim mismatch triggers recreate.
MinIOWiki Markdown, document originals, artifactsObject store.

Inference router

The router persists provider records in PostgreSQL (INFERENCE_DATABASE_URL) and serves a lock-free atomic snapshot refreshed every 30s. Two adapter kinds are wired today (services/inference-router/internal/providers/):

  • chat_completions — pass-through OpenAI-style Chat Completions.
  • responses — Responses-API adapter that translates back to Chat Completions on the wire.

Operator CRUD lives at /v1/internal/providers and is exposed through the gateway at /v1/admin/inference/* behind admin guard. Registration normalizes the operator-supplied base_url (whitespace/trailing-slash cleanup, https default for a missing scheme, pasted endpoint suffixes stripped) and probes GET {base_url}/models before persisting; adapter clients never follow upstream redirects — a redirecting base_url fails registration (and the data path) with an error naming the redirect target, since following would re-issue generation POSTs as body-less GETs. Multi-vLLM deployments use the vllm-router overlay (submodules/aibox-vllm/compose/compose.gpu-multi.yml) which front-ends multiple vLLM containers behind a single OpenAI-compatible base URL.

Model selection is runtime, not deploy-time. The router persists a model_roles table mapping canonical role aliases (default, title, classifier, memory, profile_curation, vision, guardrail, knowledge) to concrete models; LLM consumers send a role alias and the proxy expands role → model at request time; the per-model name knobs were removed from profile config. Roles are managed via /v1/admin/inference/roles and the Inference admin workspace. On first boot a reachability-aware bootstrap seeds roles to the bundled local vLLM model if its backend is healthy, else to an external provider that has a key, else leaves them for the admin.

User-facing inference errors carry a stable {"error","code"} envelope (codes such as provider_key_missing, role_not_configured, role_no_backend, model_unknown, backend_unavailable); agent-runtime forwards the code in the SSE error event so the chat UI renders audience-aware copy.

Deployment

aibox deploys two ways: locally via the make compose targets below, and through CI — the compose Deploy aibox workflow (deploy.yml) and the kube2x dev k8s workflow (deploy-k8s-dev.yml). Common make targets:

CommandPostureGPU axisUse
make updevoffFull dev stack.
make up-litedev-liten/a~5 GB RAM laptop stack, 6 stubs, no Qdrant/MinIO/SearXNG/Steel/egauth/docs-site.
make up GPU=singledevsingleSingle local vLLM.
make up GPU=multidevmultiGemma + Qwen vLLM behind the multi-vLLM router.
make up-prodprodoffProduction-mode compose (AIBOX_ENV=production).
make up-prod GPU=singleprodsingleProduction compose with a local vLLM (also GPU=multi).

Docs assume gateway-routed APIs unless a route is explicitly called out as a dev-only direct service port.

Network egress

Outbound internet traffic can be funneled through a single chokepoint: the optional egress-gateway, a hardened Squid forward proxy that follows the same pattern as the existing docker-socket proxies (cap_drop: ALL, no-new-privileges, tmpfs, memory limit, plus CAP_SETUID/CAP_SETGID so Squid can drop to its unprivileged user). It allows traffic by domain only — TLS SNI / HTTP CONNECT host, with no TLS interception, so payloads stay end-to-end encrypted to the real provider — and logs every allow/deny verdict.

It is off by default and opt-in per deploy. AIBOX_COMPOSE_EGRESS=1 (or the --egress deploy flag) stacks deploy/docker-compose.egress.yml — one overlay that adds the gateway, sets HTTP(S)_PROXY on every egressing service, and starts the egress-shipper (it tails the Squid access log and ships each allow/deny verdict into the signed audit log, surfaced in the admin Activity → Network egress tab). The allowlist source of truth is deploy/config/egress/allowlist.yaml, rendered to a Squid config by deploy/scripts/render-squid-conf.py. Two policies ship:

  • observe (default) — deploy/config/egress/squid.observe.conf: log-only, allows every destination, so enabling the gateway cannot break outbound traffic even with an incomplete allowlist.
  • enforce (EGRESS_ENFORCE=1 / --egress-enforce) — default-deny against the allowlist (deploy/config/egress/squid.conf).

The forced network flip (deploy/docker-compose.egress-forced.yml, which makes app internal: true so a service that ignores its proxy env has no route out) is a separate later step and is not part of the toggle. See Compose modes.

Codex microVM backend

Codex chat turns normally run in-process inside agent-runtime. They can optionally run inside a per-turn Firecracker microVM via the deploy/docker-compose.codex-firecracker.yml overlay, opt-in per deploy with AIBOX_COMPOSE_CODEX_FIRECRACKER=1 (or aibox-ctl deploy --codex-firecracker / the codex_firecracker overlay key). The overlay sets AIBOX_CODEX_FIRECRACKER=1, passes /dev/kvm through to agent-runtime, and bind-mounts the provisioned artifacts at /opt/aibox-fc. It is off by default and safe to enable: backend selection in services/agent-runtime/src/agent_runtime/codex_backend.py runs a preflight + /dev/kvm check and falls back to the in-process path on an unprovisioned host, and a runtime guest boot/transport failure also falls back in-process for that turn, so chat is never broken.


Verified against commit 0d6ee337 (2026-06-18) · sources 83bc72759a4b.