Skip to main content

Roadmap & ideas

A living view of where AIBox is heading. Items are anchored to actual code — either a recent commit that started the work, or a specific gap visible in the repo at commit a187d80f (2026-05-28).

Status keys:

  • In progress — code already on main, more work planned.
  • Planned — committed-to, design exists, no code yet.
  • Proposed — idea on the table; not committed-to.

Active workstreams

Pluggable inference providers (admin UI)

Status: In progress.

Anchor: commits 337765bf through b1b595e8 (Nov 2025–May 2026) added a Postgres-backed registry for inference providers, envelope encryption of API keys (499d58c9), admin CRUD endpoints, a one-shot seed from the legacy YAML, a 30-second refresh loop, and a React admin surface. Gateway proxies /v1/admin/inference/* to the inference-router (c4ca0a31).

What's left:

  • Provider-level health surfacing on the admin UI (latency, error rate).
  • Per-tenant provider scoping (today every provider is global).
  • Bring-your-own private endpoint support beyond OpenAI-compatible (Bedrock, Vertex native APIs).

Scalability + multi-instance

Status: In progress.

Anchor: feat/scalability branch merged at a187d80f (PR #47), preceded by 8438da5b initial modifications for replica creation and 3dce3d89 fix worker leak and recalibrate ram limits.

What's left:

  • HPA wiring in the Helm chart values is present but stateless services need verified soak under load (make load-test is the harness).
  • Stateful HA — Postgres, Redis, Qdrant, MinIO — is out of scope of the chart today and should be delegated to dedicated operators.
  • Multi-instance session affinity for the SSE/WebSocket paths still routes via Redis pub/sub; needs explicit documentation.

Multi-vLLM router

Status: Shipped (commit history under submodules/aibox-vllm/compose/compose.gpu-multi.yml).

See Multi-vLLM router.

Backlog:

  • Multi-GPU topologies (currently single-card-tuned defaults).
  • Vision routed through the same OpenResty layer (today vision is a separate vllm-vision container, not behind the router).

Mobile push 2FA

Status: In progress.

Anchor: 14a5dffa fix(2fa): harden enforcement — fail-closed prod, 6-digit codes, start cooldown, redis-backed stores and the merged PR #39 (b4a0c3ec Merge pull request #39 from egroup-labs/feat/2fa-egauth), plus 5f1919b1 fix(2fa): wire gateway MFA branch + mount SECRETS_MASTER_KEY into egauth.

Proposed next:

  • WebAuthn / passkey path alongside TOTP+push.
  • Step-up enforcement on admin-tier endpoints.

Egauth / auth split

Status: Shipped (Phase A–E).

Anchor: c79524c0 auth service + egauth /verify seam (Phases A-D) through 61dbd3d2 egauth slimdown — Phase E of the egauth/auth split.

Backlog:

  • Document the authegauth boundary in services/auth/ and services/egauth/ READMEs — only inline comments today.
  • Migrate non-NTLM identity flows that still live in egauth.

Dev-lite mode

Status: Shipped.

Anchor: 6727f3dd added dev lite deployment, hot-reload fix b89dfc11 fix hot reload for dev-lite, contract pinned by tests/smoke/test_compose_dev_lite.py. See Compose modes.

Proposed:

  • Document the stub contract for each aibox-stub role.
  • Stub mode for the vllm service so dev-lite GPU users can dry-run without burning CUDA memory.

Near-term planned

Helm chart → parity with Compose

Status: Planned.

Anchor: helm/aibox/Chart.yaml is 0.2.0 — the core slice (keycloak + init Jobs, gateway, agent-runtime + migrations, inference-router, guardrail, frontend, postgresql, redis) ships with the full internal-auth env surface and deploys to kube2x via .github/workflows/deploy-k8s-dev.yml. helm/README.md enumerates what is still missing: memory, knowledge, audit, code-sandbox, keystore, observability, firecrawl, egauth/auth, MCPs, vllm, backup/retention CronJobs, NetworkPolicies.

Concrete next steps:

  1. Port the remaining services as Deployments / StatefulSets / Jobs, keeping the two chart invariants (one secret at /run/secrets, compose-named Services).
  2. Bake built-in skills / agent definitions into the agent-runtime image (compose bind-mounts them; the chart currently mounts empty dirs).
  3. Re-add the backup / retention CronJobs dropped from the alpha chart.

Audit chain retention via Postgres partitioning

Status: Planned.

Anchor: row deletion breaks the audit hash chain, so retention tooling deliberately excludes audit tables — see the audit service schema (services/audit/src/audit_service/db.py) and the "Future work" section of the audit compliance docs. Requires Postgres partitioning.


Backup parity across stateful stores

Status: Planned.

Anchor: only Postgres has scripts/backup.sh coverage; Qdrant + MinIO require manual qdrant-snapshot / mc mirror. The alpha chart's backup CronJobs were removed in the 0.2.0 core-slice rebuild and have not returned yet (helm/README.md gap list).

Planned:

  • Re-add chart backup CronJobs (Postgres dump + opt-in Qdrant/MinIO S3 replication) in a parity pass.
  • Restore drill harness (scripts/restore.sh exists but isn't exercised by CI).

Vision routing inside the multi-vLLM router

Status: Planned.

Anchor: Today make up GPU=vision brings up vllm-vision (Qwen3-VL) as a sibling of the text vLLM. The multi-vLLM router (compose.gpu-multi.yml) does not front it. Folding vision into the same OpenResty layer would let llm.vision_model = qwen3-vl resolve through the same inference-router → router → vllm-vision chain.


Research / longer horizon

Self-improving skills

Source: NousResearch/hermes-agent.

After an agent completes a complex multi-step task, auto-extract a reusable skill from the trajectory. Our skills.py loads YAML/markdown skill files today; the extraction pipeline is what's missing.

Effort: Medium. Priority: High — directly compounds capability.


Full-text search across historical conversations within a tenant.

What we have: Sessions in Redis with 7-day TTL; archives written to session:{tenant}:{id}:archive during compaction but never indexed.

Effort: Medium (Postgres FTS or Qdrant embeddings + a tool). Priority: Medium.


Effort-adaptive model routing

Anchor: EffortLevel enum already exists on ChatRequest from the agent-runtime modernization, and the inference-router now supports pluggable providers (see workstreams above). What's missing is the wiring from effort → model selection.

Effort: Small. Priority: High once the admin UI for provider selection is settled.


Multi-channel gateway

Slack / email / Telegram / Discord adapters that normalise to our ChatRequest format.

What we have: Direct /webhook on agent-runtime. No gateway-level /v1/webhook, no channel adapters, no cross-channel session linking.

Effort: Large (each adapter is its own integration). Priority: Medium.


Checkpoint / resume on failure

Source: LangGraph.

Persist agent execution state per tool call so a workflow can resume from the last successful step. Deep integration with the OpenAI Agents SDK runner is required.

Effort: Large. Priority: Low — most tasks complete in seconds.


Agent-to-agent shared memory

Source: OpenClaw.

A session-scoped working memory the main agent and its delegated subagents read/write, so subagents don't have to receive the full conversation history.

Effort: Medium. Priority: Medium — addresses a known context-bloat weakness.


Training data pipeline

Source: NousResearch/hermes-agent.

Export structured trajectories from first-party observability + audit into JSONL for fine-tuning local models. Quality-filter on whether the user corrected the agent.

Effort: Medium. Priority: High for sovereignty — closes the loop between running and improving local models.


Heartbeat daemon

Source: OpenClaw.

Background process that watches data sources (email, calendar, RSS) and proactively creates ChatRequests without user prompting.

Effort: Large; depends on the multi-channel gateway. Priority: Low.


Summary

ItemStatusEffortPriority
Pluggable inference providers admin UIIn progressSmall (remaining)High
Scalability + multi-instanceIn progressMediumHigh
Multi-vLLM routerShipped
Mobile push 2FAIn progressSmall (remaining)High
egauth / auth splitShipped
Dev-lite modeShipped
Helm → parity with ComposePlannedLargeHigh
Audit chain retention via partitioningPlannedMediumMedium
Backup parity (Qdrant, MinIO) on composePlannedMediumMedium
Vision through multi-vLLM routerPlannedSmallMedium
Self-improving skillsProposedMediumHigh
Effort-adaptive routingProposedSmallHigh
Training data pipelineProposedMediumHigh
Session searchProposedMediumMedium
Shared agent memoryProposedMediumMedium
Multi-channel gatewayProposedLargeMedium
Checkpoint / resumeProposedLargeLow
Heartbeat daemonProposedLargeLow

Verified against commit f862a4f8 (2026-06-16) · sources fb31bc72ece3.