Roadmap & ideas
A living view of where AIBox is heading. Items are anchored to
actual code — either a recent commit that started the work, or a
specific gap visible in the repo at commit a187d80f (2026-05-28).
Status keys:
- In progress — code already on
main, more work planned. - Planned — committed-to, design exists, no code yet.
- Proposed — idea on the table; not committed-to.
Active workstreams
Pluggable inference providers (admin UI)
Status: In progress.
Anchor: commits 337765bf through b1b595e8 (Nov 2025–May 2026)
added a Postgres-backed registry for inference providers, envelope
encryption of API keys (499d58c9), admin CRUD endpoints, a one-shot
seed from the legacy YAML, a 30-second refresh loop, and a React admin
surface. Gateway proxies /v1/admin/inference/* to the inference-router
(c4ca0a31).
What's left:
- Provider-level health surfacing on the admin UI (latency, error rate).
- Per-tenant provider scoping (today every provider is global).
- Bring-your-own private endpoint support beyond OpenAI-compatible (Bedrock, Vertex native APIs).
Scalability + multi-instance
Status: In progress.
Anchor: feat/scalability branch merged at a187d80f (PR #47),
preceded by 8438da5b initial modifications for replica creation and
3dce3d89 fix worker leak and recalibrate ram limits.
What's left:
- HPA wiring in the Helm chart values is present but stateless services
need verified soak under load (
make load-testis the harness). - Stateful HA — Postgres, Redis, Qdrant, MinIO — is out of scope of the chart today and should be delegated to dedicated operators.
- Multi-instance session affinity for the SSE/WebSocket paths still routes via Redis pub/sub; needs explicit documentation.
Multi-vLLM router
Status: Shipped (commit history under submodules/aibox-vllm/compose/compose.gpu-multi.yml).
See Multi-vLLM router.
Backlog:
- Multi-GPU topologies (currently single-card-tuned defaults).
- Vision routed through the same OpenResty layer (today vision is a
separate
vllm-visioncontainer, not behind the router).
Mobile push 2FA
Status: In progress.
Anchor: 14a5dffa fix(2fa): harden enforcement — fail-closed prod, 6-digit codes, start cooldown, redis-backed stores and the merged PR #39
(b4a0c3ec Merge pull request #39 from egroup-labs/feat/2fa-egauth),
plus 5f1919b1 fix(2fa): wire gateway MFA branch + mount SECRETS_MASTER_KEY into egauth.
Proposed next:
- WebAuthn / passkey path alongside TOTP+push.
- Step-up enforcement on admin-tier endpoints.
Egauth / auth split
Status: Shipped (Phase A–E).
Anchor: c79524c0 auth service + egauth /verify seam (Phases A-D)
through 61dbd3d2 egauth slimdown — Phase E of the egauth/auth split.
Backlog:
- Document the
auth↔egauthboundary inservices/auth/andservices/egauth/READMEs — only inline comments today. - Migrate non-NTLM identity flows that still live in egauth.
Dev-lite mode
Status: Shipped.
Anchor: 6727f3dd added dev lite deployment, hot-reload fix
b89dfc11 fix hot reload for dev-lite, contract pinned by
tests/smoke/test_compose_dev_lite.py. See
Compose modes.
Proposed:
- Document the stub contract for each
aibox-stubrole. - Stub mode for the
vllmservice so dev-lite GPU users can dry-run without burning CUDA memory.
Near-term planned
Helm chart → parity with Compose
Status: Planned.
Anchor: helm/aibox/Chart.yaml is 0.2.0 — the core slice
(keycloak + init Jobs, gateway, agent-runtime + migrations,
inference-router, guardrail, frontend, postgresql, redis) ships with the
full internal-auth env surface and deploys to kube2x via
.github/workflows/deploy-k8s-dev.yml. helm/README.md enumerates what
is still missing: memory, knowledge, audit, code-sandbox, keystore,
observability, firecrawl, egauth/auth, MCPs, vllm, backup/retention
CronJobs, NetworkPolicies.
Concrete next steps:
- Port the remaining services as Deployments / StatefulSets / Jobs,
keeping the two chart invariants (one secret at
/run/secrets, compose-named Services). - Bake built-in skills / agent definitions into the agent-runtime image (compose bind-mounts them; the chart currently mounts empty dirs).
- Re-add the backup / retention CronJobs dropped from the alpha chart.
Audit chain retention via Postgres partitioning
Status: Planned.
Anchor: row deletion breaks the audit hash chain, so retention
tooling deliberately excludes audit tables — see the audit service
schema (services/audit/src/audit_service/db.py) and the "Future work"
section of the audit compliance docs. Requires Postgres partitioning.
Backup parity across stateful stores
Status: Planned.
Anchor: only Postgres has scripts/backup.sh coverage; Qdrant +
MinIO require manual qdrant-snapshot / mc mirror. The alpha chart's
backup CronJobs were removed in the 0.2.0 core-slice rebuild and have
not returned yet (helm/README.md gap list).
Planned:
- Re-add chart backup CronJobs (Postgres dump + opt-in Qdrant/MinIO S3 replication) in a parity pass.
- Restore drill harness (
scripts/restore.shexists but isn't exercised by CI).
Vision routing inside the multi-vLLM router
Status: Planned.
Anchor: Today make up GPU=vision brings up vllm-vision
(Qwen3-VL) as a sibling of the text vLLM. The multi-vLLM router
(compose.gpu-multi.yml) does not front it. Folding
vision into the same OpenResty layer would let
llm.vision_model = qwen3-vl resolve through the same
inference-router → router → vllm-vision chain.
Research / longer horizon
Self-improving skills
Source: NousResearch/hermes-agent.
After an agent completes a complex multi-step task, auto-extract a
reusable skill from the trajectory. Our skills.py loads YAML/markdown
skill files today; the extraction pipeline is what's missing.
Effort: Medium. Priority: High — directly compounds capability.
Session search
Full-text search across historical conversations within a tenant.
What we have: Sessions in Redis with 7-day TTL; archives written to
session:{tenant}:{id}:archive during compaction but never indexed.
Effort: Medium (Postgres FTS or Qdrant embeddings + a tool). Priority: Medium.
Effort-adaptive model routing
Anchor: EffortLevel enum already exists on ChatRequest from the
agent-runtime modernization, and the inference-router now supports
pluggable providers (see workstreams above). What's missing is the wiring
from effort → model selection.
Effort: Small. Priority: High once the admin UI for provider selection is settled.
Multi-channel gateway
Slack / email / Telegram / Discord adapters that normalise to our
ChatRequest format.
What we have: Direct /webhook on agent-runtime. No gateway-level
/v1/webhook, no channel adapters, no cross-channel session linking.
Effort: Large (each adapter is its own integration). Priority: Medium.
Checkpoint / resume on failure
Source: LangGraph.
Persist agent execution state per tool call so a workflow can resume from the last successful step. Deep integration with the OpenAI Agents SDK runner is required.
Effort: Large. Priority: Low — most tasks complete in seconds.
Agent-to-agent shared memory
Source: OpenClaw.
A session-scoped working memory the main agent and its delegated subagents read/write, so subagents don't have to receive the full conversation history.
Effort: Medium. Priority: Medium — addresses a known context-bloat weakness.
Training data pipeline
Source: NousResearch/hermes-agent.
Export structured trajectories from first-party observability + audit into JSONL for fine-tuning local models. Quality-filter on whether the user corrected the agent.
Effort: Medium. Priority: High for sovereignty — closes the loop between running and improving local models.
Heartbeat daemon
Source: OpenClaw.
Background process that watches data sources (email, calendar, RSS) and
proactively creates ChatRequests without user prompting.
Effort: Large; depends on the multi-channel gateway. Priority: Low.
Summary
| Item | Status | Effort | Priority |
|---|---|---|---|
| Pluggable inference providers admin UI | In progress | Small (remaining) | High |
| Scalability + multi-instance | In progress | Medium | High |
| Multi-vLLM router | Shipped | — | — |
| Mobile push 2FA | In progress | Small (remaining) | High |
| egauth / auth split | Shipped | — | — |
| Dev-lite mode | Shipped | — | — |
| Helm → parity with Compose | Planned | Large | High |
| Audit chain retention via partitioning | Planned | Medium | Medium |
| Backup parity (Qdrant, MinIO) on compose | Planned | Medium | Medium |
| Vision through multi-vLLM router | Planned | Small | Medium |
| Self-improving skills | Proposed | Medium | High |
| Effort-adaptive routing | Proposed | Small | High |
| Training data pipeline | Proposed | Medium | High |
| Session search | Proposed | Medium | Medium |
| Shared agent memory | Proposed | Medium | Medium |
| Multi-channel gateway | Proposed | Large | Medium |
| Checkpoint / resume | Proposed | Large | Low |
| Heartbeat daemon | Proposed | Large | Low |
Verified against commit f862a4f8 (2026-06-16) · sources fb31bc72ece3.