Skip to main content

Ideas & Research

A living document collecting ideas from SOTA research, competitor analysis, and internal brainstorming. Each idea is assessed for feasibility and prioritized. When an idea moves to implementation, it gets its own spec in docs/superpowers/specs/.


Self-Improving Skills (from Hermes Agent)

Source: NousResearch/hermes-agent

What it is: After an agent completes a complex multi-step task, automatically generate a reusable skill from the interaction trajectory. The skill captures the approach, tool sequence, and decision points so that next time a similar task arises, the agent uses the skill instead of reasoning from scratch.

How Hermes does it: The agent records every tool call and result as a "trajectory." After task completion, a summarization step compresses the trajectory into a procedural skill (markdown with YAML frontmatter). Skills are stored on disk, searchable, and shareable via a community Skills Hub.

What we already have: Our skills.py loads YAML/markdown skill files. Agents list available skills in their system prompt and call load_skill() on demand. But today, skills are hand-authored -- nobody writes them automatically.

What we'd build:

  1. After a successful multi-tool interaction (3+ tool calls), trigger a "skill extraction" step
  2. Send the trajectory (tool calls + results + final output) to the LLM with a prompt: "Extract a reusable skill from this interaction"
  3. Save the generated skill to the skills directory with proper frontmatter
  4. The skill is available to all agents on next request

Effort: Medium -- the skill loader already works, we just need the extraction pipeline.

Priority: High -- directly improves agent capability over time without manual intervention.


Session Search (from Hermes Agent)

Source: NousResearch/hermes-agent

What it is: Full-text search across all historical conversations, not just the current session. When the agent needs context from a past interaction, it can search across all sessions for a tenant.

How Hermes does it: Uses SQLite FTS5 (full-text search) with LLM-powered summarization. Conversations are indexed as they happen. Search returns relevant excerpts with timestamps.

What we already have: Sessions stored in Redis with 7-day TTL. During context compaction, full history is archived to session:{tenant}:{id}:archive. But there's no search -- archives are write-only.

What we'd build:

  1. On session archive (during compaction or session end), index messages into a searchable store (PostgreSQL full-text search or Qdrant embeddings)
  2. Add a session_search tool that agents can call: "Find past conversations about X"
  3. Returns relevant excerpts with session IDs and timestamps
  4. Scoped to tenant for isolation

Effort: Medium -- need a new index + tool, but the archive data already exists.

Priority: Medium -- valuable for long-running user relationships, less critical for short interactions.


Multi-Channel Gateway (from Hermes Agent / OpenClaw)

Source: hermes-agent, OpenClaw

What it is: A unified gateway that bridges the agent to multiple communication channels (Slack, email, Telegram, Discord, webhooks) with cross-platform conversation continuity.

How they do it:

  • Hermes: Single gateway process serves all channels. Messages from any platform go through the same agent loop. Conversations can span platforms (start on Slack, continue on CLI).
  • OpenClaw: Integration gateway normalizes signals from 20+ platforms into a common event format. Supports WhatsApp, Telegram, Slack, Discord, Gmail, CRM, calendar, and more.

What we already have: Agent-runtime has a direct /webhook endpoint that accepts webhook triggers from external systems. The gateway does not currently expose /v1/webhook, and there are no channel-specific adapters or cross-channel continuity features yet.

What we'd build:

  1. Channel adapters: Slack (slash commands + bot), email (IMAP/SMTP), Telegram (bot API), Discord (bot)
  2. Message normalization layer: convert platform-specific messages to our ChatRequest format
  3. Response formatting: convert agent output back to platform-specific format (Slack blocks, email HTML, etc.)
  4. Session linking: map platform user IDs to tenant/user IDs so conversations persist across channels
  5. Rate limiting per channel

Effort: Large -- each channel adapter is a distinct integration with its own auth, webhooks, and formatting.

Priority: Medium -- high value for teams that live in Slack/Teams, but the web UI covers the primary use case.


Checkpoint/Resume on Failure (from LangGraph)

Source: LangGraph

What it is: Persist agent state at every tool call so that if a workflow fails mid-execution, it can resume from the last successful step instead of starting over.

How LangGraph does it: The graph-based execution model naturally creates checkpoints at every node transition. State is serialized (MsgPack format) and stored in PostgreSQL. On resume, only the latest checkpoint loads (O(1) regardless of history length). Supports "time-travel debugging" -- replay from any checkpoint.

What we already have: No checkpointing. If an agent fails mid-workflow (e.g., network error during tool call 5 of 10), the entire request fails and the user has to start over. Session history persists, but the agent's internal execution state (which tools ran, what results came back, what the plan was) is lost.

What we'd build:

  1. Before each tool call, serialize the agent's execution state (messages so far, tool results, plan) to Redis or PostgreSQL
  2. On failure, return a ChatResult with type=ERROR_BACKEND and a checkpoint_id
  3. On resume (user sends a new message with the same session), detect the checkpoint and restore state
  4. The agent continues from where it left off

Effort: Large -- requires deep integration with the OpenAI Agents SDK's Runner internals, or building a custom execution wrapper.

Priority: Low -- most agent tasks complete in under a minute. Checkpoint/resume is mainly valuable for very long workflows (30+ minutes).


Effort-Adaptive Model Routing

Source: Claude Agent SDK effort levels, internal observation

What it is: Automatically select the model and reasoning depth based on task complexity, rather than using the same model for everything.

How Claude SDK does it: effort parameter (low/medium/high/max) controls reasoning depth per request. Lower effort = fewer tokens, faster responses.

What we already have: EffortLevel enum in ChatRequest (added in the modernization). The inference router supports multiple models (local vLLM + OpenRouter). But effort level isn't wired to model selection yet.

What we'd build:

  1. Map effort levels to model tiers:
    • low: fast local model, for example the configured local vLLM model, for simple questions
    • medium: default model (GPT-5.4 via OpenRouter) for general tasks
    • high: configured reasoning-capable external model for complex analysis
  2. Optional auto-detection: analyze the user's message to estimate complexity and select effort automatically
  3. The frontend exposes effort as a toggle (already designed but not wired)

Effort: Small -- the infrastructure exists, just need the routing logic.

Priority: High -- directly reduces cost and latency for simple queries while preserving quality for complex ones.


Heartbeat Daemon (from OpenClaw)

Source: OpenClaw

What it is: A background daemon that acts without user prompting. It monitors data sources (email, calendar, CRM), detects events, and proactively triggers agent actions.

How OpenClaw does it: A "heartbeat" process runs on a schedule, checking configured data sources for changes. When it detects something (new email, upcoming meeting, CRM update), it creates a task for the agent. The agent processes it and routes the response to the appropriate channel.

What we already have: Externally triggered webhook handling exists directly on agent-runtime, but not as a gateway-routed /v1/webhook endpoint. There is no internal daemon watching for events.

What we'd build:

  1. A background task in the agent-runtime that runs on a configurable schedule
  2. Checks configured event sources (could start with: email via IMAP, calendar via CalDAV, RSS feeds)
  3. On new events, creates a ChatRequest and processes it through the normal agent pipeline
  4. Sends results via the multi-channel gateway (Slack notification, email reply, etc.)

Effort: Large -- depends on multi-channel gateway being built first.

Priority: Low -- nice for "personal assistant" use cases but not core to the enterprise platform.


Agent-to-Agent Shared Memory (from OpenClaw)

Source: OpenClaw

What it is: A structured memory store that all agents in a team can read from and write to, enabling context sharing between specialist agents without passing the full conversation through every subagent delegation.

How OpenClaw does it: A "shared memory layer" sits in the agent core. When the main agent delegates to a specialist, the specialist can read relevant context from shared memory instead of receiving the entire conversation history. Results are written back to shared memory for other agents.

What we already have: Our memory service (Qdrant + Mem0) stores memories scoped to tenant + user. But subagent delegations via the Delegate tool pass a task brief only -- there's no structured "working memory" that the main agent and its subagents share during a multi-step workflow.

What we'd build:

  1. A session-scoped working memory (Redis hash) that agents read/write during a workflow
  2. When the main agent delegates to a subagent via the Delegate tool, it writes a structured brief to working memory: task, relevant context, constraints
  3. The subagent reads the brief instead of reconstructing context (saves tokens)
  4. On completion, the subagent writes its result to working memory
  5. The main agent reads the result and continues

Effort: Medium -- the memory service exists, this is a new scoping layer on top.

Priority: Medium -- reduces context bloat across subagent delegations, which is a known weakness.


Training Data Pipeline (from Hermes Agent)

Source: NousResearch/hermes-agent

What it is: Record agent interactions as structured training data for fine-tuning local models. Every conversation becomes a potential training example.

How Hermes does it: Trajectories (prompt + tool calls + results + final output) are captured in a standardized format. The Atropos RL system uses these for reinforcement learning. Trajectories can be compressed and filtered for quality before training.

What we already have: Langfuse logs all LLM calls with traces, spans, and costs. The audit service logs every tool call and result. But this data isn't structured for training -- it's structured for observability and compliance.

What we'd build:

  1. A trajectory exporter that reads from Langfuse traces and audit logs
  2. Formats interactions as training examples (system prompt + messages + tool calls in OpenAI format)
  3. Quality filtering: only export trajectories where the user didn't correct the agent
  4. Export to JSONL for fine-tuning (compatible with vLLM / Hugging Face training pipelines)
  5. Optional: RL reward signal from user feedback (thumbs up/down on responses)

Effort: Medium -- data already exists in Langfuse + audit, need the export pipeline.

Priority: High for sovereignty -- the whole point of self-hosted AI is eventually running your own fine-tuned models. This closes the loop.


Summary

IdeaSourceEffortPriorityDepends On
Self-improving skillsHermesMediumHighNothing
Effort-adaptive routingClaude SDKSmallHighNothing
Training data pipelineHermesMediumHighNothing
Session searchHermesMediumMediumNothing
Shared agent memoryOpenClawMediumMediumNothing
Multi-channel gatewayHermes/OpenClawLargeMediumNothing
Checkpoint/resumeLangGraphLargeLowNothing
Heartbeat daemonOpenClawLargeLowMulti-channel gateway