Claude Code-Style Agents: Sub-Agent Spawning and Skill Loading
Claude Code has a pattern we wanted to replicate: when a task is complex enough, the main agent spawns a focused sub-agent with a custom system prompt and a curated set of tools. The sub-agent does its work and returns results to the parent. This keeps the main agent's context clean and lets specialized work happen in isolation.
We built this into AI-in-a-Box's agent runtime using the OpenAI Agents SDK.
Current runtime model: This post is historical. The current implementation uses a main agent with a
Delegatetool and subagent definitions indeploy/config/subagents/*.md; the old YAML handoff team model has been removed. Use the Agents reference and Multi-Agent tutorial for current behavior.
The current Delegate tool
The current mechanism is the Delegate tool. The main agent calls it when a
task benefits from dedicated instructions rather than handling everything
inline. The tool takes a short description, a configured subagent_type, and
the prompt the child agent should execute.
The implementation has changed since the early prototype below. Today the tool shape is:
Delegate(description="short label", subagent_type="researcher", prompt="<task brief>")
The early prototype used a delegate_task helper that built subagents from
tool-category strings:
def create_delegate_tool(config, tenant_id, user_id, memory_client):
"""Create a tool that spawns sub-agents with custom system prompts."""
from agents import Runner
# Pre-build tool pools so sub-agents can select what they need
sandbox_tools = create_sandbox_tools(config.sandbox_url, tenant_id, ...)
tool_pools = {
"memory": create_memory_tools(memory_client, tenant_id, user_id),
"search": create_search_tools(config.search_url),
"code": sandbox_tools,
"knowledge": create_knowledge_tools(config.knowledge_url, tenant_id),
}
@function_tool
async def delegate_task(task: str, approach: str, tools_needed: str) -> str:
"""Delegate a complex task to a focused sub-agent."""
requested = [t.strip().lower() for t in tools_needed.split(",")]
sub_tools = []
for category in requested:
if category in tool_pools:
sub_tools.extend(tool_pools[category])
sub_agent = Agent(
name="sub-agent",
instructions=approach,
model=model,
tools=sub_tools,
)
result = await Runner.run(sub_agent, task)
return result.final_output or "Sub-agent completed but produced no output."
return [delegate_task]
The current implementation gets its tool allowlists from
deploy/config/subagents/*.md, not from free-form tool-category strings. A
research subagent can be configured with web and knowledge tools, while a code
subagent can be configured with sandbox tools. This limits the subagent's
blast radius and keeps its tool list focused.
The main agent's system prompt includes guidance on when to use delegation:
Use for: deep research requiring multiple searches and synthesis, complex data analysis pipelines, multi-file code projects, document drafting that needs iteration. Do NOT use for: simple questions, single tool calls, quick lookups, or tasks you can handle in one or two steps.
SKILL.md: on-demand skill loading
The second pattern we borrowed from Claude Code is progressive skill loading. Claude Code uses SKILL.md files with YAML frontmatter to define skills that can be loaded on demand. We adopted the same format, compatible with the agentskills.io specification:
---
name: summarizer
description: >
Use when the user asks to "summarize", "give me a tldr",
or "condense this document".
version: 1.0.0
---
# Document Summarizer
You are a summarization specialist...
The SkillLoader class scans a skills directory, parsing each SKILL.md file into a Skill dataclass:
@dataclass
class Skill:
name: str
description: str
instructions: str
version: str | None = None
def to_system_prompt(self) -> str:
return f"## Skill: {self.name}\n\n{self.instructions}"
The critical optimization: at agent creation time, only skill names and one-line descriptions are injected into the system prompt. The full instructions stay unloaded until the agent explicitly calls load_skill:
if skills:
skill_index = "\n".join(f"- **{s.name}**: {s.description}" for s in skills)
instructions += (
f"\n\n# Available skills\n\n"
f"Use load_skill to activate a skill before starting a task it covers.\n"
f"{skill_index}"
)
skill_map = {s.name: s for s in skills}
@function_tool
async def load_skill(skill_name: str) -> str:
"""Load a skill's full instructions."""
skill = skill_map.get(skill_name)
if not skill:
return f"Skill '{skill_name}' not found."
return skill.to_system_prompt()
This saves context window space. An agent with 20 available skills only pays for 20 one-line descriptions in its system prompt, not 20 full instruction sets. When the agent recognizes a task that matches a skill, it loads just that skill's instructions on demand.
Skill precedence and import (planned)
Skills will be layered with clear precedence: platform built-in skills as the base, tenant-custom skills extending or overriding them, and agent-specific skills taking highest priority. Currently, skills are loaded from the local filesystem. Database-backed skill registration and GitHub import are planned features that have not yet been implemented.
Current subagent model
The old hub-and-spoke model with an orchestrator and YAML handoffs was replaced by an adaptive main agent. The main agent receives the conversation, decides whether the work should stay inline or be delegated, and calls Delegate only for independent work that benefits from a focused context.
Subagent types are defined as Markdown files with YAML frontmatter in deploy/config/subagents/. The main agent sees the available types and writes the task brief itself. Subagents do not recursively spawn more subagents.
Lessons learned
Building these patterns taught us a few things:
Sub-agent isolation matters. Early versions gave sub-agents all tools. This led to confused sub-agents trying to use tools they did not need, wasting tokens on irrelevant tool descriptions. Curated tool pools solved this.
Skill descriptions need to be trigger-oriented. A description like "summarization tool" does not help the agent decide when to load the skill. A description like "Use when the user asks to summarize, give me a tldr, or condense this document" gives the agent clear matching criteria.
The OpenAI Agents SDK's Runner.run is synchronous from the caller's perspective. The parent agent blocks while the sub-agent executes. For long-running tasks, this means the user sees no streaming output until the sub-agent finishes. We are exploring ways to stream sub-agent progress back to the parent.