22 min read

How to Build Multi-Agent Systems with Claude Code

How to Build Multi-Agent Systems with Claude Code

I kept running into the same problem when building multi-agent systems with Claude Code. The official docs explain individual features well (subagents, skills, CLAUDE.md, headless mode), but they don’t explain how those features compose into a working system. Every time I started a new project, I was re-deriving the same architectural decisions from scratch.

So I built this reference guide. It pulls together the patterns I’ve found reliable across multiple production systems, validated against official Anthropic documentation. Every claim was checked: 50 total, 45 confirmed against official sources, 1 flagged as needing empirical testing, and 4 caught and corrected during the review process. The full verification log is in the appendix.

This is not a tutorial. It’s a reference you bookmark and Ctrl+F when you need it. The sections are organized by concern, not by workflow step.

Verification Scorecard (hover for details)
40
Confirmed
(official docs)
Skills in .claude/skills/ and ~/.claude/skills/
Subagents in .claude/agents/ and ~/.claude/agents/
Skills and subagents are complementary systems
Subagents cannot spawn subagents
Subagent priority: managed > CLI > project > user > plugin
Skills priority: enterprise > personal > project
Subagents have persistent memory across sessions
Memory scopes: user, project, local
First 200 lines of MEMORY.md injected into prompt
Read/Write/Edit auto-enabled for memory
Omitting tools: grants all tools
Subagents don't receive main conversation history
Parent receives only subagent's final message
Model options: sonnet, opus, haiku (+ full IDs); omit to inherit
Subagents load skills via skills: frontmatter
Preloaded skills get full content at startup
Skills auto-discover with live change detection
Skill descriptions loaded lazily in regular sessions
CLAUDE.md read at start of every session
CLAUDE.md survives compaction
CLAUDE.md lives at root or .claude/CLAUDE.md
-p flag runs headless (non-interactive)
--allowedTools uses permission rule syntax
Agent is the tool name for spawning subagents
--dangerously-skip-permissions is a real flag
--dangerously-skip-permissions carries security risk
Auto memory at ~/.claude/projects/<project>/memory/
Project root used for memory outside git repos
Cowork has no memory across sessions
Cowork desktop app must remain open
Scheduled tasks only run when awake + app open
Each scheduled task runs as its own session
Agent Teams are experimental
Hooks merge across all sources
MCP servers override: local > project > user
.claude/rules/ supports path-scoped instructions
Skills and commands have been merged
File-based subagents load project-level CLAUDE.md
claude -p with --allowedTools "Agent" spawns subagents
Subagent memory path ~/.claude/agent-memory/<name>/
5
Confirmed
(eng blog)
Orchestrator-workers is a recommended pattern
Subagent output to filesystem reduces info loss
Without detailed descriptions, agents duplicate work
Single agents ~4x tokens; multi-agent ~15x
Start simple, add complexity only when proven
1
Unverified
(needs testing)
Agent Teams use ~3-4x tokens of single session
4
Corrected
(caught in review)
Orchestrator should be a subagent (wrong: must be main session)
Skills priority: user lower than project (wrong: user is higher)
"Subagents do NOT receive CLAUDE.md" was overstated (now confirmed they do)
--dangerously-skip-permissions was unverified (now confirmed)
Sources: code.claude.com docs, Anthropic engineering blog, Claude Code changelog

Version 1.3, March 2026 (validated)

Scope

Claude Code CLICowork
Subagents, skills, CLAUDE.md, state managementYesYes (same agent harness, same filesystem)
Headless execution (claude -p, crontab)YesNo (use /schedule instead)
Cross-session persistence (auto memory, memory: frontmatter)YesNo (use context files instead)
Anthropic API / Agent SDKNot targetedNot targeted

SDK documentation was used to validate architectural claims but this guide does not cover API or SDK usage directly.


1. Core Architecture

A multi-agent system on Claude Code has five layers:

LayerWhatWhere
CLAUDE.mdOrchestration logic, always-on contextProject root
SubagentsSpecialist workers with isolated context.claude/agents/
SkillsShared knowledge and reusable workflows.claude/skills/
State filesCross-session coordination dataProject directory (e.g. state/)
TriggerCrontab, CI pipeline, or manual invocationExternal

The main session (defined by CLAUDE.md) is the orchestrator. It reads state, makes decisions, and spawns subagents. Subagents execute focused tasks and return results. Skills provide shared knowledge that any agent can load.

Hard constraint: subagents cannot spawn subagents

This is a platform limitation, not a design choice. If your workflow needs nested delegation, use skills or chain subagents from the main conversation. Never design an orchestrator as a subagent. It must be the main session.

The main session IS the orchestrator

CLAUDE.md defines the orchestrator’s identity, protocols, and decision-making rules. A single claude -p "Run." call reads CLAUDE.md and executes accordingly. No external scripting layer is needed between the trigger and the agent logic unless you have specific phase-separation requirements.


2. Subagents (.claude/agents/)

Locations and priority

ScopePathSharedPriority
User~/.claude/agents/NoLower
Project.claude/agents/Yes (version control)Higher

When the same name exists at multiple levels: managed > CLI flag > project > user > plugin.

Anatomy of a subagent

---
name: agent-name
description: When this agent should be invoked. Be specific.
tools: Read, Write, Edit, Bash, Glob, Grep
model: sonnet
memory: project
skills:
  - skill-one
  - skill-two
---

System prompt in markdown. This becomes the subagent's
core instruction set. It does not receive the main
session's conversation history, but it does load
project-level CLAUDE.md.

Key properties

  • Isolated context window. Subagents do not see the main conversation history. The SDK docs state the Agent tool’s prompt string is the only channel from parent to subagent. File-based subagents additionally load their own system prompt, project-level CLAUDE.md, any skills listed in their frontmatter, and basic environment details.
  • Scoped tool access. List only the tools the subagent needs. Omitting the tools: field grants access to all available tools. Be intentional.
  • Model routing. Use model: sonnet for most tasks (cost/capability balance), model: opus for complex reasoning, model: haiku for fast/cheap tasks. Omitting the model field inherits the parent session’s model. Full model IDs (e.g., claude-sonnet-4-6) are also accepted.
  • Return value. The parent receives only the subagent’s final message. Intermediate tool calls and reasoning stay inside the subagent.
  • Skills preloading. Subagents with skills: in their frontmatter receive the full skill content injected at startup. This is different from regular sessions, where skill descriptions are loaded but full content only loads on invocation. Subagent context budgets should account for the full size of preloaded skills.

Persistent memory

Subagents can maintain knowledge across sessions via the memory: frontmatter field.

ScopeLocationUse case
user~/.claude/agent-memory/<name>/Knowledge that applies across all projects
project.claude/agent-memory/<name>/Project-specific knowledge (version-controllable)
localSimilar to project, not version-controlledMachine-specific knowledge

These paths are confirmed in the official subagent docs. The local scope stores to .claude/agent-memory-local/<name>/.

How memory works:

  • On startup, the first 200 lines of MEMORY.md are injected into the subagent’s system prompt.
  • Read, Write, and Edit tools are automatically enabled for memory management.
  • The subagent can create additional topic-specific files in its memory directory.
  • If MEMORY.md exceeds 200 lines, the subagent is instructed to curate it.

Prompt the subagent to use memory explicitly in its system prompt:

Before starting work, review your memory for relevant patterns.
After completing work, save what you learned to your memory.
Update your agent memory as you discover patterns, decisions,
and key insights. Write concise notes about what you found.

Parallel execution

The main session can spawn multiple subagents in parallel for independent tasks. Use parallel execution when tasks don’t depend on each other’s outputs. Use sequential execution when one task’s output feeds into another.


3. Skills (.claude/skills/)

Locations and priority

ScopePathSharedPriority
User~/.claude/skills/NoHigher
Project.claude/skills/Yes (version control)Lower

For skills: managed/enterprise > user/personal > project. Plugin skills are namespaced to avoid conflicts.

Note: This is the opposite direction from subagents, where project beats user. Skills prioritize personal customization; subagents prioritize project-specific definitions.

Anatomy of a skill

.claude/skills/
└── skill-name/
    ├── SKILL.md         # Main instructions (required)
    ├── template.md      # Template for Claude to fill in
    ├── examples/        # Example outputs
    └── scripts/         # Executable scripts
---
name: skill-name
description: When this skill should be used. Claude uses this
  to decide whether to auto-load the skill.
---

Instructions in markdown that Claude follows when the skill
is invoked. Reference supporting files relative to this directory.

Skills vs subagents: when to use which

Use skills when…Use subagents when…
You need shared reference knowledgeYou need isolated context
Multiple agents need the same capabilityThe task produces large intermediate output
The capability is a workflow or processThe task needs restricted tool access
You want on-demand context injectionYou want persistent agent-specific memory

Skills are injected instructions. Subagents are isolated execution environments. They compose together: a subagent’s frontmatter can list skills: it should load.

Auto-discovery

Skills in .claude/skills/ within directories added via --add-dir are loaded automatically with live change detection. You can edit skills during a session without restarting. Claude scans skill descriptions to decide relevance and only loads full content when invoked or deemed relevant.


4. CLAUDE.md as Orchestrator

CLAUDE.md defines the orchestrator’s behavior. For multi-agent workflows, it should cover:

  • Startup protocol: What the main session does first on every run. Read state files, check timestamps, determine what work is due.
  • Agent roster: Which subagents exist, what each one does, and when to spawn each one. Include clear boundaries to prevent duplicate work.
  • Execution rules: Which tasks can run in parallel vs must run sequentially. How to handle dependencies between agents. When to skip a task (e.g., nothing in the queue).
  • State management protocol: Which files the orchestrator reads and writes. File ownership rules (which agent writes to which files). Conflict resolution and state cleanup rules.
  • Shutdown protocol: What to do when all tasks are complete. Update status files, write run summaries, flag failures for the next run.
  • Review gates: Where human approval is required before downstream work proceeds. The orchestrator should know not to stall production when the approval queue is empty, but also not to over-produce when nothing has been approved.

CLAUDE.md survives compaction

After /compact, Claude re-reads CLAUDE.md from disk. Instructions given only in conversation will be lost. Anything the orchestrator must always know belongs in CLAUDE.md, not in chat.


5. State Management: Filesystem as Shared Memory

Without shared databases or message queues, agents coordinate through files.

Design principles

  • Each file has a single owner. One agent writes to it; others read it. This prevents collision.
  • Use structured formats. JSON for machine-readable state, markdown for human-readable reports. YAML frontmatter in markdown files bridges both.
  • Include timestamps. Every state update should record when it happened so downstream agents can assess freshness.
  • Design for cold starts. Every session starts fresh. State files must contain everything an agent needs to resume work without conversation history.
  • Plan for growth. State files that grow unboundedly will eventually consume too much context. Build archival/cleanup rules into the orchestrator’s protocol.
state/
├── status.json          # Global status: active tasks, last run, agent health
├── weekly-plan.md       # Current plan (strategist writes, others read)
├── runs/                # Structured log per run
│   ├── 2026-03-15_09-00.json
│   └── ...
└── [domain-specific state files]

Structured run reports

Each run should produce a machine-readable report:

{
  "timestamp": "ISO-8601",
  "tasks_evaluated": 4,
  "tasks_executed": 2,
  "tasks_skipped": 2,
  "agents_spawned": ["agent-a", "agent-b"],
  "outputs_produced": [
    {"file": "path/to/output", "agent": "agent-a", "status": "complete"}
  ],
  "failures": [],
  "decisions": [
    "Skipped agent-c: precondition not met"
  ]
}

6. Execution: Headless Mode

Basic invocation

claude -p "Run." \
  --allowedTools "Read" "Write" "Edit" "Bash" "Glob" "Grep" "Agent" \
  >> logs/$(date +%Y-%m-%d_%H-%M).log 2>&1
  • -p runs Claude Code non-interactively (headless).
  • --allowedTools must explicitly include "Agent" for subagent spawning.
  • --dangerously-skip-permissions bypasses all permission checks for full autonomy. Only use in sandboxed environments without internet access. It does not prevent exfiltration of anything accessible in the execution environment, including credentials. Use --allowedTools for tighter, safer control.
  • The prompt can be minimal if CLAUDE.md carries the orchestration logic.

Scheduling with crontab

# Run every hour
0 * * * * cd /path/to/project && claude -p "Run." \
  --allowedTools "Read" "Write" "Edit" "Bash" "Glob" "Grep" "Agent" \
  >> logs/$(date +\%Y-\%m-\%d_\%H-\%M).log 2>&1

Note: The % characters are escaped as \% because crontab interprets unescaped % as newlines. The basic invocation above uses unescaped % which is correct for direct shell use.

Constraints:

  • The host must be running. Crontab requires an awake machine (or a cloud VM that stays on).
  • If the machine sleeps or reboots, missed runs are simply skipped.
  • Each invocation is a fresh session. Continuity comes from state files and agent memory, not conversation history.

Auto memory for the main session

The main session accumulates knowledge in ~/.claude/projects/<project>/memory/ (confirmed in official docs). The <project> path is derived from the git repo root, so all worktrees and subdirectories share one memory directory. Outside a git repo, the project root is used instead. This is separate from subagent memory. It’s the orchestrator’s own learning.


7. Observability and Quality Assurance

Three layers of observability

1. Structured run reports (machine-readable) The orchestrator writes a JSON report per run to state/runs/. Enables automated monitoring: grep for failures, chart output velocity, detect idle runs.

2. Quality review skill (agent-assisted) A skill that reviews outputs against defined criteria. Can be invoked by the orchestrator at the end of a run or by a human on demand. Should also periodically review agent memory files for coherence and drift. For a deeper look at why separating evaluation from generation matters — and how to calibrate an evaluator agent to avoid the self-preference bias LLMs exhibit when grading their own work — see Two Patterns That Changed How I Think About Multi-Agent Systems.

3. Session transcript review (deep audit) Each claude -p run produces a JSONL transcript in ~/.claude/projects/. Parse these to audit: Did the orchestrator make reasonable decisions? Did subagents stay in scope? What was the token overhead vs productive work?

Failure modes to monitor

Failure modeSymptomMitigation
DriftOutputs slowly move off-specPeriodic quality review against source-of-truth criteria
Memory noiseAgent MEMORY.md fills with irrelevant notesScheduled memory audits; 200-line curation pressure helps
Stale stateStatus files grow indefinitelyArchival rules in orchestrator protocol
DuplicationAgents reproduce work already doneStrong task descriptions with clear boundaries
Silent failureSubagent errors not retriedFailure logging + next-run retry logic in orchestrator
Token burnRuns that produce nothingOrchestrator should detect “nothing to do” early and exit
PeriodAction
Every run (early days)Skim structured run report, check outputs
Twice daily (stabilizing)Review output queue, approve/reject, check agent memory
Once daily (steady state)Review daily digest, approve queue, spot-check one agent memory
WeeklyFull quality scorecard, review all agent memories, clean state files

8. Design Principles (from Anthropic)

These are distilled from Anthropic’s “Building Effective Agents” blog, their multi-agent research system engineering post, and their agent tools engineering post.

Start simple, add complexity only when it demonstrably improves outcomes

The most successful implementations use simple, composable patterns rather than complex frameworks. A single agent with good tools often outperforms a poorly designed multi-agent system.

Think like your agents

Build simulations, watch agents work step-by-step. Effective prompting relies on developing an accurate mental model of what the agent will do with your instructions.

Teach the orchestrator how to delegate

Each subagent needs: an objective, an output format, guidance on tools and sources, and clear task boundaries. Without detailed task descriptions, agents duplicate work, leave gaps, or fail to find necessary information.

Scale effort to query complexity

Embed scaling rules: simple tasks get one agent with few tool calls. Complex tasks get multiple subagents with clearly divided responsibilities. Prevent overinvestment in simple queries.

Tool design is critical

Agent-tool interfaces matter as much as human-computer interfaces. Each tool needs a distinct purpose and clear description. Bad tool descriptions send agents down wrong paths.

Start wide, then narrow

Search strategy should mirror expert human research: explore the landscape before drilling into specifics. Agents default to overly specific queries. Prompt them to start broad.

Subagent output to filesystem minimizes “game of telephone”

Rather than requiring all communication through the orchestrator, let subagents write directly to shared files and pass lightweight references back. This prevents information loss and reduces token overhead.

Agents burn tokens fast

Single agents use ~4x more tokens than chat. Multi-agent systems use ~15x more. For economic viability, multi-agent systems require tasks where the value is high enough to justify the cost.


9. Extension Points

MCP servers

Connect to external services (APIs, databases, platforms). Configured in .claude/settings.json (project) or ~/.claude/settings.json (user). Subagents can access MCP servers listed in their mcpServers: frontmatter or inherited from the session.

Hooks

Execute custom commands at lifecycle events (tool execution, session boundaries, subagent stop). Useful for side effects like logging, linting, or triggering external notifications. Configured in settings.json. All registered hooks fire for matching events regardless of source.

Plugins

Bundle skills, agents, hooks, and MCP servers into installable units. Distributed via marketplaces. Plugin skills are namespaced (e.g., /plugin-name:skill-name) to avoid conflicts.

Rules

Modular instruction files in .claude/rules/. Loaded unconditionally or scoped to file path patterns. Use for conventions that should apply to specific file types or directories without consuming skill slots.


10. Checklist: Before You Build

Design

  • Verify the task value justifies ~15x token cost vs single-agent (multi-agent systems burn tokens fast)
  • Define what the system produces (concrete deliverable format)
  • Define who consumes the output and how they review/approve it
  • Identify which tasks are truly independent (parallelize) vs dependent (sequence)

Build

  • Write CLAUDE.md as the orchestrator before writing any subagents
  • Design state file schemas before implementing agents that read/write them
  • Build the quality review skill early. You need it to evaluate everything else

Verify

  • Verify subagent memory filesystem paths on your machine (auto memory path ~/.claude/projects/<project>/memory/ and subagent memory at ~/.claude/agent-memory/<name>/ are both confirmed in official docs)

Ship

  • Start with one subagent, validate the full loop (trigger, orchestrate, spawn, execute, write state), then add more
  • Plan your human review cadence before going autonomous

Appendix: Verification Log

Every guideline in this document was checked against primary sources. Below is the full audit trail, grouped by verification status. Expand any section to see the individual checks.

40 checks confirmed from official Anthropic docs (code.claude.com)
  1. Skills live in .claude/skills/ (project) and ~/.claude/skills/ (user) - skills docs
  2. Subagents live in .claude/agents/ (project) and ~/.claude/agents/ (user) - settings docs, subagents docs
  3. Skills and subagents are complementary, not competing systems - features overview
  4. Subagents cannot spawn subagents - subagents docs
  5. Subagent priority: managed > CLI flag > project > user > plugin - features overview
  6. Skills priority: managed/enterprise > user/personal > project - skills docs, features overview
  7. Subagents have persistent memory across sessions - changelog
  8. Memory scopes are user, project, and local - changelog + subagents docs
  9. First 200 lines of MEMORY.md injected into subagent system prompt - subagents docs
  10. Read, Write, Edit tools auto-enabled for memory management - subagents docs
  11. Omitting tools: field grants access to all available tools - docs imply this behavior
  12. Subagents do not receive the main conversation history - SDK subagents
  13. Parent receives only the subagent’s final message - SDK subagents
  14. Model options are sonnet, opus, haiku (and full model IDs); omitting inherits parent model - SDK TypeScript
  15. Subagents can load skills via skills: frontmatter - subagents docs
  16. Preloaded skills get full content injected at startup - skills docs
  17. Skills auto-discover with live change detection in --add-dir - skills docs
  18. In regular sessions, skill descriptions loaded but full content is lazy - skills docs
  19. CLAUDE.md is read at the start of every session - memory docs
  20. CLAUDE.md survives compaction - memory docs
  21. CLAUDE.md can live at project root or .claude/CLAUDE.md - memory docs, settings
  22. -p flag runs Claude Code non-interactively (headless) - headless docs
  23. --allowedTools uses permission rule syntax - headless docs
  24. Tool name for spawning subagents is Agent - SDK TypeScript, hooks docs
  25. --dangerously-skip-permissions is a real CLI flag - best practices, settings
  26. --dangerously-skip-permissions carries real security risk - devcontainer docs, best practices
  27. Auto memory path is ~/.claude/projects/<project>/memory/ - memory docs
  28. Outside a git repo, the project root is used for auto memory - memory docs
  29. Cowork has no memory across sessions - support.claude.com
  30. Cowork desktop app must remain open for sessions - support.claude.com
  31. Cowork scheduled tasks only run when computer is awake and app is open - support.claude.com
  32. Each Cowork scheduled task runs as its own session - support.claude.com
  33. Agent Teams are experimental and require environment variable - agent teams docs
  34. Hooks merge across all sources (all fire for matching events) - features overview
  35. MCP servers override by name: local > project > user - features overview
  36. .claude/rules/ supports path-scoped modular instructions - memory docs
  37. Skills and commands have been merged - skills docs
  38. File-based subagents load project-level CLAUDE.md - agent loop docs: “Each subagent starts with a fresh conversation (no prior message history, though it does load its own system prompt and project-level context like CLAUDE.md).”
  39. claude -p with --allowedTools "Agent" successfully spawns subagents - SDK subagents, headless docs, SDK TypeScript
  40. Subagent memory filesystem path is ~/.claude/agent-memory/<name>/ - subagents docs. Official docs confirm user scope at ~/.claude/agent-memory/<name>/, project scope at .claude/agent-memory/<name>/, local scope at .claude/agent-memory-local/<name>/.
5 checks confirmed from Anthropic engineering blog
  1. Orchestrator-workers is a recommended multi-agent pattern - Building Effective Agents
  2. Subagent output to filesystem reduces information loss - Multi-Agent Research System
  3. Without detailed task descriptions, agents duplicate work - Multi-Agent Research System
  4. Single agents use ~4x tokens vs chat; multi-agent ~15x - Multi-Agent Research System
  5. Start simple, add complexity only when it demonstrably improves outcomes - Building Effective Agents
1 check unverified, needs empirical testing
  1. Agent Teams use ~3-4x tokens of a single session - claudefa.st guide (secondary, citing Anthropic docs). No primary Anthropic source found. Needs verification.
4 checks corrected during review
  1. Orchestrator should be a subagent -Wrong. Because subagents cannot spawn subagents, the orchestrator must be the main session (CLAUDE.md), not a subagent.
  2. Skills priority: user is lower than project -Wrong. Official docs confirm enterprise > personal > project. User/personal skills have higher priority. Opposite direction from subagents.
  3. “Subagents do NOT receive CLAUDE.md” (blanket statement) - Overstated. Now confirmed: the agent loop docs state subagents load “project-level context like CLAUDE.md.” Updated body text accordingly.
  4. --dangerously-skip-permissions was unverified -Now confirmed. Found in official best practices, settings reference, and devcontainer docs.

Changelog

  • v1.4 (Mar 2026): Added companion post link in Section 7 (evaluator design and harness pruning).
  • v1.3 (Mar 2026): Upgraded claim 46 (memory paths) to confirmed per official subagent docs. Corrected MCP config paths to settings.json. Clarified model inheritance (omit field to inherit, not explicit inherit value). Fixed scorecard/body inconsistencies.
  • v1.2 (Mar 2026): Confirmed subagents load CLAUDE.md (claim 49). Added full model IDs to model options (claim 14).
  • v1.1 (Mar 2026): Initial verified release. 50 claims audited.

I'm an independent engineer (ex-eBay) who designs and builds production AI systems. I work deep in the Claude Code and MCP ecosystem, document what I find, and take on contract work. Currently taking on projects. Get in touch .