I/D/E · harness-engineering

Coordinator Mode: A Working Multi-Agent System, From the Source

Summary

There is a production multi-agent system inside your laptop. Three layers, file-based IPC, byte-identical fork prefixes for free KV-cache sharing, XML task notifications, prompt-level concurrency rules. Here is what Claude Code's coordinator mode actually does — and what your harness should copy.

Prerequisite: Part 4 of the Harness Engineering deep dive. Read Part 02: The Four Primitives first if you have not seen the subagents-and-filesystem framing.

Flat team to Manager / Coordinator / Worker

Coordinator mode replaces peer chaos with hierarchy. One job per layer: plans, routes, executes.

Why This Matters

Most multi-agent demos collapse the same way. A flat team of agents shares one conversation. They step on each other’s writes. They re-read each other’s outputs. The shared context grows linearly with every worker, so by step four the planner is reasoning over a transcript longer than its useful attention window. The demo works for ten minutes and falls apart the moment a real codebase shows up.

Every public explainer on coordinator mode skips the cache-prefix question, the lock-retry budget, and the envelope-injection vector. That is what this chapter covers — the parts that determine whether a coordinator system survives contact with a real workload, not the parts that fit on a slide.

Five independent teams converged on the same shape between 2025 and 2026 — Claude Code, LangChain Deep Agents, OpenAI Codex, Manus, Cursor [hwc2026]. A coordinator owns the plan and the conversation. Workers run in isolated contexts and return a summary. The filesystem is the shared memory. You can read the LangChain blog and nod, or you can read the actual implementation that ships inside the CLI you already have installed. This chapter reads the source. Every claim cites a file path and a line range [cci2026].

The reason coordinator mode is the hero chapter of this series is not that Anthropic’s design is the only correct one. It is that coordinator mode is a working production multi-agent system that a staff engineer can read end-to-end in an afternoon — input schema, fork primitive, IPC envelope, lock protocol, worktree lifecycle. Once you have read it, the rest of the multi-agent literature reads differently. You stop arguing about message buses and start arguing about cache prefixes.

FLAT TEAM vs HIERARCHY
FLAT TEAM (what most demos do)

[A] 
[B]  one shared conversation  context bloat, stepping on writes
[C] 

HIERARCHY (what production systems do)

      [COORDINATOR]    owns the plan, the user, the conversation
             
             
      [W1][W2][W3]     isolated contexts, one job each
             
       
           
     .claude/scratch/   filesystem is the shared memory

Takeaway: A multi-agent system is not a team of peers. It is a coordinator, isolated workers, and a filesystem.

The Shape: Three Layers, One Job Each

Coordinator mode in Claude Code is one feature flag — COORDINATOR_MODE — wired to one environment variable, CLAUDE_CODE_COORDINATOR_MODE=1, plus the supporting tools that already exist for non-coordinator use [cci2026, §2]. There is no separate “multi-agent runtime.” Turning the flag on swaps the system prompt and forces every spawned subagent to run asynchronously. Everything else — AgentTool, SendMessage, forkSubagent, worktree creation, scratchpad files — already exists in single-agent Claude Code. Coordinator mode is a configuration, not a new product.

The three layers split cleanly along context boundaries. The coordinator holds the only context the user sees. It owns the plan, decides which workers to spawn, and decides when the task is done. Workers run with fresh contexts populated by their spawn prompt alone. They never see the coordinator’s transcript, only what the coordinator chooses to pass in. The scratchpad — .claude/scratch/ — is a shared key-value namespace on disk that any worker can read or write, gated by the tengu_scratch flag [cci2026, §2].

What you do not see in this list is a message bus, a graph runtime, or a dependency declaration language. There is no DAG. The coordinator is a Claude model with a system prompt and a tool that spawns more Claude models. Coordination happens in natural language and in files. The implementation lesson here is that “multi-agent system” does not require new infrastructure. It requires three things: spawn an isolated worker, receive its result, share state across workers. Those three things are file-system primitives plus an XML envelope.

COORDINATOR / WORKERS / SCRATCHPAD
             
                   COORDINATOR          
             • owns plan + user turn    
             • holds only visible ctx   
             • decides done             
           
                   spawn     continue
                   AgentTool SendMessage
                            
           
       WORKER 1       WORKER 2       WORKER 3  
       fresh ctx      fresh ctx      fresh ctx 
       one job        one job        one job   
           
           <task-notification> XML to coordinator
                                            
     
                .claude/scratch/                   
        shared filesystem KV (tengu_scratch flag)  
     

Takeaway: Three layers, three primitives — spawn (AgentTool), notify (XML), share (filesystem). No bus, no DAG.

AgentTool: The Spawn Primitive

AgentTool is the single tool a coordinator uses to start a worker [cci2026, §1]. It lives at tools/AgentTool/AgentTool.tsx — roughly 4000 lines — and its input schema is defined at lines 81–124. The base schema is intentionally small:

const baseInputSchema = lazySchema(() => z.object({
  description: z.string(),
  prompt: z.string(),
  subagent_type: z.string().optional(),
  model: z.enum(['sonnet', 'opus', 'haiku']).optional(),
  run_in_background: z.boolean().optional(),
}));

Coordinator mode and KAIROS (the persistent-assistant runtime) conditionally extend it with name, team_name, mode, isolation ('worktree' | 'remote'), and cwd (KAIROS-only) [cci2026, §1]. The schema does the work that most multi-agent frameworks delegate to a graph definition. The coordinator names the worker (description), gives it a task (prompt), picks a built-in role (subagent_type), and chooses a model size and isolation level. That is the entire surface area for spawning a worker.

The three isolation modes correspond to three operational risks the coordinator is willing to accept. Default isolation puts the worker in the same working directory as the parent — fast, but the parent and child share a filesystem and can race on writes. Worktree isolation creates a fresh git worktree under .claude/worktrees/<slug> and runs the worker there, so two write-heavy workers can edit the same file without colliding. Remote isolation calls teleportToRemote() and creates a CCR (Claude Cloud Run) session for the worker — this branch is dead-code-eliminated from external builds via "external" === 'ant', so the implementation detail is internal-only [cci2026, §1].

ModeFilesystemLatencyFailure blast radiusWhen to use
default (same-dir)shared with parentlowestfull parent reporead-only research, single-file edits
worktreefresh git worktree+1 git checkoutsandboxed branchparallel write-heavy tasks
remotehosted sandboxnetwork round-tripnone (remote VM)long-horizon, untrusted, or capacity-shifted

The schema is also where Claude Code makes the only enforcement decision in the entire multi-agent stack: which subagent types exist. The built-in registry has five entries [cci2026, §1]: general-purpose, Explore, Plan, verification, claudeCodeGuide. Three are one-shot — they skip emitting agentId, the SendMessage trailer, and the per-turn usage block, saving roughly 135 characters per spawn. At whatever Claude Code’s actual run volume (the source-analysis pins it near 34M/week [cci2026, §1]), the arithmetic shows this code was written by people who read their API bill.

Takeaway: AgentTool is a five-field schema with three isolation modes. Everything else in coordinator mode is built on it.

The Sync-vs-Async Decision

The single line of code that makes coordinator mode different from regular Claude Code is the sync/async decision at tools/AgentTool/AgentTool.tsx lines 557–567 [cci2026, §1]:

const shouldRunAsync = (
  run_in_background === true ||
  selectedAgent.background === true ||
  isCoordinator ||
  forceAsync ||           // fork subagent experiment
  assistantForceAsync ||  // KAIROS mode
  (proactiveModule?.isProactiveActive() ?? false)
) && !isBackgroundTasksDisabled;

Six independent conditions can flip a single spawn from synchronous (the coordinator blocks waiting for the result) to asynchronous (the coordinator gets a placeholder back immediately and the worker continues in the background). Two are user-controlled — the run_in_background flag on the call and a per-agent background: true declaration. Two are system modes — coordinator mode (isCoordinator) and KAIROS (assistantForceAsync), both of which force every spawn async. One is the fork-subagent experiment (forceAsync), discussed in the next section. The sixth is proactiveModule?.isProactiveActive() — proactive runs auto-async to avoid blocking the user turn. A separate top-level override, !isBackgroundTasksDisabled, can veto all six and force everything sync (used in test runs and when background scheduling is disabled).

SYNC-OR-ASYNC DECISION (shouldRunAsync)
spawn request
   
   

 isBackgroundTasksDisabled ?          yes  SYNC (override)

          no
         

 ANY of these true?                  
   run_in_background === true        
   agent.background === true         
   isCoordinator                      any-yes  ASYNC
   forceAsync (fork experiment)      
   assistantForceAsync (KAIROS)      
   proactiveModule.isProactiveActive 

          all-no
         
       SYNC

The reason coordinator mode forces all spawns async is structural, not a performance optimization. A synchronous subagent holds the coordinator’s turn open. The coordinator cannot react to anything else — not a new user message, not a SendMessage from another worker, not a cron firing — until the synchronous child returns. The source comment makes this explicit for KAIROS [cci2026, §6]: “Synchronous subagents hold the main loop’s turn open — the daemon’s inputQueue backs up, and the first overdue cron catch-up on spawn becomes N serial subagent turns blocking all user input.” The same logic applies to coordinator mode: a coordinator that blocks on one worker is no longer coordinating. It is just a wrapper around the worker.

The cost of forcing all-async is a different kind of complexity. Every spawn returns a placeholder ID. Every result arrives as an out-of-band <task-notification> message. The coordinator must reason about partial completion, parallel state, and worker failure without the convenience of a synchronous return value. The harness pays a complexity tax so that the coordinator can stay responsive.

Takeaway: Coordinator mode forces every spawn async. Every result arrives out of band.

Fork Subagent: Why Parallelism Is Almost Free

The fork-subagent path is where Claude Code’s harness gets clever about money. Spawning a worker normally means paying for a full prompt re-encoding on the API side. The worker has its own system prompt, its own tool list, its own conversation prefix. None of those bytes hit the parent’s prompt cache, so the API charges full freight for the first cached read. With a coordinator that spawns three or four workers per task, that adds up.

The fork primitive — utils/forkedAgent.ts — sidesteps the cost by guaranteeing that the child’s API request prefix is byte-identical to the parent’s [cci2026, §4]. The type that enforces this is CacheSafeParams:

export type CacheSafeParams = {
  systemPrompt: SystemPrompt
  userContext: { [k: string]: string }
  systemContext: { [k: string]: string }
  toolUseContext: ToolUseContext
  forkContextMessages: Message[]
}

A child built from CacheSafeParams inherits the parent’s entire conversation context and the parent’s system prompt, character for character. The API sees the same prefix on the child’s request as on the parent’s, so it serves the child from the same prompt cache. The source comment is direct: “Parallelism is basically free.” Anthropic prices cache reads at roughly 0.1× the standard input rate; the first write costs ~1.25× input, but every fork after that reads from the same prefix, so per-fork input cost approaches one-tenth of an uncached request [anthropic-pricing]. Across thousands of parallel forks, that ratio is the reason coordinator mode can spawn workers at all — the largest single line item in an agent budget collapses by an order of magnitude.

The fork agent itself is declared minimally [cci2026, §1]:

export const FORK_AGENT = {
  agentType: 'fork',
  tools: ['*'],
  maxTurns: 200,
  model: 'inherit',
  permissionMode: 'bubble',
  source: 'built-in',
  getSystemPrompt: () => '',
}

tools: ['*'] inherits the parent’s full tool set. model: 'inherit' uses the parent’s model — switching models would break the cache. getSystemPrompt: () => '' is the trick. The agent-definition hook returns empty, and the runtime then supplies the parent’s system prompt to the child via CacheSafeParams.systemPrompt. The child does not run prompt-less; it runs with the parent’s prompt, byte for byte, sourced from a different code path. The hook returns empty because contributing any new bytes here would break the prefix.

Two safety guards keep fork from melting down. First, a recursive-fork guard scans incoming messages for a <fork-boilerplate> tag and refuses to spawn another fork when one is already in flight [cci2026, §1] — without this, a coordinator that habitually forks could fork-bomb the API. Second, all fork placeholder results are the literal string 'Fork started -- processing in background' so that every coordinator transcript that has spawned a fork has byte-identical placeholder text. Even the “I am waiting for my workers” turn is cache-friendly.

The state-isolation rules in createSubagentContext() (lines 345–462) finish the design [cci2026, §4]. readFileState is cloned, not shared, so the child can read without contaminating the parent. contentReplacementState is cloned for cache-sharing — same reason. abortController is a new child controller linked to the parent’s, so killing the parent cascades. setAppState is a no-op in the child, preventing the child from corrupting parent UI state. setAppStateForTasks is the only state hook that always reaches the root store — it has to, because the parent needs to know when the child terminates, otherwise zombie processes accumulate.

Takeaway: Fork makes parallel workers nearly free by holding the API prefix byte-identical. The design is one type (CacheSafeParams) and one rule: never change anything the cache key depends on.

The Task-Notification Protocol

When a worker finishes, the coordinator does not get a function return value. It gets a message — an XML envelope appended to its conversation as a user-role message. The envelope shape is fixed [cci2026, §2]:

<task-notification>
  <task-id>{agentId}</task-id>
  <status>completed|failed|killed</status>
  <summary>{summary}</summary>
  <result>{agent's final text response}</result>
  <usage>
    <total_tokens>N</total_tokens>
    <tool_uses>N</tool_uses>
    <duration_ms>N</duration_ms>
  </usage>
</task-notification>

Three design choices here are worth lifting verbatim into any harness you build. First, the envelope is XML and not JSON. Anthropic’s own tool-use guidance recommends XML for nested payloads, and the practical reason shows up here: JSON’s escaping rules force the worker’s result to be sanitized, which loses information. XML lets the worker’s natural-language result sit in the envelope without escaping, and tag boundaries survive the language model’s parse without an escape-character minefield.

Second, every notification carries usage data — token count, tool-use count, duration. The coordinator can reason about cost and time as part of its planning, not as an out-of-band ops concern. When the coordinator decides whether to spawn a second worker on a borderline task, the first worker’s <usage> block is right there in the transcript. This is the same architectural choice as exposing latency to a graph runtime — except the runtime is a language model and the API is a tag.

Third, status is a small enum: completed | failed | killed. There is no partial, no streaming, no degraded. A worker either finished, errored, or was terminated. The coordinator’s reasoning surface is correspondingly small. Most multi-agent frameworks add status states for engineering convenience; Claude Code subtracts them for coordinator-prompt clarity.

What Goes Wrong Without This:

WHAT GOES WRONG WITHOUT A FIXED RESULT ENVELOPE
Symptom: Coordinator mis-parses worker results, drops half the output.
Cause:   Worker returned free-text. JSON-escaped quotes broke the parser.
       The fix is a fixed envelope, not a smarter parser.

Symptom: Coordinator over-spends on parallel workers.
Cause:   Result format omits per-worker token usage.
       Coordinator cannot reason about cost as part of planning.

Symptom: Coordinator hangs waiting for a worker that already crashed.
Cause:   Failure path uses a different envelope than success path.
       Single envelope with status enum prevents this silent loss.

Takeaway: A fixed XML envelope with status, summary, result, and usage is the entire wire protocol. Subtract states until the coordinator prompt can describe them in a sentence.

File-Based IPC: Mailboxes, Locks, and .claude/scratch/

The transport layer underneath the XML envelope is the filesystem. There is no broker, no queue, no socket — just files on disk with two specific tricks [cci2026, §3].

Worker output lives at <project-temp>/<session-id>/tasks/<taskId>.output. The path components matter: project-temp scopes by project, session-id scopes by Claude Code session, taskId is unique per spawn. Two open flags are non-negotiable. The file is opened with O_NOFOLLOW so that a symlink attack — an attacker pre-creating tasks/<predictable-id>.output as a symlink to ~/.ssh/authorized_keys — fails immediately. The maximum per-task output is capped at 5GB, which sounds large until you remember that a runaway worker streaming binary data can fill a disk in minutes.

Task IDs use a prefix-plus-eight-base-36 format. The prefix encodes the task type: b (bash), a (agent), r (remote), t (teammate), w (workflow), m (monitor), d (dream) [cci2026, §3]. Eight base-36 characters give roughly 2.8 trillion combinations — enough that high-frequency spawns within a session, or two coordinators across different sessions on the same machine, have a negligible collision probability. (sessionId already scopes uniqueness; the random suffix is the second layer.) The prefix is debuggability collateral: when you find a file at tasks/a3kx9zq2.output, the leading a tells you it was an agent spawn without looking up the task table.

Teammate-to-teammate communication uses a different file pattern: .claude/teams/{team_name}/inboxes/{agent_name}.json. Two teammates writing to the same mailbox would race, so writes are gated by a lock with explicit retry semantics — 10 retries with 5-100ms exponential backoff [cci2026, §3]. The numbers are tuned for short-lived contention: if a worker cannot acquire the lock in roughly one second of jittered retries, the contention is structural and the worker is supposed to fail loudly rather than block.

The scratchpad — .claude/scratch/ — is the third file pattern and the only one designed for unstructured cross-worker knowledge. It is feature-gated by tengu_scratch and is described in the source as “persistent cross-worker knowledge” [cci2026, §2]. Workers write findings there with arbitrary keys; later workers (or later sessions) read them. The pattern is deliberately minimal: no schema, no TTL, no locking. The coordinator’s system prompt tells workers when to read and write. The filesystem is the bus.

PathProducerConsumerConcurrency control
<project-temp>/<session-id>/tasks/<taskId>.outputone workercoordinatorO_NOFOLLOW, 5GB cap
.claude/teams/<team>/inboxes/<agent>.jsonany teammateone named agent10× lock retry, 5-100ms backoff
.claude/scratch/<key>any workerany workerprompt-level, no lock
.claude/worktrees/<slug>/gitone workergit’s own locking

Takeaway: Filesystem is the bus. O_NOFOLLOW for security, 10× retry for write contention, prefix-tagged IDs for debuggability. No broker, no socket.

Prompt-Level Concurrency Rules (Not Code-Enforced)

The most surprising design choice in coordinator mode is what the source calls “concurrency rules (prompt-level, not code-enforced)” [cci2026, §2]. The system prompt is roughly 6000 characters (lines 111–369 in the source), and a chunk of it spells out concurrency policy in English:

  • Read-only tasks: parallel freely.
  • Write-heavy tasks: one at a time per file set.
  • Verification: can run alongside implementation on different areas.

There is no scheduler enforcing these rules. There is no graph runtime preventing two workers from clobbering the same file. There is a prompt that tells the coordinator, in natural language, when parallel spawns are safe and when they are not. The coordinator’s job — the coordinator-model’s job — is to read its own plan, classify each step, and serialize the write-heavy ones.

The choice trades guarantees for flexibility. A code-enforced scheduler would refuse to spawn two write-heavy workers on the same file, full stop. The prompt-level rule allows the coordinator to make domain-specific judgments — for example, two workers editing different functions in the same file may be safe in practice even though static analysis cannot prove it. The price of that flexibility is that the rules can be violated, and the only line of defense against violation is the coordinator’s reasoning quality. When the coordinator misjudges, two workers race and one’s edits are lost — silently. There is no detection mechanism in coordinator mode for this case. The harness does not diff worker outputs against each other, and a clobbered edit looks identical to an edit that succeeded. This is a real production silent-loss vector, not a theoretical one, and it is the strongest reason to route write-heavy spawns to worktrees even when the prompt suggests they are safe.

The mitigation is structural rather than algorithmic. If two workers might race on writes, the coordinator can spawn them with isolation: 'worktree' and they get fresh git worktrees. The race becomes a merge problem instead of a write-clobber problem, and merges fail loudly. The pattern is: “trust the prompt for safe cases, route to worktrees when the prompt cannot rule out unsafe cases.” Code-enforced schedulers cannot do this trade-off because they cannot read the worker’s intent. The harness leaves the trade-off to the coordinator and provides isolation as the escape hatch.

Takeaway: Concurrency is a prompt rule, not a scheduler invariant. Worktree isolation is the escape hatch when the prompt cannot guarantee safety.

SendMessage: Inter-Agent Wakeups

AgentTool starts a worker. SendMessage continues one [cci2026, §7]. The two together cover the full lifecycle: spawn, wake, terminate. The implementation lives at SendMessageTool.ts lines 800–873.

The to field accepts five address types: an agent’s name, an agent’s id, the broadcast literal "*", a Unix-domain-socket URL "uds:<socket>", and a remote-control bridge URL "bridge:<session-id>". The first two are local. "*" fans out to every agent in scope. "uds:" and "bridge:" reach across processes and sessions respectively. The same tool handles both the local single-process case and cross-host coordination — the address space is uniform, the transport is the variable.

State handling depends on the recipient’s lifecycle. A running agent receives the message via queuePendingMessage(), which inserts it at the next tool-round boundary — not the next token, the next tool round. Messages do not interrupt mid-thought; they wait until the recipient is about to make its next decision. A stopped agent is auto-resumed via resumeAgentBackground(). An evicted agent — one whose context was paged out — is resumed from disk transcript, replaying its history to rebuild context, and then receives the message.

Two message types have special structure. shutdown_request / shutdown_response is the graceful-shutdown protocol: a sender asks an agent to terminate, the agent acknowledges, finishes its current tool round, and exits. plan_approval_response is the approval gate for team coordination: only the team lead may approve or reject a plan, and the response is encoded as a structured message rather than free text so the recipient can branch on it without parsing prose. Everything else is free-text and parsed by the recipient as a normal user-role message.

The interesting design tension here is between liveness and quiescence. A coordinator that wakes a sleeping worker pays a resume cost — context reload, possible re-execution of the last tool round, disk reads. A worker that stays alive in the background pays a memory cost. SendMessage’s ability to wake an evicted agent from disk means the harness can choose: keep workers warm if quick wakeups dominate, evict aggressively if memory dominates. The tool is the same in both regimes.

Takeaway: SendMessage uniformly addresses name, id, broadcast, UDS, or bridge — and resumes evicted agents from disk. The harness picks the warm-vs-cold trade-off; the tool does not care.

Worktree Isolation: Git as Sandbox

Worktree isolation is the most physical of the three isolation modes [cci2026, §8]. When a coordinator spawns a worker with isolation: 'worktree', createAgentWorktree (at utils/worktree.ts lines 902–952) creates a git worktree under .claude/worktrees/<slug> and runs the worker there. The worker’s cwd is the worktree. Its writes go to a separate working tree backed by the same .git object store. Two workers with different worktrees can edit the same file in their respective worktrees without colliding on disk.

The implementation is full of small wins. createAgentWorktree is intentionally lightweight — it does not touch global session state, so it can run in parallel without taking the session mutex [cci2026, §8]. If the project is not a git repo, the function falls back to a hook-based creation path so the same API works for non-git workspaces. Fast resume reads the .git pointer file directly rather than spawning git rev-parse, avoiding roughly 15ms of spawn overhead. Worktrees support sparse checkout via settings.worktree.sparsePaths so that large repos do not pay full checkout cost. Slug validation prevents path-traversal attacks via .. in the worktree name.

Post-creation, the worktree is configured to match the parent’s environment. The parent’s settings.local.json is copied so user-specific tool permissions carry over. Git hooks are reconfigured to use the main repo’s hooks rather than the worktree’s, so pre-commit checks remain consistent. Directory symlinks (opt-in) avoid disk bloat for shared assets like node_modules. Gitignored files listed in .worktreeinclude are explicitly copied — useful for .env files and uncommitted config that the worker needs.

Lifecycle is the part most harness designs get wrong. When the worker completes, Claude Code checks whether the worktree has any changes. No changes — the worktree is auto-removed [cci2026, §8]. Changes exist — the worktree is kept and a message logs the location so the user or coordinator can review. The default is to clean up; the exception is to retain. This is the opposite of what most sandbox systems do, and it is the correct default for an agent harness, because the steady state is many short-lived workers each producing zero or a small diff. Cleaning up the no-change ones keeps .claude/worktrees/ from growing without bound.

Worktree isolation is not free. Each worktree is a working tree on disk, costs a checkout, and creates filesystem noise. The escape hatch — same-dir isolation — exists for workers that genuinely cannot race. The point is that the harness offers a calibrated trade-off: pay a checkout for safety, or skip it when safe. The coordinator decides per spawn.

Takeaway: Worktrees give two write-heavy workers their own filesystems on the same .git store. Auto-cleanup when no changes is the right default; retain-on-change is the right exception.

What To Copy, What To Skip

Not every choice in coordinator mode generalizes. Some are tightly coupled to Claude Code’s specific stack (Bun runtime, Anthropic API, internal feature flags). Some are universal.

Pattern in coordinator modeCopy into your harness?Why
Fixed XML envelope for worker resultsCopy.Subtract status states, add usage block. Universal.
Fork-with-byte-identical-prefix for cache sharingCopy.Largest single cost line item in any multi-agent system. Cache pricing rewards it on every major provider.
File-based IPC with O_NOFOLLOW + size capCopy.Beats brokers for ≤10 workers per host. Add a broker only when you need cross-host.
Three-mode isolation (same-dir / worktree / remote)Copy the shape.Worktree-equivalent in your VCS. Remote only if you actually have a sandbox.
Prompt-level concurrency rulesCopy only if your coordinator is GPT-4-class or above.Below that, enforce in code — a 7B model will lose edits in the first hour. The rules are not safety; they are an optimization that requires a model strong enough to read its own plan.
Lock with 10× retry, 5-100ms backoffCopy — but cap producers per mailbox.Tuned for short-lived contention with roughly ≤10 producers writing to one mailbox. Above that, the retry budget is pathological and you need fan-out to per-agent inboxes or a real broker.
tools: ['*'] in fork agent (full parent tool inheritance)Skip the default; filter destructive tools.Inheriting the parent’s full tool set — including Bash, Edit, Write — into a background-spawned fork is a footgun. The fork has no supervisor at the moment of spawn. Filter destructive tools out of fork inputs unless the fork is being supervised.
<task-notification> as user-role messageCopy.XML tags survive language-model parsing better than JSON when the payload includes user text.
Force-all-async in coordinator modeCopy.A blocked coordinator is not a coordinator. Non-negotiable for any system with cron or push input.
Feature flag for the whole modeCopy.Single switch, single rollout, single revert.
Five-entry built-in agent registrySkip the count, copy the discipline.Five is Claude Code’s number. The discipline is “small finite set, not arbitrary spawning.”
tengu_* telemetry prefixSkip.Internal to Anthropic. Use your own namespace.
KAIROS-specific extensions (cwd, daily logs)Skip.Persistent-assistant features, not coordinator features. Different chapter.
Hidden BUDDY feature flag patternSkip for coordinator, useful elsewhere.Example of feature-flag isolation discipline [cci2026-gems, §1] — copy the flag pattern, not the gacha pet.

Takeaway: Copy the envelope, the fork primitive, the file-based IPC, and the force-async. Skip the telemetry prefix, the agent-count, and anything KAIROS-flavored.

Gotchas

GotchaSymptomFix
Cache-breaking edit to fork system promptFork workers cost full prompt instead of cached. Bill spikes.Treat any change to fork’s system prompt or tool list as a cache-break event. Telemetry it (see chapter 07).
Worker writes through symlinkWorker corrupts a file outside its worktree.Open all task-output files with O_NOFOLLOW. Do not trust path discipline.
Two write-heavy workers in same-dir modeLost edits. Last-write-wins on the file.Default write-heavy spawns to isolation: 'worktree'. Same-dir is for read-only.
Coordinator blocks on a synchronous workerUI freezes. Cron catch-up backs up. Daemon input queue grows.Force all-async in coordinator mode. Never expose a run_in_background: false spawn to the coordinator’s tool list.
Mailbox lock contention exceeds retry budgetWorker fails after ~1s of jittered backoff. Looks like a flake.Structural contention means too many workers writing to one mailbox. Fan out to per-agent inboxes (Claude Code’s default), not a shared one.
Worker output exceeds 5GB capTruncated output, worker error.Stream large outputs to a dedicated path; do not return them through the task-output channel. The 5GB cap exists to prevent disk-fill, not to be raised.
Recursive fork bombAPI rate-limit, runaway spawn.Recursive-fork guard scans for <fork-boilerplate> tag in messages. Keep the guard; do not optimize it away.
Worker crashes mid-task between .output write and <task-notification> emitCoordinator waits forever for a notification that never arrives.Attach a per-spawn deadline; on timeout, the harness synthesizes a status: killed notification. Full pattern is replay safety’s job — see chapter 05.
<task-notification> injection from worker stdoutA malicious or hallucinating worker emits a fake <task-notification> envelope mid-output and the coordinator believes another worker finished.Construct the envelope out-of-band in the harness, not from worker stdout. The XML the coordinator sees should originate in trusted code, never in untrusted text.
Scratchpad concurrency on same keyTwo workers write the same .claude/scratch/<key> and one clobbers the other silently.Per-worker key prefix (<agentId>/<key>) by convention, or accept last-write-wins and document it. No lock is intentional — pick the discipline that matches your blast radius.
Plan-approval response parsed as free textCoordinator approves something it should have rejected.Structured plan_approval_response messages, not natural language. Only the team lead may emit them.

Takeaway: Most gotchas reduce to: assume the worker is hostile, cap the resources, treat cache as architecture, never block the coordinator.

What Coordinator Mode Teaches About Harnesses

Coordinator mode shows that a working multi-agent system needs no new infrastructure — spawn primitive, XML envelope, filesystem IPC, and an opinionated coordinator prompt are enough. What it does not solve is what happens after the spawn: a worker that dies between writing .output and emitting <task-notification>, a coordinator that restarts mid-plan, an envelope that arrives twice. That failure model is the subject of the next chapter — 05: Replay Safety.

Takeaway: Coordinator mode is harness engineering reduced to one system. Spawn, notify, share, isolate. Resume-safe is a separate chapter.

References

  1. [cci2026] tacit-web/research/cc-internals/src-analysis-05-agents-coordination.md — Direct source analysis of /Users/ketankhairnar/Downloads/claude-code-src/, dated 2026-04-01. Sections cited: §1 AgentTool Implementation, §2 Coordinator Mode, §3 Task System & File-based IPC, §4 KV Cache Forking, §7 SendMessage, §8 Worktree Isolation, §9 Communication Model Summary. Specific source paths: tools/AgentTool/AgentTool.tsx lines 81–124 and 557–567; utils/forkedAgent.ts lines 47–68 and 345–462; utils/worktree.ts lines 902–952 and 235–375; SendMessageTool.ts lines 800–873; utils/task/diskOutput.ts; Task.ts lines 6–13.
  2. [cci2026-gems] tacit-web/research/cc-internals/src-analysis-07-hidden-gems.md — Hidden gems analysis, 2026-04-01. §1 BUDDY (feature-flagged isolation example), §6 retry logic (multi-strategy resilience, relevant to lock-retry tuning).
  3. [hwc2026] tacit-web/research/harness-engineering-deep-agents-ssr.md — Harness engineering convergence research, 2026-03-10. Phase 4 findings: five independent teams (Claude Code, LangChain Deep Agents, OpenAI Codex, Manus, Cursor) converged on the same four primitives.
  4. [anthropic-pricing] Anthropic, “Pricing,” https://www.anthropic.com/pricing — cache-read pricing at roughly 0.1× standard input rate; cache-write at roughly 1.25× input rate. Numbers vary by model tier; ratios are stable.

Next chapter: 05 — Replay Safety: The Bug That Breaks Every HITL Workflow

One question for the reader: Your current multi-agent system — could you draw its result envelope on a napkin? If the answer is “it depends on the agent type,” you have a coordinator-prompt bug, not a worker bug.

Harness-engineering Ch 5/13
  1. 1 Harness Engineering — What This Series Is, and Why You Should Read It in Order 12m
  2. 2 What a Harness Actually Is (and What It Is Not) 20m
  3. 3 The Four Primitives Every Working Agent System Has 28m
  4. 4 The Reasoning Sandwich: Why More Thinking Made My Agent Worse 18m
  5. 5 Coordinator Mode: A Working Multi-Agent System, From the Source 32m
  6. 6 Replay Safety: The Bug That Breaks Every HITL Workflow 26m
  7. 7 Skills as Information Architecture, Not Features 22m
  8. 8 Prompt Cache Is Architecture: Designing Around the 50K-Token Mistake 22m
  9. 9 The Session-Memory Feedback Loop (ACE + Codified Context) 26m
  10. 10 The Org-Harness Thesis: Why Context Does Not Transfer 26m
  11. 11 The Numbers That Killed the 'Wait for Better Models' Excuse 14m
  12. 12 Build Your Own Harness: A 6-Week Plan for a 3-Person Team 30m
  13. 13 The Ten Pitfalls (and How to See Them Coming) 20m