The Four Primitives Every Working Agent System Has | Intentional / Deliberate / Engineering

Prerequisite: This is Part 02 of the Harness Engineering deep dive. Read Part 01: What a Harness Actually Is first if you have not seen the four-layer stack.

A 5-by-4 matrix drawn as a grid. Columns are labelled Detailed Prompts, Planning, Subagents, Filesystem. Rows are labelled Claude Code, LangChain Deep Agents, OpenAI Codex, Manus, Cursor. Every cell is filled with a checkmark and a one-line implementation note. Below the grid: 'Five systems. Five teams. Two years. Twenty cells filled.' — Five systems, four primitives, twenty cells filled

Why This Matters

Every “agent loop” tutorial shows you observe-think-act and stops there. What ships in production is bigger than a loop. The ReAct paper gave us a control structure for one agent making one decision; what five separate teams shipped between 2024 and 2026 looks nothing like ReAct’s diagram. The diagram became table stakes. The systems on top of it got specific.

Three framings this chapter counters. The “10 primitives” lists that LinkedIn agent posts produce every other week — they conflate features with primitives, listing “RAG” alongside “memory” alongside “evaluation,” ending in a checklist no working system implements top-to-bottom. The “ReAct is enough” claim — useful for one-decision agents, silent on everything above one decision. And the “the agent is just a loop” framing — true at one level of abstraction and unhelpful at every other. None predict why five independent teams ended up with the same four boxes.

The four primitives are LangChain’s framing in their “Deep Agents” essay: detailed prompts, planning, subagents, filesystem [dalc2025]. The contribution here is not the list — Harrison Chase wrote the list. The contribution is the five-system audit: 5 × 4 = 20 cells. Every cell cited [hwc2026, Finding 2]. Five teams with different goals (CLI coding tool, open framework, hosted research agent, autonomous PC use, IDE), different stacks (Bun, Python, Python, custom, custom), and different funding shapes converged on the same four boxes. The boxes are load-bearing, not stylistic.

Takeaway: The primitives are not a checklist. They are a converged-on architecture across five independent shipped systems. Convergence is the evidence.

What “Convergence” Actually Means Here

Five systems agree on the same four primitives. “Agreement” is a weak word in software architecture, so the rest of this section earns it.

The agreement is on the slot, not the spelling. Claude Code spells “filesystem” as .claude/scratch/ plus worktrees and a memory directory [cci2026, §1, §2, §3]. Deep Agents spells it as a filesystem tool exposed to the agent [dalc2025]. Manus spells it as continuous todo.md rewriting and a KV-cache-friendly append-only context where the disk holds the long tail [manus2025][boh-p5, §10]. OpenAI Codex spells it as the working directory of the harness’s sandbox [oai2026]. Cursor spells it as .cursor/rules/*.mdc plus the repo itself [boh-p1]. The implementations differ. The slot is the same.

SLOT VS SPELLING — FOUR PRIMITIVES, FIVE SYSTEMS

SLOT              CC                  Deep Agents       Codex            Manus              Cursor
─────────────────────────────────────────────────────────────────────────────────────────────────────
detailed prompts  coordinator         canonical         curated          stable-prefix      .cursor/rules
                system prompt       library prompts   prompt           KV-friendly         /*.mdc
─────────────────────────────────────────────────────────────────────────────────────────────────────
planning          TodoWrite +         planning step     implicit /       todo.md             opt-in
                Plan subagent                         in prompt        rewrite             plan mode
─────────────────────────────────────────────────────────────────────────────────────────────────────
subagents         AgentTool           subagent_type     worker spawn     △ sparingly        Automations
                                                                                            spawn
─────────────────────────────────────────────────────────────────────────────────────────────────────
filesystem        .claude/scratch/    filesystem tool   sandbox          todo.md +          .cursor/
                + worktrees                           working dir      filesystem bus     rules/ + repo

Read across, the spelling drifts; read down, the slot stays — same diagnostic the matrix later quantifies cell-by-cell.

The agreements are also unequal. Strongest convergence is on filesystem and subagents — every one of the five exposes both, with a public artifact or source-derived analysis confirming it [hwc2026, Finding 2]. Weakest is on planning: Cursor’s planning is opt-in rather than first-class; OpenAI Codex’s planning is implicit in its prompt rather than a named module. The SSR analysis across these five systems adds a fifth pattern — verification — that the Deep Agents post does not call a primitive but every system implements in some form [hwc2026, Finding 2]. This chapter covers the four primitives Deep Agents named [dalc2025] and treats verification as its own chapter.

The agreements are also temporal. Claude Code shipped in 2024; Deep Agents in late 2025; OpenAI’s harness-engineering post in early 2026; Manus through 2025; Cursor across 2025–2026. No vendor announced four primitives at a conference and asked the rest to follow. Five teams converged on the same four boxes via cross-pollination rather than coordination — each shipped publicly, each read the others (the Deep Agents post explicitly references the recreated Claude Code system prompt and Anthropic’s deep-research agent [hwc2026, additional links][dalc2025]), and the resulting convergence is stronger evidence of the boxes than if they had never seen each other’s work.

Takeaway: Five systems agree on the slot, not the spelling. We cover the four primitives with the strongest agreement and defer verification to its own chapter.

Primitive 1 — Detailed Prompts

The first primitive is a system prompt that encodes the agent’s role, behaviour, tools, examples, and edge cases — measured in thousands of characters, not hundreds [dalc2025]. The Deep Agents essay names the distinction directly: a “shallow” agent has a one-liner system prompt; a “deep” agent has a multi-thousand-character document with role, constraints, tool descriptions, format requirements, and worked examples [dalc2025].

Why it is a primitive and not a tuning concern: minimal prompts produce minimal behaviour. A model with a generic prompt picks generic strategies, narrates verbosely, asks clarifying questions instead of acting, and treats every task as a fresh start. The detailed prompt is how the harness teaches the model the harness’s own conventions — that the agent has a todo.md, that it should call Read before Edit, that verification is a separate phase. None of that is in the weights.

Two systems make the prompt-as-architecture choice visible. Claude Code’s coordinator mode ships a system prompt of roughly 6000 characters covering the coordinator’s role, the worker-result XML envelope, three concurrency rules, and the tengu_scratch scratchpad contract [cci2026, §2]. The file contains lines like “Your job is to direct workers to research, implement and verify code changes” alongside an explicit XML schema for <task-notification> — prose and contract in the same artifact. LangChain Deep Agents ships canonical detailed prompts as part of the library, referencing the system prompts used in Anthropic’s deep-research agent and Claude Code [dalc2025]. Manus follows the same pattern with a KV-cache-friendly stable-prefix prompt — a design constraint that simple few-shot tutorials never mention [manus2025].

The common failure mode when this primitive is thin is the “generic agent” symptom: an assistant that asks for permission instead of acting, ignores the harness’s tools, and reverts to safety-narration voice on every multi-step task. The fix is treating the prompt as an artifact — versioned, telemetered, revised when behaviour drifts. The cache discipline in chapter 07 is the operational form of that ownership.

Takeaway: Detailed prompts are how the harness teaches the model its own conventions. Claude Code and Deep Agents ship them publicly enough to study; the prompt is an artifact, not a tuning concern.

Primitive 2 — Planning

The second primitive is explicit plan-then-execute decomposition — a planner produces a list, an executor runs the list [dalc2025]. The Deep Agents essay names this as one of the four; the OpenAI harness-engineering post frames it as the difference between “ask the model and hope” and “structure the agent around a plan” [dalc2025][oai2026].

Why it exists: multi-step tasks decompose better than they single-pass. A model asked to “fix the failing test” in one shot will sometimes succeed and frequently flail. The same model asked to “list the steps, then execute in order” succeeds materially more often, because the plan is a cheap artifact that constrains the rest of the conversation. The plan is also re-readable — when step 4 fails, the agent goes back to the plan, not the original prompt. Planning gives the agent a stable goal independent of the most recent tool result.

Three systems make planning a named module rather than an implicit phase. Claude Code’s built-in agent registry has five entries — general-purpose, Explore, Plan, verification, claudeCodeGuide — with Plan listed as a planning subagent that is one-shot (no agentId, no SendMessage trailer, no usage block) to save tokens at scale [cci2026, §1]. The fact that Plan is its own subagent type, not a prompt prefix, is the architectural choice. Deep Agents builds planning in as one of the four library-shipped primitives, with the planner as a distinct step in the loop [dalc2025]. Manus encodes planning as a continuously-rewritten todo.md that the agent updates between steps — the planner is not an agent but a file the harness keeps refreshed [manus2025][boh-p5, §10]. Three spellings of the same primitive: a subagent, a library module, a file.

The common failure mode when planning is missing is “lost in execution”: the agent gets five tool calls deep, the original task drifts out of the recent attention span, and the next decision is made against the most recent tool output instead of the original goal. The plan is what keeps the goal in the recent attention span. Without it the goal is one line at turn zero, fighting for visibility against tens of thousands of tokens of tool results. Manus names this directly — the reason todo.md gets rewritten on every step is to push the plan into the recent attention span [manus2025].

Takeaway: Planning is decomposition as a primitive — a subagent, a library step, or a continuously-rewritten file. Shape varies, slot stays.

Primitive 3 — Subagents

The third primitive is the ability to spawn an isolated worker for a sub-task, with a fresh context, a constrained tool set, and a structured return path back to the parent [dalc2025]. Subagents are the most-discussed primitive in the agent literature; the full mechanics live in chapter 04 — Coordinator Mode.

Why it exists: context isolation, parallelism, and role specialization. A monolithic agent that does research, planning, implementation, and verification in one conversation accumulates a context window full of every intermediate tool result. By turn fifty the planner is reasoning over a transcript longer than its useful attention window and decision accuracy degrades — Chroma’s context-rot research shows decision quality decays as a gradient against cumulative tokens, not a cliff [chroma-rot][manus2025]. Single-agent loops carrying 40K+ tokens of cumulative tool output measurably underperform isolated-context spawns; the 12.6pp Terminal Bench delta in chapter 03 is exactly the cost of carrying that much context into one agent [lch-harness2026]. A subagent runs in a fresh context populated only by its spawn prompt, returns a summary, and disappears. The parent never sees the intermediate steps. Context isolation is the cost-control mechanism and the accuracy mechanism in one move.

Four systems make subagents first-class. Claude Code exposes AgentTool as the spawn primitive, with three isolation modes (same-dir, worktree, remote), five built-in subagent types, and an XML envelope for the result [cci2026, §1, §2]. Deep Agents ships a subagent type as one of the four library primitives, with the spawn-and-return shape modelled after research-style hosted agents [dalc2025]. OpenAI Codex’s harness spawns workers as a routine part of long-horizon execution [oai2026]. Cursor’s “Automations” (2026) added event-driven agent spawning and cloud sandboxes — the same primitive in IDE form [hwc2026]. Manus, the fifth, uses subagents more sparingly; its filesystem-as-memory pattern reduces the pressure for context isolation that subagents otherwise solve [manus2025].

The full implementation — spawn primitive, fork prefix, IPC envelope, lock protocol, worktree lifecycle — lives in chapter 04. The common failure mode when subagents are missing is “context exhaustion”: one mega-agent accumulating everything until the planner reasons over a longer-than-useful window, decisions degrade, and the run either truncates aggressively (losing relevant history) or times out (Terminal Bench reports this for max-reasoning configurations [lch-harness2026]).

Takeaway: Subagents are how the harness gets context isolation and parallelism in one primitive. Four of the five systems make them first-class; chapter 04 is the deep-dive.

Primitive 4 — Filesystem

The fourth primitive is file-based external memory — an agent-writable directory that survives across tool calls, across subagent boundaries, and across restarts [dalc2025]. The filesystem is to the agent what disk is to the CPU: where state lives when it is too big or too long-lived to fit in RAM.

Why it exists: the context window is RAM, the filesystem is disk, and an agent needs both. Loading every relevant fact into the context window does not scale beyond a single task — context budgets are finite, attention degrades as input length grows [hwc2026, “Context Rot”], and “more context” is not a free upgrade. The filesystem is how the agent offloads what it does not need right now and reloads it on demand. It is also how state survives a process crash: a coordinator that writes its plan to a file can resume from that file; one that holds the plan in context starts over.

Five systems make filesystem-as-memory load-bearing — the primitive with the strongest convergence in the source map [hwc2026, Finding 2]. Claude Code uses .claude/scratch/ as a shared key-value namespace across workers (gated by tengu_scratch), .claude/worktrees/ for git-based isolation, and a memory directory with MEMORY.md plus topic files capped at 200 files and 25KB on the index [cci2026, §2, §3]. Deep Agents exposes a filesystem tool to the agent as one of the four library primitives and treats it as the primary memory layer [dalc2025]. Manus is the loudest on this point — the Manus blog calls the filesystem the primary memory layer of the agent and builds the context-engineering strategy around it [manus2025][boh-p5, §10]. OpenAI Codex uses the sandbox working directory as the agent’s persistent state across worker spawns [oai2026]. Cursor uses the repository itself plus .cursor/rules/*.mdc files as the agent-readable filesystem [boh-p1][boh-p5].

The common failure mode when filesystem is missing is “amnesia on restart” — the agent crashes mid-workflow and resumes with nothing, or completes a session and the next starts cold. Every “compounding learning” claim depends on a filesystem to compound into. Without it the agent learns nothing between sessions and the operational pattern becomes “run the agent, copy the useful output by hand, throw the session away.” This is why chapter 08 — Session-Memory Loop is filesystem at scale: ACE’s generator/reflector/curator loop, GCC’s memory-as-filesystem pattern — all rest on this primitive.

Takeaway: Filesystem is the primitive with the strongest convergence. Five systems make it load-bearing. It separates “agent that learns” from “agent that forgets on restart.”

The Convergence Matrix

Five rows. Four columns. Twenty cells. Nineteen fully implemented, one partial. Legend: an unmarked cell is implemented and load-bearing; △ is present-but-minor or opt-in rather than first-class.

System	Detailed Prompts	Planning	Subagents	Filesystem
Claude Code	~6000-char coordinator system prompt with role + XML envelope + concurrency rules [cci2026, §2]	`Plan` is a built-in one-shot subagent type in the registry [cci2026, §1]	`AgentTool` spawn primitive, 3 isolation modes, 5 built-in types [cci2026, §1]	`.claude/scratch/`, `.claude/worktrees/`, memory dir with MEMORY.md + topic files [cci2026, §2, §3]
LangChain Deep Agents	Ships canonical detailed prompts as library reference (deep-research, Claude Code prompts) [dalc2025]	Planning is one of the four library-shipped primitives [dalc2025]	`subagent` is a first-class library type with structured return [dalc2025]	Filesystem tool exposed to the agent as primary memory [dalc2025]
OpenAI Codex	Curated prompt published as part of the harness, per the OpenAI post [oai2026]; built by a 3–7 engineer team across ~1M lines / 5 months / 1500 PRs [hwc2026, Tier 2 row 9]	△ Implicit in prompt + harness rather than a named module, per the OpenAI post [oai2026]	Worker spawn used in long-horizon execution, per the OpenAI post [oai2026]; same 3–7 engineer harness team shipped it [hwc2026, Tier 2 row 9]	Sandbox working directory holds persistent state across spawns, per the OpenAI post [oai2026]
Manus	Curated KV-cache-friendly system prompt with stable prefix [manus2025][boh-p5, §10]	Continuously-rewritten `todo.md` as the planning artifact [manus2025][boh-p5, §10]	△ Used sparingly — filesystem-as-memory reduces the pressure for context isolation [manus2025]	Filesystem as primary memory layer — the loudest of the five on this point [manus2025][boh-p5, §10]
Cursor	`.cursor/rules/*.mdc` with YAML frontmatter as conditional prompt fragments [boh-p1]	△ Plan mode is opt-in rather than first-class [boh-p1]	“Automations” (2026) adds event-driven agent spawning + cloud sandboxes [boh-p5][hwc2026, Tier 2 row 17]	Repository itself + `.cursor/rules/*.mdc` as agent-readable filesystem [boh-p1][boh-p5]

Read across, the spelling drifts; read down, the slot stays.

“Detailed prompts” is a system prompt in three rows, MDC files in one, a curated prefix in one. Planning is a subagent in one row, a library step in one, a file in one, a mode in one. Filesystem is the cleanest column — every row spells it as filesystem, varying only in kind (worktree, sandbox, repo, todo file).

Takeaway: Twenty cells, nineteen fully filled and one partial, with the spelling varying and the slot staying. The matrix is the chapter’s load-bearing artifact.

What’s Missing From This List (And Why)

Several patterns appear in subsets of the five systems but did not earn a column above.

Verification loops — one source counts them as a fifth pattern across the same five systems [hwc2026, Finding 2]. Convergence is real (Anthropic’s SWE-bench numbers depend on it [anthropic-context2025], LangChain’s Terminal Bench result depends on it [lch-harness2026]), but the spelling diverges: a dedicated subagent (Claude Code’s verification agent [cci2026, §1]), middleware in others, a plan phase in others. Verification gets its own chapter later in the series.

Memory hierarchy is adjacent to filesystem. The distinction between scratchpad (within-task), session memory (across-task within-session), and long-term memory (across-session) is real, but is best treated as an elaboration of the filesystem primitive. Chapter 08 covers it.

Skills are the most-cited “fifth primitive” in 2026 writing. The progressive-disclosure pattern with 29% → 95% Claude Code pass-rate gains [lch-skills2026] is detailed prompts at scale, hosted on the filesystem. Skills are an extension of two primitives, not a separate fifth — see chapter 06.

Context engineering operations (write / select / compress / isolate [anthropic-context2025]) live one layer down — at the context layer — and were covered in chapter 01. The four primitives operate on top of those operations.

Takeaway: Four primitives are the lowest layer of agreement. Verification, memory hierarchy, skills, and context-engineering operations live elsewhere — each with its own chapter or layer-down placement.

What Happens When One Primitive Is Missing

Each primitive has a recognizable failure mode when absent. Four failure modes, four primitives. The mapping is one-to-one and operationally useful — when you see the symptom, you know which primitive to audit.

FAILURE MODES PER MISSING PRIMITIVE

Missing primitive  →  Failure mode             →  What to look for
─────────────────────────────────────────────────────────────────────
DETAILED PROMPT    →  GENERIC AGENT            →  asks permission instead of
                                                 acting; narrates in safety
                                                 voice; ignores harness tools

PLANNING           →  LOST IN EXECUTION         →  5+ tool calls deep, original
                                                 goal drifts; decisions made
                                                 against latest tool output

SUBAGENTS          →  CONTEXT EXHAUSTION        →  monolithic transcript, planner
                                                 reasoning over longer-than-
                                                 useful window; timeouts

FILESYSTEM         →  AMNESIA ON RESTART        →  crash mid-workflow → start
                                                 over; session-to-session no
                                                 compounding; copy-paste ops

Generic agent — detailed prompt missing or too thin. The agent treats every task like a fresh chat session. It asks “would you like me to proceed?” in the middle of a planned execution, narrates in safety voice, ignores tools the harness exposes. The fix is investing in the prompt as an artifact — versioned, telemetered, longer than a paragraph. Reference implementations: Claude Code’s coordinator prompt [cci2026, §2] and Deep Agents’ shipped canonical prompts [dalc2025].

Lost in execution — planning missing or implicit. The agent gets five tool calls deep, the original task drifts out of the recent attention span, the next decision is made against the most recent tool output rather than the original goal. Manus’s mitigation is the cleanest: rewrite todo.md on every step [manus2025]. Claude Code’s is structurally similar at the subagent layer: the Plan agent type produces a plan as a separate spawn [cci2026, §1].

Context exhaustion — subagents missing or under-used. One mega-agent accumulates research, planning, implementation, verification all into one transcript. By the verification step, the transcript is 40K+ tokens of mixed-relevance content and accuracy degrades. Terminal Bench reports this directly: max-reasoning configurations time out before completing tasks because the context fills [lch-harness2026]. The fix is subagent spawning with isolated contexts — see chapter 04.

Amnesia on restart — filesystem missing or ephemeral. The agent crashes mid-workflow and resumes with nothing. The next session starts cold, repeats yesterday’s mistakes. The operational tell is a team copy-pasting outputs from the agent into a shared doc by hand — that is the team doing the filesystem’s job. The fix is a writable directory the agent owns end-to-end, with lifecycle policy at the harness layer — see chapter 08.

Takeaway: Four failure modes map one-to-one to four missing primitives. Generic, lost, exhausted, amnesiac. When you see the symptom, audit the matching cell in your harness’s row.

Do This, Not That

Pattern	Naive implementation	Primitive-correct	Why
New agent project	One-line system prompt + model call	Detailed prompt as a versioned artifact with role, tools, examples, format contract	Generic prompt → generic agent; the prompt teaches the harness’s conventions [dalc2025]
Multi-step task	Ask the model and hope	Plan-then-execute with the plan as a re-readable artifact	The plan keeps the goal in the recent attention span across tool calls [manus2025]
Research + implementation in one agent	Single conversation, all tools, hope	Spawn a research subagent, return a summary, then spawn an implementation worker	Context isolation gives accuracy and cost control in one move [cci2026, §1]
State across tool calls	Hold it in the transcript	Write to `.scratch/<key>` or equivalent; reload on demand	Transcript is RAM; filesystem is disk; the agent needs both [manus2025]
Session-to-session learning	Re-paste yesterday’s context	Persist to a filesystem the harness curates (MEMORY.md, topic files, generator/reflector/curator)	Compounding requires a substrate to compound into [boh-p3, §8][cci2026, §3]
“More context” as the fix for flaky accuracy	Increase window, load everything	Move long-lived state to filesystem; load on demand	Context rot is a gradient — more tokens degrade decision accuracy [hwc2026, “Context Rot”]
Verification	Trust the model to declare done	Distinct verification step (subagent or middleware) before completion	Verification is the most-leveraged single harness change in the evidence [lch-harness2026][hwc2026, Finding 5]
Scaling out the agent	Make the agent bigger	Spawn more subagents; share filesystem; keep coordinator small	Subagents are how the harness scales horizontally without context bloat [cci2026, §1, §4]

Takeaway: For every “make the agent bigger” instinct there is a primitive-correct alternative. The four primitives are the menu.

Gotchas

Gotcha	Symptom	Fix
Treating system prompt as copywriting	Prompt changes daily; cache breaks on every change; bill spikes	Prompt is an architectural artifact with cache-break telemetry [cci2026, §4] and a dynamic-boundary marker. See chapter 07.
Implicit planning (“the model will figure it out”)	Agent flails on multi-step; original goal drifts after 5+ tool calls	Make planning explicit — subagent, file the harness rewrites, or middleware that re-injects the plan
Subagents with full parent tool inheritance	Worker has Bash + Edit + Write with no supervisor; one bad spawn does damage	Filter destructive tools from subagent spawns by default; explicit opt-in per spawn [cci2026, §1]
Filesystem with no lifecycle policy	Disk fills; old scratchpads poison new sessions; cross-session leakage	Eviction policy at the harness layer (Claude Code: 200-file cap + 25KB index cap, MEMORY.md byte/line caps) [cci2026, §3]
One-shot subagent type returning the chatty result	Tokens wasted on agentId / SendMessage / usage trailer on every spawn — ~135 chars × N	One-shot agents skip the trailer fields entirely (Claude Code does this for `Explore`, `Plan`, one more) [cci2026, §1]
Filesystem as “any writable directory”	Subagents step on each other’s keys; clobbered writes; silent loss	Per-worker key prefix or worktree isolation; concurrency is a prompt rule, not a scheduler invariant [cci2026, §2]
Confusing “skills” with a fifth primitive	Team builds a parallel skill system duplicating filesystem + prompt machinery	Skills are detailed-prompts on the filesystem with progressive disclosure — see chapter 06
Counting verification out because “the four primitives” don’t include it	Agent declares done without testing; regressions in production	Verification is the fifth slot in some framings and the most-leveraged harness change in the receipts [lch-harness2026][hwc2026, Finding 5]. Cover it explicitly — chapter 04 names a `verification` subagent type [cci2026, §1].

Takeaway: The four primitives compose; the gotchas live in the compositions. Lifecycle, cache, concurrency, and tool inheritance are where each primitive touches the next.

What the Four Primitives Teach About the Rest of the Series

Each primitive gets its own deep-dive later. Coordinator mode (ch04) is subagents. Skills as information architecture (ch06) is detailed prompts at scale. The session-memory loop (ch08) is filesystem at scale, with generator/reflector/curator as the compounding mechanism.

Takeaway: From here on, each chapter expands one of the four primitives into its production form. Hold the matrix as you read.

References

[dalc2025] Harrison Chase / LangChain, “Deep Agents,” December 2025. https://blog.langchain.com/deep-agents/ — Foundational essay naming the four primitives. Source for the framing, the library-shipped canonical prompts, and the filesystem-tool-as-primitive design.
[hwc2026] tacit-web/research/harness-engineering-deep-agents-ssr.md — Phase 4 findings, March 2026. Source for the five-system convergence claim (Finding 2: “Claude Code, Deep Agents, OpenAI Codex, Manus, Cursor all use: detailed prompts, planning, subagents, filesystem/external memory, and verification loops”), the source map (Tier 2 row 9 = OpenAI Codex InfoQ summary, Tier 2 row 17 = Cursor Automations TechCrunch coverage), and the “Additional Links” cross-pollination references showing Deep Agents reads the recreated Claude Code system prompt and Anthropic’s deep-research agent.
[cci2026] tacit-web/research/cc-internals/src-analysis-05-agents-coordination.md and src-analysis-03-memory-context.md. Direct source analysis of Claude Code, 2026-04-01. §1 AgentTool Implementation (built-in registry of 5 agent types including Plan and verification, three isolation modes), §2 Coordinator Mode (system prompt ~6000 chars per lines 111–369, XML envelope, tengu_scratch scratchpad), §3 Task System & File-based IPC, §4 KV Cache Forking. Plus src-analysis-03-memory-context.md for .claude/memory/ structure (MEMORY.md 200-line / 25KB caps, 200-file directory cap).
[manus2025] Manus, “Context Engineering for AI Agents: Lessons from Building Manus.” https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus — Source for continuous todo.md rewriting as the planning operation, filesystem as the primary memory layer, KV-cache-friendly stable-prefix prompt design.
[oai2026] OpenAI, “Harness Engineering with Codex.” https://openai.com/index/harness-engineering/ — Source for worker spawn as a routine of long-horizon execution, sandbox working directory as persistent state across spawns, and the “structure the agent around a plan” framing.
[lch-harness2026] LangChain, “Improving Deep Agents with Harness Engineering,” February 2026. https://blog.langchain.com/improving-deep-agents-with-harness-engineering/ — Source for Terminal Bench 2.0 results (52.8% → 66.5%), the timeout failure mode under max-reasoning configurations, and verification-loop framing.
[lch-skills2026] LangChain, “Skills” blog post, 2026. Source for the 29% → 95% result driven by progressive disclosure — used here to argue that skills are detailed-prompts hosted on the filesystem, not a separate fifth primitive.
[anthropic-context2025] Anthropic Applied AI Team, “Effective Context Engineering for AI Agents,” September 2025. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents — Source for the write / select / compress / isolate framing of context-layer operations.
[boh-p3] tacit-web/research/building-org-harness/phase3-compounding-moat.md — Internal research, March 2026. §8 (Session memory and decision logs as compounding loop — ACE generator/reflector/curator framing).
[boh-p1] tacit-web/research/building-org-harness/phase1-frameworks-tools.md — Internal research, March 2026. §2 (Cursor Rules .cursor/rules/*.mdc with YAML frontmatter, conditional loading, team-wide conventions as persistent system prompt).
[boh-p5] tacit-web/research/building-org-harness/phase5-case-studies.md — Internal research, March 2026. §8 (Salesforce Cursor adoption at 3,000-license scale), §10 (Manus AI case study — KV-cache optimization with stable prefixes, continuous todo.md rewriting, filesystem as primary memory). Also referenced for Cursor Automations adoption.
[chroma-rot] Chroma, “Context Rot” research. https://research.trychroma.com/context-rot — Source for the gradient-not-cliff framing of decision-accuracy decay as cumulative tokens increase.

Next chapter: 03 — The Reasoning Sandwich: Why More Thinking Made My Agent Worse

One question for the reader: Open your harness. For each of the four primitives — detailed prompts, planning, subagents, filesystem — can you point at the artifact your team owns, versions, and revises? Any primitive that resolves to “the model figures it out” is the one your harness is missing.