I/D/E · harness-engineering

The Four Primitives Every Working Agent System Has

Summary

Five independent teams — Claude Code, LangChain Deep Agents, OpenAI Codex, Manus, Cursor — converged on the same four primitives between 2024 and 2026. Detailed prompts, planning, subagents, filesystem. The convergence is the evidence: when five teams with different goals on different stacks arrive at the same shape, the shape is load-bearing, not stylistic. Here is what each primitive does, how each system implements it, and what fails when one is missing.

Prerequisite: This is Part 02 of the Harness Engineering deep dive. Read Part 01: What a Harness Actually Is first if you have not seen the four-layer stack.

Five systems, four primitives, twenty cells filled

Five systems. Five teams. Two years. Every cell filled — and the cells are the evidence.

Why This Matters

Every “agent loop” tutorial shows you observe-think-act and stops there. What ships in production is bigger than a loop. The ReAct paper gave us a control structure for one agent making one decision; what five separate teams shipped between 2024 and 2026 looks nothing like ReAct’s diagram. The diagram became table stakes. The systems on top of it got specific.

Three framings this chapter counters. The “10 primitives” lists that LinkedIn agent posts produce every other week — they conflate features with primitives, listing “RAG” alongside “memory” alongside “evaluation,” ending in a checklist no working system implements top-to-bottom. The “ReAct is enough” claim — useful for one-decision agents, silent on everything above one decision. And the “the agent is just a loop” framing — true at one level of abstraction and unhelpful at every other. None predict why five independent teams ended up with the same four boxes.

The four primitives are LangChain’s framing in their “Deep Agents” essay: detailed prompts, planning, subagents, filesystem [dalc2025]. The contribution here is not the list — Harrison Chase wrote the list. The contribution is the five-system audit: 5 × 4 = 20 cells. Every cell cited [hwc2026, Finding 2]. Five teams with different goals (CLI coding tool, open framework, hosted research agent, autonomous PC use, IDE), different stacks (Bun, Python, Python, custom, custom), and different funding shapes converged on the same four boxes. The boxes are load-bearing, not stylistic.

Takeaway: The primitives are not a checklist. They are a converged-on architecture across five independent shipped systems. Convergence is the evidence.

What “Convergence” Actually Means Here

Five systems agree on the same four primitives. “Agreement” is a weak word in software architecture, so the rest of this section earns it.

The agreement is on the slot, not the spelling. Claude Code spells “filesystem” as .claude/scratch/ plus worktrees and a memory directory [cci2026, §1, §2, §3]. Deep Agents spells it as a filesystem tool exposed to the agent [dalc2025]. Manus spells it as continuous todo.md rewriting and a KV-cache-friendly append-only context where the disk holds the long tail [manus2025][boh-p5, §10]. OpenAI Codex spells it as the working directory of the harness’s sandbox [oai2026]. Cursor spells it as .cursor/rules/*.mdc plus the repo itself [boh-p1]. The implementations differ. The slot is the same.

SLOT VS SPELLING — FOUR PRIMITIVES, FIVE SYSTEMS
SLOT              CC                  Deep Agents       Codex            Manus              Cursor

detailed prompts  coordinator         canonical         curated          stable-prefix      .cursor/rules
                system prompt       library prompts   prompt           KV-friendly         /*.mdc

planning          TodoWrite +         planning step     implicit /       todo.md             opt-in
                Plan subagent                         in prompt        rewrite             plan mode

subagents         AgentTool           subagent_type     worker spawn     △ sparingly        Automations
                                                                                            spawn

filesystem        .claude/scratch/    filesystem tool   sandbox          todo.md +          .cursor/
                + worktrees                           working dir      filesystem bus     rules/ + repo

Read across, the spelling drifts; read down, the slot stays — same diagnostic the matrix later quantifies cell-by-cell.

The agreements are also unequal. Strongest convergence is on filesystem and subagents — every one of the five exposes both, with a public artifact or source-derived analysis confirming it [hwc2026, Finding 2]. Weakest is on planning: Cursor’s planning is opt-in rather than first-class; OpenAI Codex’s planning is implicit in its prompt rather than a named module. The SSR analysis across these five systems adds a fifth pattern — verification — that the Deep Agents post does not call a primitive but every system implements in some form [hwc2026, Finding 2]. This chapter covers the four primitives Deep Agents named [dalc2025] and treats verification as its own chapter.

The agreements are also temporal. Claude Code shipped in 2024; Deep Agents in late 2025; OpenAI’s harness-engineering post in early 2026; Manus through 2025; Cursor across 2025–2026. No vendor announced four primitives at a conference and asked the rest to follow. Five teams converged on the same four boxes via cross-pollination rather than coordination — each shipped publicly, each read the others (the Deep Agents post explicitly references the recreated Claude Code system prompt and Anthropic’s deep-research agent [hwc2026, additional links][dalc2025]), and the resulting convergence is stronger evidence of the boxes than if they had never seen each other’s work.

Takeaway: Five systems agree on the slot, not the spelling. We cover the four primitives with the strongest agreement and defer verification to its own chapter.

Primitive 1 — Detailed Prompts

The first primitive is a system prompt that encodes the agent’s role, behaviour, tools, examples, and edge cases — measured in thousands of characters, not hundreds [dalc2025]. The Deep Agents essay names the distinction directly: a “shallow” agent has a one-liner system prompt; a “deep” agent has a multi-thousand-character document with role, constraints, tool descriptions, format requirements, and worked examples [dalc2025].

Why it is a primitive and not a tuning concern: minimal prompts produce minimal behaviour. A model with a generic prompt picks generic strategies, narrates verbosely, asks clarifying questions instead of acting, and treats every task as a fresh start. The detailed prompt is how the harness teaches the model the harness’s own conventions — that the agent has a todo.md, that it should call Read before Edit, that verification is a separate phase. None of that is in the weights.

Two systems make the prompt-as-architecture choice visible. Claude Code’s coordinator mode ships a system prompt of roughly 6000 characters covering the coordinator’s role, the worker-result XML envelope, three concurrency rules, and the tengu_scratch scratchpad contract [cci2026, §2]. The file contains lines like “Your job is to direct workers to research, implement and verify code changes” alongside an explicit XML schema for <task-notification> — prose and contract in the same artifact. LangChain Deep Agents ships canonical detailed prompts as part of the library, referencing the system prompts used in Anthropic’s deep-research agent and Claude Code [dalc2025]. Manus follows the same pattern with a KV-cache-friendly stable-prefix prompt — a design constraint that simple few-shot tutorials never mention [manus2025].

The common failure mode when this primitive is thin is the “generic agent” symptom: an assistant that asks for permission instead of acting, ignores the harness’s tools, and reverts to safety-narration voice on every multi-step task. The fix is treating the prompt as an artifact — versioned, telemetered, revised when behaviour drifts. The cache discipline in chapter 07 is the operational form of that ownership.

Takeaway: Detailed prompts are how the harness teaches the model its own conventions. Claude Code and Deep Agents ship them publicly enough to study; the prompt is an artifact, not a tuning concern.

Primitive 2 — Planning

The second primitive is explicit plan-then-execute decomposition — a planner produces a list, an executor runs the list [dalc2025]. The Deep Agents essay names this as one of the four; the OpenAI harness-engineering post frames it as the difference between “ask the model and hope” and “structure the agent around a plan” [dalc2025][oai2026].

Why it exists: multi-step tasks decompose better than they single-pass. A model asked to “fix the failing test” in one shot will sometimes succeed and frequently flail. The same model asked to “list the steps, then execute in order” succeeds materially more often, because the plan is a cheap artifact that constrains the rest of the conversation. The plan is also re-readable — when step 4 fails, the agent goes back to the plan, not the original prompt. Planning gives the agent a stable goal independent of the most recent tool result.

Three systems make planning a named module rather than an implicit phase. Claude Code’s built-in agent registry has five entries — general-purpose, Explore, Plan, verification, claudeCodeGuide — with Plan listed as a planning subagent that is one-shot (no agentId, no SendMessage trailer, no usage block) to save tokens at scale [cci2026, §1]. The fact that Plan is its own subagent type, not a prompt prefix, is the architectural choice. Deep Agents builds planning in as one of the four library-shipped primitives, with the planner as a distinct step in the loop [dalc2025]. Manus encodes planning as a continuously-rewritten todo.md that the agent updates between steps — the planner is not an agent but a file the harness keeps refreshed [manus2025][boh-p5, §10]. Three spellings of the same primitive: a subagent, a library module, a file.

The common failure mode when planning is missing is “lost in execution”: the agent gets five tool calls deep, the original task drifts out of the recent attention span, and the next decision is made against the most recent tool output instead of the original goal. The plan is what keeps the goal in the recent attention span. Without it the goal is one line at turn zero, fighting for visibility against tens of thousands of tokens of tool results. Manus names this directly — the reason todo.md gets rewritten on every step is to push the plan into the recent attention span [manus2025].

Takeaway: Planning is decomposition as a primitive — a subagent, a library step, or a continuously-rewritten file. Shape varies, slot stays.

Primitive 3 — Subagents

The third primitive is the ability to spawn an isolated worker for a sub-task, with a fresh context, a constrained tool set, and a structured return path back to the parent [dalc2025]. Subagents are the most-discussed primitive in the agent literature; the full mechanics live in chapter 04 — Coordinator Mode.

Why it exists: context isolation, parallelism, and role specialization. A monolithic agent that does research, planning, implementation, and verification in one conversation accumulates a context window full of every intermediate tool result. By turn fifty the planner is reasoning over a transcript longer than its useful attention window and decision accuracy degrades — Chroma’s context-rot research shows decision quality decays as a gradient against cumulative tokens, not a cliff [chroma-rot][manus2025]. Single-agent loops carrying 40K+ tokens of cumulative tool output measurably underperform isolated-context spawns; the 12.6pp Terminal Bench delta in chapter 03 is exactly the cost of carrying that much context into one agent [lch-harness2026]. A subagent runs in a fresh context populated only by its spawn prompt, returns a summary, and disappears. The parent never sees the intermediate steps. Context isolation is the cost-control mechanism and the accuracy mechanism in one move.

Four systems make subagents first-class. Claude Code exposes AgentTool as the spawn primitive, with three isolation modes (same-dir, worktree, remote), five built-in subagent types, and an XML envelope for the result [cci2026, §1, §2]. Deep Agents ships a subagent type as one of the four library primitives, with the spawn-and-return shape modelled after research-style hosted agents [dalc2025]. OpenAI Codex’s harness spawns workers as a routine part of long-horizon execution [oai2026]. Cursor’s “Automations” (2026) added event-driven agent spawning and cloud sandboxes — the same primitive in IDE form [hwc2026]. Manus, the fifth, uses subagents more sparingly; its filesystem-as-memory pattern reduces the pressure for context isolation that subagents otherwise solve [manus2025].

The full implementation — spawn primitive, fork prefix, IPC envelope, lock protocol, worktree lifecycle — lives in chapter 04. The common failure mode when subagents are missing is “context exhaustion”: one mega-agent accumulating everything until the planner reasons over a longer-than-useful window, decisions degrade, and the run either truncates aggressively (losing relevant history) or times out (Terminal Bench reports this for max-reasoning configurations [lch-harness2026]).

Takeaway: Subagents are how the harness gets context isolation and parallelism in one primitive. Four of the five systems make them first-class; chapter 04 is the deep-dive.

Primitive 4 — Filesystem

The fourth primitive is file-based external memory — an agent-writable directory that survives across tool calls, across subagent boundaries, and across restarts [dalc2025]. The filesystem is to the agent what disk is to the CPU: where state lives when it is too big or too long-lived to fit in RAM.

Why it exists: the context window is RAM, the filesystem is disk, and an agent needs both. Loading every relevant fact into the context window does not scale beyond a single task — context budgets are finite, attention degrades as input length grows [hwc2026, “Context Rot”], and “more context” is not a free upgrade. The filesystem is how the agent offloads what it does not need right now and reloads it on demand. It is also how state survives a process crash: a coordinator that writes its plan to a file can resume from that file; one that holds the plan in context starts over.

Five systems make filesystem-as-memory load-bearing — the primitive with the strongest convergence in the source map [hwc2026, Finding 2]. Claude Code uses .claude/scratch/ as a shared key-value namespace across workers (gated by tengu_scratch), .claude/worktrees/ for git-based isolation, and a memory directory with MEMORY.md plus topic files capped at 200 files and 25KB on the index [cci2026, §2, §3]. Deep Agents exposes a filesystem tool to the agent as one of the four library primitives and treats it as the primary memory layer [dalc2025]. Manus is the loudest on this point — the Manus blog calls the filesystem the primary memory layer of the agent and builds the context-engineering strategy around it [manus2025][boh-p5, §10]. OpenAI Codex uses the sandbox working directory as the agent’s persistent state across worker spawns [oai2026]. Cursor uses the repository itself plus .cursor/rules/*.mdc files as the agent-readable filesystem [boh-p1][boh-p5].

The common failure mode when filesystem is missing is “amnesia on restart” — the agent crashes mid-workflow and resumes with nothing, or completes a session and the next starts cold. Every “compounding learning” claim depends on a filesystem to compound into. Without it the agent learns nothing between sessions and the operational pattern becomes “run the agent, copy the useful output by hand, throw the session away.” This is why chapter 08 — Session-Memory Loop is filesystem at scale: ACE’s generator/reflector/curator loop, GCC’s memory-as-filesystem pattern — all rest on this primitive.

Takeaway: Filesystem is the primitive with the strongest convergence. Five systems make it load-bearing. It separates “agent that learns” from “agent that forgets on restart.”

The Convergence Matrix

Five rows. Four columns. Twenty cells. Nineteen fully implemented, one partial. Legend: an unmarked cell is implemented and load-bearing; is present-but-minor or opt-in rather than first-class.

SystemDetailed PromptsPlanningSubagentsFilesystem
Claude Code~6000-char coordinator system prompt with role + XML envelope + concurrency rules [cci2026, §2]Plan is a built-in one-shot subagent type in the registry [cci2026, §1]AgentTool spawn primitive, 3 isolation modes, 5 built-in types [cci2026, §1].claude/scratch/, .claude/worktrees/, memory dir with MEMORY.md + topic files [cci2026, §2, §3]
LangChain Deep AgentsShips canonical detailed prompts as library reference (deep-research, Claude Code prompts) [dalc2025]Planning is one of the four library-shipped primitives [dalc2025]subagent is a first-class library type with structured return [dalc2025]Filesystem tool exposed to the agent as primary memory [dalc2025]
OpenAI CodexCurated prompt published as part of the harness, per the OpenAI post [oai2026]; built by a 3–7 engineer team across ~1M lines / 5 months / 1500 PRs [hwc2026, Tier 2 row 9]△ Implicit in prompt + harness rather than a named module, per the OpenAI post [oai2026]Worker spawn used in long-horizon execution, per the OpenAI post [oai2026]; same 3–7 engineer harness team shipped it [hwc2026, Tier 2 row 9]Sandbox working directory holds persistent state across spawns, per the OpenAI post [oai2026]
ManusCurated KV-cache-friendly system prompt with stable prefix [manus2025][boh-p5, §10]Continuously-rewritten todo.md as the planning artifact [manus2025][boh-p5, §10]△ Used sparingly — filesystem-as-memory reduces the pressure for context isolation [manus2025]Filesystem as primary memory layer — the loudest of the five on this point [manus2025][boh-p5, §10]
Cursor.cursor/rules/*.mdc with YAML frontmatter as conditional prompt fragments [boh-p1]△ Plan mode is opt-in rather than first-class [boh-p1]“Automations” (2026) adds event-driven agent spawning + cloud sandboxes [boh-p5][hwc2026, Tier 2 row 17]Repository itself + .cursor/rules/*.mdc as agent-readable filesystem [boh-p1][boh-p5]

Read across, the spelling drifts; read down, the slot stays.

“Detailed prompts” is a system prompt in three rows, MDC files in one, a curated prefix in one. Planning is a subagent in one row, a library step in one, a file in one, a mode in one. Filesystem is the cleanest column — every row spells it as filesystem, varying only in kind (worktree, sandbox, repo, todo file).

Takeaway: Twenty cells, nineteen fully filled and one partial, with the spelling varying and the slot staying. The matrix is the chapter’s load-bearing artifact.

What’s Missing From This List (And Why)

Several patterns appear in subsets of the five systems but did not earn a column above.

Verification loops — one source counts them as a fifth pattern across the same five systems [hwc2026, Finding 2]. Convergence is real (Anthropic’s SWE-bench numbers depend on it [anthropic-context2025], LangChain’s Terminal Bench result depends on it [lch-harness2026]), but the spelling diverges: a dedicated subagent (Claude Code’s verification agent [cci2026, §1]), middleware in others, a plan phase in others. Verification gets its own chapter later in the series.

Memory hierarchy is adjacent to filesystem. The distinction between scratchpad (within-task), session memory (across-task within-session), and long-term memory (across-session) is real, but is best treated as an elaboration of the filesystem primitive. Chapter 08 covers it.

Skills are the most-cited “fifth primitive” in 2026 writing. The progressive-disclosure pattern with 29% → 95% Claude Code pass-rate gains [lch-skills2026] is detailed prompts at scale, hosted on the filesystem. Skills are an extension of two primitives, not a separate fifth — see chapter 06.

Context engineering operations (write / select / compress / isolate [anthropic-context2025]) live one layer down — at the context layer — and were covered in chapter 01. The four primitives operate on top of those operations.

Takeaway: Four primitives are the lowest layer of agreement. Verification, memory hierarchy, skills, and context-engineering operations live elsewhere — each with its own chapter or layer-down placement.

What Happens When One Primitive Is Missing

Each primitive has a recognizable failure mode when absent. Four failure modes, four primitives. The mapping is one-to-one and operationally useful — when you see the symptom, you know which primitive to audit.

FAILURE MODES PER MISSING PRIMITIVE
Missing primitive    Failure mode               What to look for

DETAILED PROMPT      GENERIC AGENT              asks permission instead of
                                                 acting; narrates in safety
                                                 voice; ignores harness tools

PLANNING             LOST IN EXECUTION           5+ tool calls deep, original
                                                 goal drifts; decisions made
                                                 against latest tool output

SUBAGENTS            CONTEXT EXHAUSTION          monolithic transcript, planner
                                                 reasoning over longer-than-
                                                 useful window; timeouts

FILESYSTEM           AMNESIA ON RESTART          crash mid-workflow  start
                                                 over; session-to-session no
                                                 compounding; copy-paste ops

Generic agent — detailed prompt missing or too thin. The agent treats every task like a fresh chat session. It asks “would you like me to proceed?” in the middle of a planned execution, narrates in safety voice, ignores tools the harness exposes. The fix is investing in the prompt as an artifact — versioned, telemetered, longer than a paragraph. Reference implementations: Claude Code’s coordinator prompt [cci2026, §2] and Deep Agents’ shipped canonical prompts [dalc2025].

Lost in execution — planning missing or implicit. The agent gets five tool calls deep, the original task drifts out of the recent attention span, the next decision is made against the most recent tool output rather than the original goal. Manus’s mitigation is the cleanest: rewrite todo.md on every step [manus2025]. Claude Code’s is structurally similar at the subagent layer: the Plan agent type produces a plan as a separate spawn [cci2026, §1].

Context exhaustion — subagents missing or under-used. One mega-agent accumulates research, planning, implementation, verification all into one transcript. By the verification step, the transcript is 40K+ tokens of mixed-relevance content and accuracy degrades. Terminal Bench reports this directly: max-reasoning configurations time out before completing tasks because the context fills [lch-harness2026]. The fix is subagent spawning with isolated contexts — see chapter 04.

Amnesia on restart — filesystem missing or ephemeral. The agent crashes mid-workflow and resumes with nothing. The next session starts cold, repeats yesterday’s mistakes. The operational tell is a team copy-pasting outputs from the agent into a shared doc by hand — that is the team doing the filesystem’s job. The fix is a writable directory the agent owns end-to-end, with lifecycle policy at the harness layer — see chapter 08.

Takeaway: Four failure modes map one-to-one to four missing primitives. Generic, lost, exhausted, amnesiac. When you see the symptom, audit the matching cell in your harness’s row.

Do This, Not That

PatternNaive implementationPrimitive-correctWhy
New agent projectOne-line system prompt + model callDetailed prompt as a versioned artifact with role, tools, examples, format contractGeneric prompt → generic agent; the prompt teaches the harness’s conventions [dalc2025]
Multi-step taskAsk the model and hopePlan-then-execute with the plan as a re-readable artifactThe plan keeps the goal in the recent attention span across tool calls [manus2025]
Research + implementation in one agentSingle conversation, all tools, hopeSpawn a research subagent, return a summary, then spawn an implementation workerContext isolation gives accuracy and cost control in one move [cci2026, §1]
State across tool callsHold it in the transcriptWrite to .scratch/<key> or equivalent; reload on demandTranscript is RAM; filesystem is disk; the agent needs both [manus2025]
Session-to-session learningRe-paste yesterday’s contextPersist to a filesystem the harness curates (MEMORY.md, topic files, generator/reflector/curator)Compounding requires a substrate to compound into [boh-p3, §8][cci2026, §3]
“More context” as the fix for flaky accuracyIncrease window, load everythingMove long-lived state to filesystem; load on demandContext rot is a gradient — more tokens degrade decision accuracy [hwc2026, “Context Rot”]
VerificationTrust the model to declare doneDistinct verification step (subagent or middleware) before completionVerification is the most-leveraged single harness change in the evidence [lch-harness2026][hwc2026, Finding 5]
Scaling out the agentMake the agent biggerSpawn more subagents; share filesystem; keep coordinator smallSubagents are how the harness scales horizontally without context bloat [cci2026, §1, §4]

Takeaway: For every “make the agent bigger” instinct there is a primitive-correct alternative. The four primitives are the menu.

Gotchas

GotchaSymptomFix
Treating system prompt as copywritingPrompt changes daily; cache breaks on every change; bill spikesPrompt is an architectural artifact with cache-break telemetry [cci2026, §4] and a dynamic-boundary marker. See chapter 07.
Implicit planning (“the model will figure it out”)Agent flails on multi-step; original goal drifts after 5+ tool callsMake planning explicit — subagent, file the harness rewrites, or middleware that re-injects the plan
Subagents with full parent tool inheritanceWorker has Bash + Edit + Write with no supervisor; one bad spawn does damageFilter destructive tools from subagent spawns by default; explicit opt-in per spawn [cci2026, §1]
Filesystem with no lifecycle policyDisk fills; old scratchpads poison new sessions; cross-session leakageEviction policy at the harness layer (Claude Code: 200-file cap + 25KB index cap, MEMORY.md byte/line caps) [cci2026, §3]
One-shot subagent type returning the chatty resultTokens wasted on agentId / SendMessage / usage trailer on every spawn — ~135 chars × NOne-shot agents skip the trailer fields entirely (Claude Code does this for Explore, Plan, one more) [cci2026, §1]
Filesystem as “any writable directory”Subagents step on each other’s keys; clobbered writes; silent lossPer-worker key prefix or worktree isolation; concurrency is a prompt rule, not a scheduler invariant [cci2026, §2]
Confusing “skills” with a fifth primitiveTeam builds a parallel skill system duplicating filesystem + prompt machinerySkills are detailed-prompts on the filesystem with progressive disclosure — see chapter 06
Counting verification out because “the four primitives” don’t include itAgent declares done without testing; regressions in productionVerification is the fifth slot in some framings and the most-leveraged harness change in the receipts [lch-harness2026][hwc2026, Finding 5]. Cover it explicitly — chapter 04 names a verification subagent type [cci2026, §1].

Takeaway: The four primitives compose; the gotchas live in the compositions. Lifecycle, cache, concurrency, and tool inheritance are where each primitive touches the next.

What the Four Primitives Teach About the Rest of the Series

Each primitive gets its own deep-dive later. Coordinator mode (ch04) is subagents. Skills as information architecture (ch06) is detailed prompts at scale. The session-memory loop (ch08) is filesystem at scale, with generator/reflector/curator as the compounding mechanism.

Takeaway: From here on, each chapter expands one of the four primitives into its production form. Hold the matrix as you read.

References

  1. [dalc2025] Harrison Chase / LangChain, “Deep Agents,” December 2025. https://blog.langchain.com/deep-agents/ — Foundational essay naming the four primitives. Source for the framing, the library-shipped canonical prompts, and the filesystem-tool-as-primitive design.
  2. [hwc2026] tacit-web/research/harness-engineering-deep-agents-ssr.md — Phase 4 findings, March 2026. Source for the five-system convergence claim (Finding 2: “Claude Code, Deep Agents, OpenAI Codex, Manus, Cursor all use: detailed prompts, planning, subagents, filesystem/external memory, and verification loops”), the source map (Tier 2 row 9 = OpenAI Codex InfoQ summary, Tier 2 row 17 = Cursor Automations TechCrunch coverage), and the “Additional Links” cross-pollination references showing Deep Agents reads the recreated Claude Code system prompt and Anthropic’s deep-research agent.
  3. [cci2026] tacit-web/research/cc-internals/src-analysis-05-agents-coordination.md and src-analysis-03-memory-context.md. Direct source analysis of Claude Code, 2026-04-01. §1 AgentTool Implementation (built-in registry of 5 agent types including Plan and verification, three isolation modes), §2 Coordinator Mode (system prompt ~6000 chars per lines 111–369, XML envelope, tengu_scratch scratchpad), §3 Task System & File-based IPC, §4 KV Cache Forking. Plus src-analysis-03-memory-context.md for .claude/memory/ structure (MEMORY.md 200-line / 25KB caps, 200-file directory cap).
  4. [manus2025] Manus, “Context Engineering for AI Agents: Lessons from Building Manus.” https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus — Source for continuous todo.md rewriting as the planning operation, filesystem as the primary memory layer, KV-cache-friendly stable-prefix prompt design.
  5. [oai2026] OpenAI, “Harness Engineering with Codex.” https://openai.com/index/harness-engineering/ — Source for worker spawn as a routine of long-horizon execution, sandbox working directory as persistent state across spawns, and the “structure the agent around a plan” framing.
  6. [lch-harness2026] LangChain, “Improving Deep Agents with Harness Engineering,” February 2026. https://blog.langchain.com/improving-deep-agents-with-harness-engineering/ — Source for Terminal Bench 2.0 results (52.8% → 66.5%), the timeout failure mode under max-reasoning configurations, and verification-loop framing.
  7. [lch-skills2026] LangChain, “Skills” blog post, 2026. Source for the 29% → 95% result driven by progressive disclosure — used here to argue that skills are detailed-prompts hosted on the filesystem, not a separate fifth primitive.
  8. [anthropic-context2025] Anthropic Applied AI Team, “Effective Context Engineering for AI Agents,” September 2025. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents — Source for the write / select / compress / isolate framing of context-layer operations.
  9. [boh-p3] tacit-web/research/building-org-harness/phase3-compounding-moat.md — Internal research, March 2026. §8 (Session memory and decision logs as compounding loop — ACE generator/reflector/curator framing).
  10. [boh-p1] tacit-web/research/building-org-harness/phase1-frameworks-tools.md — Internal research, March 2026. §2 (Cursor Rules .cursor/rules/*.mdc with YAML frontmatter, conditional loading, team-wide conventions as persistent system prompt).
  11. [boh-p5] tacit-web/research/building-org-harness/phase5-case-studies.md — Internal research, March 2026. §8 (Salesforce Cursor adoption at 3,000-license scale), §10 (Manus AI case study — KV-cache optimization with stable prefixes, continuous todo.md rewriting, filesystem as primary memory). Also referenced for Cursor Automations adoption.
  12. [chroma-rot] Chroma, “Context Rot” research. https://research.trychroma.com/context-rot — Source for the gradient-not-cliff framing of decision-accuracy decay as cumulative tokens increase.

Next chapter: 03 — The Reasoning Sandwich: Why More Thinking Made My Agent Worse

One question for the reader: Open your harness. For each of the four primitives — detailed prompts, planning, subagents, filesystem — can you point at the artifact your team owns, versions, and revises? Any primitive that resolves to “the model figures it out” is the one your harness is missing.

Harness-engineering Ch 3/13
  1. 1 Harness Engineering — What This Series Is, and Why You Should Read It in Order 12m
  2. 2 What a Harness Actually Is (and What It Is Not) 20m
  3. 3 The Four Primitives Every Working Agent System Has 28m
  4. 4 The Reasoning Sandwich: Why More Thinking Made My Agent Worse 18m
  5. 5 Coordinator Mode: A Working Multi-Agent System, From the Source 32m
  6. 6 Replay Safety: The Bug That Breaks Every HITL Workflow 26m
  7. 7 Skills as Information Architecture, Not Features 22m
  8. 8 Prompt Cache Is Architecture: Designing Around the 50K-Token Mistake 22m
  9. 9 The Session-Memory Feedback Loop (ACE + Codified Context) 26m
  10. 10 The Org-Harness Thesis: Why Context Does Not Transfer 26m
  11. 11 The Numbers That Killed the 'Wait for Better Models' Excuse 14m
  12. 12 Build Your Own Harness: A 6-Week Plan for a 3-Person Team 30m
  13. 13 The Ten Pitfalls (and How to See Them Coming) 20m