Left: flat agent teams. Right: Manager / Coordinator / Agent.
Watch enough multi-agent demos and a pattern emerges: a “team” of LLM agents debates among themselves, produces something coherent on a clean prompt, and visibly degrades the moment the input gets non-trivial. They loop. They contradict. They forget what step they were on. They confidently produce output that solves the wrong problem.
The problem is not the agents. The problem is the topology.
I built a small outbound sales pipeline as a learning project — ai-sales-team-public — and the topology I keep coming back to is not the one in the demos. It is plainer, more hierarchical, and noticeably harder to break. The shape is older than the agent hype: supervisor trees in Erlang/OTP, planner-executor architectures in classical AI. LLMs did not invent it; they just made it interesting again.
The topology
Three layers, each with a single responsibility:
Manager sees the goal. Decomposes it into a plan of N tasks with explicit dependencies. Writes the plan as structured JSON. Does no work itself.
Coordinator sees one task. Owns its lifecycle: route it to the right tool, retry on failure, escalate if blocked, produce a structured result. Does no LLM-generative work itself; just orchestration.
Agent sees one tool call. Generates a single structured input, invokes the tool, parses the structured output. Cannot loop. Cannot decide what to do next.
What this fixes
Most multi-agent demos collapse one of three ways: circular waits (two agents stall waiting on each other’s output), drift (one agent quietly wanders off-task), or hallucinated coherence (the team converges on a confident wrong answer because no agent has the authority to disagree).
The hierarchy addresses each:
No circular waits because there is no peer communication. Information flows down (Manager → Coordinator → Agent) and results flow up. Coordinators do not talk to each other. When task B depends on task A’s output, the Manager owns that dependency in the plan graph and schedules accordingly; the Coordinators stay independent.
Bounded drift radius because each layer’s context is bounded. LLMs always drift; the hierarchy does not eliminate it, it just constrains where it can do damage. The Agent sees one tool — it cannot decide to “also do X.” The Coordinator sees one task. The Manager sees the plan, and only the plan. When drift happens it is contained to a single layer’s scope, where it shows up as a malformed output the next layer rejects, rather than as a quiet detour through the whole system.
Detectable structural failure because the Manager is the only authority on plan changes, and it is forced to produce structured plans (JSON schema, validated). Schemas only catch shape errors, not semantic wrongness — but in practice most plan failures I have seen show up first as shape errors (missing dependency, wrong tool name, malformed task), and you find out immediately rather than three steps downstream. Semantic wrongness still requires evals.
Where it breaks
I will not pretend this is the right topology for everything.
It is bad at exploration. If the goal is “research X and tell me what you find,” the Manager has to predict the structure of the answer up front. It cannot. You want a flat agent here, not a hierarchy.
It is bad at long-running negotiation. If the task involves real back-and-forth with an external party (a customer, another system that responds asynchronously), the Coordinator’s per-task lifecycle becomes a state machine you have to maintain explicitly. The hierarchy does not give you that for free.
Manager planning is the bottleneck. Every task starts with a Manager call to produce the plan. If your goal can be decomposed by a deterministic parser instead of an LLM, do that. The Manager exists for goals where decomposition itself is the hard part.
Cost scales with hierarchy depth. Manager call + N Coordinator calls + N Agent calls per task. For trivial work, this is over-engineered. For trivial work, do not use a multi-agent system.
The implementation choices that mattered
Three things I changed across iterations:
Structured I/O between layers, not natural language. The Manager produces JSON. The Coordinator consumes JSON, produces JSON. The Agent consumes JSON, invokes a typed tool, returns JSON. Natural-language handoffs between layers were the single biggest source of drift in early versions. Schemas with strict validation removed it.
Coordinator is not an LLM. Or rather: it does not have to be. In my pipeline most Coordinators are small Python state machines that route a task to one of three tools, retry on failure, time out cleanly. An LLM Coordinator costs 10x more and produces only marginally better routing on the workloads I tested. Use the smallest tool that works.
Agents own one tool. Not “the agent that uses tools.” The agent that uses this specific tool. When an Agent owns multiple tools it starts inventing reasons to use the more powerful one. Single-tool Agents are predictable. The Coordinator decides which Agent to call.
A concrete bite from the sales pipeline: an early Agent had access to both a “search company” tool and a “deep research” tool. Given an ambiguous prospect name it would default to deep research — three minutes and a few thousand tokens — when search was correct and would have settled the question in two seconds. Splitting that into two single-tool Agents and letting the Coordinator pick made the same workload roughly an order of magnitude cheaper, with no measurable quality drop.
What I would still change
A few things are open for me:
- Plan recovery. When a Coordinator fails irrecoverably, the Manager has to replan. My current replan logic is too eager — it replans the whole goal rather than the failed branch. That is wasteful. A proper diff-and-patch on the plan tree would be better.
- Eval. Same problem as the previous post. I tuned the topology against my own examples. A held-out eval set with measurable success criteria would tell me how much of the improvement is real.
- Observability. The hierarchy is a tree of LLM calls; debugging a failure requires stepping through every layer. Distributed-tracing-style instrumentation across the hierarchy is the obvious answer; I have not built it.
What this is not
This is not a framework. It is a topology. I keep seeing new multi-agent libraries that bake the M/C/A pattern into their abstraction; my read is that this is premature. The pattern is small enough to write directly in your application, with just the libraries you already use for tool calls, schema validation, and retries. A framework adds vocabulary you have to learn and abstractions you have to fight when the topology does not fit.
The repo is at github.com/ketankhairnar/ai-sales-team-public. It is a learning project — same caveats as the cache. The topology, however, has survived enough variations of the workload that I now reach for it first.