Read this as What makes this loop stop?
- Failure Trap
- Shipping Thought, Action, Observation without a max-iteration or terminal-error exit.
- Decision Rule
- Every production loop needs finish, budget, and terminal-error exits before it gets tools.
Thought makes the next step inspectable
ReAct starts with a thought: the model names what it knows and what it still needs. That is the difference between a hidden prompt response and a debuggable trajectory.
- Input: a user goal, such as checking refund status
- Thought: decide the next missing fact, not the final answer
- Why it matters: operators can see why a tool was chosen
Action turns reasoning into a tool call
The model emits an action: a tool name plus arguments. In production, the tool schema and description are part of the prompt because they teach the model when the action is allowed.
- Tool choice:
search_orders,check_refund, or another capability - Arguments: concrete IDs, emails, dates, or structured parameters
- Boundary: code executes the tool; the model does not bypass it
Observation feeds the next decision
The tool returns an observation. The agent appends that result to the conversation state, so the next thought can adapt instead of following a hardcoded pipeline.
- Result: data such as an order ID, refund status, or empty match
- State update: the next LLM call sees the observation
- Agent behavior: choose the next action based on what actually happened
The loop continues only while useful
If the observation is not enough, the loop repeats: Thought → Action → Observation. This is the source of agent flexibility, but every repeat costs latency, tokens, and new failure surface.
- Continue: ask another tool because the answer is still incomplete
- Evaluate: after every observation, ask whether the loop should stop
- Production rule: no unbounded loops
Production ReAct has three exits
The loop body is the easy half. The hard half is wiring all three termination paths before users touch the agent.
-
Finish[answer]: the model has enough information and returns the answer -
max_iterations: the budget is reached; return partial state for review -
tool_error: a terminal tool failure or guardrail trip returns structured failure
The missing stop condition is the incident
A ReAct loop without a ceiling can call the same broken tool over and over against the same empty observation. The model is not malicious; it is helpfully trying again from a state that cannot succeed.
- Symptom: iteration count climbs while arguments do not change
- Cost: token spend and latency rise before anyone gets an answer
- Fix: cap iterations, log the stop reason, and return a structured outcome