The ReAct Loop and Its 3 Stop Conditions

Read this as What makes this loop stop?

Failure Trap: Shipping Thought, Action, Observation without a max-iteration or terminal-error exit.
Decision Rule: Every production loop needs finish, budget, and terminal-error exits before it gets tools.

1 / ?

Thought makes the next step inspectable

ReAct starts with a thought: the model names what it knows and what it still needs. That is the difference between a hidden prompt response and a debuggable trajectory.

Input: a user goal, such as checking refund status
Thought: decide the next missing fact, not the final answer
Why it matters: operators can see why a tool was chosen

Action turns reasoning into a tool call

The model emits an action: a tool name plus arguments. In production, the tool schema and description are part of the prompt because they teach the model when the action is allowed.

Tool choice: search_orders, check_refund, or another capability
Arguments: concrete IDs, emails, dates, or structured parameters
Boundary: code executes the tool; the model does not bypass it

Observation feeds the next decision

The tool returns an observation. The agent appends that result to the conversation state, so the next thought can adapt instead of following a hardcoded pipeline.

Result: data such as an order ID, refund status, or empty match
State update: the next LLM call sees the observation
Agent behavior: choose the next action based on what actually happened

The loop continues only while useful

If the observation is not enough, the loop repeats: Thought → Action → Observation. This is the source of agent flexibility, but every repeat costs latency, tokens, and new failure surface.

Continue: ask another tool because the answer is still incomplete
Evaluate: after every observation, ask whether the loop should stop
Production rule: no unbounded loops

Production ReAct has three exits

The loop body is the easy half. The hard half is wiring all three termination paths before users touch the agent.

Finish[answer]: the model has enough information and returns the answer
max_iterations: the budget is reached; return partial state for review
tool_error: a terminal tool failure or guardrail trip returns structured failure

The missing stop condition is the incident

A ReAct loop without a ceiling can call the same broken tool over and over against the same empty observation. The model is not malicious; it is helpfully trying again from a state that cannot succeed.

Symptom: iteration count climbs while arguments do not change
Cost: token spend and latency rise before anyone gets an answer
Fix: cap iterations, log the stop reason, and return a structured outcome