The ReAct Loop and Its 3 Stop Conditions

Thought, Action, Observation is only half of ReAct. Production agents also need explicit Finish, max-iteration, and terminal-error exits.

Read this as What makes this loop stop?
Failure Trap
Shipping Thought, Action, Observation without a max-iteration or terminal-error exit.
Decision Rule
Every production loop needs finish, budget, and terminal-error exits before it gets tools.
The ReAct loop and its three stop conditions A six-step explainer. First the model thinks about the user goal, then emits an action, then observes the tool result. If it still lacks the answer, it loops. A production loop exits by Finish answer, max iterations, or terminal tool error. Without those stops, the agent repeats a failing tool call until it becomes an incident. 1. Thought picks next step User: refund status? Thought Need order first Reasoning is explicit 2. Action calls a tool Action search_orders Tool email Model chooses which function and arguments 3. Observation updates state Tool returns Observation order 456 The next thought sees status: refund_requested 4. Not done? Loop again Thought what next? Action check_refund Observe approved Continue only while useful 5. Production has 3 exits Finish[answer] model has enough information max_iterations return partial state for review tool_error structured failure, no retry Every loop needs all three 6. No stop becomes incident search_orders(BAD_ID) empty observation Iteration 47 same tool, same args cost and latency keep rising Missing exit = runaway loop
1 / ?

Thought makes the next step inspectable

ReAct starts with a thought: the model names what it knows and what it still needs. That is the difference between a hidden prompt response and a debuggable trajectory.

  • Input: a user goal, such as checking refund status
  • Thought: decide the next missing fact, not the final answer
  • Why it matters: operators can see why a tool was chosen

Action turns reasoning into a tool call

The model emits an action: a tool name plus arguments. In production, the tool schema and description are part of the prompt because they teach the model when the action is allowed.

  • Tool choice: search_orders, check_refund, or another capability
  • Arguments: concrete IDs, emails, dates, or structured parameters
  • Boundary: code executes the tool; the model does not bypass it

Observation feeds the next decision

The tool returns an observation. The agent appends that result to the conversation state, so the next thought can adapt instead of following a hardcoded pipeline.

  • Result: data such as an order ID, refund status, or empty match
  • State update: the next LLM call sees the observation
  • Agent behavior: choose the next action based on what actually happened

The loop continues only while useful

If the observation is not enough, the loop repeats: Thought → Action → Observation. This is the source of agent flexibility, but every repeat costs latency, tokens, and new failure surface.

  • Continue: ask another tool because the answer is still incomplete
  • Evaluate: after every observation, ask whether the loop should stop
  • Production rule: no unbounded loops

Production ReAct has three exits

The loop body is the easy half. The hard half is wiring all three termination paths before users touch the agent.

  • Finish[answer]: the model has enough information and returns the answer
  • max_iterations: the budget is reached; return partial state for review
  • tool_error: a terminal tool failure or guardrail trip returns structured failure

The missing stop condition is the incident

A ReAct loop without a ceiling can call the same broken tool over and over against the same empty observation. The model is not malicious; it is helpfully trying again from a state that cannot succeed.

  • Symptom: iteration count climbs while arguments do not change
  • Cost: token spend and latency rise before anyone gets an answer
  • Fix: cap iterations, log the stop reason, and return a structured outcome