Skip to content

Ai-engineering Series

RAG to Agents - From Retrieval to Action

Deep dive into AI agents: the agent loop, tools, ReAct pattern, memory systems, when agents are wrong, and agent failure modes you'll encounter in production

Why This Matters

RAG answers questions. Agents solve problems.

When a user asks “What’s the status of order #12345?”, RAG retrieves a document. But what if answering requires:

  • Querying an order database
  • Checking shipping status from an API
  • Calculating estimated delivery based on location
  • Composing a response with all that information

RAG can’t do this. RAG retrieves static documents. Agents take actions.

If you try to build multi-step systems with RAG patterns, you’ll create brittle pipelines that break on variation. Understanding the agent mental model lets you build flexible systems that adapt.

What Goes Wrong Without This:

Symptom: Your "smart assistant" can only answer questions from
         documents. Users ask for actions, it apologizes.
Cause: You built RAG when you needed an agent. RAG retrieves
       information. It doesn't take action or call APIs.

Symptom: Your multi-step pipeline is 500 lines of if/else handling
         every edge case. Adding a new capability requires 2 weeks.
Cause: You hardcoded the reasoning that should be delegated to the LLM.
       Every variation is a code branch.

Symptom: Your agent attempts an action, fails, and doesn't recover.
         It returns "Error occurred" to the user.
Cause: You built a pipeline, not an agent. Pipelines don't adapt.
       Agents observe results and adjust.

Pipelines vs Agents

There are two ways to build multi-step AI systems:

+------------------------------------------------------------------+
|  PIPELINE (Code decides)                                          |
+------------------------------------------------------------------+
|                                                                   |
|  Input  Step 1  Step 2  Step 3  Output                        |
|                                                                |
|        [fixed]   [fixed]   [fixed]                                |
|                                                                   |
|  The code determines what happens at each step.                   |
|  Each branch is explicitly written.                               |
|  Predictable, but rigid.                                          |
|                                                                   |
+------------------------------------------------------------------+
|  AGENT (Model decides)                                            |
+------------------------------------------------------------------+
|                                                                   |
|  Input  ┌─────────────────────────┐                              |
|          │ Observe current state   │◄────────┐                    |
|          │                        │         │                    |
|          │ Think: what next?       │         │                    |
|          │                        │         │                    |
|          │ Act: execute decision   │─────────┘                    |
|          └─────────────────────────┘                              |
|                                                                  |
|                  Output (when done)                               |
|                                                                   |
|  The model determines what happens at each step.                  |
|  Flexible, but less predictable.                                  |
|                                                                   |
+------------------------------------------------------------------+

The key question: Who decides the next step—your code or the model?

  • Pipeline: You enumerate all paths. Reliable for known scenarios. Fails on novel scenarios.
  • Agent: Model reasons about what to do. Handles variation. Can make mistakes.

Neither is better. They solve different problems.


The Agent Loop

An agent is a loop. The LLM decides what to do, executes it, observes the result, and decides again.

+------------------------------------------------------------------+
|                      THE AGENT LOOP                               |
+------------------------------------------------------------------+
|                                                                   |
|                    ┌──────────────┐                               |
|            ┌──────▶OBSERVE    │                               |
|            │       │              │                               |
|            │       │ What do I    │                               |
|            │       │ know now?    │                               |
|            │       └──────┬───────┘                               |
|            │              │                                       |
|            │                                                     |
|            │       ┌──────────────┐                               |
|            │       │    THINK     │                               |
|            │       │              │                               |
|            │       │ What should  │                               |
|            │       │ I do next?   │                               |
|            │       └──────┬───────┘                               |
|            │              │                                       |
|            │                                                     |
|            │       ┌──────────────┐                               |
|     ┌──────┴──┐    │     ACT      │                               |
|     │ Not done│◄───┤              │                               |
|     └─────────┘    │ Execute the  │                               |
|                    │ decision     │                               |
|                    └──────┬───────┘                               |
|                           │                                       |
|                                                                  |
|                      ┌─────────┐                                  |
|                      │  Done?  │                                  |
|                      └────┬────┘                                  |
|                           │ Yes                                   |
|                                                                  |
|                      ┌─────────┐                                  |
|                      │ OUTPUT  │                                  |
|                      └─────────┘                                  |
|                                                                   |
+------------------------------------------------------------------+

Each iteration:

  1. Observe: What information do I have? What just happened?
  2. Think: Given my goal and current state, what’s the best next action?
  3. Act: Execute the chosen action
  4. Evaluate: Am I done? If not, loop.

The magic: the model decides the action at step 2. This is what makes it an agent, not a pipeline.


Tools: The Agent’s Capabilities

An agent without tools is just a chatbot. Tools are functions the agent can call.

+------------------------------------------------------------------+
|  TOOLS GIVE AGENTS CAPABILITIES                                   |
+------------------------------------------------------------------+
|                                                                   |
|  Tool Definition:                                                 |
|  ┌───────────────────────────────────────────────────────────┐    |
|  │ name: "search_orders"                                     │    |
|  │ description: "Search orders by user ID, order ID,         │    |
|  │               or date range"                              │    |
|  │ parameters:                                               │    |
|  │   user_id: string (optional)                              │    |
|  │   order_id: string (optional)                             │    |
|  │   date_from: date (optional)                              │    |
|  └───────────────────────────────────────────────────────────┘    |
|                                                                   |
|  Agent receives tool descriptions  LLM learns WHEN to use        |
|  Agent receives user query  LLM decides WHICH tool + arguments   |
|  Tool returns result  Agent observes and continues               |
|                                                                   |
+------------------------------------------------------------------+

Common tool categories:

CategoryExamplesWhat it enables
Data retrievalsearch_docs, query_databaseAccess information
External APIsget_weather, check_inventoryReal-time data
Actionssend_email, create_ticketSide effects
Computationcalculate, run_codeComplex logic
User interactionask_user, show_optionsClarification

Tool descriptions are prompts. Good descriptions = agent uses tools correctly. Bad descriptions = agent guesses wrong.

RAG as a Tool

Here’s the insight: RAG doesn’t get replaced by agents—it becomes a tool.

Available tools:
• search_docs: Search knowledge base for relevant info
• query_orders: Get order details from database
• check_shipping: Get real-time shipping status

User: "When will my order #789 arrive?"

Agent thinks: "I need order details first"
Agent acts: query_orders(order_id="789")
Observation: {status: "shipped", carrier: "FedEx", ...}

Agent thinks: "Order is shipped, need tracking info"
Agent acts: check_shipping(tracking="FX123456")
Observation: {location: "Chicago", est_delivery: "Dec 5"}

Agent thinks: "I have all the info, can answer now"
Agent responds: "Your order shipped via FedEx and
                 should arrive December 5th."

RAG is retrieval. Agents can use retrieval as one capability among many.


The ReAct Pattern

The most common agent architecture is ReAct (Reason + Act). The model explicitly reasons before acting.

+------------------------------------------------------------------+
|  ReAct EXAMPLE                                                    |
+------------------------------------------------------------------+
|                                                                   |
|  User: "What's the refund status for alice@example.com?"          |
|                                                                   |
|  ┌─────────────────────────────────────────────────────────┐      |
|  │ THOUGHT: I need to find any refund requests from this   │      |
|  │ user. I'll search the orders first.                     │      |
|  │                                                         │      |
|  │ ACTION: search_orders(email="alice@example.com")        │      |
|  └─────────────────────────────────────────────────────────┘      |
|                                                                   |
|  OBSERVATION: [                                                   |
|    {order_id: "456", status: "refund_requested", amount: 99}      |
|  ]                                                                |
|                                                                   |
|  ┌─────────────────────────────────────────────────────────┐      |
|  │ THOUGHT: Found an order with refund requested.          │      |
|  │ I need to check the refund processing status.           │      |
|  │                                                         │      |
|  │ ACTION: check_refund(order_id="456")                    │      |
|  └─────────────────────────────────────────────────────────┘      |
|                                                                   |
|  OBSERVATION: {status: "approved", processed_date: "Dec 3"}       |
|                                                                   |
|  ┌─────────────────────────────────────────────────────────┐      |
|  │ THOUGHT: The refund has been approved and processed.    │      |
|  │ I have enough info to answer.                           │      |
|  │                                                         │      |
|  │ ACTION: respond_to_user                                 │      |
|  └─────────────────────────────────────────────────────────┘      |
|                                                                   |
|  RESPONSE: "Alice's refund of $99 for order #456 was              |
|  approved and processed on December 3rd."                         |
|                                                                   |
+------------------------------------------------------------------+

The THOUGHT step makes the agent’s reasoning visible. This helps with:

  • Debugging (you can see why it chose an action)
  • Guidance (you can provide examples of good reasoning)
  • Error recovery (model realizes when it’s stuck)

Agent Memory

Agents without memory forget everything between turns. Production agents need memory.

+------------------------------------------------------------------+
|  MEMORY TYPES                                                     |
+------------------------------------------------------------------+
|                                                                   |
|  SHORT-TERM MEMORY (Conversation Context)                         |
|  ────────────────────────────────────────                         |
|  What: Previous messages in current session                       |
|  How: Append to LLM context                                       |
|  Limit: Context window size                                       |
|                                                                   |
|  User: "Check order #123"                                         |
|  Agent: "Order #123 shipped Dec 1"                                |
|  User: "When will IT arrive?"   "it" = order #123                |
|        Short-term memory resolves the reference                   |
|                                                                   |
+------------------------------------------------------------------+
|  LONG-TERM MEMORY (Persistent Knowledge)                          |
+------------------------------------------------------------------+
|  What: Facts that persist across sessions                         |
|  How: Vector store for semantic retrieval                         |
|  Limit: Storage capacity                                          |
|                                                                   |
|  Session 1: User says "I prefer email over SMS"                   |
|   Store: ("user_preference", "prefers email for notifications")  |
|                                                                   |
|  Session 2: Agent needs to notify user                            |
|   Retrieve preference  Send email                               |
|                                                                   |
+------------------------------------------------------------------+
|  WORKING MEMORY (Scratch Pad)                                     |
+------------------------------------------------------------------+
|  What: Intermediate results during task execution                 |
|  How: Structured state object                                     |
|  Limit: Task complexity                                           |
|                                                                   |
|  Task: "Calculate total revenue by region"                        |
|  Working memory: {                                                |
|    "north": 150000,                                               |
|    "south": 120000,    Accumulated as agent works                |
|    "east": pending...                                             |
|  }                                                                |
|                                                                   |
+------------------------------------------------------------------+

Without memory, agents can’t handle multi-turn conversations, learn user preferences, or maintain context across sessions.


When Agents Are Wrong

Agents are not always the answer. Sometimes they’re the problem.

+------------------------------------------------------------------+
|  WHEN TO USE WHAT                                                 |
+------------------------------------------------------------------+
|                                                                   |
|  USE DIRECT LLM CALL when:                                        |
|  • Single-step task (summarize, translate, classify)              |
|  • No external data needed                                        |
|  • No actions required                                            |
|                                                                   |
|  USE RAG when:                                                    |
|  • Answer exists in your documents                                |
|  • Single retrieval + generation is sufficient                    |
|  • You want predictable, auditable answers                        |
|                                                                   |
|  USE PIPELINE when:                                               |
|  • Steps are known and fixed                                      |
|  • High reliability required                                      |
|  • Each step must happen regardless of previous results           |
|                                                                   |
|  USE AGENT when:                                                  |
|  • Task requires multiple tools/data sources                      |
|  • Strategy depends on intermediate results                       |
|  • User requests vary significantly                               |
|  • Recovery from failure requires reasoning                       |
|                                                                   |
+------------------------------------------------------------------+

The “agent for everything” anti-pattern:

User: "What's 2 + 2?"

BAD (over-engineering):
  Agent thinks: "I should use the calculator tool"
  Agent acts: calculate("2 + 2")
  Observation: 4
  Agent responds: "The answer is 4"

  Cost: Multiple LLM calls, tool overhead
  Time: 2-3 seconds

GOOD (direct):
  LLM responds: "4"

  Cost: One LLM call
  Time: 200ms

Agents add:

  • Latency: Multiple LLM calls per request
  • Cost: Each thought/action cycle costs tokens
  • Non-determinism: Same input can produce different paths
  • New failure modes: Wrong tool selection, hallucinated arguments, infinite loops

Don’t use an agent when a simpler approach works.


Agent Failure Modes

Agents introduce new ways to fail:

+------------------------------------------------------------------+
|  AGENT-SPECIFIC FAILURES                                          |
+------------------------------------------------------------------+
|                                                                   |
|  1. WRONG TOOL SELECTION                                          |
|     Agent picks search_docs when it should use query_orders       |
|     Cause: Ambiguous tool descriptions, poor examples             |
|                                                                   |
|  2. HALLUCINATED ARGUMENTS                                        |
|     Agent calls: check_order(order_id="MADE_UP_ID")               |
|     Cause: Model invents plausible-looking arguments              |
|                                                                   |
|  3. INFINITE LOOPS                                                |
|     Agent keeps trying the same failing action                    |
|     Cause: No loop detection, poor error handling instructions    |
|                                                                   |
|  4. PREMATURE TERMINATION                                         |
|     Agent responds before gathering enough information            |
|     Cause: Weak instructions to be thorough                       |
|                                                                   |
|  5. SCOPE CREEP                                                   |
|     Agent takes actions beyond what user asked                    |
|     Cause: Unclear boundaries, model being "helpful"              |
|                                                                   |
|  6. CATASTROPHIC ACTIONS                                          |
|     Agent deletes data, sends emails, makes purchases             |
|     Cause: Powerful tools without guardrails                      |
|                                                                   |
+------------------------------------------------------------------+

Code Example

Minimal agent loop demonstrating the observe-think-act cycle:

from openai import OpenAI
import json

client = OpenAI()

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_orders",
            "description": "Search for orders by user email or order ID",
            "parameters": {
                "type": "object",
                "properties": {
                    "email": {"type": "string", "description": "User email"},
                    "order_id": {"type": "string", "description": "Order ID"},
                },
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_refund",
            "description": "Check refund status for an order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "Order ID"},
                },
                "required": ["order_id"],
            },
        },
    },
]

# Mock tool implementations
def search_orders(email=None, order_id=None):
    return [{"order_id": "456", "status": "refund_requested", "amount": 99}]

def check_refund(order_id):
    return {"status": "approved", "processed_date": "Dec 3"}

def execute_tool(name, arguments):
    """Route tool calls to implementations."""
    if name == "search_orders":
        return search_orders(**arguments)
    elif name == "check_refund":
        return check_refund(**arguments)
    return {"error": f"Unknown tool: {name}"}

def run_agent(user_message: str, max_iterations: int = 5) -> str:
    """Run the agent loop."""
    messages = [
        {"role": "system", "content": "You are a helpful customer service agent."},
        {"role": "user", "content": user_message},
    ]

    for i in range(max_iterations):
        # THINK: Model decides what to do
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
        )

        message = response.choices[0].message

        # Check if done (no tool calls)
        if not message.tool_calls:
            return message.content

        # ACT: Execute each tool call
        messages.append(message)

        for tool_call in message.tool_calls:
            name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)

            # Execute tool
            result = execute_tool(name, arguments)

            # OBSERVE: Add result to context
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })

    return "Max iterations reached"

# Test
result = run_agent("What's the refund status for alice@example.com?")
print(result)

Key Takeaways

1. Pipelines vs Agents
   - Pipeline: code decides the next step
   - Agent: model decides the next step

2. The agent loop: Observe  Think  Act  Repeat

3. Tools give agents capabilities
   - Good tool descriptions are prompts
   - RAG becomes a tool, not a replacement

4. ReAct pattern: explicit reasoning before acting
   - THOUGHT  ACTION  OBSERVATION

5. Memory types: short-term, long-term, working memory

6. Agents aren't always the answer
   - Add latency, cost, non-determinism
   - Use simpler approaches when they suffice

7. Agent-specific failure modes
   - Wrong tool, hallucinated arguments, infinite loops
   - Premature termination, scope creep, catastrophic actions

Verify Your Understanding

Before proceeding:

Explain the difference between a pipeline and an agent to someone who hasn’t read this document. If you say “an agent uses an LLM,” that’s insufficient.

Given this task: “Summarize the top 3 news articles about AI today”

  • Could this be done with RAG?
  • When would this need an agent?
  • What tools would the agent need?

Your agent has these tools: [search_docs, query_database, send_email, calculate]. User asks: “What’s our revenue this quarter?” Which tool(s) should the agent use? What if query_database fails?

Identify the error in this statement: “I built an agent with 30 tools so it can handle any request.”


What’s Next

After this, you can:

  • Continue → Agents → Evaluation — measuring what matters in multi-step systems
  • Build → Production agent with proper guardrails

Go Deeper: Production Agents

This article covers the agent mental model. For production patterns (idempotency, checkpointing, HITL, cost control), see the Production Agents Deep Dive series:

PartTopicWhat You’ll Learn
0OverviewWhy 98% of orgs haven’t deployed agents at scale
1IdempotencySafe retries, the Stripe pattern
2State & MemoryCheckpointing, memory systems
3Human-in-the-LoopConfidence routing, escalation
4Cost ControlToken budgets, circuit breakers
5ObservabilitySilent failure detection
6Durable ExecutionTemporal, Inngest, Restate
7SecuritySandboxing, prompt injection
8TestingGolden datasets, evaluation