Skip to content

ReAct: Synergizing Reasoning and Acting in Language Models

Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
Year: 2023  |  Venue: ICLR
Link: arXiv:2210.03629


TL;DR

ReAct interleaves natural language reasoning (Thought) with structured actions (Action) and environment feedback (Observation) in a loop. This Thought → Action → Observation pattern beats reasoning-only (CoT) or acting-only baselines on tasks requiring external information. It frames LLM agents as policies that can reason about and interact with tool APIs.


Why This Paper Matters

ReAct is the foundational architecture for LLM agents:

  1. Agent frameworks: LangChain, AutoGPT, Cursor, and most agent systems implement the ReAct loop
  2. Grounded reasoning: Unlike pure CoT, ReAct connects reasoning to real-world information
  3. Tool use paradigm: Established the pattern for how LLMs call APIs, search, and use tools
  4. Debuggable: Each step (Thought, Action, Observation) is visible, making agent behavior interpretable

Key Concepts Explained Simply

Why Reasoning Alone Isn't Enough

CoT lets models reason, but they can only use knowledge from their training data. When asked "What was Apple's stock price yesterday?", CoT will hallucinate. ReAct solves this by letting the model take actions (like searching the web) and incorporate real observations.

Why Acting Alone Isn't Enough

A model that just calls tools without reasoning makes poor decisions about which tool to use, when to stop, and how to combine results. The "Thought" step lets the model plan its actions and interpret results.

The ReAct Loop

Thought: I need to find when Albert Einstein was born.
Action: Search["Albert Einstein birth date"]
Observation: Albert Einstein was born on March 14, 1879.
Thought: Now I need to find where he was born.
Action: Search["Albert Einstein birthplace"]  
Observation: Einstein was born in Ulm, Germany.
Thought: I have both pieces of information.
Action: FINISH["Albert Einstein was born on March 14, 1879, in Ulm, Germany."]

Each cycle: 1. Thought: Reason about what to do next 2. Action: Call a tool with specific input 3. Observation: Receive the tool's output 4. Repeat until the task is solved or max steps reached


The Math — Explained Step by Step

Trajectory Formulation

A ReAct trajectory \(\tau\) consists of alternating thoughts, actions, and observations:

\[ \tau = (t_1, a_1, o_1, t_2, a_2, o_2, \ldots, t_n, a_n, o_n) \]

Each component is generated by:

[ t_k \sim P_\theta(\cdot \mid x, \tau_{<k}) ] [ a_k \sim \pi_\theta(\cdot \mid x, \tau_{<k}, t_k) ] [ o_k = \text{Env}(a_k) \quad \text{(deterministic from environment)} ]

Key insight: Observations \(o_k\) are not generated by the model — they come from the environment (tool API), providing grounding in real data.

Comparison with CoT

CoT: \(\tau = (t_1, t_2, \ldots, t_n, a)\) — all internal reasoning, no external feedback

ReAct: \(\tau = (t_1, a_1, o_1, \ldots)\) — interleaved reasoning and external feedback

CoT operates open-loop (no correction possible). ReAct operates closed-loop (observations can correct misconceptions).

Action Space

The action space is defined by available tools:

\[ \mathcal{A} = \{\text{Search}[\cdot], \text{Lookup}[\cdot], \text{Calculate}[\cdot], \text{FINISH}[\cdot]\} \]

Each action takes a string argument and returns a string observation.


Python Implementation

import numpy as np
import re


class SimpleEnvironment:
    """Simulated environment with search and calculate tools."""

    def __init__(self):
        self.knowledge = {
            "albert einstein": "Albert Einstein (1879-1955) was a physicist who developed the theory of relativity.",
            "theory of relativity": "E=mc², published in 1905 (special) and 1915 (general).",
            "python programming": "Python is a high-level programming language created by Guido van Rossum in 1991.",
            "transformer model": "The Transformer was introduced in 'Attention Is All You Need' (2017) by Vaswani et al.",
        }

    def search(self, query):
        query_lower = query.lower()
        for key, value in self.knowledge.items():
            if key in query_lower:
                return value
        return "No results found."

    def calculate(self, expression):
        try:
            result = eval(expression, {"__builtins__": {}})
            return str(result)
        except Exception as e:
            return f"Error: {e}"

    def execute(self, action_type, action_input):
        if action_type == "Search":
            return self.search(action_input)
        elif action_type == "Calculate":
            return self.calculate(action_input)
        elif action_type == "FINISH":
            return action_input
        else:
            return f"Unknown action: {action_type}"


def parse_action(action_string):
    """Parse 'Action: ToolName[input]' format."""
    match = re.match(r'(\w+)\[(.+)\]', action_string.strip())
    if match:
        return match.group(1), match.group(2)
    return None, None


class ReActAgent:
    """ReAct agent that interleaves thinking and acting."""

    def __init__(self, env, max_steps=5):
        self.env = env
        self.max_steps = max_steps
        self.trajectory = []

    def step(self, thought, action_str):
        """Execute one Thought-Action-Observation cycle."""
        self.trajectory.append({"type": "Thought", "content": thought})

        action_type, action_input = parse_action(action_str)
        self.trajectory.append({
            "type": "Action",
            "content": f"{action_type}[{action_input}]"
        })

        observation = self.env.execute(action_type, action_input)
        self.trajectory.append({"type": "Observation", "content": observation})

        return observation

    def run(self, question, steps):
        """
        Run the ReAct loop.
        steps: list of (thought, action) tuples simulating model output
        """
        self.trajectory = [{"type": "Question", "content": question}]

        for i, (thought, action) in enumerate(steps):
            if i >= self.max_steps:
                break

            obs = self.step(thought, action)

            action_type, _ = parse_action(action)
            if action_type == "FINISH":
                return obs

        return "Max steps reached without finishing."

    def print_trajectory(self):
        """Display the full trajectory."""
        for entry in self.trajectory:
            prefix = entry["type"]
            content = entry["content"]
            print(f"  {prefix}: {content}")


def react_vs_cot_comparison():
    """Compare ReAct and CoT approaches on a fact-based question."""
    print("=== CoT (Reasoning Only) ===")
    print("Q: When was the Transformer model introduced?")
    print("Thought 1: I need to recall when the Transformer was introduced.")
    print("Thought 2: I believe it was around 2017 or 2018.")
    print("Thought 3: The paper is 'Attention Is All You Need'.")
    print("Answer: The Transformer was introduced in 2017.")
    print("✓ Correct, but relies entirely on training data (could hallucinate)")

    print("\n=== ReAct (Reasoning + Acting) ===")
    env = SimpleEnvironment()
    agent = ReActAgent(env)
    result = agent.run(
        "When was the Transformer model introduced?",
        steps=[
            ("I need to search for information about the Transformer model.",
             "Search[Transformer model]"),
            ("The search result says it was introduced in 2017 by Vaswani et al.",
             "FINISH[The Transformer was introduced in 2017 in the paper 'Attention Is All You Need' by Vaswani et al.]"),
        ]
    )
    agent.print_trajectory()
    print(f"  Result: {result}")
    print("✓ Grounded in retrieved information")


def build_react_prompt(question, tools, examples=None):
    """Build a ReAct-style prompt for an LLM."""
    tool_desc = "\n".join(
        f"  - {name}: {desc}" for name, desc in tools.items()
    )

    prompt = f"""Answer the following question using the available tools.

Available tools:
{tool_desc}

Format:
Thought: <reasoning about what to do>
Action: ToolName[input]
Observation: <result from tool>
... (repeat as needed)
Thought: I now have enough information.
Action: FINISH[final answer]

"""
    if examples:
        prompt += "Examples:\n"
        for ex in examples:
            prompt += f"\nQuestion: {ex['question']}\n"
            for step in ex['steps']:
                prompt += f"{step}\n"
            prompt += "\n"

    prompt += f"Question: {question}\n"
    return prompt


def analyze_failure_modes():
    """Common ReAct failure modes."""
    failures = [
        {
            "name": "Infinite Loop",
            "description": "Agent repeats the same search with the same query",
            "mitigation": "Track action history, detect repeats, set max_steps"
        },
        {
            "name": "Wrong Tool Selection",
            "description": "Agent uses Search when Calculate is needed",
            "mitigation": "Better tool descriptions, few-shot examples"
        },
        {
            "name": "Premature Finish",
            "description": "Agent finishes before gathering enough information",
            "mitigation": "Self-check thought before FINISH, require confidence"
        },
        {
            "name": "Observation Misinterpretation",
            "description": "Agent misreads or ignores tool output",
            "mitigation": "Structured output parsing, verification step"
        },
    ]

    print("--- ReAct Failure Modes ---")
    for f in failures:
        print(f"\n  {f['name']}: {f['description']}")
        print(f"  Fix: {f['mitigation']}")


# --- Demo ---
if __name__ == "__main__":
    # ReAct vs CoT comparison
    react_vs_cot_comparison()

    # Prompt construction
    print("\n--- ReAct Prompt ---")
    tools = {
        "Search": "Search the web for information",
        "Calculate": "Evaluate a math expression",
        "FINISH": "Return the final answer"
    }
    prompt = build_react_prompt(
        "What is the square of the year Python was created?",
        tools
    )
    print(prompt[:500])

    # Failure modes
    print()
    analyze_failure_modes()

Interview Importance

ReAct is essential for any role involving LLM agents or tool-using systems. It's the most commonly implemented agent pattern.

Difficulty Level: ⭐⭐ (Medium)


Interview Questions & Answers

Q1: How do you prevent infinite tool loops?

Answer: 1. Max steps limit: Hard cap on the number of Thought-Action-Observation cycles (e.g., 5-10) 2. Action deduplication: Detect and refuse repeated identical actions 3. Progressive summarization: After several steps, summarize trajectory and reset to prevent context overflow 4. Confidence threshold: Require the model to express confidence before continuing 5. Timeout: Wall-clock time limit in production systems 6. Cost budget: Track token usage and stop when budget is exceeded

Q2: What's the difference between ReAct and plain CoT without tools?

Answer: - CoT: Pure internal reasoning — no access to external information. Can hallucinate facts. Good for logic and math where all information is in the prompt. - ReAct: Reasoning grounded in tool observations — the model can verify facts, search for information, and compute values. Essential for tasks requiring up-to-date or specific knowledge. - Trade-off: ReAct adds latency (tool calls) and complexity (error handling) but dramatically reduces hallucination for fact-based queries.

Q3: How does observation trustworthiness affect policy design?

Answer: If observations can be noisy or wrong: 1. Cross-verification: Search multiple sources and compare results 2. Confidence tracking: Weight observations by source reliability 3. Fallback reasoning: If tool results seem implausible, reason about why and try alternative queries 4. Observation validation: Add a verification step where the model checks if the observation makes sense 5. Error handling: Design the prompt to handle "No results found" or error responses gracefully

Q4: How would you design a production ReAct system?

Answer: 1. Tool registry: Define tools with typed schemas (OpenAI function calling format) 2. Sandboxing: Execute tool calls in isolated environments for safety 3. Observability: Log every Thought/Action/Observation for debugging 4. Rate limiting: Prevent excessive tool calls (cost and abuse) 5. Fallback strategies: If a tool fails, the agent should adapt (try different tool, rephrase query) 6. Streaming: Stream thoughts to the user for transparency and early termination 7. Caching: Cache tool results for repeated queries


Connections to Other Papers

  • Chain-of-Thought → ReAct adds actions and observations to CoT reasoning
  • Toolformer → Learned tool use vs. ReAct's prompted tool use
  • GPT-3 → ReAct builds on in-context learning with structured prompts
  • InstructGPT → Instruction following enables ReAct-style prompts

Key Takeaways for Quick Review

Concept Remember
Pattern Thought → Action → Observation (loop)
Key advantage Grounded reasoning with external information
vs. CoT CoT is open-loop; ReAct is closed-loop with feedback
Failure modes Infinite loops, wrong tools, premature finish
Action space Tools with typed inputs/outputs
Adoption Foundation of LangChain, AutoGPT, and all agent frameworks
Production concerns Max steps, sandboxing, observability, caching