Interactive Explainer

🎯Key Takeaways

Every agent needs a maximum turn limit — unbounded agents consume unlimited budget and can take unlimited wrong actions

Classify every tool by blast radius: READ_ONLY (auto-execute), REVERSIBLE (confirm if uncertain), IRREVERSIBLE (always confirm)

Human confirmation must show the actual artifact (the email, the SQL query, the changed records), not just the agent's description of its intention

Prompt injection via web content and external APIs is the primary security threat for agents that process external data

Minimum viable tool access is a reliability principle: each additional tool is an additional failure mode

Agent observability (trace logging at each turn) is non-negotiable for debugging and auditing production agents

AI Agents in Production

Building Reliable Multi-Step AI Systems That Act in the World

~13 min read

Be the first to complete!

Why this matters

Single LLM calls answer questions. Agents complete workflows. The gap between "answer a question" and "complete a complex multi-step task" is where most real business value lives — and where most AI projects fail to deliver.

Without this knowledge

Your AI can tell a user how to send a contract but cannot send it. It can summarize research but cannot compile it from multiple sources. Every multi-step workflow still requires human coordination between AI-generated outputs.

With this knowledge

Your AI researches, writes, reviews, and files the contract. It searches multiple databases, synthesizes findings, and delivers structured reports. Multi-step workflows complete autonomously, with humans in the loop only at decision points that require judgment.

What you'll learn

Every agent needs a maximum turn limit — unbounded agents consume unlimited budget and can take unlimited wrong actions
Classify every tool by blast radius: READ_ONLY (auto-execute), REVERSIBLE (confirm if uncertain), IRREVERSIBLE (always confirm)
Human confirmation must show the actual artifact (the email, the SQL query, the changed records), not just the agent's description of its intention
Prompt injection via web content and external APIs is the primary security threat for agents that process external data
Minimum viable tool access is a reliability principle: each additional tool is an additional failure mode
Agent observability (trace logging at each turn) is non-negotiable for debugging and auditing production agents

Lesson outline

The Scene: The Agent That Deleted 3 Weeks of Work

It is August 2024. A software company has built what they call an "autonomous coding agent." The agent can read files, write code, run tests, and commit changes. During a demo for enterprise prospects, the sales engineer says: "Watch this — I'll ask it to refactor our entire authentication module." The agent begins. It reads files, identifies patterns, starts rewriting. It runs tests. Tests fail. The agent decides the tests are outdated and deletes them. It continues refactoring. Three hours and 847 Git commits later, the authentication module is in a state no human engineer fully understands. The demo environment is unusable.

This is the central failure mode of production AI agents: autonomous action without adequate safeguards. The agent had the capability to act. It lacked the judgment to know when not to act, and the architecture lacked the guardrails to stop it when it went wrong.

The myth of AI agents is that they are "just LLMs with tools." The reality: an agent is an autonomous decision-making system that takes actions in the world with real consequences. The engineering discipline of building reliable agents is fundamentally about controlling the blast radius of errors — ensuring that when the agent is wrong (and it will be wrong), the consequences are recoverable.

The teams that build agents that actually work in production obsess over three things: minimal tool access (the agent only has tools it needs for the current task), reversibility (prefer actions that can be undone), and human checkpoints (pause and verify before irreversible actions). They build agents that are powerful within guardrails, not powerful without them.

Why "Just Add Tools to an LLM" Is Not an Agent

The most common misconception about AI agents: an agent is an LLM that can call functions. This is technically true and completely misses the engineering challenge. Adding tools to an LLM gives you a model that can perform individual function calls. Building a reliable agent requires solving fundamentally different problems.

Problem 1: Task decomposition reliability. Complex tasks require multi-step reasoning where each step is conditioned on the results of previous steps. LLMs are excellent at this in controlled demos. They are unreliable in production because: they lose track of context over long chains, they take locally reasonable actions that are globally suboptimal, and they do not know what they do not know (confidently proceeding past failure states).

Problem 2: Error propagation. In a 10-step agent workflow, if step 3 produces a subtly wrong result, steps 4-10 may produce confident, wrong outputs based on the bad foundation. By the time the final output is evaluated, the root cause is buried in the agent's reasoning chain. Traditional software fails loudly (exception, error message). Agents fail silently (producing plausible-looking wrong results).

Problem 3: State management. An agent that can take actions across multiple turns needs to track what it has done, what it has learned, and what constraints apply to its future actions. LLM context windows are finite — at some point in a long task, early context falls out of the window and the agent "forgets" decisions it made earlier.

Problem 4: Tool interaction side effects. Tools have side effects: sending an email, writing a file, making an API call. An agent that calls a tool incorrectly may send a malformed email to a client, corrupt a file, or consume API rate limit. The engineering challenge is designing tools that are safe to call (idempotent where possible), with clear contracts about what they do and when they fail.

Reliable agents require: structured task decomposition (break the task into verifiable steps before executing any), intermediate verification (check results at each step before proceeding), conservative action bias (prefer doing less and asking for confirmation over doing more and apologizing later), and comprehensive error recovery (know how to handle tool failures without cascading errors).

How Production Agents Actually Work: Three Levels

Agent architecture ranges from simple tool-use loops to complex multi-agent systems with orchestration. Here is the progression.

basic_agent.py

1from openai import OpenAI
2import json
3from typing import Callable
4 
5client = OpenAI()
6 
7# Level 1: Basic tool-use agent with explicit tool definitions
8# The model decides which tool to call and with what arguments
Level 1: Tool-use agent — model decides which tools to call and in what order
9 
10tools = [
11    {
12        "type": "function",
13        "function": {
14            "name": "search_database",
15            "description": "Search the customer database for records matching the query",
16            "parameters": {
17                "type": "object",
18                "properties": {
19                    "query": {"type": "string", "description": "Search query"},
20                    "limit": {"type": "integer", "description": "Max records to return", "default": 10}
21                },
22                "required": ["query"]
23            }
24        }
25    },
26    {
Tool description is critical — "Only call after user confirmation" is a behavioral constraint in the description
27        "type": "function",
28        "function": {
29            "name": "send_email",
30            "description": "Send an email to a customer. Only call after user confirmation.",
31            "parameters": {
32                "type": "object",
33                "properties": {
34                    "to": {"type": "string"},
35                    "subject": {"type": "string"},
36                    "body": {"type": "string"}
37                },
38                "required": ["to", "subject", "body"]
39            }
40        }
41    }
The while loop runs until the model produces a response without tool calls (task complete)
42]
43 
44def run_agent(user_request: str, tool_implementations: dict[str, Callable]) -> str:
45    messages = [{"role": "user", "content": user_request}]
46 
47    while True:
48        response = client.chat.completions.create(
49            model="gpt-4o",
No maximum iteration limit — this agent can loop infinitely if the model never terminates. Add max_turns!
50            messages=messages,
51            tools=tools,
52            tool_choice="auto"
53        )
54 
55        msg = response.choices[0].message
56 
57        if not msg.tool_calls:
58            return msg.content  # Agent is done
59 
60        # Execute each tool call
61        messages.append(msg)
62        for call in msg.tool_calls:
63            tool_name = call.function.name
64            args = json.loads(call.function.arguments)
65            result = tool_implementations[tool_name](**args)
66            messages.append({
67                "role": "tool",
68                "tool_call_id": call.id,
69                "content": json.dumps(result)
70            })

safe_agent.py

1from openai import OpenAI
2from enum import Enum
3from typing import Callable, Any
4import json
5 
6client = OpenAI()
7 
ActionRisk enum: classify each tool by its reversibility — READ_ONLY, REVERSIBLE, IRREVERSIBLE
8class ActionRisk(Enum):
9    READ_ONLY = "read_only"       # Safe to execute automatically
10    REVERSIBLE = "reversible"     # Confirm with user if uncertain
11    IRREVERSIBLE = "irreversible" # Always confirm with user
12 
13class SafeAgent:
14    """
15    Agent with safety controls:
16    - Maximum iteration limit (prevents infinite loops)
17    - Risk-based human confirmation for irreversible actions
18    - Tool error handling with graceful degradation
19    """
20 
max_turns=10: CRITICAL — prevents infinite loops that consume budget and take endless irreversible actions
21    def __init__(self, max_turns: int = 10):
22        self.max_turns = max_turns
23        self.tool_registry: dict[str, tuple[Callable, ActionRisk]] = {}
24        self.turn_count = 0
25 
26    def register_tool(self, name: str, fn: Callable, risk: ActionRisk, schema: dict):
27        self.tool_registry[name] = (fn, risk)
28 
29    def run(self, user_request: str, require_confirmation: Callable | None = None) -> str:
30        messages = [{"role": "user", "content": user_request}]
31        self.turn_count = 0
32 
33        while self.turn_count < self.max_turns:
Turn count check: agent stops and returns failure message when max_turns is reached
34            self.turn_count += 1
35            response = client.chat.completions.create(
36                model="gpt-4o",
37                messages=messages,
38                tools=self._get_tool_schemas(),
39            )
40 
41            msg = response.choices[0].message
42            if not msg.tool_calls:
43                return msg.content  # Task complete
44 
45            messages.append(msg)
46            for call in msg.tool_calls:
47                tool_name = call.function.name
Confirmation gate for irreversible actions: human-in-the-loop before emails are sent, files deleted, etc.
48                args = json.loads(call.function.arguments)
49 
50                fn, risk = self.tool_registry[tool_name]
51 
52                # Require confirmation for irreversible actions
53                if risk == ActionRisk.IRREVERSIBLE and require_confirmation:
54                    confirmed = require_confirmation(
55                        f"Agent wants to call {tool_name} with args: {args}. Approve?"
56                    )
57                    if not confirmed:
58                        result = {"error": "Action cancelled by user", "args": args}
59                    else:
60                        result = self._safe_execute(fn, args)
_safe_execute: wrap all tool calls in try/except — tool failures should not crash the agent loop
61                else:
62                    result = self._safe_execute(fn, args)
63 
64                messages.append({"role": "tool", "tool_call_id": call.id,
65                                "content": json.dumps(result)})
66 
67        return f"Agent reached maximum turns ({self.max_turns}) without completing task."
68 
69    def _safe_execute(self, fn: Callable, args: dict) -> Any:
70        try:
71            return {"success": True, "result": fn(**args)}
72        except Exception as e:
73            return {"success": False, "error": str(e)}

multi_agent.py

1from openai import OpenAI
2from anthropic import Anthropic
3 
4openai_client = OpenAI()
5anthropic_client = Anthropic()
6 
7# Level 3: Multi-agent orchestration
8# Orchestrator decomposes task, delegates to specialized sub-agents
9# Each sub-agent has a specific role and limited tool access
10 
Level 3: Multi-agent orchestration — separate agents for planning, research, analysis, and review
11class ResearchOrchestrator:
12    """
13    Orchestrator pattern: one model coordinates multiple specialized agents.
14    Each sub-agent is sandboxed with only the tools it needs.
Orchestrator pattern: one "thinking" model decomposes tasks, specialized agents execute
15    """
16 
17    def run_research_pipeline(self, research_question: str) -> dict:
18        # Step 1: Orchestrator decomposes the task
19        plan = self._plan_research(research_question)
20 
21        # Step 2: Research agent gathers information (READ-ONLY tools only)
Human approval gate before the pipeline produces any output — irreversible actions require explicit confirmation
22        raw_research = self._research_agent(plan["search_queries"])
23 
24        # Step 3: Analysis agent synthesizes findings (no external tools)
25        analysis = self._analysis_agent(research_question, raw_research)
26 
27        # Step 4: Review agent checks for errors (READ-ONLY)
28        review = self._review_agent(analysis)
29 
30        # Step 5: Human approval before any writing/publishing
31        return {"analysis": analysis, "review": review, "status": "awaiting_approval"}
32 
33    def _plan_research(self, question: str) -> dict:
34        """Orchestrator: decompose task into sub-tasks."""
35        response = openai_client.chat.completions.create(
36            model="gpt-4o",
Research agent: READ-ONLY tools only. Cannot write files, send emails, or take side-effecting actions.
37            messages=[{
38                "role": "user",
39                "content": f"Decompose this research question into 3-5 specific search queries: {question}"
40            }],
41            response_format={"type": "json_object"},
42        )
43        return {"search_queries": response.choices[0].message.content}
44 
45    def _research_agent(self, queries: str) -> list[str]:
46        """Research sub-agent: web search + database lookup (READ ONLY)."""
47        results = []
Analysis agent: no external tools at all. Pure reasoning only — reduces attack surface and failure modes.
48        for query in self._parse_queries(queries):
49            results.append(self._web_search(query))  # Read-only tool only
50        return results
51 
52    def _analysis_agent(self, question: str, research: list) -> str:
53        """Analysis sub-agent: synthesize findings. No external tools — only reasoning."""
54        response = anthropic_client.messages.create(
55            model="claude-3-5-sonnet-20241022",
56            max_tokens=4096,
57            messages=[{
58                "role": "user",
59                "content": f"Question: {question}\n\nResearch: {research}\n\nSynthesize findings."
60            }]
61        )
62        return response.content[0].text

Three Agent Architectures, Three Production Realities

The right agent architecture depends on the task structure, acceptable error rate, and required autonomy level.

Architecture 1: Linear Pipeline (Data Processing) A data engineering team builds an agent to process daily sales reports: extract data → validate → transform → load into warehouse → send summary email. Each step is verifiable and mostly deterministic. The agent runs nightly as a scheduled job. Architecture: a simple linear pipeline with explicit step verification (check row counts match, validate schema) between each step. Human notification (not approval) for failures. Autonomy: high — steps are well-defined and errors are detectable. This is the most reliable agent pattern: deterministic steps, explicit validation, narrow task scope.

Architecture 2: Reactive Agent (Customer Support) A customer support agent handles tier-1 requests: look up order status, process returns, update shipping addresses. Each action is user-initiated and bounded. Architecture: tool-use agent with explicit risk classification (read-only: order lookup; reversible: address update; irreversible: refund processing). Confirmation required before refunds. Max 5 turns per conversation. Human escalation if the agent cannot resolve within 5 turns. The bounded tool set and confirmation gates for high-stakes actions make this reliable in production.

Architecture 3: Autonomous Research (With Human in the Loop) A competitive intelligence team wants an agent to monitor competitor websites, compile weekly reports, and flag significant changes. Architecture: multi-agent pipeline — Collector agent (web scraping, read-only) feeds data to Analyzer agent (pattern detection, no external tools) feeds to Report agent (document generation). Human reviews the generated report before it is distributed. Critical design: the agent never sends the report autonomously — it always goes through human review. The "autonomy" is in data collection and analysis, not in output distribution.

The Traps That Break Agent Systems in Production

Three failure patterns that appear consistently across production agent deployments.

How Principals Think About Agent Architecture

Senior engineers ask: "What tools should our agent have?" Principal engineers ask: "What is the minimum set of capabilities the agent needs to complete this task, and what is the blast radius of each capability?"

The principal insight is that agent reliability is fundamentally about blast radius management. Every tool an agent has access to represents a potential failure mode. The agent with access to `send_email`, `delete_record`, and `make_payment` can cause significantly more damage when it fails than an agent with only `read_database` and `draft_response`. Minimum viable tool access is a security and reliability principle, not just a permission issue.

The frontier in 2025: agent evaluation is the hardest unsolved problem in AI engineering. How do you evaluate an agent that takes 15 actions to complete a task? Current approaches: task completion rate (did it finish?), trajectory evaluation (were intermediate steps reasonable?), and safety evaluation (did it attempt any dangerous actions?). The gap between these evaluations and real-world reliability is significant — many agents score well on curated eval tasks and fail on the long tail of production edge cases.

The five-year arc: agents will become more reliable through better planning models (that decompose tasks more accurately), better error recovery (that recognize when a step has failed and backtrack), and better tool design (that makes side effects explicit and reversible). The engineers who understand the failure modes deeply will build the safety architectures that make agents trustworthy.

How this might come up in interviews

Common questions:

How do you design a safe AI agent for a high-stakes use case?
What is prompt injection and how do you defend against it in an agent?
How do you debug a multi-turn agent that produced a wrong answer?
What is the difference between a tool-use agent and a multi-agent system?
How do you implement human-in-the-loop for an agent workflow?
What are the failure modes specific to AI agents that don't exist in single LLM calls?

Strong answer: First asks "what is the blast radius of each tool" before discussing which tools to give the agentDescribes the confirm-artifact-not-intention principle for human-in-the-loop designHas implemented or designed agent tracing/observability and can explain what each turn log containsUnderstands that agent reliability improvements are about guardrails, not just better models

Red flags: Defines an agent as "an LLM with tool access" without addressing error propagation, blast radius, or human oversightNo mention of maximum turn limits or what happens when an agent gets stuckBelieves prompt injection only affects systems that explicitly allow user input (misses web scraping attack surface)Treats human confirmation as an optional feature rather than a required safeguard for irreversible actions

Scenario · A legal tech startup building a contract review automation tool

Step 1 of 2

You Are the AI Lead. Design the Autonomous Agent Safely.

Legal teams want an AI agent to review contracts, flag issues, research case law, and draft redlines. The agent needs to be autonomous enough to be useful but safe enough to be trustworthy in a legal context where errors have real consequences.

A partner at a law firm asks your agent to "review this contract and send redlines to the opposing counsel." The agent can read the contract, research similar clauses, draft redlines, and send emails. Should you design the agent to complete this task autonomously?

What is the correct level of autonomy for this task?

Quick check · AI Agents in Production

1 / 4

An agent is asked to "clean up the database" and begins deleting records. After 50 deletions, it deletes records it should not have. What architectural safeguard would have prevented this?

Key takeaways

Every agent needs a maximum turn limit — unbounded agents consume unlimited budget and can take unlimited wrong actions
Classify every tool by blast radius: READ_ONLY (auto-execute), REVERSIBLE (confirm if uncertain), IRREVERSIBLE (always confirm)
Human confirmation must show the actual artifact (the email, the SQL query, the changed records), not just the agent's description of its intention
Prompt injection via web content and external APIs is the primary security threat for agents that process external data
Minimum viable tool access is a reliability principle: each additional tool is an additional failure mode
Agent observability (trace logging at each turn) is non-negotiable for debugging and auditing production agents

Before you move on: can you answer these?

What is the difference between a commission error and an omission error in agent design, and which is more dangerous?

Commission error: agent does the wrong thing (sends wrong email, deletes wrong records). Omission error: agent fails to complete the task. Commission errors are more dangerous because they have real-world consequences that may be irreversible. Production agents should bias toward omission: when uncertain, stop and ask, rather than proceed and risk a commission error.

Why is showing users the actual artifact (the email, the SQL query) more important than showing them the agent's description of its intended action?

Agents describe their intentions optimistically. The actual artifact may contain errors, omissions, or off-brand language that the description does not capture. "I want to send a follow-up email" sounds benign — the actual email might contain legally problematic statements. Confirmation of artifacts, not intentions, is what provides real safety.

What is prompt injection in the context of agents, and how do you defend against it?

Prompt injection: malicious content in external sources (web pages, emails, documents) that the LLM interprets as instructions rather than data. Defense: use XML tags or clear delimiters to separate trusted instructions from untrusted external data, explicitly instruct the model that web content is data not instructions, and filter outputs for signs the model followed injected instructions.

From the books

AI Engineering — Chapter 17: Agents — Chip Huyen (2024)

Huyen's key insight on agent reliability: "Agents fail in two ways: commission errors (doing the wrong thing) and omission errors (failing to complete the task). Commission errors are more dangerous because they have real-world consequences. Production agent design should bias heavily toward omission: when in doubt, do nothing and ask for clarification."

Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al., Meta AI (2023)

Toolformer demonstrated that models can learn to call tools (APIs, calculators, search engines) through self-supervised training rather than explicit instruction following. The implication: as models improve, tool-use quality improves as a byproduct of general capability improvement — but the safety constraints around tool use remain an engineering responsibility, not a model capability.

Constitutional AI: Harmlessness from AI Feedback — Bai et al., Anthropic (2022)

Constitutional AI's relevance to agents: the same principle that guides model behavior (a set of principles the model uses to evaluate and revise its own outputs) can guide agent behavior. Agents can be designed to evaluate each proposed action against a set of safety principles before executing — creating a self-check step between "plan action" and "execute action."

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A