Most production AI agents in 2026 are built from the same nine design patterns. This guide walks through each agentic AI design pattern — when it fits, how it's wired, what it costs in latency and language-model tokens, and how to combine them without ending up with a fragile pile of prompts.
Picking the right agent architecture matters more than picking the right model. We've reviewed 88 AI agents on the leaderboard and the difference between an agentic system that scores 85+ and one that ships nothing is almost always architectural, not model-driven. The shops that build AI agents successfully pick a small set of these patterns and compose them deliberately; the shops that don't, reach for "multi-agent systems" on day one and drown in their own orchestration.
This article is for engineering leads picking an architecture for their agentic AI system, and for buyers trying to understand what's actually under the hood of vendors like Claude Code, Devin, Manus and the others on our list. The patterns below apply whether you're using a single agent or coordinating multiple agents, and whether you're calling a frontier language model or a fine-tuned local one.
The 9 patterns at a glance
| # | Pattern | What it solves | Latency | Token cost | When to reach for it |
|---|---|---|---|---|---|
| 1 | Tool Use | Letting the model act on the world | Low | 1× | Always — it's the substrate |
| 2 | ReAct | Multi-step problems that need feedback | Low–Med | 1–3× | Default for tool-using agents |
| 3 | Reflection | Accuracy beats speed | Med | 2–4× | Code review, writing, eval |
| 4 | Plan-and-Execute | Long horizons, irreversible steps | Med | 2× | Migrations, research reports |
| 5 | Routing | Heterogeneous workloads | Low | 1.1× | Support, classify-then-act |
| 6 | Parallelization | Independent sub-tasks, wall-clock matters | Low | 1× per branch | Fan-out research, ensembling |
| 7 | Orchestrator-Workers | One job, many specialties | Med | 2–4× | Cross-functional reports |
| 8 | Evaluator-Optimizer | Generator + judge | Med | 2–3× | Marketing copy, summaries |
| 9 | Multi-Agent Collaboration | Emergent decomposition | High | 3–10× | Genuinely novel problems |
Costs are rough multipliers vs. a one-shot LLM call. The "always start here" trio is Tool Use → ReAct → Reflection. Everything else is an answer to a specific scaling problem you've already hit in your agentic workflow.
For background, see our explainer on the agentic loop, agentic AI, and the ReAct agent definition.
1. Tool Use (a.k.a. function calling)
What it solves: Lets the model act on the world — read files, hit APIs, run code, query a database — instead of only producing text.
Tool use is the substrate that every other pattern in this list sits on top of. In 2026, every frontier model (Claude, GPT, Gemini, Llama) ships first-class structured tool use, so this isn't a pattern you choose so much as one you can't avoid. The interesting decisions are which tools to expose and how narrowly to scope them.
# Pseudocode shape — every framework looks roughly like this
tools = [
{"name": "search_docs", "params": {"query": "str"}},
{"name": "create_ticket", "params": {"title": "str", "body": "str"}},
]
response = model.complete(prompt=user_msg, tools=tools)
if response.tool_calls:
for call in response.tool_calls:
result = dispatch(call.name, call.params)
# feed result back into the next turn
Cost: ~1× a normal completion (a few extra tokens for the tool schema).
Failure modes:
- Tool sprawl. Past ~15 tools the model starts mixing them up. Group related tools behind a "router tool" or split into specialized agents (see pattern #7).
- Loose schemas. A tool with a
query: stringparameter will eat any garbage. Use enums and required fields aggressively. - Side effects without confirmation. Tools that write (send email, charge a card, delete a row) need an explicit human-in-the-loop step or a dry-run mode — see our human-in-the-loop page.
This pattern lives behind nearly every coding agent on our code category — Cursor Agent, Claude Code, Codex CLI all expose a small, sharp set of file/shell/git tools and let the loop do the work.
2. ReAct — reason then act, in a loop
What it solves: Multi-step problems where each action's result informs the next reasoning step.
ReAct (Yao et al., 2022) interleaves Thought → Action → Observation turns. The model writes a short thought, picks a tool, sees the result, and decides what to do next. In 2026 most "agent frameworks" — LangGraph, LlamaIndex, Smolagents, the OpenAI Agents SDK — implement ReAct as the default loop.
Thought: I need the user's most recent invoice
Action: call billing.get_latest_invoice(user_id="u_42")
Observation: { "id": "inv_91", "amount": 1240, "status": "unpaid" }
Thought: It's unpaid — I should check why before reminding them
Action: call billing.get_invoice_events(invoice_id="inv_91")
Observation: [{ "event": "payment_failed", "reason": "card_declined" }]
Thought: Card declined — escalate to a payment-recovery flow
Action: call payments.start_recovery(invoice_id="inv_91")
When to use it: Any tool-using agent where the next step depends on what the last step found. That's most agents.
Trade-offs:
- Latency scales linearly with loop depth — a 10-step ReAct call is ~10× the tail latency of a single completion.
- Easy to debug because each loop iteration is a discrete LLM call you can log and replay.
- Vulnerable to "thought drift" — the model can talk itself out of a correct plan over many turns. Cap loop depth (8–12 is typical) and add a forced-summary step before the cap.
3. Reflection — let the agent critique itself
What it solves: Cases where accuracy matters more than wall-clock latency.
The agent produces a draft answer, then a critic prompt (often the same model with a different system prompt) reviews it for errors, gaps or violations of constraints. The original agent then revises. Two or three rounds is usually enough; more than that hits diminishing returns.
draft = agent.solve(task)
for _ in range(2):
critique = critic.review(draft, rubric=rubric)
if critique.passes:
break
draft = agent.revise(draft, critique)
return draft
Where it earns its keep:
- Code generation. Cursor and Claude Code use a form of reflection when they run tests after each edit — the test failure is the "critique." See our Cursor review and Claude Code review.
- Writing tasks. Marketing copy, support replies, summarization — anywhere there's a quality rubric.
- Eval pipelines. A judge model evaluating another model's output is reflection with the loop opened up. See our agent evaluation guide.
Cost: 2–4× a base completion, depending on how many revision rounds you allow.
Failure mode: The critic and the agent both share the same blind spots when they share the same base model. For high-stakes critique (security review, factual fact-checking), use a different model family as the critic.
4. Plan-and-Execute
What it solves: Long-horizon tasks with many irreversible or expensive steps, where you want to commit to a plan upfront rather than discovering it turn-by-turn.
A planner agent decomposes the task into an ordered list of subtasks. An executor agent (or a worker pool) carries each one out. The planner can re-plan if a step fails.
Plan:
1. Inventory all S3 buckets in account 41123
2. For each bucket, fetch lifecycle policy
3. Identify buckets with no policy and > 1TB
4. Draft remediation tickets in Jira
5. Post summary in #infra-cost
When to reach for it: Multi-hour autonomous work. Devin and Manus are the canonical examples — see Devin vs Cursor for how Devin's planner phase distinguishes it from a pure ReAct coding loop.
Trade-offs vs. ReAct:
- Better for tasks where a wrong early step is expensive (database migration, infrastructure changes).
- Worse when the task is genuinely exploratory — a plan made before you see the data is often wrong.
- Easier to checkpoint and resume because the plan is an explicit artifact.
Cost: ~2× a comparable ReAct run; the upfront planning call is the overhead.
5. Routing — classify, then dispatch
What it solves: Heterogeneous incoming work where different requests need different handling.
A lightweight classifier (sometimes a small model, sometimes a regex layer, sometimes the main model with a constrained-output prompt) tags the request and routes it to the right downstream agent or tool chain.
incoming → classify(intent) → {
"refund": refund_agent,
"tech_issue": triage_agent,
"sales": lead_router,
"_default": general_support,
}
Where it shows up:
- Customer support — see AI customer service agent.
- Sales inbound routing.
- Email triage (different inboxes, different prompts) — see how to automate inbox with AI.
- Frontier-model "model routing" — cheap model for easy queries, frontier model for hard ones. This is now common enough that Anthropic, OpenAI and Google all ship official routing helpers.
Cost: Adds ~10–20% on top of the downstream call (one extra classification turn) but typically saves total cost by sending easy work to cheaper specialists.
6. Parallelization — fan-out, fan-in
What it solves: Independent sub-tasks where wall-clock latency matters and the sub-tasks don't depend on each other.
Two flavors:
Sectioning — split a task into independent pieces, run each in parallel, merge results.
Voting — run the same task N times with the same or different prompts and ensemble the results (majority vote, judge picks the best, average a numeric answer).
# Sectioning
chunks = split_doc(document)
summaries = await asyncio.gather(*[agent.summarize(c) for c in chunks])
final = agent.combine(summaries)
# Voting
attempts = await asyncio.gather(*[agent.solve(task) for _ in range(5)])
answer = majority_vote(attempts)
When to use it: Document analysis, research-with-sources, anywhere you can shard and merge. Voting is also a cheap accuracy lift on benchmarks like SWE-bench — running a coding agent N times and picking the best run boosts pass-rate at the cost of N× tokens.
Cost: N× tokens but ~1× wall-clock, which is exactly the trade-off you usually want.
7. Orchestrator-Workers
What it solves: Jobs that benefit from specialized agents under a central planner.
The orchestrator decides what's needed, spawns worker agents with different system prompts/tools, collects their outputs, synthesizes a final answer. It's Plan-and-Execute with specialized workers instead of a single executor.
Orchestrator
├─ Research worker (web tools, citation discipline)
├─ Code worker (repo access, test runner)
├─ Data worker (SQL, pandas)
└─ Writer worker (long-form synthesis)
Examples in the wild:
- Deep-research products: Perplexity Labs, Gemini Deep Research, Elicit. See Gemini Deep Research vs ChatGPT and our research-stack-for-solo-operators writeup.
- Multi-step SDR agents: separate prospect researcher, message drafter, sender. See best AI SDR tools.
Cost: 2–4× a single-agent solve, with the upside that each worker can use a smaller/cheaper model tuned to its job.
8. Evaluator-Optimizer
What it solves: Tasks with a clear rubric where you can use a judge model to grade and improve drafts.
Same shape as Reflection but with explicit roles: the optimizer generates, the evaluator scores against a rubric, and the loop continues until the score crosses a threshold or hits a max-iterations cap.
Where it works:
- Ad copy and email subject lines (rubric: open-rate proxies).
- Code-style refactors (rubric: passes linters + tests + style guide).
- Search-result reranking (rubric: relevance to query).
Where it fails: When the rubric is fuzzy or learned from too few examples, the evaluator becomes a bottleneck. If your rubric can't be written down in 10–15 bullet points, this pattern isn't the right fit.
9. Multi-Agent Collaboration
What it solves: Genuinely novel, decomposable problems where the right division of labor emerges from the agents themselves.
Peer agents converse — debate, critique, propose, vote. There's no central orchestrator deciding who does what. Frameworks like AutoGen, CrewAI and LangGraph all support this style.
Honest take: This pattern is over-prescribed. Most things people try to solve with peer multi-agent setups solve faster, cheaper and more reliably with a single ReAct loop or an Orchestrator-Workers setup. Reach for multi-agent collaboration only when:
- The task is genuinely open-ended (research at the edge of a field, exploratory simulation).
- You've already tried simpler patterns and they hit a context-window or specialization ceiling.
- You can afford the 3–10× cost and accept the debugging cost.
See autonomous vs copilot agents for the broader autonomy spectrum this pattern lives on.
Combining patterns — what real-world agentic AI systems look like
A production agentic system almost never picks one pattern. The typical shape:
Routing
└─ Plan-and-Execute
├─ Tool Use
└─ ReAct workers
└─ Reflection on critical steps
Three concrete examples from the agents we've reviewed:
Cursor / Claude Code (coding agents): Tool Use + ReAct + Reflection (tests as critic). No multi-agent. Cheap, fast, hard to beat for code edits inside a known repo. See Cursor vs Claude Code for the head-to-head.
Devin (autonomous engineer): Plan-and-Execute on top of Tool Use + ReAct, with a checkpointing layer that lets it pause for human review. The Plan-and-Execute layer is what makes it different from Cursor — and what makes it more expensive. See Devin review.
Perplexity Labs / Gemini Deep Research: Orchestrator-Workers + Parallelization. A planner decomposes the research question, fan-out workers fetch and read sources in parallel, a synthesis pass merges into a cited report.
How to pick: a decision flow
- Does the task involve external action? → You need Tool Use. (Almost certainly yes.)
- Are there multiple steps where each depends on the last? → Add ReAct.
- Is the task error-sensitive (code, money, irreversible writes)? → Add Reflection or Evaluator-Optimizer.
- Are steps expensive/irreversible and the plan known upfront? → Switch the executor to Plan-and-Execute.
- Is the incoming work heterogeneous? → Wrap it in Routing.
- Can sub-tasks run independently? → Parallelization for speed, Voting for accuracy.
- Does the work split cleanly into specialties? → Move to Orchestrator-Workers.
- Is the problem genuinely novel and emergent? → Only now consider Multi-Agent Collaboration.
If you can answer "yes" to step 8 honestly without having tried 1–7, it's worth double-checking. Most product problems live in the 1–4 range.
What this means for buyers
If you're picking an agent off the leaderboard, the patterns above are the lens to read vendor claims through:
- Vendors selling "autonomous" agents are usually pitching Plan-and-Execute + heavy Reflection. Ask how the agent recovers when its plan goes wrong on step 7 of 12.
- Vendors selling "multi-agent" are usually pitching Orchestrator-Workers, not true peer collaboration. Ask whether the workers share memory, who owns conflict resolution, and what the cost multiplier is vs. a single agent.
- Vendors selling "agentic AI" without specifying a pattern are pitching marketing copy. Ask for an architecture diagram.
For the broader business framing, see agentic AI vs generative AI, AI agent vs LLM and our methodology page for how these architectural choices feed into our Agent Rank score.