aiagentrank.io
Subscribe
💻Code13 min read

Agentic AI Design Patterns 2026: The 9 AI Agent Patterns You Need

ReAct, Reflection, Plan-and-Execute, Routing, Orchestrator-Workers and 4 more agentic AI design patterns — what each solves, code shape, latency and token-cost trade-offs, and how to combine them in production.

Eyal ShlomoPublished May 23, 2026

Most production AI agents in 2026 are built from the same nine design patterns. This guide walks through each agentic AI design pattern — when it fits, how it's wired, what it costs in latency and language-model tokens, and how to combine them without ending up with a fragile pile of prompts.

Picking the right agent architecture matters more than picking the right model. We've reviewed 88 AI agents on the leaderboard and the difference between an agentic system that scores 85+ and one that ships nothing is almost always architectural, not model-driven. The shops that build AI agents successfully pick a small set of these patterns and compose them deliberately; the shops that don't, reach for "multi-agent systems" on day one and drown in their own orchestration.

This article is for engineering leads picking an architecture for their agentic AI system, and for buyers trying to understand what's actually under the hood of vendors like Claude Code, Devin, Manus and the others on our list. The patterns below apply whether you're using a single agent or coordinating multiple agents, and whether you're calling a frontier language model or a fine-tuned local one.

The 9 patterns at a glance

#PatternWhat it solvesLatencyToken costWhen to reach for it
1Tool UseLetting the model act on the worldLowAlways — it's the substrate
2ReActMulti-step problems that need feedbackLow–Med1–3×Default for tool-using agents
3ReflectionAccuracy beats speedMed2–4×Code review, writing, eval
4Plan-and-ExecuteLong horizons, irreversible stepsMedMigrations, research reports
5RoutingHeterogeneous workloadsLow1.1×Support, classify-then-act
6ParallelizationIndependent sub-tasks, wall-clock mattersLow1× per branchFan-out research, ensembling
7Orchestrator-WorkersOne job, many specialtiesMed2–4×Cross-functional reports
8Evaluator-OptimizerGenerator + judgeMed2–3×Marketing copy, summaries
9Multi-Agent CollaborationEmergent decompositionHigh3–10×Genuinely novel problems

Costs are rough multipliers vs. a one-shot LLM call. The "always start here" trio is Tool Use → ReAct → Reflection. Everything else is an answer to a specific scaling problem you've already hit in your agentic workflow.

For background, see our explainer on the agentic loop, agentic AI, and the ReAct agent definition.

1. Tool Use (a.k.a. function calling)

What it solves: Lets the model act on the world — read files, hit APIs, run code, query a database — instead of only producing text.

Tool use is the substrate that every other pattern in this list sits on top of. In 2026, every frontier model (Claude, GPT, Gemini, Llama) ships first-class structured tool use, so this isn't a pattern you choose so much as one you can't avoid. The interesting decisions are which tools to expose and how narrowly to scope them.

# Pseudocode shape — every framework looks roughly like this
tools = [
    {"name": "search_docs", "params": {"query": "str"}},
    {"name": "create_ticket", "params": {"title": "str", "body": "str"}},
]
response = model.complete(prompt=user_msg, tools=tools)
if response.tool_calls:
    for call in response.tool_calls:
        result = dispatch(call.name, call.params)
        # feed result back into the next turn

Cost: ~1× a normal completion (a few extra tokens for the tool schema).

Failure modes:

  • Tool sprawl. Past ~15 tools the model starts mixing them up. Group related tools behind a "router tool" or split into specialized agents (see pattern #7).
  • Loose schemas. A tool with a query: string parameter will eat any garbage. Use enums and required fields aggressively.
  • Side effects without confirmation. Tools that write (send email, charge a card, delete a row) need an explicit human-in-the-loop step or a dry-run mode — see our human-in-the-loop page.

This pattern lives behind nearly every coding agent on our code category — Cursor Agent, Claude Code, Codex CLI all expose a small, sharp set of file/shell/git tools and let the loop do the work.

2. ReAct — reason then act, in a loop

What it solves: Multi-step problems where each action's result informs the next reasoning step.

ReAct (Yao et al., 2022) interleaves Thought → Action → Observation turns. The model writes a short thought, picks a tool, sees the result, and decides what to do next. In 2026 most "agent frameworks" — LangGraph, LlamaIndex, Smolagents, the OpenAI Agents SDK — implement ReAct as the default loop.

Thought: I need the user's most recent invoice
Action:  call billing.get_latest_invoice(user_id="u_42")
Observation: { "id": "inv_91", "amount": 1240, "status": "unpaid" }
Thought: It's unpaid — I should check why before reminding them
Action:  call billing.get_invoice_events(invoice_id="inv_91")
Observation: [{ "event": "payment_failed", "reason": "card_declined" }]
Thought: Card declined — escalate to a payment-recovery flow
Action:  call payments.start_recovery(invoice_id="inv_91")

When to use it: Any tool-using agent where the next step depends on what the last step found. That's most agents.

Trade-offs:

  • Latency scales linearly with loop depth — a 10-step ReAct call is ~10× the tail latency of a single completion.
  • Easy to debug because each loop iteration is a discrete LLM call you can log and replay.
  • Vulnerable to "thought drift" — the model can talk itself out of a correct plan over many turns. Cap loop depth (8–12 is typical) and add a forced-summary step before the cap.

3. Reflection — let the agent critique itself

What it solves: Cases where accuracy matters more than wall-clock latency.

The agent produces a draft answer, then a critic prompt (often the same model with a different system prompt) reviews it for errors, gaps or violations of constraints. The original agent then revises. Two or three rounds is usually enough; more than that hits diminishing returns.

draft = agent.solve(task)
for _ in range(2):
    critique = critic.review(draft, rubric=rubric)
    if critique.passes:
        break
    draft = agent.revise(draft, critique)
return draft

Where it earns its keep:

  • Code generation. Cursor and Claude Code use a form of reflection when they run tests after each edit — the test failure is the "critique." See our Cursor review and Claude Code review.
  • Writing tasks. Marketing copy, support replies, summarization — anywhere there's a quality rubric.
  • Eval pipelines. A judge model evaluating another model's output is reflection with the loop opened up. See our agent evaluation guide.

Cost: 2–4× a base completion, depending on how many revision rounds you allow.

Failure mode: The critic and the agent both share the same blind spots when they share the same base model. For high-stakes critique (security review, factual fact-checking), use a different model family as the critic.

4. Plan-and-Execute

What it solves: Long-horizon tasks with many irreversible or expensive steps, where you want to commit to a plan upfront rather than discovering it turn-by-turn.

A planner agent decomposes the task into an ordered list of subtasks. An executor agent (or a worker pool) carries each one out. The planner can re-plan if a step fails.

Plan:
 1. Inventory all S3 buckets in account 41123
 2. For each bucket, fetch lifecycle policy
 3. Identify buckets with no policy and > 1TB
 4. Draft remediation tickets in Jira
 5. Post summary in #infra-cost

When to reach for it: Multi-hour autonomous work. Devin and Manus are the canonical examples — see Devin vs Cursor for how Devin's planner phase distinguishes it from a pure ReAct coding loop.

Trade-offs vs. ReAct:

  • Better for tasks where a wrong early step is expensive (database migration, infrastructure changes).
  • Worse when the task is genuinely exploratory — a plan made before you see the data is often wrong.
  • Easier to checkpoint and resume because the plan is an explicit artifact.

Cost: ~2× a comparable ReAct run; the upfront planning call is the overhead.

5. Routing — classify, then dispatch

What it solves: Heterogeneous incoming work where different requests need different handling.

A lightweight classifier (sometimes a small model, sometimes a regex layer, sometimes the main model with a constrained-output prompt) tags the request and routes it to the right downstream agent or tool chain.

incoming → classify(intent) → {
   "refund":     refund_agent,
   "tech_issue": triage_agent,
   "sales":      lead_router,
   "_default":   general_support,
}

Where it shows up:

  • Customer support — see AI customer service agent.
  • Sales inbound routing.
  • Email triage (different inboxes, different prompts) — see how to automate inbox with AI.
  • Frontier-model "model routing" — cheap model for easy queries, frontier model for hard ones. This is now common enough that Anthropic, OpenAI and Google all ship official routing helpers.

Cost: Adds ~10–20% on top of the downstream call (one extra classification turn) but typically saves total cost by sending easy work to cheaper specialists.

6. Parallelization — fan-out, fan-in

What it solves: Independent sub-tasks where wall-clock latency matters and the sub-tasks don't depend on each other.

Two flavors:

Sectioning — split a task into independent pieces, run each in parallel, merge results.

Voting — run the same task N times with the same or different prompts and ensemble the results (majority vote, judge picks the best, average a numeric answer).

# Sectioning
chunks = split_doc(document)
summaries = await asyncio.gather(*[agent.summarize(c) for c in chunks])
final = agent.combine(summaries)

# Voting
attempts = await asyncio.gather(*[agent.solve(task) for _ in range(5)])
answer = majority_vote(attempts)

When to use it: Document analysis, research-with-sources, anywhere you can shard and merge. Voting is also a cheap accuracy lift on benchmarks like SWE-bench — running a coding agent N times and picking the best run boosts pass-rate at the cost of N× tokens.

Cost: N× tokens but ~1× wall-clock, which is exactly the trade-off you usually want.

7. Orchestrator-Workers

What it solves: Jobs that benefit from specialized agents under a central planner.

The orchestrator decides what's needed, spawns worker agents with different system prompts/tools, collects their outputs, synthesizes a final answer. It's Plan-and-Execute with specialized workers instead of a single executor.

Orchestrator
 ├─ Research worker (web tools, citation discipline)
 ├─ Code worker (repo access, test runner)
 ├─ Data worker (SQL, pandas)
 └─ Writer worker (long-form synthesis)

Examples in the wild:

Cost: 2–4× a single-agent solve, with the upside that each worker can use a smaller/cheaper model tuned to its job.

8. Evaluator-Optimizer

What it solves: Tasks with a clear rubric where you can use a judge model to grade and improve drafts.

Same shape as Reflection but with explicit roles: the optimizer generates, the evaluator scores against a rubric, and the loop continues until the score crosses a threshold or hits a max-iterations cap.

Where it works:

  • Ad copy and email subject lines (rubric: open-rate proxies).
  • Code-style refactors (rubric: passes linters + tests + style guide).
  • Search-result reranking (rubric: relevance to query).

Where it fails: When the rubric is fuzzy or learned from too few examples, the evaluator becomes a bottleneck. If your rubric can't be written down in 10–15 bullet points, this pattern isn't the right fit.

9. Multi-Agent Collaboration

What it solves: Genuinely novel, decomposable problems where the right division of labor emerges from the agents themselves.

Peer agents converse — debate, critique, propose, vote. There's no central orchestrator deciding who does what. Frameworks like AutoGen, CrewAI and LangGraph all support this style.

Honest take: This pattern is over-prescribed. Most things people try to solve with peer multi-agent setups solve faster, cheaper and more reliably with a single ReAct loop or an Orchestrator-Workers setup. Reach for multi-agent collaboration only when:

  1. The task is genuinely open-ended (research at the edge of a field, exploratory simulation).
  2. You've already tried simpler patterns and they hit a context-window or specialization ceiling.
  3. You can afford the 3–10× cost and accept the debugging cost.

See autonomous vs copilot agents for the broader autonomy spectrum this pattern lives on.

Combining patterns — what real-world agentic AI systems look like

A production agentic system almost never picks one pattern. The typical shape:

Routing
  └─ Plan-and-Execute
       ├─ Tool Use
       └─ ReAct workers
            └─ Reflection on critical steps

Three concrete examples from the agents we've reviewed:

Cursor / Claude Code (coding agents): Tool Use + ReAct + Reflection (tests as critic). No multi-agent. Cheap, fast, hard to beat for code edits inside a known repo. See Cursor vs Claude Code for the head-to-head.

Devin (autonomous engineer): Plan-and-Execute on top of Tool Use + ReAct, with a checkpointing layer that lets it pause for human review. The Plan-and-Execute layer is what makes it different from Cursor — and what makes it more expensive. See Devin review.

Perplexity Labs / Gemini Deep Research: Orchestrator-Workers + Parallelization. A planner decomposes the research question, fan-out workers fetch and read sources in parallel, a synthesis pass merges into a cited report.

How to pick: a decision flow

  1. Does the task involve external action? → You need Tool Use. (Almost certainly yes.)
  2. Are there multiple steps where each depends on the last? → Add ReAct.
  3. Is the task error-sensitive (code, money, irreversible writes)? → Add Reflection or Evaluator-Optimizer.
  4. Are steps expensive/irreversible and the plan known upfront? → Switch the executor to Plan-and-Execute.
  5. Is the incoming work heterogeneous? → Wrap it in Routing.
  6. Can sub-tasks run independently?Parallelization for speed, Voting for accuracy.
  7. Does the work split cleanly into specialties? → Move to Orchestrator-Workers.
  8. Is the problem genuinely novel and emergent? → Only now consider Multi-Agent Collaboration.

If you can answer "yes" to step 8 honestly without having tried 1–7, it's worth double-checking. Most product problems live in the 1–4 range.

What this means for buyers

If you're picking an agent off the leaderboard, the patterns above are the lens to read vendor claims through:

  • Vendors selling "autonomous" agents are usually pitching Plan-and-Execute + heavy Reflection. Ask how the agent recovers when its plan goes wrong on step 7 of 12.
  • Vendors selling "multi-agent" are usually pitching Orchestrator-Workers, not true peer collaboration. Ask whether the workers share memory, who owns conflict resolution, and what the cost multiplier is vs. a single agent.
  • Vendors selling "agentic AI" without specifying a pattern are pitching marketing copy. Ask for an architecture diagram.

For the broader business framing, see agentic AI vs generative AI, AI agent vs LLM and our methodology page for how these architectural choices feed into our Agent Rank score.

Agents mentioned in this post

More from the blog