Agentic RAG is what classic RAG became when retrieval got smart enough to be an agent loop. Instead of one retrieval + one answer, the agent plans, queries, evaluates, reformulates, re-queries, and synthesizes β closer to how a human researcher actually works.
TLDR
Classic RAG: search β top-K chunks β stuff into prompt β answer. One shot.
Agentic RAG: decompose question β query β judge results β reformulate β query again β cross-source synthesis β cited answer. Iterative.
The agentic version is 5-20Γ more expensive per query but produces materially better answers on hard questions.
Read the long version in the agentic RAG glossary entry.
The shape of agentic RAG
A typical agentic RAG loop:
- Decompose the user's question into sub-questions
- Query retrieval (vector search, web search, structured DB) for each sub-question
- Judge each result β relevant? trustworthy? fresh enough?
- Reformulate queries that returned weak results, search again
- Cross-source synthesize β reconcile contradictions, weight sources
- Cite β link claims to source documents
- Loop β if the synthesized answer isn't confident, go back to step 2
Latency: 10-60 seconds for typical queries (vs. 1-3 seconds for classic RAG).
Cost: $0.05-0.50 per query (vs. ~$0.01 for classic RAG).
Why agentic RAG matters
Classic RAG fails on:
- Multi-hop questions β "What did the CEO of the company that acquired Anthropic's biggest competitor say about AGI?" requires 3 hops, not one
- Comparative questions β "Compare Stripe + Adyen + Checkout.com on developer experience" requires querying 3 sources + cross-source synthesis
- Open-ended research β "Investigate the regulatory landscape for AI agents in healthcare" requires multiple queries + synthesis
- Ambiguous queries β "Find a competitor doing better than us in customer support" requires understanding the user's context + reformulating
Agentic RAG handles all of these. The price is latency + cost.
When to use which
Use classic RAG when:
- Simple lookups: "what's our return policy"
- Single-hop questions with a clear answer
- Latency-sensitive surfaces (chatbots, voice agents)
- Cost-sensitive at high query volume
Use agentic RAG when:
- Multi-hop or comparative questions
- Research-flavored workflows (executive briefs, due diligence)
- High-stakes answers where accuracy matters more than latency
- Users explicitly opt-in to deeper research ("Deep Research" UX)
Most production systems run both β classic RAG as the default, agentic RAG triggered by query complexity heuristics or explicit user request.
Leading implementations in 2026
Consumer products
- Perplexity Labs β strong agentic search with citations
- OpenAI Deep Research β multi-step research with extended thinking
- Gemini Deep Research β Google's flavor, integrated with Workspace data
- Anthropic Research mode β newer, ships with Claude.ai
Enterprise products
- Glean β agentic RAG over internal knowledge bases
- Hebbia β agentic RAG for financial research
- Elicit β agentic RAG for academic research
Developer frameworks
- LangGraph with retrieval nodes
- LlamaIndex Agents with query engines
- CrewAI with retrieval tools
All are credible. Pick by your stack + ergonomic preferences.
Architecture patterns
The two dominant patterns in 2026:
Pattern 1: Plan-and-execute
plan β [query_1, query_2, query_3] β execute_all_in_parallel
β judge_results β optional_reformulate β synthesize β cite
Faster (parallel queries), works well when sub-questions are independent.
Pattern 2: ReAct-style iterative
reason β act (query) β observe β reason β act (re-query)
β observe β ... β synthesize β cite
Slower (sequential), works better when later queries depend on earlier results.
Most production systems use a hybrid: plan-and-execute for the first round, ReAct-style iteration for refinement.
Cost economics in 2026
For a typical agentic RAG query:
- 5-10 LLM calls (planner + judges + synthesizer)
- 3-8 retrieval calls (vector search + web search)
- Total cost: $0.05-0.50 per query depending on model + retrieval choice
For 10K queries/month: $500-5,000/month. Materially more expensive than classic RAG at the same volume β justified only by the answer-quality difference.
Common mistakes
- Using agentic RAG for everything. Most queries don't need it. Triage; route by complexity.
- Skimping on judging. The "judge each retrieval" step is what makes agentic RAG work. Skip it and you've just made classic RAG slower + more expensive.
- Ignoring citation quality. The cited-source pattern is the trust signal. If citations are wrong or fabricated, the whole system loses credibility.
- Not setting a budget. Agentic RAG can spiral β a bad query can drive 20+ LLM calls. Hard-cap retries.
See also
Bottom line
Agentic RAG is the right pattern for hard research-flavored questions. Don't use it for simple lookups (waste of latency + cost). Build a triage layer that routes simple queries to classic RAG and complex queries to agentic RAG. Most production systems converge on this hybrid pattern.