🏗️Architecturealso: self rag, self-reflective rag

Self-RAGdefinition and how it works in 2026

Self-RAG: A RAG variant where the model decides on the fly whether to retrieve, what to retrieve, and whether its own draft is grounded — emitting reflection tokens at each step.

Self-RAG (introduced 2023, mainstream by 2025) trains the model to emit special reflection tokens — Retrieve, IsRel (is the retrieved doc relevant), IsSup (is the draft supported by the doc), IsUse (is the draft useful) — that drive a controllable inference loop. The model decides when retrieval helps rather than always running it.

Compared to standard RAG, Self-RAG cuts unnecessary retrievals (when the model already knows the answer) and adds quality control (the model self-checks groundedness). On open-domain Q&A benchmarks, Self-RAG matches or beats standard RAG with ~50% fewer retrieval calls.

In 2026, Self-RAG is one of three "RAG variants" worth knowing — alongside [corrective-RAG](/glossary/corrective-rag) and [adaptive-RAG](/glossary/adaptive-rag). They're increasingly bundled into "agentic-RAG" platforms rather than implemented separately.

Frequently asked

Do I need a Self-RAG-specific model?+

The original paper trained one. In 2026, frontier models (GPT-5, Claude, Gemini) have strong enough instruction-following that you can implement Self-RAG's logic via prompting rather than needing the special-token training.

When does Self-RAG outperform standard RAG?+

When retrieval quality is mixed (some docs irrelevant, some great) and the model has strong intrinsic knowledge. Self-RAG's ability to skip retrieval and self-check is where the gains come from.

Frequently asked

Related terms