📊Evaluationalso: groundedness, grounding, source grounding

Groundednessdefinition and how it works in 2026

Groundedness: A RAG eval metric measuring whether the generated response is supported by the retrieved context. Distinct from factual accuracy — the answer could be grounded in a wrong source.

Groundedness asks: did the model only say things that were actually in the retrieved documents? It's the primary defense against the "RAG hallucinations" that creep in when the model extends beyond what the context supports — a confident-sounding answer that the user can't verify in the sources.

In practice, groundedness is measured by extracting each claim in the response, then checking whether each claim is supported by the retrieved context (usually via an LLM judge). The output is a per-claim binary or a 0–1 score; production systems flag low-groundedness responses for review or rerouting.

Critically, groundedness is independent of correctness. An answer can be perfectly grounded in a wrong source document — high groundedness, low factual accuracy. Or perfectly correct but contain claims not in the retrieved context — low groundedness, high accuracy. Mature RAG evals measure both.

Frequently asked

Groundedness vs faithfulness — same thing?+

Used interchangeably in 2026, though originally they meant slightly different things. Both measure "is the response supported by the retrieved context." Most eval frameworks (RAGAS, DeepEval) use the terms as synonyms now.

What's a good groundedness score?+

>0.9 for production RAG systems is the target. 0.7–0.9 indicates work needed — usually better retrieval or stricter prompting. <0.7 is a broken system that's probably hallucinating most of the time.

Frequently asked

Related terms