📊Evaluationalso: citation quality, citation accuracy, citation eval

Citation qualitydefinition and how it works in 2026

Citation quality: An eval metric for systems that cite sources — measures whether citations resolve to real documents, point to the supporting passage, and match the cited claim.

Citation quality is the eval that matters for any AI product whose value depends on verifiability — Perplexity, Hebbia, Glean, Harvey AI, research agents. Three sub-metrics: (1) citation existence — does the cited URL/doc-ID actually exist? (2) citation alignment — does the cited passage support the claim it's attached to? (3) citation specificity — does the citation point to the specific passage or just the document?

The Mata v. Avianca case from 2023 (lawyer cited fabricated Harvey AI cases in a brief) established why citation quality became a high-priority eval. Modern legal-AI tools (Harvey, CoCounsel) score 95%+ on citation existence; the failure mode has shifted from hallucinated cases to citing real cases that don't support the specific claim.

In 2026, citation quality evals are standard in RAG eval frameworks (RAGAS, DeepEval, TruEra) and increasingly automated via LLM-as-a-judge. Most production teams sample 5–10% of citations for human review on top of automated eval.

Frequently asked

What citation-quality score should I target?+

For verifiability-critical products: 95%+ citation existence (no broken links), 90%+ citation alignment (the source actually supports the claim). Below these, users will spot wrong citations and trust collapses.

How do I measure citation alignment automatically?+

LLM-as-a-judge: feed the cited passage + the claim + ask "does this passage support this claim?" Calibrate against human-labeled ground truth quarterly to keep the judge honest.

Citation qualitydefinition and how it works in 2026

Frequently asked

Agents that use citation quality

Related terms