aiagentrank.io
📊Evaluationalso: hallucinations, confabulation

Hallucination

When an LLM generates content that sounds plausible but is factually wrong or fabricated — a citation that doesn't exist, a function that isn't in the API.

Hallucination is the single biggest reliability problem in production agents. The model isn't lying — it's generating the most-probable token sequence, which sometimes diverges from reality.

Modern mitigation stacks combine RAG (ground the model in real sources), structured output (constrain what can be emitted), tool-use verification (don't trust, verify), and confidence calibration (let the model say "I don't know").

In 2026, well-built agents hallucinate less than they used to, but it's never zero. The right product design assumes some hallucination and provides the user a path to verify.

Frequently asked

Are reasoning models less prone to hallucination?+

Yes, modestly. Reasoning models catch some hallucinations during internal review. But they still hallucinate at lower rates rather than not at all. Verification matters as much as model choice.

Related terms