Hallucination Detection: Why It's Hard and What Actually Works
Factual vs. faithfulness vs. citation hallucinations. NLI-based detection, self-consistency, and retrieval grounding — tested against real examples.
Hallucination is the most cited failure mode of LLMs — and also the most misunderstood. Not all hallucinations are the same. Detecting them requires different techniques depending on what type you're dealing with.
Three types of hallucination
| Type | Definition | Example | Detection method |
|---|---|---|---|
| Factual | Model asserts a false real-world fact | "Einstein won the Nobel Prize in 1922" (it was 1921) | External knowledge base lookup |
| Faithfulness | Answer contradicts the provided context | Context says revenue was $4M, answer says $14M | NLI / entailment model |
| Citation | Model cites a source that doesn't support the claim (or doesn't exist) | Fabricated paper title/DOI | Source verification |
NLI-based faithfulness detection
Natural Language Inference (NLI) models classify whether a hypothesis is entailed by, contradicted by, or neutral to a premise. For RAG, you can use an NLI model to check whether each claim in the model's answer is entailed by the retrieved context.
from transformers import pipeline
nli = pipeline("text-classification",
model="cross-encoder/nli-deberta-v3-small")
context = "The company was founded in 2018 and went public in 2023."
claim = "The company has been public since 2021."
result = nli(f"{context} [SEP] {claim}")
# Output: {'label': 'CONTRADICTION', 'score': 0.97}
# → flag this claim as a potential hallucination
Self-consistency as a hallucination signal
Generate the same answer multiple times with temperature > 0. If the model gives consistent answers, it's more likely to be correct. High variance across samples signals low confidence — a useful proxy for potential hallucination without needing ground truth.
Self-consistency works because hallucinations are often low-probability outputs. A hallucinated fact will be inconsistently stated across samples. A true fact tends to be stated consistently.
RAGAS faithfulness metric
RAGAS decomposes the model's answer into atomic claims and checks each claim against the retrieved context using an LLM-as-judge pattern. It produces a faithfulness score between 0 and 1. This is the most widely used RAG-specific hallucination metric in production.
- Faithfulness: fraction of answer claims that are entailed by the retrieved context
- Answer Relevancy: how well the answer addresses the actual question
- Context Precision: fraction of retrieved context that's actually relevant
- Context Recall: fraction of ground-truth information that's present in the retrieved context
Run RAGAS offline on a golden test set (100–200 hand-labelled Q&A pairs) every time you change your RAG pipeline. It's the fastest way to catch regressions before they reach users.
Spot hallucinations in Playground →: Feed the model contradictory context and see how faithfulness breaks down in real time.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →