Foundations & Architecture 5 min read

What Hallucination Actually Is (Not a Bug)

The model has no concept of truth. It predicts the most probable next token given context. When context doesn't constrain the answer, the model fills with statistically plausible completion — the same mechanism that makes it fluent makes it confidently wrong.

The legal team files an urgent bug report. The model cited a court case — Hendricks v. Meridian Capital Partners, 9th Circuit, 2019 — in a contract review summary. The case does not exist. The citation format is perfect: court, year, circuit, parties. The legal team wants to know why the model is making things up. The engineer assigned to investigate has to explain something counterintuitive: the model is not making anything up. It is doing exactly what it was trained to do.

A language model has no internal representation of truth. It has no dictionary of facts it believes versus facts it doubts. It has weights — billions of parameters encoding statistical patterns over the training corpus. During generation, the model looks at everything in its context window and predicts the single most probable next token. That is the entire mechanism. There is no verification step. There is no fact-checking module. There is next-token prediction, applied one token at a time.

When context provides strong constraint — a fill-in-the-blank where only one answer fits, a math problem with a unique solution — the model's probability distribution is sharply peaked at the correct completion. When context provides weak constraint — "cite a relevant case about contract fraud" — the model's distribution over next tokens reflects what a case citation looks like statistically, not which case citations are real. Both operations are identical from the model's perspective. The difference is how much signal the context provides about what the correct next token is.

Prompt: "The contract dispute was resolved in [CASE NAME]"

Top token probabilities (flat — no real case anchors the distribution):
  "Smith v. Williams (2018)"       → 0.031
  "Johnson v. First National"      → 0.028
  "Davis v. Meridian Capital"      → 0.027
  "Brown v. Hamilton Trust"        → 0.026
  "Hendricks v. Meridian (2019)"   → 0.024
  "Chen v. Pacific Ventures"       → 0.023
  ... (hundreds of equally plausible completions)

Peak probability: 0.031 — near-uniform across all plausible case formats
No real case has elevated probability because training data gave no signal
Model samples from: "what a case citation looks like"

The word hallucination implies a malfunction — the model deviated from some correct mode of operation. In reality the model is operating exactly as designed. The training objective — minimize cross-entropy loss on next-token prediction across a large corpus — contains no constraint that says only predict tokens corresponding to verifiable real-world facts. The model learned what text looks like. Legal citations look like specific things. The model produces text that looks like a legal citation, because that is what maximizes the probability of the next token given the context.

This also explains why retrieval-augmented generation reduces hallucination without changing the model at all. When you inject the actual source document into the context, the probability distribution over the next token is now constrained by real text. The model is still doing next-token prediction — but now the context provides strong signal about what tokens should come next. RAG does not fix the generation mechanism. It changes the information environment the mechanism operates in, giving it actual ground truth to be constrained by.

The legal team's system was not broken. The design was broken: it asked the model to produce factual citations without providing any facts. The fix was not a better prompt. It was a retrieval layer that injected real case law before the model generated anything.

Hallucination is next-token prediction working correctly in an information vacuum — when context does not constrain which tokens are factually right, the model fills the space with statistically plausible completions, because that is the only objective it was ever trained to optimize.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →