Evaluation 8 min read

The Reversal Curse: Why LLMs Know a Fact One Way But Not the Other

Training on 'A is B' does not teach 'B is A'. Why parametric memory is directional, what this means for RAG systems and fine-tuning datasets, and the four-case eval rule that catches the gap.

A simple test most models fail

Ask a language model: "Who is Tom Hanks?" It will tell you he is an actor. Ask: "Tom Hanks is a well-known [blank]?" It fills in "actor" without hesitation. Now ask: "What famous actor starred in Forrest Gump?" It answers correctly. Then ask: "Forrest Gump starred [blank] in the lead role." Many models hesitate, confabulate, or get this wrong — even though the factual content is identical and the model clearly knows the fact in the forward direction.

This is the Reversal Curse. It was formally documented by Berglund et al. (2023): if a model is trained on 'A is B', it does not automatically learn 'B is A'. The relationship is directional in the model's weights. The model knows the fact in the direction it was trained on, but cannot reliably reverse it.

Why this happens

Language models learn next-token prediction over the training corpus. The statistical structure of natural language is not symmetric. In text, facts appear far more often in some directions than others. 'The CEO of Anthropic is Dario Amodei' appears in that order in news articles, Wikipedia, and company pages. 'Dario Amodei is the CEO of Anthropic' appears less frequently — it is still true but the sentence structure is less common.

Because the model learns from token-level patterns, the strength of the association in the weights reflects the frequency and directionality of exposure in training data. The reverse query activates a different chain of associations, one that was not as heavily reinforced. The model's parametric memory is directional, not a lookup table.

This is not a model size problem. Berglund et al. found the Reversal Curse holds at all tested scales, including models over 100B parameters. Scaling up does not fix it. The architecture is the constraint.

There is an important exception: when both entities are in the context window. If you provide the full context — 'Tom Hanks starred in Forrest Gump; who starred in Forrest Gump?' — the model can use the context to reason backward. The reversal failure is specifically a parametric memory failure. In-context reasoning is much more symmetric.

Why this matters for RAG

The common assumption about RAG is: if the right document is retrieved, the model will answer correctly. The Reversal Curse breaks this assumption in a specific class of queries.

Consider a knowledge base with a document that says: 'Project Phoenix was led by Sarah Chen.' A user asks: 'Who led Project Phoenix?' — retrieval works, document retrieved, model answers correctly. Same user asks: 'Sarah Chen led which project?' — retrieval may still work (the same document is a good match), but when the model generates an answer, it needs to complete 'Sarah Chen led [blank]' from context. If Sarah Chen was a rare entity in pretraining, the parametric memory has no strong association. The model must rely entirely on in-context reasoning from the retrieved document.

Most of the time, with a well-retrieved document and a capable model, this works. The failure case is when retrieval is imperfect and the model has to combine partial context with parametric memory — the parametric contribution may be directionally biased and actively misleading.

Why this matters for fine-tuning

If you fine-tune a model on a dataset of Q&A pairs, the training examples have a direction. 'Q: What is [entity A]'s role? A: [Entity A] is [role].' The fine-tuning reinforces the forward direction. If your eval then tests only forward questions, you get strong scores. If a user asks the reverse, you may get failures that your eval never caught.

The practical implication: when building fine-tuning datasets for factual tasks, include reverse-direction examples explicitly. '[Entity B] holds which title at [org A]?' as well as 'What does [org A]'s [title] look like?' Both directions need representation in training, not just the natural-language-dominant direction.

Why this matters for evals

Eval suites for factual tasks almost always test the dominant direction. MMLU asks 'What is X?' not 'X is an example of what?' Entity knowledge benchmarks present entities first, attributes second. The benchmarks measure what the training data structure predicts — not whether the model has symmetric access to the fact.

The result: eval scores overstate factual reliability for reverse queries. Production users ask questions in any direction. The eval never catches the gap.

The four-case eval rule: for any important factual claim in your system, test all four cases — (1) forward question, parametric, (2) reverse question, parametric, (3) forward question, with context retrieved, (4) reverse question, with context retrieved. Cases 1 and 3 will almost always pass. Cases 2 and 4 reveal the real reliability profile.

Mitigations

Augment fine-tuning data with reverse-direction examples for every important factual relationship
In RAG systems, always retrieve supporting context even for queries that seem to require parametric recall — don't assume the model has the reverse direction in weights
Add reverse-direction test cases to your eval suite as a standard practice, not a one-time audit
For critical facts, use structured output to force the model to cite retrieved evidence rather than generate from parametric memory
When prompt-engineering for factual tasks, include the retrieved context directly adjacent to the query — the in-context reasoning path is much more symmetric than the parametric path

The broader implication

The Reversal Curse is one instance of a larger truth about language models: parametric memory is not a database. It is a compressed statistical representation of training data distribution. The structure of the representation reflects the structure of the data — including its asymmetries, its frequencies, its directional biases.

When you ask a model a question that the training data answered frequently in the forward direction, you get reliable recall. When you ask the reverse, or an unusual framing, or a query that requires combining two facts that appeared in different contexts, you are asking the model to generalize across statistical patterns it may never have seen aligned. Sometimes it works. Often it does not. Good system design does not rely on it working.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →