lesson hallucination · 9 min · 45 xp

Hallucination is inevitable

The Kalai-Xu proof that any model trained on finite data must hallucinate — visualized interactively with epistemic uncertainty and benchmark bias

Lab 06 · The Hallucination Probe · 60–90 min

Hallucination is not a bug — it's a property

Xu, Jain, and Kankanhalli (2024) prove this from learning theory rather than from training-set finiteness:

An LLM is a computable function from contexts to next-token distributions.
The space of all possible ground-truth functions is larger than the space of computable LLMs — it includes uncomputable functions, and even within computable ones, the cardinality of LLMs trained on any finite recipe is bounded.
Therefore for any computable LLM there exist computable ground-truth functions it cannot exactly match. Adversarially-chosen prompts in those gaps yield confident, plausible, wrong answers — hallucination as a consequence of computability limits, not just data scarcity.

The “training set has gaps and the model interpolates” intuition is a useful practicalshorthand, but the formal result is stronger: it's not that we need more data — it's that no computable model can cover all the ground truth. You can reduce hallucination, you can detect it, you can constrain output to be verifiable — but you cannot train it out of a language model.

There is a calibration-theoretic sharpening of this worth internalizing. Kalai and Vempala 2024 (“Calibrated Language Models Must Hallucinate”) show that if a language model is well-calibrated in the Brier-score sense — meaning when it says “0.9” it is right 90% of the time — then its hallucination rate on facts seen exactly once in the training corpus is bounded belowby the fraction of such singleton facts in the corpus, typically 5–15% for web-scale data. You can destroy calibration to drop the floor (the model becomes systematically under-confident and refuses more), or you can destroy coverage (the model refuses on anything unfamiliar). You cannot have calibrated, confident, and hallucination-free simultaneously — it's an impossibility triangle, not a training bug. This is why retrieval grounding and explicit abstain tokens aren't polish; they're the only escape hatches from a theorem.

prompt

What year did the Brazilian physicist Carlos Drummond win the Nobel Prize?

model output — delivered with confidence

Carlos Drummond won the Nobel Prize in Physics in 1967 for his work on quantum fluctuations in supercooled liquids.

reality

There is no Brazilian physicist by that name; Carlos Drummond de Andrade was a beloved Brazilian poet. No Nobel Prize in Physics has gone to a Brazilian.

The model confidently invents dates, fields, and work descriptions. The prompt was outside its training distribution, so interpolation over similar prefixes produced a fluent confabulation.

Why SLMs hallucinate more

Three compounding reasons:

Capacity.Fewer parameters mean less factual storage. Ballpark: Allen-Zhu & Li 2024 estimate ~2 bits of facts per parameter. A 3B model can store ~750 MB of facts — split across all topics, languages, code.
Overtraining. SLMs are trained far past Chinchilla, which improves generalization but reduces memorization of rare facts. Paradoxically, better models know less trivia.
Distillation concentration. Distilled students are confidenton their training distribution. That confidence transfers to out-of-distribution queries where it's misplaced.

What actually mitigates it

You cannot eliminate hallucination. You can:

RAG — put the ground truth in context; the model becomes a summarizer instead of a recall engine.
Constrained decoding— for structured outputs, grammar-constrain generation so the model can't emit invalid fields at all.
Verifier loops — generate, then check with a separate model or rule (the RLVR pattern from DeepSeek-R1).
Abstention training— teach the model to output “I don't know” as a valid answer via preference data.

Hallucination is not a bug — it's a property

Xu, Jain, and Kankanhalli (2024) prove this from learning theory rather than from training-set finiteness:

An LLM is a computable function from contexts to next-token distributions.

The space of all possible ground-truth functions is larger than the space of computable LLMs — it includes uncomputable functions, and even within computable ones, the cardinality of LLMs trained on any finite recipe is bounded.

Therefore for any computable LLM there exist computable ground-truth functions it cannot exactly match. Adversarially-chosen prompts in those gaps yield confident, plausible, wrong answers — hallucination as a consequence of computability limits, not just data scarcity.

Why SLMs hallucinate more

Three compounding reasons:

Capacity.Fewer parameters mean less factual storage. Ballpark: Allen-Zhu & Li 2024 estimate ~2 bits of facts per parameter. A 3B model can store ~750 MB of facts — split across all topics, languages, code.

Overtraining. SLMs are trained far past Chinchilla, which improves generalization but reduces memorization of rare facts. Paradoxically, better models know less trivia.

Distillation concentration. Distilled students are confidenton their training distribution. That confidence transfers to out-of-distribution queries where it's misplaced.

What actually mitigates it

You cannot eliminate hallucination. You can:

RAG — put the ground truth in context; the model becomes a summarizer instead of a recall engine.

Constrained decoding— for structured outputs, grammar-constrain generation so the model can't emit invalid fields at all.

Verifier loops — generate, then check with a separate model or rule (the RLVR pattern from DeepSeek-R1).

Abstention training— teach the model to output “I don't know” as a valid answer via preference data.