Hallucination is inevitable
The Kalai-Xu proof that any model trained on finite data must hallucinate — visualized interactively with epistemic uncertainty and benchmark bias
Lab 06 · The Hallucination Probe · 60–90 minHallucination is not a bug — it's a property
Xu, Jain, and Kankanhalli (2024) prove this from learning theory rather than from training-set finiteness:
- An LLM is a computable function from contexts to next-token distributions.
- The space of all possible ground-truth functions is larger than the space of computable LLMs — it includes uncomputable functions, and even within computable ones, the cardinality of LLMs trained on any finite recipe is bounded.
- Therefore for any computable LLM there exist computable ground-truth functions it cannot exactly match. Adversarially-chosen prompts in those gaps yield confident, plausible, wrong answers — hallucination as a consequence of computability limits, not just data scarcity.
The “training set has gaps and the model interpolates” intuition is a useful practicalshorthand, but the formal result is stronger: it's not that we need more data — it's that no computable model can cover all the ground truth. You can reduce hallucination, you can detect it, you can constrain output to be verifiable — but you cannot train it out of a language model.
There is a calibration-theoretic sharpening of this worth internalizing. Kalai and Vempala 2024 (“Calibrated Language Models Must Hallucinate”) show that if a language model is well-calibrated in the Brier-score sense — meaning when it says “0.9” it is right 90% of the time — then its hallucination rate on facts seen exactly once in the training corpus is bounded belowby the fraction of such singleton facts in the corpus, typically 5–15% for web-scale data. You can destroy calibration to drop the floor (the model becomes systematically under-confident and refuses more), or you can destroy coverage (the model refuses on anything unfamiliar). You cannot have calibrated, confident, and hallucination-free simultaneously — it's an impossibility triangle, not a training bug. This is why retrieval grounding and explicit abstain tokens aren't polish; they're the only escape hatches from a theorem.
Why SLMs hallucinate more
Three compounding reasons:
- Capacity.Fewer parameters mean less factual storage. Ballpark: Allen-Zhu & Li 2024 estimate ~2 bits of facts per parameter. A 3B model can store ~750 MB of facts — split across all topics, languages, code.
- Overtraining. SLMs are trained far past Chinchilla, which improves generalization but reduces memorization of rare facts. Paradoxically, better models know less trivia.
- Distillation concentration. Distilled students are confidenton their training distribution. That confidence transfers to out-of-distribution queries where it's misplaced.
What actually mitigates it
You cannot eliminate hallucination. You can:
- RAG — put the ground truth in context; the model becomes a summarizer instead of a recall engine.
- Constrained decoding— for structured outputs, grammar-constrain generation so the model can't emit invalid fields at all.
- Verifier loops — generate, then check with a separate model or rule (the RLVR pattern from DeepSeek-R1).
- Abstention training— teach the model to output “I don't know” as a valid answer via preference data.