Microscale
0
Act VWhere They Break
lesson hallucination · 9 min · 45 xp

Hallucination is inevitable

The Kalai / Xu argument, interactive

Hallucination is not a bug — it's a property

Xu, Jain, and Kankanhalli (2024) offered a formal argument for why hallucination cannot be eliminated from language models. Sketch of the argument:

  1. A language model is a function from contexts to token distributions.
  2. For any finite training corpus, there exist true facts about the world not in the corpus.
  3. An ideally-trained LM predicts tokens with high probability when they match the training distribution.
  4. For facts outside the training distribution, the LM's output is determined by interpolationover similar patterns — which can produce confident but false statements.
  5. No training algorithm can avoid this, because step (2) is always true for any finite corpus.

The practical upshot: you can reduce hallucination, you can detect it, you can constrain output to be verifiable — but you cannot train it out of a language model. It's as fundamental as the fact that a finite classifier can't learn every possible function.

There is a calibration-theoretic sharpening of this worth internalizing. Kalai and Vempala 2024 (“Calibrated Language Models Must Hallucinate”) show that if a language model is well-calibrated in the Brier-score sense — meaning when it says “0.9” it is right 90% of the time — then its hallucination rate on facts seen exactly once in the training corpus is bounded belowby the fraction of such singleton facts in the corpus, typically 5–15% for web-scale data. You can destroy calibration to drop the floor (the model becomes systematically under-confident and refuses more), or you can destroy coverage (the model refuses on anything unfamiliar). You cannot have calibrated, confident, and hallucination-free simultaneously — it's an impossibility triangle, not a training bug. This is why retrieval grounding and explicit abstain tokens aren't polish; they're the only escape hatches from a theorem.

prompt
What year did the Brazilian physicist Carlos Drummond win the Nobel Prize?
model output — delivered with confidence
Carlos Drummond won the Nobel Prize in Physics in 1967 for his work on quantum fluctuations in supercooled liquids.
reality
There is no Brazilian physicist by that name; Carlos Drummond de Andrade was a beloved Brazilian poet. No Nobel Prize in Physics has gone to a Brazilian.
The model confidently invents dates, fields, and work descriptions. The prompt was outside its training distribution, so interpolation over similar prefixes produced a fluent confabulation.

Why SLMs hallucinate more

Three compounding reasons:

  • Capacity.Fewer parameters mean less factual storage. Ballpark: Allen-Zhu & Li 2024 estimate ~2 bits of facts per parameter. A 3B model can store ~750 MB of facts — split across all topics, languages, code.
  • Overtraining. SLMs are trained far past Chinchilla, which improves generalization but reduces memorization of rare facts. Paradoxically, better models know less trivia.
  • Distillation concentration. Distilled students are confidenton their training distribution. That confidence transfers to out-of-distribution queries where it's misplaced.

What actually mitigates it

You cannot eliminate hallucination. You can:

  • RAG — put the ground truth in context; the model becomes a summarizer instead of a recall engine.
  • Constrained decoding— for structured outputs, grammar-constrain generation so the model can't emit invalid fields at all.
  • Verifier loops — generate, then check with a separate model or rule (the RLVR pattern from DeepSeek-R1).
  • Abstention training— teach the model to output “I don't know” as a valid answer via preference data.