Phi-4
Phi-4 is a 14B dense model trained to prove you can out-punch 70B-class competitors by spending compute on what the model learns, not just how much. Synthetic textbook data as the headline move.
The bet behind Phi is that data quality beats data quantity at SLM scale. The Textbooks lessonwalks through the educational-value classifier that filters web data down to pedagogically useful text, and the synthetic-textbook generation pipeline that fills the gaps where the web doesn't cover a concept well. Phi-4 is the production validation of that pipeline at 14B parameters.
There are two Phi-4 variants worth distinguishing. Phi-4 (14B) is the flagship, trained on curated + synthetic data. Phi-4-Mini (3.8B) is an earlier release, smaller, more aggressively distilled. Both have reasoning-specialised siblings released in 2025: Phi-4-Mini-Reasoning (3.8B, distilled from DeepSeek-R1) and Phi-4-reasoning (14B, distilled from o3-mini). These show up in the Reasoning lesson as examples of small reasoning models produced by distillation from frontier teachers, rather than by running the full RL pipeline from scratch.
Architecturally Phi-4 is unremarkable: dense decoder, GQA, RoPE, SwiGLU, nothing exotic. That's the point: the contribution is at the data layer, not the architecture layer. See the model museum for how Phi-4 compares to same-scale peers.
- Sizes
- Phi-4 (14B), Phi-4-Mini (3.8B)
- Architecture
- Dense, GQA
- Training data
- Synthesized + filtered web
- Context
- 16K (main), 128K (Mini)
- Reasoning variants
- Mini-Reasoning (R1-distilled)
- Act III · 10 min · 40 xpThe model museumExplore every major SLM — Phi-4, Llama 3.2, Qwen3, Gemma 3, SmolLM3, BitNet — with architecture diagrams, training recipes, and benchmarks
- Act IV · 9 min · 45 xpThe textbook hypothesisHow Microsoft Phi proved data quality beats scale — educational-value classifiers, synthetic textbook generation, mode collapse risks, and Phi-4 exceeding GPT-4