model

Phi-4

by Microsoft · Released December 2024 · LAST REVIEWED APR 2026

Phi-4 is a 14B dense model trained to prove you can out-punch 70B-class competitors by spending compute on what the model learns, not just how much. Synthetic textbook data as the headline move.

what's new in this one

The bet behind Phi is that data quality beats data quantity at SLM scale. The Textbooks lessonwalks through the educational-value classifier that filters web data down to pedagogically useful text, and the synthetic-textbook generation pipeline that fills the gaps where the web doesn't cover a concept well. Phi-4 is the production validation of that pipeline at 14B parameters.

There are two Phi-4 variants worth distinguishing. Phi-4 (14B) is the flagship, trained on curated + synthetic data. Phi-4-Mini (3.8B) is an earlier release, smaller, more aggressively distilled. Both have reasoning-specialised siblings released in 2025: Phi-4-Mini-Reasoning (3.8B, distilled from DeepSeek-R1) and Phi-4-reasoning (14B, distilled from o3-mini). These show up in the Reasoning lesson as examples of small reasoning models produced by distillation from frontier teachers, rather than by running the full RL pipeline from scratch.

Architecturally Phi-4 is unremarkable: dense decoder, GQA, RoPE, SwiGLU, nothing exotic. That's the point: the contribution is at the data layer, not the architecture layer. See the model museum for how Phi-4 compares to same-scale peers.

the shape in numbers

Sizes: Phi-4 (14B), Phi-4-Mini (3.8B)
Architecture: Dense, GQA
Training data: Synthesized + filtered web
Context: 16K (main), 128K (Mini)
Reasoning variants: Mini-Reasoning (R1-distilled)