Llama 3.2
Llama 3.2 is where Meta entered the SLM conversation seriously. 1B and 3B text models built for edge inference; 11B and 90B vision models bolted on top.
For the SLM audience, Llama 3.2 matters mostly for the text models — 1B and 3B dense decoders, both GQA, both quantisable to run on a phone. These aren't the best-in-class at every eval, but they're the canonical architecture-transferable small models: the same recipe that produced them scales up to Llama 3.1 405B, so lessons learned at 1B transfer upward cleanly.
What "edge inference" means in the 2025/2026 landscape is covered across several Microscale lessons. AWQ vs GPTQ shows how to quantise a 3B model to 4-bit without significant quality loss. BitNet 1.58 shows the aggressive ternary weight route. Ollama and MLX-LM cover the serving runtimes that Llama 3.2 is specifically optimised for.
Architecturally the 1B and 3B are unsurprising: dense, GQA, RoPE, RMSNorm, SwiGLU. The design choice is the setof available sizes. Meta shipped a 1B so the "what's the smallest useful model" conversation has a concrete answer that isn't a research artefact. The model museumlets you compare Llama 3.2's 1B/3B against same-scale peers like Gemma 3 and SmolLM3 directly.
- Text sizes
- 1B, 3B
- Vision sizes
- 11B, 90B
- Architecture
- Dense, GQA
- Context
- 128K
- Target
- On-device and edge inference