Microscale
0
model

Llama 3.2

by Meta · Released September 2024 · LAST REVIEWED APR 2026

Llama 3.2 is where Meta entered the SLM conversation seriously. 1B and 3B text models built for edge inference; 11B and 90B vision models bolted on top.

what's new in this one

For the SLM audience, Llama 3.2 matters mostly for the text models — 1B and 3B dense decoders, both GQA, both quantisable to run on a phone. These aren't the best-in-class at every eval, but they're the canonical architecture-transferable small models: the same recipe that produced them scales up to Llama 3.1 405B, so lessons learned at 1B transfer upward cleanly.

What "edge inference" means in the 2025/2026 landscape is covered across several Microscale lessons. AWQ vs GPTQ shows how to quantise a 3B model to 4-bit without significant quality loss. BitNet 1.58 shows the aggressive ternary weight route. Ollama and MLX-LM cover the serving runtimes that Llama 3.2 is specifically optimised for.

Architecturally the 1B and 3B are unsurprising: dense, GQA, RoPE, RMSNorm, SwiGLU. The design choice is the setof available sizes. Meta shipped a 1B so the "what's the smallest useful model" conversation has a concrete answer that isn't a research artefact. The model museumlets you compare Llama 3.2's 1B/3B against same-scale peers like Gemma 3 and SmolLM3 directly.

the shape in numbers
Text sizes
1B, 3B
Vision sizes
11B, 90B
Architecture
Dense, GQA
Context
128K
Target
On-device and edge inference
read alongside