model

Qwen3

by Alibaba · Released 2025 · LAST REVIEWED APR 2026

Qwen3 ships at every scale: 0.6B, 1.7B, 4B, 8B, 14B, 32B dense, plus a 30B-A3B and a 235B-A22B MoE. The pick-your-size family for open-weights work in 2026.

what's new in this one

Qwen3's design choice is breadth — every compute budget is covered. A 0.6B dense model fits on a laptop; a 235B-A22B MoE runs on a single 8×H100 node. The architecture stays consistent across sizes (RoPE, GQA, SwiGLU, RMSNorm) so a recipe that works on 4B transfers to 32B without surgery.

The MoE variants use a familiar shape: top-8 routing of 128 experts, shared experts to carry universal features. The MoE lessonwalks through the routing math that applies here identically; only the expert count differs from DeepSeek-V3's 256 or Kimi K2's 384.

Qwen3-Next is the interesting variant architecturally — it's the first production model to ship MTP with depth ≥ 2, pushing the speculative decoding acceptance rate above what DeepSeek-V3's depth-1 module achieves. The MTP lessonshows why depth-2 is non-obvious (the second module's acceptance is conditional on the first, multiplicatively reducing expected speedup). Qwen3-Next also extends context to 1M tokens — far past the 128K ceiling that most 2025 models stopped at.

the shape in numbers

Sizes (dense): 0.6B, 1.7B, 4B, 8B, 14B, 32B
Sizes (MoE): 30B-A3B, 235B-A22B
Routing (MoE): top-8 of 128 routed
Context: 128K+ (Qwen3-Next: 1M)
Notable: Strong multilingual + Qwen3-Next MTP