Microscale
0
model

GPT-OSS

by OpenAI · Released August 2025 · LAST REVIEWED APR 2026

GPT-OSS is OpenAI's first open-weights release in over six years: two MoE models, 20B and 120B total parameters. The architecture confirms what outside teams had reverse-engineered about the closed GPT-series.

what's new in this one

GPT-OSS is the first time since GPT-2 in 2019 that OpenAI has released model weights openly. The architecture, as it turns out, is conventional: MoE for the MLP, GQA for attention, SwiGLU activations, RoPE positions. Nothing exotic. The contribution is the release, not the architecture — but that in itself is a signal: the frontier open-weights ecosystem (DeepSeek, Kimi, Qwen, Llama 4) pushed closed labs to ship something to stay relevant in the open conversation.

Both variants are MoE — 20B total with roughly 3.6B active per token, 120B total with roughly 5B active. The active-param ratios are smaller than DeepSeek-V3 or Kimi K2, which suggests a larger expert count with a tighter top-k, though the exact routing configuration wasn't detailed at release time. The MoE lessoncovers the routing and expert-count math; plug in GPT-OSS's published numbers to see where it lands on the activate-to-total ratio curve.

If you're choosing an open-weights model for inference today, GPT-OSS-20B sits in a useful spot: small enough to run on a single consumer GPU with AWQ quantisation, large enough to compete with Llama 3.1 70B-class instruction quality on many evals. The vLLM lessoncovers the production deployment path. For pure architecture study, DeepSeek-V3 is still the richer target — GPT-OSS confirms the recipe but doesn't extend it.

the shape in numbers
Sizes
20B, 120B (MoE)
Active per token
~3.6B (20B) / ~5B (120B)
Architecture
MoE, GQA, SwiGLU, RoPE
Context
128K
License
Apache 2.0
read alongside