It sounds like you've worked near these architectures long enough to know the names — MoE, MLA, RoPE, FlashAttention, LoRA — but not to explain why each one exists. This is the layer underneath.
54 lessons · 13 labs · every claim ties to a shipped model or a paper you can trace
Click a region to enter. Each lesson is a playable model, not a lecture — you'll earn the mathematics the same way you earn the badge.
Every lesson embeds a playable version of whatever it teaches. Click the other label above to flip between them.
read the lesson · Tokens and probabilitiesYou already use language models. Maybe you fine-tune them. Maybe you serve them. Maybe you read release notes and still skip the architectural footnotes.
Microscale is for the point where that stops being enough — when “MoE,” “KV cache,” “RoPE scaling,” “FlashAttention,” and “LoRA rank” need to become mechanisms you can reason about, not terms you recognize.
Bring Python, tensors, linear algebra, and patience. If you want prompt tips, you want a different site.
A different way to learn sits next to the reading path. Twelve specimens wait on the workbench — all 448 attention heads of a 600M model classifying themselves into previous-token and induction patterns, a 10M transformer descending from noise to coherent English in twenty minutes of consumer GPU time, a 2 MB LoRA adapter that reshapes a model's voice on twenty cooking examples, your own GPU's bandwidth plotted on a roofline against your own model's arithmetic intensity.
Every one produces a number or a file you keep. None of them require a datacentre.
This journal is organised as a slow path, not a dense reference. There is a canonical order through the regions, but you are free to wander. Nothing is locked; progress rings appear only to help you find your way back.