the aha moment
Implement RMSNorm, RoPE, Grouped-Query Attention, and SwiGLU from scratch in PyTorch — no `nn.TransformerEncoderLayer`, no HuggingFace. Load the real weights from Qwen3-0.6B's layer 0 into your version, and wait for `torch.allclose(yours, theirs, atol=1e-5)` to return True. The hardest lab. The most satisfying lab.
the facts
- Time
- 90–120 min
- Hardware
- CPU · Mac · Colab
- Act
- II · Inside the Machine
- Status
- Live
- Artifact
- A standalone transformer_block.py reference implementation that loads any Qwen3 layer.
run it locally
Clone the labs repo and run this lab as a script or open it as a notebook:
git clone https://github.com/iqbal-sk/Microscale-labs.git cd Microscale just setup-auto # auto-detects CPU / CUDA / Mac just run 03 # or: jupyter lab labs/03-build-a-transformer/lab.py
Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.
read alongside