Build a Transformer Block from Raw Ops

the aha moment

Implement RMSNorm, RoPE, Grouped-Query Attention, and SwiGLU from scratch in PyTorch — no `nn.TransformerEncoderLayer`, no HuggingFace. Load the real weights from Qwen3-0.6B's layer 0 into your version, and wait for `torch.allclose(yours, theirs, atol=1e-5)` to return True. The hardest lab. The most satisfying lab.

Open in Colab View on GitHub

the facts

Time: 90–120 min
Hardware: CPU · Mac · Colab
Act: II · Inside the Machine
Status: Live
Artifact: A standalone transformer_block.py reference implementation that loads any Qwen3 layer.

run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 03
# or:  jupyter lab labs/03-build-a-transformer/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside

Lesson · 12 min · 80 xp

Build-a-block capstone

Wire up RMSNorm, GQA, SwiGLU, RoPE, and residual connections into the canonical 2026 decoder block — every Act II component in one working layer

Open in Colab View on GitHub ← all labs