Quantize It Yourself · Microscale

the aha moment

Take a single 3072×3072 weight tensor from Qwen3-0.6B and implement three quantisation schemes from scratch — naive 4-bit uniform, NF4 quantile-binned, K-quant Q4_K_M with sub-block scales. Measure L2 error for each. Watch naive lose 3× to NF4, and NF4 lose 2× to Q4_K_M. The hierarchy of quantisation tricks is now a chart you built.

Open in Colab View on GitHub

the facts

Time: 90 min
Hardware: CPU · Colab
Act: VII · Packing for Travel
Status: Live
Artifact: Three quantised tensor files + an error-comparison chart.

run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 09
# or:  jupyter lab labs/09-quantize-it-yourself/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside

Lesson · 10 min · 50 xp

K-quants super-blocks

Q4_K_M, Q8_0, and the 256-weight super-block hierarchy — how mixed-precision k-quants make local LLM inference possible

Open in Colab View on GitHub ← all labs