the aha moment
Take a single 3072×3072 weight tensor from Qwen3-0.6B and implement three quantisation schemes from scratch — naive 4-bit uniform, NF4 quantile-binned, K-quant Q4_K_M with sub-block scales. Measure L2 error for each. Watch naive lose 3× to NF4, and NF4 lose 2× to Q4_K_M. The hierarchy of quantisation tricks is now a chart you built.
the facts
- Time
- 90 min
- Hardware
- CPU · Colab
- Act
- VII · Packing for Travel
- Status
- Live
- Artifact
- Three quantised tensor files + an error-comparison chart.
run it locally
Clone the labs repo and run this lab as a script or open it as a notebook:
git clone https://github.com/iqbal-sk/Microscale-labs.git cd Microscale just setup-auto # auto-detects CPU / CUDA / Mac just run 09 # or: jupyter lab labs/09-quantize-it-yourself/lab.py
Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.
read alongside