the aha moment
Build 20 preference pairs for a narrow task (chosen vs rejected responses), run TRL's DPOTrainer on Qwen3-0.6B for 100 steps, and watch the model's behaviour shift from generic-chatbot to specifically-matches-your-chosen-examples. Alignment stops being abstract and becomes a trained adapter you can A/B test.
the facts
- Time
- 90 min
- Hardware
- GPU · Mac · Colab
- Act
- VI · Making It Yours
- Status
- Live
- Artifact
- A DPO-aligned LoRA adapter + a before/after comparison report.
run it locally
Clone the labs repo and run this lab as a script or open it as a notebook:
git clone https://github.com/iqbal-sk/Microscale-labs.git cd Microscale just setup-auto # auto-detects CPU / CUDA / Mac just run 08 # or: jupyter lab labs/08-dpo-alignment/lab.py
Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.
read alongside