PEFT, preference optimization, and specialization recipes
LoRA as a decomposition you can watch form. QLoRA as a better bit-grid. DPO as a derivation that builds in front of you. GRPO as a rollout dance. Then three end-to-end recipes: tool-calling SLM, domain expert, personal assistant.