Loading…

Microscalea field journal for small language models

progress0 / 2605 xp

Primer Curriculum Models Labs

0

← back to the atlas

Act VI · Region 06

Making It Yours

PEFT, preference optimization, and specialization recipes

LoRA as a decomposition you can watch form. QLoRA as a better bit-grid. DPO as a derivation that builds in front of you. GRPO as a rollout dance. Then three end-to-end recipes: tool-calling SLM, domain expert, personal assistant.

badge · Distillation Engineer

0 of 8 lessons completed

1
LoRA, visualized
How LoRA freezes pretrained weights and trains low-rank matrices BA — interactive rank slider shows how adapter capacity reshapes the model
11 min
55 xp
2
QLoRA and the NF4 grid
How QLoRA combines 4-bit NormalFloat quantization with LoRA adapters — the NF4 data type, double quantization, and paged optimizers
10 min
50 xp
3
DPO as KL-constrained optimum
From the Bradley-Terry preference model to the KL-constrained optimum — a visual derivation of the DPO loss function
12 min
60 xp
4
GRPO and RLVR
Group-relative advantages without a critic network — the RL algorithm behind DeepSeek-R1 and reasoning SLMs, visually explained
11 min
55 xp
5
Teaching an SLM to call tools
Why base SLMs fail at JSON tool calls and how SFT + DPO against bad calls gets xLAM-style models from 10% to 79% on BFCL
14 min
55 xp
6
Three recipes
Three end-to-end fine-tuning recipes — an xLAM-style tool-calling SLM, a domain specialist, and a personal assistant with your voice
12 min
60 xp
7
Fine-tuning frameworks
Unsloth, Axolotl, LLaMA-Factory, TRL, torchtune, and MLX-LM compared — benchmarks, stack diagrams, and a decision tree to pick yours
14 min
60 xp
8
Model merging: task vectors, TIES, DARE
Combine multiple fine-tunes without retraining — task vectors as composable deltas, TIES's trim-and-elect, DARE's random drop+rescale
14 min
55 xp