Training and Alignment¶
How models go from random weights to following instructions. Pre-training data pipelines, distributed training at scale, quantization, supervised fine-tuning, reinforcement learning from human feedback, and parameter-efficient adaptation.
Goals¶
After completing this section you will be able to:
- Design a data pipeline for pre-training including deduplication and quality filtering
- Calculate memory budgets for distributed training and choose the right parallelism strategy
- Explain the three-stage RLHF pipeline and contrast it with DPO
- Implement LoRA and QLoRA fine-tuning and explain why low-rank adaptation works
- Describe Constitutional AI and how it scales alignment with AI feedback
Topics¶
| # | Topic | What You Will Learn |
|---|---|---|
| 1 | Pre-training at Scale | Data pipelines, BPE tokenization, compute budgets, scaling laws |
| 2 | Distributed Training | Data/model/pipeline parallelism, FSDP, DeepSpeed ZeRO |
| 3 | Mixed Precision and Quantization | FP16, BF16, INT8, INT4, GPTQ, AWQ, quantization-aware training |
| 4 | Instruction Tuning and SFT | FLAN, InstructGPT, chat templates, SFT data formats |
| 5 | RLHF and DPO | Reward models, PPO, DPO, KL penalty, preference optimization |
| 6 | Constitutional AI | RLAIF, critique-revision, self-alignment at scale |
| 7 | Parameter-Efficient Fine-Tuning | LoRA, QLoRA, adapters, prefix tuning, prompt tuning |
Every page includes plain-English math walkthroughs, worked numerical examples, runnable Python code, and FAANG-level interview questions.