Training and Alignment¶

How models go from random weights to following instructions. Pre-training data pipelines, distributed training at scale, quantization, supervised fine-tuning, reinforcement learning from human feedback, and parameter-efficient adaptation.

Goals¶

After completing this section you will be able to:

Design a data pipeline for pre-training including deduplication and quality filtering
Calculate memory budgets for distributed training and choose the right parallelism strategy
Explain the three-stage RLHF pipeline and contrast it with DPO
Implement LoRA and QLoRA fine-tuning and explain why low-rank adaptation works
Describe Constitutional AI and how it scales alignment with AI feedback

Topics¶

#	Topic	What You Will Learn
1	Pre-training at Scale	Data pipelines, BPE tokenization, compute budgets, scaling laws
2	Distributed Training	Data/model/pipeline parallelism, FSDP, DeepSpeed ZeRO
3	Mixed Precision and Quantization	FP16, BF16, INT8, INT4, GPTQ, AWQ, quantization-aware training
4	Instruction Tuning and SFT	FLAN, InstructGPT, chat templates, SFT data formats
5	RLHF and DPO	Reward models, PPO, DPO, KL penalty, preference optimization
6	Constitutional AI	RLAIF, critique-revision, self-alignment at scale
7	Parameter-Efficient Fine-Tuning	LoRA, QLoRA, adapters, prefix tuning, prompt tuning

Every page includes plain-English math walkthroughs, worked numerical examples, runnable Python code, and FAANG-level interview questions.