Deep Learning Fundamentals¶
The building blocks that every neural network — including every Transformer and every LLM — is made of. This section exists so you never have to wave your hands when someone asks "but how does a neural network actually learn?"
Designed for Undergraduates
This documentation is written for smart, curious readers who may not have taken linear algebra or probability courses yet. Every topic starts with everyday analogies and "Think of it like..." explanations before introducing any math. The math is then shown with step-by-step worked examples using real numbers. If an equation looks scary, read the "In Plain English" box next to it — we never leave you alone with a formula.
Goals¶
After completing Part 0 you will be able to:
- Explain how a single neuron computes a weighted sum, applies a nonlinearity, and produces output
- Trace forward and backward passes through a multi-layer network with actual numbers
- Compare activation functions (sigmoid, tanh, ReLU, GELU) and explain when each is preferred
- Derive the chain rule on a computational graph and connect it to gradient descent
- Explain cross-entropy loss and why it is the standard objective for classification and language modeling
- Describe how convolutions extract spatial features and why CNNs matter for vision encoders in multimodal LLMs
- Articulate the sequence modeling problem and why vanilla feedforward networks cannot solve it
- Draw the abstract encoder-decoder paradigm that underpins seq2seq, Transformers, and modern LLM architectures
Before You Start¶
Prerequisites¶
If any of these look unfamiliar, start with the Math Prerequisites page. It explains every symbol, notation, and mathematical concept used throughout this documentation, with worked numerical examples.
You should be comfortable with: - High school algebra (solving equations, working with variables) - Basic coordinate geometry (points, lines, slopes) - What a function is (input → output mapping)
We'll teach you: - Vector and matrix operations (with step-by-step examples) - Derivatives and the chain rule (intuition-first, not proof-heavy) - Probability basics (what distributions, expectations, and variance mean)
Reading Strategy: Two Passes¶
This documentation is designed to be read in two passes:
First Pass (Build Intuition): - Read the "Why This Matters" and "Core Concepts" sections - Focus on the "In Plain English" callout boxes - Work through the numerical examples - Skip the "Deep Dive" sections on first reading - Goal: Understand what each concept does and why it matters
Second Pass (Deepen Understanding): - Re-read with the "Deep Dive" sections included - Study the code implementations - Attempt the interview questions - Goal: Understand how to implement and when to apply each concept
What to Skip on First Reading
- "Deep Dive" collapsible sections (marked with ??? deep-dive)
- Detailed optimizer derivations (Adam bias correction proofs)
- Advanced regularization theory (KL divergence decompositions)
- Come back to these after you've built intuition from the core concepts.
Topics¶
| # | Topic | What You Will Learn |
|---|---|---|
| 0 | Math Prerequisites | Start here if math notation is unfamiliar — vectors, matrices, derivatives, summation, log/exp, probability basics |
| 1 | The Perceptron and Feedforward Networks | Single neuron, MLP, universal approximation, forward pass |
| 2 | Activation Functions | Sigmoid, tanh, ReLU, GELU, softmax — saturation, dying neurons, when to use each |
| 3 | Backpropagation and Gradient Descent | Chain rule, computational graphs, SGD, momentum, Adam, learning rate schedules |
| 4 | Loss Functions and Regularization | MSE, cross-entropy, L1/L2 penalties, dropout, batch normalization |
| 5 | Convolutional Neural Networks | Convolution operation, pooling, feature hierarchies, LeNet to ResNet, vision encoders |
| 6 | Sequence Modeling and RNNs | Why order matters, vanilla RNN intuition, limitations that motivate LSTM and attention |
| 7 | The Encoder-Decoder Paradigm | Compress-then-generate pattern, information bottleneck, bridge to Transformers |
Hands-On Notebooks¶
Practice what you've learned with interactive Jupyter notebooks that combine toy examples (build from scratch) with real-world library usage (PyTorch, HuggingFace):
| Notebook | Covers Topics |
|---|---|
| Math & Neural Network Basics | Math Prerequisites, Perceptron/FFN, Activation Functions |
| Training Mechanics | Backpropagation, Loss Functions, Regularization |
| CNNs, Sequences & Encoder-Decoders | CNNs, RNNs, Encoder-Decoder |
How this connects to the rest of LLMBase¶
This section gives you the vocabulary and intuition that the Foundations section assumes. Once you are comfortable with backprop, activations, and the idea that a network maps inputs to outputs through differentiable layers, the Foundations section will build on that with language-specific concepts: n-grams, word embeddings, LSTM gate equations, and attention.
Every page follows the same structure:
- "Think of it like..." — an everyday analogy to build intuition
- "In Plain English" — what the math means in words
- The Math — the actual equations, with every symbol explained
- Worked Example — step-by-step calculation with real numbers
- Code — runnable Python so you can verify the math
- Interview Questions — FAANG-level questions with expected answer depth
New to Math Notation?
Start with Math Prerequisites — it has a "Reading Math in AI Papers — A Survival Guide" section that decodes every symbol you'll encounter, plus a comprehensive "Notation You'll See in AI Papers" glossary.