System Design Interview Guide¶
Your comprehensive resource for mastering system design interviews — 75+ topics across software engineering, ML, and GenAI, with step-by-step walkthroughs, architecture diagrams, code examples, and interview transcripts.
Who Is This For?¶
| You Are | Start Here | You'll Learn |
|---|---|---|
| Staff / L6 Engineer targeting Google, Meta | Staff Engineer Track | Multi-region architecture, SLOs, consensus, leadership signals |
| Senior SDE preparing for FAANG | Fundamentals → System Design | Core system design patterns and trade-offs |
| ML/AI Engineer designing production systems | GenAI/ML Fundamentals → ML Design | ML-specific architectures and serving patterns |
| GenAI Engineer building LLM-powered products | GenAI/ML Fundamentals → GenAI Design | LLM serving, RAG, agents, safety at scale |
| Student / Junior Dev wanting to level up | Fundamentals | Foundational concepts with clear explanations |
Recommended Learning Paths¶
Tip
Follow these paths in order. Each builds on the previous. Skip sections you're already confident in.
Path 1: Software System Design (SDE / Senior SDE)¶
Week 1-2: Fundamentals
Interview Framework → Estimation → Networking → Databases → Caching → Load Balancing
Week 3-4: Core Design Problems
URL Shortener → Rate Limiter → Key-Value Store → Distributed Cache → Chat System
Week 5-6: Complex Systems
News Feed → Video Streaming → Collaborative Editor → Payment System → Message Queue
Week 7-8: Advanced Topics + Practice
Consensus → Distributed Transactions → Sharding → Event Sourcing → Mock interviews
Path 2: ML System Design (ML Engineer)¶
Week 1-2: Software Fundamentals (abbreviated)
Databases → Caching → Distributed Systems → Message Queues
Week 3-4: ML Fundamentals
Model Serving → Feature Stores → Data Pipelines → Distributed Training → LLM Evaluation → RLHF & Alignment
Week 5-6: ML Design Problems
Recommendation System → Fraud Detection → Search Ranking → Ads Ranking → Machine Translation
Week 7-8: Practice
Image Search → Real-time Personalization → Speech Recognition → Feature Platform → Mock interviews
Path 3: GenAI System Design (GenAI / LLM Engineer)¶
Week 1-2: Foundations
Distributed Systems → ML Fundamentals (Model Serving, Feature Stores)
LLM Systems (RAG, fine-tuning, vector DBs) → Distributed Training → LLM Evaluation → RLHF & Alignment
Week 3-4: GenAI Design Problems
LLM Chatbot → Enterprise RAG → Document Q&A → AI Code Assistant → LLM Gateway
Week 5-6: Advanced GenAI
AI Agent System → Content Moderation → Hallucination Detection → Text-to-Image → Multi-Modal Search
Week 7-8: Infrastructure + LLM Ops + Practice
ML Training Platform → Fine-Tuning Platform → LLM Evaluation Pipeline → Prompt Management
Mock interviews with interview transcripts
Path 4: Staff Engineer (L6) — All Domains¶
Week 1-2: Master Fundamentals + Advanced
All Basics → Consensus → Distributed Transactions → Sharding → Observability
Week 3-4: Priority Designs (deep)
Key-Value Store → Rate Limiter → Collaborative Editor → Task Scheduler → Payment System
Week 5-6: ML/GenAI Depth
LLM Chatbot → Enterprise RAG → ML Training Platform → Ads Ranking
Week 7-8: Leadership + Practice
Staff Engineer Guide → Behavioral & Leadership → Read interview transcripts → Mock interviews
What's Inside¶
Fundamentals¶
Start here. The building blocks that appear in every system design interview.
- Interview Framework - How to approach any system design question
- Estimation & Planning - Back-of-the-envelope calculations
- Networking - TCP/UDP, HTTP, DNS, WebSockets
- Databases - SQL vs NoSQL, ACID, CAP theorem
- Caching - Speed up reads with in-memory storage
- Load Balancing - Distribute traffic across servers
- API Design - REST, GraphQL, versioning, authentication
- Concurrency - Threads, locks, async patterns
- Security - Encryption, hashing, TLS, common vulnerabilities
- Scalability & Reliability - Scaling, availability, disaster recovery
- Distributed Systems - CAP, consensus, message queues, DHTs
Advanced Topics¶
Deep dives for Senior and Staff-level interviews.
- Message Queues & Streaming - Kafka, RabbitMQ, Flink, event-driven patterns
- Search Systems - Inverted indexes, Elasticsearch, BM25
- Consistency Patterns - CRDTs, sagas, transactional outbox, quorum
- Microservices Architecture - Service discovery, API gateways, Kubernetes
- Data Warehousing & Lakes - ETL, star schema, Spark, lakehouse
- Object Storage & CDN - S3, blob storage, edge caching
- Distributed Locking - Redlock, fencing tokens, ZooKeeper
- Observability - Logging, metrics, tracing, OpenTelemetry
- Event Sourcing & CQRS - Append-only logs, projections
- Consensus Algorithms - Raft, Paxos, leader election
- Distributed Transactions - 2PC, Saga, transactional outbox
- Sharding & Partitioning - Partition keys, hot spots, resharding
- Behavioral & Leadership (L6) - STAR for Staff, conflict resolution
Staff Engineer (L6) Interview Track¶
Targeting Google, Meta, or other top companies at the Staff / Principal / L6 level?
- Staff Engineer Interview Guide - L5 vs L6 expectations, 5 pillars, anti-patterns
- Priority designs: Key-Value Store, Rate Limiter, Collaborative Editor, Task Scheduler, Notification System
- Advanced foundations: Consensus, Distributed Transactions, Sharding
- Leadership round: Behavioral & Leadership Guide
System Design Examples¶
Step-by-step walkthroughs of classic interview questions — 28 designs.
Infrastructure & Data:
- URL Shortener - Hashing, Base62, distributed IDs
- Rate Limiter - Token bucket, sliding window, Redis
- Key-Value Store - Consistent hashing, quorum, vector clocks
- Distributed Cache - LRU, hot keys, stampede mitigation
- Distributed Message Queue - Append-only log, partitions, consumer groups
- Task Scheduler - Priority queues, lease-based execution
- Metrics & Monitoring - Time-series storage, alerting, Gorilla compression
Communication & Social:
- Chat System - WebSockets, message ordering, presence
- Notification System - Multi-channel, push vs pull
- News Feed / Timeline - Fan-out strategies, ranking
- Voting System - Consistency, duplicate prevention
- Email Delivery System - SMTP, DKIM/SPF, IP reputation, deliverability
Media & Content:
- Video Streaming (YouTube) - CDN, transcoding, adaptive bitrate
- Photo Sharing (Instagram) - Object storage, feed, stories
- Cloud Storage (Google Drive) - File sync, chunking, dedup
Real-time & Geo:
- Collaborative Editor (Google Docs) - OT/CRDTs, conflict resolution
- Ride Sharing (Uber/Lyft) - Geospatial matching, tracking
- Proximity Service - Geohash, quadtree
Commerce:
- Event Booking (Ticketmaster) - Inventory locking, flash crowds
- Payment System - Idempotency, double-entry ledger
Data Infrastructure:
- Distributed File System (GFS) - Master-chunk architecture, replication, leases
- Ad Click Aggregator - Real-time aggregation, exactly-once, Flink/Kafka
Search:
- Web Crawler - Concurrency, politeness, dedup
- Search Autocomplete - Trie, ranking, caching
GenAI/ML Fundamentals¶
Core building blocks — master these before ML and GenAI design questions. 7 topics.
- Model Serving - Inference APIs, versioning, A/B testing, drift detection
- Feature Stores - Train-serve consistency, point-in-time joins, Feast
- Data Pipelines for ML - Ingestion, validation, Airflow orchestration
- Large Language Models - RAG, prompt engineering, fine-tuning, vector DBs
- Distributed Training - Data/model/pipeline parallelism, DeepSpeed, ZeRO
- LLM Evaluation - BLEU/ROUGE/BERTScore, LLM-as-judge, benchmarks, RAGAS
- RLHF & Alignment - PPO, DPO, Constitutional AI, safety alignment
ML System Design¶
Production ML systems — 10 designs covering ranking, retrieval, personalization, NLP, and feature infrastructure.
- Recommendation System - Collaborative filtering, Two-Tower, cold start
- Fraud Detection - Real-time ML, class imbalance, velocity features
- Image Search - CLIP embeddings, FAISS, ANN indexes
- Image Caption Generator - Encoder-decoder, attention, Triton
- Search Ranking - BM25, LambdaMART, NDCG, retrieval + ranking
- Real-time Personalization - Session models, contextual bandits
- Ads Ranking System - CTR prediction, auction mechanics, budget pacing
- Real-time Feature Platform - Streaming features, PIT joins, train-serve consistency
- Machine Translation - Transformer, multilingual NMT, quality estimation, low-resource
- Speech Recognition - CTC/RNN-T, streaming ASR, speaker diarization, Whisper
GenAI System Design¶
Production GenAI systems — 15 designs with Google-style interview transcripts.
- LLM-Powered Chatbot - KV-cache, PagedAttention, streaming, safety
- Enterprise RAG System - Chunking, hybrid retrieval, ACLs, citations
- Document Q&A System - PDF parsing, 10K+ docs, cross-encoder re-ranking, citations
- AI Code Assistant - FIM, speculative decoding, repo context
- LLM Content Moderation - Cascade architecture, adversarial robustness
- Hallucination Detection - Claim extraction, NLI verification, confidence scoring
- ML Training Platform - Gang scheduling, checkpointing, GPU clusters
- LLM Fine-Tuning Platform - LoRA/QLoRA, private data, differential privacy, blue-green deploy
- Multi-Modal Search - CLIP embeddings, cross-modal retrieval
- AI Agent System - ReAct, tool use, planning, memory, multi-agent
- LLM Gateway - Multi-model routing, semantic caching, cost control
- LLM Evaluation Pipeline - LLM-as-judge, Elo rating, benchmarks, A/B testing
- Prompt Management & Versioning - Prompt registry, templating, A/B testing, environment promotion
- Text-to-Image Generation - Diffusion models, latent space, safety, CFG
- Vector Database - HNSW, IVF-PQ, hybrid search, billion-scale ANN
How to Use This Guide¶
If You Have 1 Week¶
- Read Interview Framework — know the 4-step approach
- Study URL Shortener and Rate Limiter
- For ML roles: add Recommendation System
- For GenAI roles: add LLM Chatbot — read the interview transcript
- Practice explaining designs out loud
If You Have 1 Month¶
Follow the Learning Path for your target role above.
During Your Interview¶
- Clarify requirements (5 min) — Don't assume!
- High-level design (10 min) — Draw the big picture
- Deep dive (15 min) — Focus on 2-3 components
- Wrap up (5 min) — Discuss trade-offs and improvements
Key Tips¶
Note
Always ask clarifying questions. The interviewer wants to see how you think, not just what you know.
Tip
Draw diagrams. A picture is worth a thousand words in system design.
Warning
Don't jump to solutions. Understand the problem before proposing architecture.
Content Overview¶
| Section | Topics | Difficulty |
|---|---|---|
| Fundamentals | 11 essential topics | Beginner-Advanced |
| Advanced Topics | 13 deep-dive topics (incl. L6 track) | Advanced-Expert |
| System Design Examples | 28 classic problems (incl. Staff Guide) | Intermediate-Hard |
| GenAI/ML Fundamentals | 7 ML/GenAI building blocks | Medium-Hard |
| ML System Design | 10 ML systems | Hard |
| GenAI System Design | 15 GenAI systems (with interview transcripts) | Very Hard |
| Total | 84 |
Contributing¶
Found an error? Want to add a new design? Contributions are welcome! Open an issue or submit a pull request on GitHub.