Advanced Topics¶

Deep dives into the building blocks of large-scale, production-grade distributed systems — essential for Senior and Staff-level interviews.

Why Advanced Topics?¶

Once you've mastered the fundamentals, interviewers at Senior/Staff level expect you to reason about the systems behind the systems. These aren't just theoretical — every modern tech company runs on message queues, search engines, data pipelines, and microservice architectures.

What's Covered¶

Topic	Description	Difficulty
Message Queues & Stream Processing	Kafka, RabbitMQ, Flink, event-driven architecture	⭐⭐⭐ Advanced
Search Systems	Inverted indexes, Elasticsearch, ranking, autocomplete	⭐⭐⭐ Advanced
Data Warehousing & Data Lakes	ETL, star schema, Hadoop, Spark, lakehouse	⭐⭐⭐ Advanced
Microservices Architecture	Service discovery, API gateways, Docker, Kubernetes	⭐⭐⭐ Advanced
Consistency Patterns	Strong, eventual, causal consistency, CRDTs, sagas	⭐⭐⭐⭐ Expert
Object Storage & CDN	S3-style storage, edge caching, presigned URLs, streaming	⭐⭐⭐ Advanced
Distributed Locking	Redlock, fencing tokens, ZooKeeper, lease-based locks	⭐⭐⭐ Advanced
Observability	Logging, metrics, tracing, OpenTelemetry, alerting	⭐⭐⭐ Advanced
Event Sourcing & CQRS	Append-only logs, projections, read/write separation	⭐⭐⭐⭐ Expert

Staff Engineer (L6) Track¶

Topic	Description	Difficulty
Consensus Algorithms (Raft/Paxos)	Leader election, log replication, safety properties, Paxos vs Raft	⭐⭐⭐⭐ Expert
Distributed Transactions (2PC/Saga)	Two-phase commit, Saga orchestration/choreography, transactional outbox	⭐⭐⭐⭐ Expert
Sharding & Partitioning	Partition key selection, hot spots, resharding, cross-shard operations	⭐⭐⭐ Advanced
Behavioral & Leadership (L6)	STAR method for Staff, conflict resolution, technical vision, Googliness	⭐⭐⭐⭐ Expert

How These Relate to Interviews¶

When They Ask...	You Need...
"How do services communicate asynchronously?"	Message Queues & Stream Processing
"How would you implement search/autocomplete?"	Search Systems
"How do you handle analytics at scale?"	Data Warehousing & Data Lakes
"How do you decompose a monolith?"	Microservices Architecture
"How do you keep data consistent across services?"	Consistency Patterns
"How do you serve images/video globally?"	Object Storage & CDN
"How do you prevent race conditions across services?"	Distributed Locking
"How do you monitor and debug distributed systems?"	Observability
"How do you maintain a complete audit trail?"	Event Sourcing & CQRS
"How does your database stay consistent during a leader failover?"	Consensus Algorithms
"How do you keep order + payment + inventory consistent?"	Distributed Transactions
"How do you scale this to billions of rows?"	Sharding & Partitioning
"Tell me about a time you resolved a technical disagreement"	Behavioral & Leadership

Prerequisites¶

You should be comfortable with the Fundamentals before tackling these topics, especially:

Distributed Systems — CAP theorem, consensus protocols
Databases — SQL vs NoSQL, replication, sharding
Networking — HTTP, gRPC, WebSockets
Scalability & Reliability — Horizontal scaling, circuit breakers

Tip

At Senior/Staff level, interviewers care less about knowing the "right answer" and more about your ability to reason through trade-offs, articulate why you'd choose one approach over another, and identify failure modes before they become production incidents.

Quick Reference Card¶

MESSAGE QUEUES & STREAMING
├── Kafka         → Distributed log, replay, partitions, consumer groups
├── RabbitMQ      → AMQP broker, routing, dead-letter queues
├── Flink         → Stateful stream processing, event time, windows
└── Patterns      → Event sourcing, CQRS, saga

SEARCH SYSTEMS
├── Inverted Index → Term → [doc1, doc5, doc9] mapping
├── Elasticsearch  → Distributed search, shards, relevance scoring
├── Ranking        → TF-IDF, BM25, learning-to-rank
└── Autocomplete   → Trie, prefix matching, popularity weighting

DATA WAREHOUSING & DATA LAKES
├── ETL/ELT       → Extract, Transform, Load pipelines
├── Star Schema   → Fact + dimension tables, denormalized
├── Data Lake     → Raw storage (S3/HDFS), schema-on-read
└── Lakehouse     → Best of warehouse + lake (Delta, Iceberg)

MICROSERVICES
├── Discovery     → Consul, Eureka, DNS-based
├── API Gateway   → Routing, auth, rate limiting, aggregation
├── Resilience    → Circuit breaker, bulkhead, retry, timeout
└── Orchestration → Docker containers, Kubernetes pods/services

CONSISTENCY PATTERNS
├── Strong        → Linearizability, Raft/Paxos, 2PC
├── Eventual      → Async replication, conflict resolution
├── Causal        → Vector clocks, session guarantees
├── CRDTs         → Conflict-free replicated data types
└── Sagas         → Distributed transactions via compensation

OBJECT STORAGE & CDN
├── Object store  → Buckets, keys, multipart, lifecycle tiers
├── CDN           → Edge cache, TTL, purge, origin pull
├── Access        → Presigned URLs, OAC, bucket policies, CORS
└── Patterns      → Static assets, HLS/DASH, API cache at edge

DISTRIBUTED LOCKING
├── Redis         → SET NX PX, Redlock (multi-node)
├── ZooKeeper     → Ephemeral sequential nodes, watches
├── Database      → SELECT FOR UPDATE, advisory locks
├── Fencing       → Monotonic tokens prevent stale locks
└── etcd          → Lease-based, compare-and-swap

OBSERVABILITY
├── Logging       → Structured (JSON), ELK, correlation IDs
├── Metrics       → RED (services), USE (resources), Prometheus
├── Tracing       → OpenTelemetry, Jaeger, span context propagation
└── Alerting      → Symptom-based, error budgets, runbooks

EVENT SOURCING & CQRS
├── Events        → Immutable, append-only, versioned schema
├── Projections   → Materialized views, async rebuild
├── CQRS          → Separate read/write models, eventual consistency
└── Use cases     → Financial ledger, order lifecycle, audit trail

CONSENSUS ALGORITHMS (Staff L6)
├── Paxos         → Prepare/Accept phases, proposer dueling
├── Raft          → Leader election, log replication, safety
├── Applications  → etcd, ZooKeeper, Spanner, CockroachDB
└── When to use   → Coordination plane, not data plane

DISTRIBUTED TRANSACTIONS (Staff L6)
├── 2PC           → Prepare/Commit phases, blocking problem
├── Sagas         → Orchestration vs choreography, compensation
├── Outbox        → Atomic business write + event, CDC/polling
└── Spanner       → Paxos + 2PC + TrueTime

SHARDING & PARTITIONING (Staff L6)
├── Strategies    → Hash, range, consistent hashing, directory
├── Key selection → Cardinality, query patterns, write distribution
├── Hot spots     → Salting, splitting, caching
├── Resharding    → Logical shards, dual-write migration
└── Cross-shard   → Scatter-gather, denormalization, materialized views