Skip to content

Advanced Topics

Deep dives into the building blocks of large-scale, production-grade distributed systems — essential for Senior and Staff-level interviews.


Why Advanced Topics?

Once you've mastered the fundamentals, interviewers at Senior/Staff level expect you to reason about the systems behind the systems. These aren't just theoretical — every modern tech company runs on message queues, search engines, data pipelines, and microservice architectures.


What's Covered

Topic Description Difficulty
Message Queues & Stream Processing Kafka, RabbitMQ, Flink, event-driven architecture ⭐⭐⭐ Advanced
Search Systems Inverted indexes, Elasticsearch, ranking, autocomplete ⭐⭐⭐ Advanced
Data Warehousing & Data Lakes ETL, star schema, Hadoop, Spark, lakehouse ⭐⭐⭐ Advanced
Microservices Architecture Service discovery, API gateways, Docker, Kubernetes ⭐⭐⭐ Advanced
Consistency Patterns Strong, eventual, causal consistency, CRDTs, sagas ⭐⭐⭐⭐ Expert
Object Storage & CDN S3-style storage, edge caching, presigned URLs, streaming ⭐⭐⭐ Advanced
Distributed Locking Redlock, fencing tokens, ZooKeeper, lease-based locks ⭐⭐⭐ Advanced
Observability Logging, metrics, tracing, OpenTelemetry, alerting ⭐⭐⭐ Advanced
Event Sourcing & CQRS Append-only logs, projections, read/write separation ⭐⭐⭐⭐ Expert

Staff Engineer (L6) Track

Topic Description Difficulty
Consensus Algorithms (Raft/Paxos) Leader election, log replication, safety properties, Paxos vs Raft ⭐⭐⭐⭐ Expert
Distributed Transactions (2PC/Saga) Two-phase commit, Saga orchestration/choreography, transactional outbox ⭐⭐⭐⭐ Expert
Sharding & Partitioning Partition key selection, hot spots, resharding, cross-shard operations ⭐⭐⭐ Advanced
Behavioral & Leadership (L6) STAR method for Staff, conflict resolution, technical vision, Googliness ⭐⭐⭐⭐ Expert

How These Relate to Interviews

When They Ask... You Need...
"How do services communicate asynchronously?" Message Queues & Stream Processing
"How would you implement search/autocomplete?" Search Systems
"How do you handle analytics at scale?" Data Warehousing & Data Lakes
"How do you decompose a monolith?" Microservices Architecture
"How do you keep data consistent across services?" Consistency Patterns
"How do you serve images/video globally?" Object Storage & CDN
"How do you prevent race conditions across services?" Distributed Locking
"How do you monitor and debug distributed systems?" Observability
"How do you maintain a complete audit trail?" Event Sourcing & CQRS
"How does your database stay consistent during a leader failover?" Consensus Algorithms
"How do you keep order + payment + inventory consistent?" Distributed Transactions
"How do you scale this to billions of rows?" Sharding & Partitioning
"Tell me about a time you resolved a technical disagreement" Behavioral & Leadership

Prerequisites

You should be comfortable with the Fundamentals before tackling these topics, especially:

  • Distributed Systems — CAP theorem, consensus protocols
  • Databases — SQL vs NoSQL, replication, sharding
  • Networking — HTTP, gRPC, WebSockets
  • Scalability & Reliability — Horizontal scaling, circuit breakers

Tip

At Senior/Staff level, interviewers care less about knowing the "right answer" and more about your ability to reason through trade-offs, articulate why you'd choose one approach over another, and identify failure modes before they become production incidents.


Quick Reference Card

MESSAGE QUEUES & STREAMING
├── Kafka         → Distributed log, replay, partitions, consumer groups
├── RabbitMQ      → AMQP broker, routing, dead-letter queues
├── Flink         → Stateful stream processing, event time, windows
└── Patterns      → Event sourcing, CQRS, saga

SEARCH SYSTEMS
├── Inverted Index → Term → [doc1, doc5, doc9] mapping
├── Elasticsearch  → Distributed search, shards, relevance scoring
├── Ranking        → TF-IDF, BM25, learning-to-rank
└── Autocomplete   → Trie, prefix matching, popularity weighting

DATA WAREHOUSING & DATA LAKES
├── ETL/ELT       → Extract, Transform, Load pipelines
├── Star Schema   → Fact + dimension tables, denormalized
├── Data Lake     → Raw storage (S3/HDFS), schema-on-read
└── Lakehouse     → Best of warehouse + lake (Delta, Iceberg)

MICROSERVICES
├── Discovery     → Consul, Eureka, DNS-based
├── API Gateway   → Routing, auth, rate limiting, aggregation
├── Resilience    → Circuit breaker, bulkhead, retry, timeout
└── Orchestration → Docker containers, Kubernetes pods/services

CONSISTENCY PATTERNS
├── Strong        → Linearizability, Raft/Paxos, 2PC
├── Eventual      → Async replication, conflict resolution
├── Causal        → Vector clocks, session guarantees
├── CRDTs         → Conflict-free replicated data types
└── Sagas         → Distributed transactions via compensation

OBJECT STORAGE & CDN
├── Object store  → Buckets, keys, multipart, lifecycle tiers
├── CDN           → Edge cache, TTL, purge, origin pull
├── Access        → Presigned URLs, OAC, bucket policies, CORS
└── Patterns      → Static assets, HLS/DASH, API cache at edge

DISTRIBUTED LOCKING
├── Redis         → SET NX PX, Redlock (multi-node)
├── ZooKeeper     → Ephemeral sequential nodes, watches
├── Database      → SELECT FOR UPDATE, advisory locks
├── Fencing       → Monotonic tokens prevent stale locks
└── etcd          → Lease-based, compare-and-swap

OBSERVABILITY
├── Logging       → Structured (JSON), ELK, correlation IDs
├── Metrics       → RED (services), USE (resources), Prometheus
├── Tracing       → OpenTelemetry, Jaeger, span context propagation
└── Alerting      → Symptom-based, error budgets, runbooks

EVENT SOURCING & CQRS
├── Events        → Immutable, append-only, versioned schema
├── Projections   → Materialized views, async rebuild
├── CQRS          → Separate read/write models, eventual consistency
└── Use cases     → Financial ledger, order lifecycle, audit trail

CONSENSUS ALGORITHMS (Staff L6)
├── Paxos         → Prepare/Accept phases, proposer dueling
├── Raft          → Leader election, log replication, safety
├── Applications  → etcd, ZooKeeper, Spanner, CockroachDB
└── When to use   → Coordination plane, not data plane

DISTRIBUTED TRANSACTIONS (Staff L6)
├── 2PC           → Prepare/Commit phases, blocking problem
├── Sagas         → Orchestration vs choreography, compensation
├── Outbox        → Atomic business write + event, CDC/polling
└── Spanner       → Paxos + 2PC + TrueTime

SHARDING & PARTITIONING (Staff L6)
├── Strategies    → Hash, range, consistent hashing, directory
├── Key selection → Cardinality, query patterns, write distribution
├── Hot spots     → Salting, splitting, caching
├── Resharding    → Logical shards, dual-write migration
└── Cross-shard   → Scatter-gather, denormalization, materialized views