Skip to content

Design a Distributed Cache (Redis / Memcached–style)


What We're Building

A distributed in-memory cache provides sub-millisecond reads and millions of operations per second by keeping hot data in RAM across many nodes. Architecturally it resembles Redis (rich data types, optional persistence, replication) or Memcached (simple key–value, multithreaded, slab allocator).

Capability Why it matters
In-memory storage RAM latency is orders of magnitude lower than disk
Horizontal scale Add nodes to increase capacity and throughput
Eviction & TTL Bounded memory; stale data expires
Replication Durability of reads; failover

Real-world: Redis commonly achieves 1M+ ops/sec per node on modern hardware (workload-dependent). It is used at scale by Twitter (timelines/counters), GitHub (resque/sessions), Snapchat (story ranking, presence), and thousands of other services for caching, rate limiting, pub/sub, and lightweight data structures.

Note

In interviews, clarify whether the interviewer wants a single-node in-process cache, a clustered cache with sharding, or full Redis feature parity. This guide assumes a clustered, production-style design unless scoped down.


Step 1: Requirements Clarification

Questions to Ask

Question Why it matters
Read vs write ratio? Drives caching strategy and consistency model
Strong consistency required? Usually eventual; linearizable cache is expensive
Max memory per key/value? Affects allocator, eviction, and “big key” risk
Multi-tenancy / isolation? Namespaces, quotas, noisy-neighbor limits
Durability requirements? Pure cache vs snapshot/AOF-style persistence
Geo-distribution? Active-active vs primary region + replicas
Exactly-once semantics? Caches are typically at-most-once; idempotent writes
Expected peak QPS and data size? Drives sharding, replication factor, NIC sizing

Functional Requirements

Requirement Priority Notes
GET by key Must have Core read path
PUT/SET key–value Must have Optional TTL / EX
DELETE key Must have Invalidation
TTL / expiration Must have Lazy + optional active expiry
Eviction policies (LRU, LFU, TTL-first) Must have When memory is full
Data types (string, list, set, hash, sorted set) Nice to have Redis-like; Memcached is string-only
Pub/Sub Nice to have Invalidation, real-time features
CAS / conditional writes Nice to have Optimistic concurrency
Clustering (add/remove nodes) Must have For “distributed” scope

Non-Functional Requirements

Requirement Target Rationale
Read latency (p99) < 1 ms In-memory + local network
Availability 99.99% Replication + failover; no single point of failure
Throughput Millions of ops/sec cluster-wide Horizontal scale
Durability Configurable RDB/AOF-style vs pure cache
Consistency Eventual (typical) Cross-partition linearizability is costly
Horizontal scalability Add shards for capacity Consistent hashing or slot-based routing

API Design (REST or binary protocol)

Illustrative commands aligned with Redis-style semantics:

Operation Syntax Behavior
GET GET key Return value or miss
SET SET key value [EX seconds] Store; optional TTL
DEL DEL key Remove key
EXPIRE EXPIRE key seconds Set/update TTL
Optional SET key value NX Set only if not exists
Optional GETSET key value Atomic read–write

Tip

Mention idempotency for DEL and for SET with same value; not idempotent in the strict sense for INCR-style ops unless designed carefully.

Technology Selection & Tradeoffs

Caches are not interchangeable: pick the engine, eviction, persistence, topology, and serialization to match latency, operational cost, and consistency expectations. Below is interview-ready comparison material you can narrate while drawing a box-and-arrow diagram.

Cache engine

Engine Strengths Weaknesses Typical fit
Redis Rich types (hash, zset, stream), replication, cluster, optional persistence, Lua scripting, mature ecosystem Single-threaded command execution per shard (CPU-bound hot keys), operational surface area Default for most “Redis-style” interviews; general-purpose distributed cache + coordination
Memcached Simple multithreaded model, predictable slab allocator, very fast for plain GET/SET No replication in core OSS (use client-side redundancy or vendor builds), no persistence story, strings + opaque blobs only Massive simple KV, session caches, large-scale GET-heavy workloads with minimal features
Hazelcast JVM-native embedded/server, CP data structures option, WAN replication in enterprise JVM heap/GC tuning, heavier footprint than C daemons Java-centric stacks needing embedded cache + distributed data structures in one product
Apache Ignite Memory-centric compute + SQL + persistence (not “just a cache”), distributed transactions (configurable) Heavier cluster; overlaps with databases — easy to blur “cache vs system of record” Analytics/near-cache with compute colocated with data; less common as a pure Memcached replacement

Note

In interviews, name the workload: plain KV at huge QPS → Memcached is credible; invalidation, structures, TTL semantics, cluster → Redis is the usual answer.

Eviction policy (when memory is full)

Policy Strengths Weaknesses When it wins
LRU Strong recency bias; O(1) classic implementations; good for “recently touched” working sets One-off scans can pollute; not ideal if frequency matters more than recency General web/session traffic, skewed hot sets
LFU Protects frequently used keys from one-off bursts “Forever hot” keys can block cold-but-important data; needs aging (e.g. W-TinyLFU) in practice Catalogs, feature flags, reference data with stable popularity
FIFO Simple queue; predictable Ignores access pattern; often poor hit rate vs LRU/LFU Rarely ideal; sometimes used with bounded queues of jobs
ARC Adapts between LRU-like and frequency-like behavior; resilient to scanning More complex; higher metadata cost Mixed workloads with both scans and hot working sets (file cache lineage)
Random O(1), minimal metadata Worst hit-rate predictability Stress tests, or when eviction is almost never hit

Tip

Tie eviction to access pattern: “burst then idle” → LRU; “steady popularity” → LFU or W-TinyLFU (Redis 4+ optional eviction families).

Persistence

Mode Strengths Weaknesses Interview takeaway
None Fastest writes; simplest ops; true “cache” semantics Process loss = cold cache; RPO = “since last fill” Pure cache tier; durability = replicas + source of truth
RDB (snapshot) Compact; fast restarts if snapshot fresh; good for backups Data loss since last snapshot; fork cost on large heaps Periodic checkpoints; acceptable RPO in minutes
AOF (append log) Finer granularity; replay for durability story Larger disk use; replay time; fsync policy drives latency When you need smaller RPO than RDB alone
Hybrid (RDB + AOF) Balance replay time + granularity Operational complexity (rewrite, compaction) “We want fast restarts and bounded loss window”

Cluster topology

Topology Strengths Weaknesses When to use
Client-side sharding (library computes shard from key) No proxy hop; flexible algorithms Every client must implement routing, failover, migration; version skew pain Uniform smart clients; smaller clusters
Redis Cluster (slot map, server-assisted) First-class MOVED/ASK, resharding story; widely understood Clients must be cluster-aware; ops complexity Default answer for Redis-native horizontal scale
Proxy (Twemproxy, Envoy Redis, etc.) Simpler app config; central routing; optional multi-tenancy Extra hop + failure domain; proxy capacity becomes bottleneck Many languages/clients; need consistent connection pooling at proxy

Warning

Twemproxy is historically popular but static shard maps are painful during migrations — contrast with cluster-aware clients or proxies that track slot maps dynamically.

Serialization

Format Strengths Weaknesses Typical use
JSON Human-readable; ubiquitous Larger payloads; slower parse; float/text ambiguity Config, small objects, debuggability-first
Protobuf Compact; fast; schema evolution Needs .proto and codegen; binary Service boundaries, large structured values
MessagePack Compact JSON-like binary; schemaless Less human-readable than JSON; fewer guarantees than Protobuf Mixed services, dynamic maps
Custom binary Minimum bytes and parse cost Fragile without versioning; cross-language pain Extreme performance, fixed layouts

Our choice (typical interview narrative): Redis as the cache engine for data structures + TTL + cluster; W-TinyLFU or LRU eviction depending on workload (state which); no persistence or RDB for a cache tier unless the interviewer mandates durability; Redis Cluster or a dynamic proxy if many polyglot clients need simpler routing; Protobuf (or MessagePack) for large structured values, JSON for small human-tunable blobs. Rationale: optimize p99 latency and operational familiarity while keeping the cache replaceable (source of truth remains the database).

CAP Theorem Analysis

CAP (Consistency, Availability, Partition tolerance) forces honesty: in a network partition, you cannot have both linearizable reads/writes and full availability for every operation. Real caches almost always choose AP-ish behavior for reads (serve possibly stale data) while tuning writes and control-plane behavior separately.

Dimension Typical distributed cache stance Why
Consistency Eventual between replicas and vs origin DB Async replication and cache-aside mean no single global linearization without extra protocol
Availability High for reads on reachable replicas Cache exists to absorb load; failing closed on every uncertainty would offload pain to the DB
Partition tolerance Required Datacenters and racks partition; system must continue with best-effort semantics

Per-operation behavior (what to say in the interview):

Operation Partition scenario CP-leaning behavior AP-leaning behavior (common)
Reads Client cannot reach primary but can reach a replica Return error or stale-if-error policy Serve possibly stale replica data; prefer timeouts + fallback over hard fail
Writes Split between primaries or quorum loss Reject writes until safe leader Accept write on one sidedivergent replicas (dangerous — usually avoided without merge)
Invalidation Pub/sub or RPC to cache nodes fails Strict: block until invalidation succeeds (hurts availability) Best-effort: accept stale until TTL or eventual repair
Cluster membership / routing Gossip partitions Split views; some clients route wrong Eventual convergence; MOVED/ASK or rediscovery; risk of stale routes briefly

Note

“CP vs AP” for caches is often per request class: user-facing reads may be AP (stale OK for 30s), while financial balances might bypass cache or use version checks — say explicitly if the interviewer allows tiered consistency.

flowchart TB subgraph Partitioned["Network partition"] C1[Clients / AZ A] N1[Shard primary A] N2[Replica B - possibly stale] end C1 -->|read| N2 C1 -->|ideal write| N1 N1 -.->|replication blocked| N2 C1 -->|"choice: fail vs stale"| X{Serve stale read?} X -->|"AP: yes"| R[Return replica value + optional version] X -->|"CP: no"| E[Error / retry / redirect]

Interview punchline: Caches bias toward availability and partition tolerance for reads; correctness comes from TTL, invalidation, versioning, and application-level invariants, not from pretending the cache is a strongly consistent store.

Data Model & Schema

Caches are schema-light but not schema-free: bad key design causes hot shards, inefficient invalidation, and unbounded memory.

Logical model

Concept Description
Key Opaque identifier; often hierarchical (tenant:entity:id:field)
Value Blob or structured payload; size affects network, eviction, replication
Metadata TTL (absolute or sliding), version/CAS token, logical size, optional content-type
Namespacing Per-tenant or per-service prefix to avoid collisions and to enable SCAN-style maintenance
Cluster state Slot or shard ID, primary/replica role, epoch/config version — not stored per key but drives routing

Value shapes (logical):

Pattern Contents Trade-off
Opaque blob Serialized domain object Simple; whole-value refresh on any field change
Hash fields Column-like subkeys Partial updates; smaller invalidation granularity
Aggregates Precomputed denormalized read models Fast reads; write amplification on upstream changes
IDs + pointers Cache stores IDs, client hydrates Lower memory; extra round trips

Metadata fields (typical)

Field Purpose
ttl_seconds or expire_at Bound staleness; drive lazy expiry
version / etag Optimistic concurrency; detect stale writes
byte_size Enforce max value limits; observability
compression flag Alternate encoding for large values

Redis-specific key conventions

Convention Example Reason
Colon hierarchy user:123:profile Human-readable; prefix scanning (SCAN user:123:*)
Hash tags (cluster) foo{user123}:orders Forces same slot for related keys (Redis Cluster)
Version suffix config:v3:feature_x Blue/green config rollouts
Negative cache user:404:999 with short TTL Avoid repeated DB misses (document carefully to avoid poisoning)

Tip

State maximum key and value sizes you would enforce (e.g. KBs for keys, MB cap per value with review) to prevent big key incidents.

Cluster state (what lives in “control plane” memory): slot → node map, replica pointers, config epoch, migration state — clients cache this and refresh on MOVED / periodic CLUSTER SLOTS-style discovery.


Step 2: Back-of-Envelope Estimation

Assumptions

- Cluster aggregate: 10M requests/second (read + write combined)
- Total cache capacity: 1 TB across all nodes
- Average key size: 50 bytes
- Average value size: 1 KB
- Replication factor: 3 (for HA; logical data size × 3 in RAM for full copies — clarify “1 TB user-visible” vs “1 TB raw RAM”)

Below we treat 1 TB as logical cached data (one copy); with 3× replication, raw RAM ≈ 3 TB for full replicas. Adjust in interview if the interviewer says “1 TB total RAM.”

Per-Entry Size

Per key-value (metadata overhead ~32 bytes illustrative):
  key:     50 B
  value:   1024 B
  overhead: 32 B
  ─────────────
  ~1.1 KB per entry average

Entries that fit in 1 TB logical:
  1 TB / 1.1 KB ≈ 953 million entries (order-of-magnitude: ~1B keys)

Nodes and Memory

Assume 64 GB RAM per cache server (common cloud shape); usable for data ~80% after OS, JVM/runtime, replication buffers → ~51 GB net per node.

Logical 1 TB / 51 GB ≈ 20 nodes (single copy narrative)

With 3× replication of all data:
  3 TB / 51 GB ≈ 60 data-bearing processes (or 20 machines × 3 replicas spread across racks/AZs)

Interview answer: “Roughly 20 shards for 1 TB logical at ~50 GB effective per node; triple that replica footprint if we store three full copies. Alternatively, use primary + async replicas with partial read scaling.”

Network Bandwidth

10M ops/sec × (50 + 1024) bytes ≈ 10.7 GB/s aggregate cluster egress+ingress for raw payloads
Per node (20 nodes): ~535 MB/s — requires 10 Gbps NICs and careful placement

Add protocol overhead (~10–30%), replication traffic, and gossip: plan 25–40% headroom.

Warning

Hot keys can saturate a single node’s NIC or CPU long before the cluster average looks unhealthy. Always mention hot-key mitigation in estimation follow-up.


Step 3: High-Level Design

Architecture (Mermaid)

flowchart LR subgraph Clients A[App Servers] end subgraph ClientLib["Client Library"] B[Consistent Hash / Slot Map] C[Connection Pool] end subgraph Cluster["Cache Cluster"] N1[Node 1] N2[Node 2] N3[Node N] end subgraph Persistence["Persistence Layer"] P1[(RDB Snapshot)] P2[(Append Log AOF)] end A --> B B --> C C --> N1 C --> N2 C --> N3 N1 --> P1 N2 --> P2 N3 --> P2
  • Client library computes partition (hash ring or slot) and talks to the right node(s).
  • Replication (leader/follower or chained) sits inside each partition’s story; diagram simplified.
  • Persistence is optional per process; many deployments use replicas as durability and async disk for point-in-time recovery.

Cache Population Patterns

Pattern Flow Best when
Cache-aside App reads cache → on miss, read DB → populate cache Default; app controls consistency
Read-through Cache misses trigger loader callback Centralized loading logic
Write-through Write to cache + DB synchronously Stronger consistency with DB
Write-behind Write to cache first; async flush to DB High write throughput; risk if cache dies

Deep dive with code appears in §4.5.

Note

Redis/Memcached are typically cache-aside or read-through from the application’s perspective; write-through/write-behind are application patterns using the cache, not properties of the server alone.


Step 4: Deep Dive

4.1 Hash Table & Memory Management

In-memory hash table: Open addressing or chaining; production systems optimize for cache-line locality, pointer overhead, and concurrency (sharding, lock-free where proven).

Memory allocation:

Approach Pros Cons
malloc/free per object Simple Fragmentation, slow
Slab allocation (Memcached) Fixed sizes, low fragmentation Internal fragmentation
jemalloc/tcmalloc Good general-purpose Still tune slab classes for tiny objects
Arena per thread / epoch Fast bump alloc Complex lifecycle
# LRU with OrderedDict
from __future__ import annotations

from collections import OrderedDict
from threading import RLock
from typing import Optional, Tuple
import time


class LRUCache:
    """O(1) LRU via OrderedDict (Python 3.7+ dict is ordered; we keep explicit LRU API)."""

    def __init__(self, capacity: int) -> None:
        self._cap = capacity
        self._data: OrderedDict[str, Tuple[float, bytes]] = OrderedDict()
        self._lock = RLock()

    def get(self, key: str) -> Optional[bytes]:
        with self._lock:
            if key not in self._data:
                return None
            exp, val = self._data[key]
            if exp and time.time() > exp:
                del self._data[key]
                return None
            self._data.move_to_end(key)
            return val

    def set(self, key: str, value: bytes, ttl_sec: Optional[float] = None) -> None:
        exp = time.time() + ttl_sec if ttl_sec else 0.0
        with self._lock:
            if key in self._data:
                del self._data[key]
            self._data[key] = (exp, value)
            self._data.move_to_end(key)
            while len(self._data) > self._cap:
                self._data.popitem(last=False)
// ConcurrentHashMap-based cache
import java.util.concurrent.ConcurrentHashMap;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicLong;

/** Illustrative thread-safe string cache with TTL checked on read (lazy expiry). */
public final class ConcurrentStringCache {
    private static final class Entry {
        final byte[] value;
        final long expireAtNanos; // Long.MAX_VALUE if no expiry

        Entry(byte[] value, long expireAtNanos) {
            this.value = value;
            this.expireAtNanos = expireAtNanos;
        }

        boolean expired() {
            return expireAtNanos != Long.MAX_VALUE
                    && System.nanoTime() > expireAtNanos;
        }
    }

    private final ConcurrentHashMap<String, Entry> map = new ConcurrentHashMap<>();
    private final AtomicLong approximateBytes = new AtomicLong(0L);

    public Optional<byte[]> get(String key) {
        Entry e = map.get(key);
        if (e == null) return Optional.empty();
        if (e.expired()) {
            map.remove(key, e);
            return Optional.empty();
        }
        return Optional.of(e.value);
    }

    public void set(String key, byte[] value, Optional<Long> ttlMillis) {
        long exp = ttlMillis
                .map(ms -> System.nanoTime() + ms * 1_000_000L)
                .orElse(Long.MAX_VALUE);
        Entry neo = new Entry(value, exp);
        Entry old = map.put(key, neo);
        long delta = value.length - (old == null ? 0 : old.value.length);
        approximateBytes.addAndGet(delta);
    }

    public void delete(String key) {
        Entry old = map.remove(key);
        if (old != null) {
            approximateBytes.addAndGet(-old.value.length);
        }
    }
}
// Sharded maps with RWMutex
package cache

import (
    "hash/fnv"
    "sync"
    "time"
)

type entry struct {
    val     []byte
    deadline time.Time // zero means no TTL
}

type shard struct {
    mu sync.RWMutex
    m  map[string]entry
}

type ShardedCache struct {
    shards []*shard
}

func NewShardedCache(shardCount int) *ShardedCache {
    s := make([]*shard, shardCount)
    for i := range s {
        s[i] = &shard{m: make(map[string]entry)}
    }
    return &ShardedCache{shards: s}
}

func (c *ShardedCache) shardFor(key string) *shard {
    h := fnv.New32a()
    _, _ = h.Write([]byte(key))
    return c.shards[h.Sum32()%uint32(len(c.shards))]
}

func (c *ShardedCache) Get(key string) ([]byte, bool) {
    sh := c.shardFor(key)
    sh.mu.RLock()
    e, ok := sh.m[key]
    sh.mu.RUnlock()
    if !ok {
        return nil, false
    }
    if !e.deadline.IsZero() && time.Now().After(e.deadline) {
        sh.mu.Lock()
        delete(sh.m, key) // lazy expire
        sh.mu.Unlock()
        return nil, false
    }
    return e.val, true
}

func (c *ShardedCache) Set(key string, val []byte, ttl time.Duration) {
    sh := c.shardFor(key)
    sh.mu.Lock()
    defer sh.mu.Unlock()
    e := entry{val: val}
    if ttl > 0 {
        e.deadline = time.Now().Add(ttl)
    }
    sh.m[key] = e
}

Tip

For Java, ConcurrentHashMap reduces lock contention; for Go, shard count (e.g. 256–4096) is a common tuning knob. Python GIL makes CPU-bound eviction less parallel — for CPython, extension modules or multiprocessing may be needed at extreme scale.


4.2 Eviction Policies

Policy Idea Typical structure Cost
LRU Evict least recently used Hash map + doubly linked list O(1) get/set
LFU Evict least frequently used Min-heap, or frequency buckets + LRU lists O(1) amortized (bucket design)
TTL-first Prefer expiring soon Priority queue or lazy + periodic sweep Mixed
Random Evict random key None O(1), poor locality

TTL expiration:

Mode Behavior Trade-off
Lazy Check expiry on read; periodic sampling CPU-efficient; stale keys occupy memory
Active Background thread walks structure More CPU; tighter memory
# TTL-based eviction helper
import heapq
import time
from dataclasses import dataclass, field
from typing import Dict, List, Optional


@dataclass(order=True)
class Expiring:
    deadline: float
    key: str = field(compare=False)


class TTLIndex:
    """Min-heap of keys by deadline — illustrative active expiry scanner."""

    def __init__(self) -> None:
        self._heap: List[Expiring] = []
        self._entry: Dict[str, bytes] = {}

    def set(self, key: str, val: bytes, ttl_sec: float) -> None:
        deadline = time.time() + ttl_sec
        heapq.heappush(self._heap, Expiring(deadline, key))
        self._entry[key] = val

    def purge_expired(self) -> int:
        now = time.time()
        removed = 0
        while self._heap and self._heap[0].deadline <= now:
            item = heapq.heappop(self._heap)
            if item.key in self._entry:  # stale heap entries possible without decrease-key
                del self._entry[item.key]
                removed += 1
        return removed
// LRU cache O(1)
import java.util.HashMap;
import java.util.Map;

public class LRUCache {
    static class Node {
        String key;
        String val;
        Node prev, next;
        Node(String k, String v) { key = k; val = v; }
    }

    private final Map<String, Node> index = new HashMap<>();
    private final Node head = new Node("", "");
    private final Node tail = new Node("", "");
    private final int capacity;

    public LRUCache(int capacity) {
        this.capacity = capacity;
        head.next = tail;
        tail.prev = head;
    }

    public String get(String key) {
        Node n = index.get(key);
        if (n == null) return null;
        moveToHead(n);
        return n.val;
    }

    public void put(String key, String val) {
        Node n = index.get(key);
        if (n != null) {
            n.val = val;
            moveToHead(n);
            return;
        }
        if (index.size() == capacity) {
            Node evict = removeTail();
            index.remove(evict.key);
        }
        Node neo = new Node(key, val);
        index.put(key, neo);
        addToHead(neo);
    }

    private void addToHead(Node n) {
        n.prev = head;
        n.next = head.next;
        head.next.prev = n;
        head.next = n;
    }

    private void remove(Node n) {
        n.prev.next = n.next;
        n.next.prev = n.prev;
    }

    private void moveToHead(Node n) {
        remove(n);
        addToHead(n);
    }

    private Node removeTail() {
        Node last = tail.prev;
        remove(last);
        return last;
    }
}
// LFU (simplified frequency buckets)
package eviction

import "container/list"

type lfuItem struct {
    key   string
    freq  int
    elem  *list.Element // element in freqList[freq]
}

// LFUCache uses a map + per-frequency doubly linked lists for O(1) typical ops.
type LFUCache struct {
    cap   int
    size  int
    min   int
    nodes map[string]*lfuItem
    freqs map[int]*list.List
}

func NewLFU(capacity int) *LFUCache {
    return &LFUCache{
        cap:   capacity,
        nodes: make(map[string]*lfuItem),
        freqs: make(map[int]*list.List),
    }
}

func (c *LFUCache) Get(key string) (string, bool) {
    n, ok := c.nodes[key]
    if !ok {
        return "", false
    }
    c.increment(n)
    return n.key, true // value stored separately in real impl
}

func (c *LFUCache) increment(it *lfuItem) {
    // remove from old freq list
    oldList := c.freqs[it.freq]
    oldList.Remove(it.elem)
    if oldList.Len() == 0 {
        delete(c.freqs, it.freq)
        if c.min == it.freq {
            c.min++
        }
    }
    it.freq++
    newList, ok := c.freqs[it.freq]
    if !ok {
        newList = list.New()
        c.freqs[it.freq] = newList
    }
    it.elem = newList.PushFront(it.key)
}

func (c *LFUCache) evict() {
    list := c.freqs[c.min]
    back := list.Back()
    list.Remove(back)
    delete(c.nodes, back.Value.(string))
    c.size--
}

Note

Production LFU often uses W-TinyLFU (Redis 4+ optional eviction) to handle frequency burst vs LRU recency — mention by name for bonus points.


4.3 Data Partitioning (Sharding)

  • Consistent hashing: Keys map to a ring; nodes own arc ranges. Adding/removing a node moves only neighboring keys (not full rehash).
  • Virtual nodes: Each physical node appears as many points on the ring to balance load when counts are uneven.
  • Hash slots (Redis Cluster): Fixed 16384 slots; each node holds a slot range. Rebalancing moves slots, not individual keys.
  • Hash tags: {user123}:profile forces same slot for related keys (Redis).

Mermaid: hash ring

flowchart LR subgraph Ring["Consistent Hash Ring 0..2^32-1"] direction clockwise V1((v1)) V2((v2)) V3((v3)) V4((v4)) V1 --> V2 --> V3 --> V4 --> V1 end V1 -.-> N1[Node A] V2 -.-> N1 V3 -.-> N2[Node B] V4 -.-> N2
// Slot-based routing (Redis Cluster–style)
public final class SlotRouter {
    public static final int SLOT_COUNT = 16_384;

    private final String[] slotToNode; // simplified: slot -> node id

    public SlotRouter(String[] slotToNode) {
        if (slotToNode.length != SLOT_COUNT) {
            throw new IllegalArgumentException("expected " + SLOT_COUNT + " slots");
        }
        this.slotToNode = slotToNode;
    }

    public static int crc16slot(byte[] key) {
        // Interview: cite Redis CRC16 + mod 16384; implementation omitted for brevity
        return Math.floorMod(crc16(key), SLOT_COUNT);
    }

    public String route(byte[] key) {
        int slot = crc16slot(key);
        return slotToNode[slot];
    }

    private static int crc16(byte[] bs) {
        int crc = 0;
        for (byte b : bs) {
            crc = ((crc << 8) ^ (b & 0xFF)) % 65536;
        }
        return crc;
    }
}
// Consistent hash ring (Ketama-style sketch)
package shard

import (
    "crypto/sha1"
    "hash"
    "sort"
    "strconv"
)

type Ring struct {
    hash   hash.Hash
    replicas int
    keys     []int
    hashMap  map[int]string
}

func NewRing(replicas int) *Ring {
    return &Ring{
        hash:     sha1.New(),
        replicas: replicas,
        hashMap:  make(map[int]string),
    }
}

func (r *Ring) Add(nodes ...string) {
    for _, node := range nodes {
        for i := 0; i < r.replicas; i++ {
            h := r.hashKey(strconv.Itoa(i) + node)
            r.keys = append(r.keys, h)
            r.hashMap[h] = node
        }
    }
    sort.Ints(r.keys)
}

func (r *Ring) hashKey(key string) int {
    r.hash.Reset()
    r.hash.Write([]byte(key))
    bs := r.hash.Sum(nil)
    return int(bs[3]) | int(bs[2])<<8 | int(bs[1])<<16 | int(bs[0])<<24
}

func (r *Ring) Get(key string) string {
    if len(r.keys) == 0 {
        return ""
    }
    h := r.hashKey(key)
    idx := sort.Search(len(r.keys), func(i int) bool { return r.keys[i] >= h })
    if idx == len(r.keys) {
        idx = 0
    }
    return r.hashMap[r.keys[idx]]
}

Warning

Rebalancing: Moving slots or ring ranges causes migration windows; clients must handle MOVED/ASK redirects (Redis) or dual-read strategies during transitions.


4.4 Replication & High Availability

Aspect Async replication Sync replication
Latency Lower write latency Higher (wait for followers)
Durability Risk of lost writes on failover Stronger
Availability Higher Can block if follower slow

Leader–follower: One primary per shard; followers apply a replication log (stream of commands or snapshot + delta).

Failover / leader election: Often implemented with Raft or ZooKeeper/etcd for coordination, or sentinel-style external watchers.

Split-brain prevention: Quorum + fencing (e.g., epoch numbers); old primary must not accept writes after losing leadership.

Mermaid: replication topology

flowchart TB subgraph AZ1 L[Leader] end subgraph AZ2 F1[Follower 1] end subgraph AZ3 F2[Follower 2] end L -->|"replication log"| F1 L -->|"replication log"| F2 C[Clients] -->|writes| L C -->|"reads optional"| F1
// Illustrative replication manager
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;

public class ReplicationManager {
    public record Command(String op, String key, byte[] value) {}

    private final BlockingQueue<Command> outbound = new LinkedBlockingQueue<>();

    public void replicate(Command cmd) throws InterruptedException {
        // Leader: append to local WAL, then async fan-out
        outbound.put(cmd);
    }

    public void followerLoop(Runnable applier) {
        new Thread(() -> {
            while (true) {
                try {
                    Command c = outbound.take();
                    applier.run(); // apply replicated op
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    return;
                }
            }
        }, "follower").start();
    }
}

Tip

Mention partial resync (PSYNC in Redis) vs full sync after long downtime — interviewers like operational realism.


4.5 Cache Patterns

Pattern Use case Pros Cons
Cache-aside General purpose Simple; app controls Stale data; stampede risk
Read-through Shared loader Centralized invalidation Loader becomes bottleneck
Write-through Strong read-after-write DB and cache aligned Higher write latency
Write-behind Write-heavy bursts Fast writes Data loss window; complexity
# Write-behind with queue (simplified)
import queue
import threading
from typing import Callable


class WriteBehind:
    def __init__(self, flush: Callable[[str, bytes], None], workers: int = 2) -> None:
        self._q: "queue.Queue[tuple[str, bytes]]" = queue.Queue(maxsize=10_000)
        self._flush = flush
        for _ in range(workers):
            threading.Thread(target=self._loop, daemon=True).start()

    def _loop(self) -> None:
        while True:
            key, val = self._q.get()
            self._flush(key, val)
            self._q.task_done()

    def enqueue(self, key: str, val: bytes) -> None:
        self._q.put((key, val))
import java.util.Optional;

// Cache-aside (illustrative)
public class CacheAsideUserRepo {
    private final ConcurrentStringCache cache;
    private final UserDb db;

    public CacheAsideUserRepo(ConcurrentStringCache cache, UserDb db) {
        this.cache = cache;
        this.db = db;
    }

    public byte[] getUser(String id) {
        return cache.get("user:" + id).orElseGet(() -> {
            byte[] row = db.loadUser(id);
            if (row != null) {
                cache.set("user:" + id, row, Optional.of(60_000L)); // 60s TTL
            }
            return row;
        });
    }

    public void updateUser(String id, byte[] row) {
        db.saveUser(id, row);
        cache.delete("user:" + id);
    }

    public interface UserDb {
        byte[] loadUser(String id);
        void saveUser(String id, byte[] row);
    }
}
// Write-through sketch
type WriteThrough struct {
    cache *ShardedCache
    db    interface {
        Save(key string, val []byte) error
    }
}

func (w *WriteThrough) Set(key string, val []byte, ttl time.Duration) error {
    if err := w.db.Save(key, val); err != nil {
        return err
    }
    w.cache.Set(key, val, ttl)
    return nil
}

4.6 Consistency & Concurrency

  • Eventual consistency: Replicas converge over time; reads may return stale values unless read-your-writes is engineered (sticky sessions, sync replication, or quorum reads).
  • CAS: SET key value if current == Xoptimistic concurrency; conflicts retry.
  • Distributed locks: Redlock (multiple independent Redis instances) is controversial; many teams prefer etcd/Consul leases or DB advisory locks for fencing.
  • Locking: Optimistic (version/CAS) vs pessimistic (locks) — trade latency vs conflict rate.
# Optimistic versioning
from dataclasses import dataclass
from typing import Dict, Optional, Tuple


@dataclass
class VerVal:
    ver: int
    data: bytes


class VersionedStore:
    def __init__(self) -> None:
        self._m: Dict[str, VerVal] = {}

    def cas(self, key: str, expected_ver: int, new_data: bytes) -> bool:
        cur = self._m.get(key)
        if cur is None or cur.ver != expected_ver:
            return False
        self._m[key] = VerVal(cur.ver + 1, new_data)
        return True
// CAS-style update
import java.util.concurrent.atomic.AtomicReference;

public class CasValue {
    private final AtomicReference<Versioned> ref;

    public CasValue(byte[] initial) {
        this.ref = new AtomicReference<>(new Versioned(1, initial));
    }

    public boolean compareAndSet(byte[] expect, byte[] update) {
        Versioned cur = ref.get();
        if (!java.util.Arrays.equals(cur.data, expect)) {
            return false;
        }
        Versioned next = new Versioned(cur.version + 1, update);
        return ref.compareAndSet(cur, next);
    }

    private record Versioned(int version, byte[] data) {}
}
// Lock with lease (single-node demo; not full Redlock)
package distlock

import (
    "context"
    "time"
)

type Locker interface {
    TryLock(ctx context.Context, key string, ttl time.Duration) (unlock func(), ok bool)
}

// SingleRedisLocker is illustrative; production uses random token + Lua compare-and-del.
type SingleRedisLocker struct{}

func (SingleRedisLocker) TryLock(ctx context.Context, key string, ttl time.Duration) (func(), bool) {
    // SET key token NX PX ttl — omitted: connect to Redis
    return func() {}, true
}

4.7 Persistence

Mode Behavior Pros Cons
Snapshot (RDB) Fork + periodic dump Compact; fast restart if snapshot fresh Data loss since last snapshot
Append-only (AOF) Log every write Finer granularity Larger files; replay time
Hybrid AOF + periodic rewrite Balance Operational complexity

Recovery: Start from latest RDB, then replay AOF after RDB timestamp if hybrid; or replay AOF from empty.

Note

Redis supports RDB, AOF, and disabling persistence entirely for pure cache tiers.


4.8 Client-Side Optimization

Technique Benefit Caveat
Connection pooling Amortize TCP/TLS setup Pool sizing vs file descriptors
Pipelining Batch commands, fewer round trips Head-of-line blocking on single connection
Client-side cache Fewer network hops Stale data; invalidation story
Compression Less bandwidth CPU cost; only for large values
# Pipeline-style batching (conceptual)
from typing import Iterable, List, Tuple, Protocol


class RedisLike(Protocol):
    def get(self, key: str) -> bytes | None: ...
    def set(self, key: str, val: bytes) -> None: ...


def pipeline_get(client: RedisLike, keys: Iterable[str]) -> List[bytes | None]:
    # Real: redis.pipeline(); here: sequential illustration
    return [client.get(k) for k in keys]


def pipeline_set(client: RedisLike, pairs: List[Tuple[str, bytes]]) -> None:
    for k, v in pairs:
        client.set(k, v)
// Connection pool (Hikari-style naming only; illustrative DataSource)
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.ArrayDeque;
import java.util.Deque;

public class SimplePool implements AutoCloseable {
    private final Deque<Connection> pool = new ArrayDeque<>();
    private final String url;
    private final int max;

    public SimplePool(String url, int max) {
        this.url = url;
        this.max = max;
    }

    public synchronized Connection borrow() throws SQLException {
        if (!pool.isEmpty()) {
            return pool.pollFirst();
        }
        return DriverManager.getConnection(url);
    }

    public synchronized void release(Connection c) {
        if (pool.size() < max) {
            pool.addLast(c);
        } else {
            try { c.close(); } catch (SQLException ignored) { }
        }
    }

    public void close() {
        pool.forEach(c -> { try { c.close(); } catch (SQLException ignored) { } });
        pool.clear();
    }
}

Warning

For Redis, use Lettuce/Jedis pool or redis-py connection pools — the above shows pooling pattern, not Redis protocol.


4.9 Pub/Sub & Notifications

  • Pub/Sub: Fire-and-forget broadcast; no persistence of messages (Redis Pub/Sub semantics).
  • Keyspace notifications: Server publishes events on key changes; clients subscribe for cache invalidation or auditing.
# Minimal subscriber sketch
from collections import defaultdict
from typing import DefaultDict, List
import threading


class SimplePubSub:
    def __init__(self) -> None:
        self._lock = threading.RLock()
        self._topics: DefaultDict[str, List["queue.Queue[str]"]] = defaultdict(list)

    def publish(self, topic: str, msg: str) -> None:
        import queue

        with self._lock:
            subs = list(self._topics[topic])
        for q in subs:
            try:
                q.put_nowait(msg)
            except queue.Full:
                pass
// Minimal subscriber sketch
import java.util.concurrent.*;
import java.util.concurrent.CopyOnWriteArrayList;

public class SimplePubSub {
    private final ConcurrentHashMap<String, CopyOnWriteArrayList<BlockingQueue<String>>> topics
            = new ConcurrentHashMap<>();

    public void subscribe(String topic, BlockingQueue<String> q) {
        topics.computeIfAbsent(topic, t -> new CopyOnWriteArrayList<>()).add(q);
    }

    public void publish(String topic, String msg) {
        var subs = topics.get(topic);
        if (subs == null) return;
        subs.forEach(q -> q.offer(msg));
    }
}
// Minimal in-memory pub/sub
package pubsub

import "sync"

type Bus struct {
    mu   sync.RWMutex
    subs map[string][]chan string
}

func NewBus() *Bus {
    return &Bus{subs: make(map[string][]chan string)}
}

func (b *Bus) Subscribe(topic string) <-chan string {
    ch := make(chan string, 16)
    b.mu.Lock()
    b.subs[topic] = append(b.subs[topic], ch)
    b.mu.Unlock()
    return ch
}

func (b *Bus) Publish(topic, msg string) {
    b.mu.RLock()
    defer b.mu.RUnlock()
    for _, ch := range b.subs[topic] {
        select {
        case ch <- msg:
        default:
        }
    }
}

4.10 Cache Stampede, Thundering Herd & Hot Keys

Cache stampede: Many concurrent requests miss the same key (for example TTL-aligned expiry), and each hits the origin — database overload.

Thundering herd: Many clients retry or wake simultaneously after expiry, amplifying load.

Mitigations (compare):

Technique Mechanism Pros Cons
Singleflight / request coalescing First miss holds work; others wait on same future Simple; large win Latency for waiters
Probabilistic early expiration (jitter) Refresh before hard TTL with randomness Spreads refresh Slightly stale reads
Mutex per key (local) Only one loader per process Easy Does not cross processes
Distributed lock One loader cluster-wide Strong dedup Lock service failure modes
Pre-warming Scheduled refresh for known keys Predictable Waste if unused
# Asyncio lock-per-key (illustrative)
import asyncio
from typing import Awaitable, Callable, Dict, TypeVar

K = TypeVar("K")
V = TypeVar("V")


class AsyncSingleFlight:
    def __init__(self) -> None:
        self._locks: Dict[K, asyncio.Lock] = {}
        self._map_lock = asyncio.Lock()

    async def _lock_for(self, key: K) -> asyncio.Lock:
        async with self._map_lock:
            if key not in self._locks:
                self._locks[key] = asyncio.Lock()
            return self._locks[key]

    async def do(self, key: K, factory: Callable[[], Awaitable[V]]) -> V:
        lk = await self._lock_for(key)
        async with lk:
            return await factory()
// Singleflight-style loader (per-key)
import java.util.concurrent.*;

public class StampedeGuard<K, V> {
    private final ConcurrentHashMap<K, CompletableFuture<V>> inflight = new ConcurrentHashMap<>();

    public V get(K key, Loader<K, V> loader) throws Exception {
        CompletableFuture<V> fut = inflight.computeIfAbsent(
                key, k -> CompletableFuture.supplyAsync(() -> {
                    try {
                        return loader.load(k);
                    } catch (RuntimeException e) {
                        throw e;
                    } catch (Exception e) {
                        throw new CompletionException(e);
                    }
                }));
        try {
            return fut.get();
        } finally {
            inflight.remove(key, fut);
        }
    }

    @FunctionalInterface
    public interface Loader<K, V> {
        V load(K key) throws Exception;
    }
}
// singleflight pattern
package stampede

import "golang.org/x/sync/singleflight"

type Guard struct {
    g singleflight.Group
}

func (g *Guard) Do(key string, fn func() (interface{}, error)) (interface{}, error) {
    return g.g.Do(key, fn)
}

Note

Prefer golang.org/x/sync/singleflight in production; the snippet encodes one origin load per key at a time.

Hot key: operational patterns

Pattern Description
Read from replica Spread read QPS across followers for the same logical key
Application sharding popular:item:{0..N} round-robin; merge in client
Local L1 Process-local cache with short TTL

4.11 Redis Cluster Client Routing (MOVED / ASK)

When the cluster reshards, clients may hit the wrong node. Redis responds with:

Reply Meaning
MOVED slot ip:port Slot permanently owned elsewhere — update slot map
ASK ip:port Temporary migration — send with ASKING for this query only

Client responsibilities: maintain slot → node cache, refresh on MOVED, follow ASKING plus one-off redirect during migration.

sequenceDiagram participant C as Client participant N1 as Node A stale route participant N2 as Node B owner C->>N1: GET foo N1-->>C: MOVED 3999 10.0.0.12:6379 C->>N2: GET foo N2-->>C: value

Warning

Clients that ignore MOVED look flaky under cluster operations — mention cluster-aware clients for Redis Cluster.


4.12 Distributed Locks & Redlock (Interview Nuance)

Goal: Exclusive access to a resource across processes.

Approach Favor when
Lease + fencing token Correctness under GC pauses and clock skew
etcd / Consul lease Strong coordination guarantees
Redis SET NX PX + Lua unlock Single Redis; understand failure modes
Redlock (N independent Redis) Controversial; Kleppmann critique — know trade-offs

Talking points: debates center on clock drift, long GC, and fencing so stale holders cannot commit side effects after failover.


Step 5: Scaling & Production

Cluster Management

Concern Mitigation
Add/remove node Consistent hashing migration; Redis reshard; minimize moved keys
Hot keys Client-side local cache; read replicas; key splitting (user:123:partN)
OOM maxmemory, eviction policy, reject vs evict
Uneven shard load Virtual nodes; dynamic rebalancing; monitor key distribution

Capacity & SLO Planning

SLO Example target Measurement
Read p99 < 1 ms intra-AZ Client-side histogram
Write p99 < 2 ms with sync replication Server-side
Availability 99.99% Multi-AZ plus failover
Durability RPO < 5 min Snapshot interval plus AOF

Note

RPO/RTO depend on persistence mode: pure in-memory cache may imply large RPO unless replicated with sync cross-region semantics (expensive).

Failure Handling

Failure Detection Recovery
Process crash Health checks / sentinel Promote replica; reroute clients
Network partition Timeouts, partial quorum Avoid split-brain with epochs
Disk full (persistence) Metrics alerts Rotate logs; expand volume
Slow replica Replication lag metric Throttle writes or replace node

Gossip (e.g., cluster membership): scalable failure detection; trade-off: eventual view of cluster vs strong consistency of routing tables.

Multi-AZ & Disaster Recovery

Topology Pros Cons
Active/passive region Simpler writes Failover time
Active/active Lower RTO Conflict resolution, cache coherence

Monitoring

Metric Why
Hit rate / miss rate Cache effectiveness
Eviction rate Working set vs memory pressure
Memory usage Capacity planning
Latency p50/p99 SLA
Connections Pool leaks, connection storms
Replication lag Stale reads, failover risk
CPU / NIC saturation Hot key or big key symptoms
Commands/sec by command Detect KEYS *, FLUSHALL, accidental scans

Tip

Dashboard hit ratio without latency is misleading — optimize for tail latency under load.

Production Checklist (Mermaid)

flowchart TD A[Define SLOs] --> B[Partitioning + replication] B --> C[Eviction + maxmemory policy] C --> D[Client pooling + timeouts + retries] D --> E[Runbooks failover + resharding] E --> F[Dashboards lag + p99 + evictions]

Interview Tips

Topic What to say
Cache stampede Many requests miss together → thundering DB load. Mitigation: singleflight, probabilistic early expiration, locks, pre-warming.
Thundering herd Many clients wake at once; use jitter, backoff, coalescing.
Hot key Single shard/network limit; replication, application cache, split key.
Big key Slow serialization, eviction, replication; split value, compression, design limits.
Consistency Eventual by default; linearizable cache is rare and expensive.
Eviction policy LRU vs LFU vs TTL — tie to access pattern (bursty vs steady).
Persistence RDB vs AOF vs none — RPO/RTO and replay time.
Cluster ops Resharding, MOVED, replica promotion — client must be cluster-aware.
Memory fragmentation Slab allocators, MEMORY PURGE (Redis), jemalloc tuning.

Follow-Up Questions You Can Ask the Interviewer

Question Why it helps
Is cross-region consistency required? Drives sync replication vs CRDT-style acceptance
Are values binary or structured? Compression, serialization, memcpy cost
Is Pub/Sub durability required? Redis Pub/Sub is fire-and-forget; use Kafka for durable streams
What is the acceptable stale read window? Cache-aside TTL vs read-through with versioning

Note

Tie answers to observability (metrics above) and failure modes (split-brain, replication lag). That signals production maturity beyond textbook APIs.


Summary

Designing a distributed cache blends data structures (hash tables, LRU/LFU), partitioning (consistent hashing, slots), replication (leader/follower, Raft), persistence (RDB/AOF trade-offs), and client patterns (cache-aside, pipelining). Ground discussion in latency, memory, hot spots, and operational failure — that is what distinguishes a strong interview performance.

Real-world anchors: Redis (data structures, persistence, cluster), Memcached (multithreaded simplicity, slab allocator), and large-scale users (Twitter, GitHub, Snapchat) demonstrate that tail latency, hot keys, and cluster operations matter as much as API design.