Design a URL Shortener (TinyURL/Bitly)¶

What We're Building¶

A URL shortener takes a long, unwieldy URL like:

https://example.com/articles/2024/how-to-prepare-for-system-design-interviews?ref=social&utm_source=twitter&utm_medium=post&campaign_id=12345

And converts it to something short and shareable:

https://tiny.co/aB3xYz1

When someone clicks the short link, they get redirected to the original URL.

Why URL Shorteners Exist¶

Character limits: Twitter (now X) famously had 140-character limits. Long URLs consumed precious space.
Tracking: Short URLs enable click analytics—who clicked, when, from where.
Aesthetics: tiny.co/sale looks better in marketing than a 200-character URL.
Memorability: Short, custom URLs (bit.ly/my-resume) are easier to remember and share verbally.

Real-world examples: TinyURL (2002), Bitly, t.co (Twitter), goo.gl (Google, discontinued), rb.gy

This is one of the most common system design interview questions because it covers core concepts: hashing, databases, caching, and scaling—while being simple enough to discuss in 45 minutes.

Step 1: Requirements Clarification¶

Before designing anything, clarify what you're building. This demonstrates maturity and prevents wasted effort.

Questions to Ask the Interviewer¶

Question	Why It Matters	Typical Answer
How many URLs shortened per day?	Determines database size, write throughput	100M/day (Twitter-scale) to 1K/day (startup)
What's the read-to-write ratio?	Caching strategy, read replica needs	Usually 100:1 or higher (many more clicks than creates)
Should short URLs expire?	Storage growth, cleanup jobs	Sometimes. Enterprise: yes. Consumer: often no.
Do we need analytics?	Adds complexity, async processing	Usually yes (that's how Bitly makes money)
Can users create custom short URLs?	Collision handling, validation	Nice-to-have, but common
What's the maximum URL length?	Database schema, validation	Typically 2,048 characters (browser limit)

Functional Requirements¶

Requirement	Priority	Notes
Shorten a long URL	Must have	Core feature
Redirect short URL to original	Must have	Core feature
Custom aliases (vanity URLs)	Nice to have	`tiny.co/my-resume`
Click analytics	Nice to have	Count, referrer, geography
URL expiration	Nice to have	Auto-delete after N days
API access	Nice to have	Programmatic creation

Non-Functional Requirements¶

Requirement	Target	Rationale
Availability	99.99% for redirects	If redirects fail, all shared links break
Redirect latency	< 50ms	Users expect instant redirect
Shortening latency	< 200ms	Can be slightly slower than reads
Durability	Never lose a mapping	Losing URLs breaks all links
Scalability	Handle 100K+ redirects/sec	Popular links go viral

API Design¶

Let's define the core APIs:

Create Short URL:

POST /api/v1/urls
Content-Type: application/json
Authorization: Bearer <token>

{
    "long_url": "https://example.com/very/long/path?with=params",
    "custom_alias": "my-link",     // Optional
    "expires_at": "2025-12-31"     // Optional
}

Response:

{
    "short_url": "https://tiny.co/aB3xYz1",
    "long_url": "https://example.com/very/long/path?with=params",
    "short_code": "aB3xYz1",
    "created_at": "2024-01-15T10:00:00Z",
    "expires_at": "2025-12-31T00:00:00Z"
}

Redirect (handled by browser):

GET /aB3xYz1
→ 302 Found
→ Location: https://example.com/very/long/path?with=params

Get Analytics:

GET /api/v1/urls/aB3xYz1/stats

Technology Selection & Tradeoffs¶

Interviewers often ask why you picked specific technologies. The URL shortener is read-heavy (redirects ≫ creates), needs strong uniqueness on write (no duplicate short codes), and benefits from predictable hot-key caching. Below are structured comparisons you can cite in an interview; section 4.2 revisits SQL vs NoSQL in the context of schema design.

Database (read-heavy workload)¶

Criterion	PostgreSQL	MySQL (InnoDB)	DynamoDB	Cassandra
Read path fit	Excellent with indexes + replicas; mature query planner	Excellent; similar to PG for simple key lookups	Excellent at key-value get by partition key; single-digit ms at scale	Excellent for wide-partition reads; tunable consistency
Strong uniqueness (`short_code`)	Native `UNIQUE`, transactional	Native `UNIQUE`, transactional	Conditional writes / idempotent keys; app must handle races carefully	Lightweight transactions / compare-and-set; more app logic
Horizontal write scale	Sharding is manual (or Citus); vertical + replicas first	Same pattern	Native partitioning; on-demand throughput	Native; best for multi-DC active-active
Ops & ecosystem	Rich extensions, JSON, great HA story (Patroni, etc.)	Ubiquitous; slightly fewer advanced types	Fully managed on AWS; vendor lock-in	Complex ops unless managed; excellent for huge scale
Consistency model	Serializable / repeatable read (tunable)	InnoDB ACID	Per-item strong reads available; global secondary indexes eventual	Tunable per read (ONE, QUORUM, ALL)
Best when	Default for most designs; strong consistency + SQL	Team already standardized on MySQL	AWS-native, huge partition-key QPS, minimal joins	Extreme write scale, multi-region, time-series style patterns

Read-heavy nuance: For redirects you mostly do SELECT long_url WHERE short_code = ?—any of these can work if cache hit rate is high. The harder part is write correctness (one canonical mapping per code) and operational simplicity at your target scale.

Cache layer¶

Criterion	Redis	Memcached	CDN / edge cache
Data structures	Strings, hashes, sorted sets, streams	Simple key-value	Cached HTTP responses / edge KV
Persistence & HA	Optional AOF/RDB; Redis Cluster / Sentinel	RAM-only; client-side sharding	POP replication; origin on miss
Eviction & TTL	Rich TTL, LRU policies	LRU	HTTP `Cache-Control`; URL-specific policies
Best for	Redirect lookup cache, rate limits, locks, analytics buffers	Pure microsecond KV when you need simplicity	Static or cacheable redirect responses, geo latency reduction; careful with personalized/analytics accuracy
Tradeoff	Slightly more features = slightly more complexity	No persistence; fewer features	Stale mappings until purge; harder to invalidate on URL update/delete

Interview tip: Say Redis (or compatible) for application cache (cache-aside), optional CDN in front for popular short links to shave RTT globally—pair CDN with short TTLs or purge APIs when mappings change.

ID generation for `short_code`¶

Approach	Uniqueness	Coordination	Predictability	Notes
Hash-based (truncate MD5/SHA)	Risk of collisions at short length; must detect & retry	Low	Same URL → same code (dedupe) or varies with salt	Simple; collision handling adds latency and complexity
Counter + Base62	Strong if counter is single source of truth	High if one DB sequence; low with ranges / Snowflake	Sequential unless obfuscated	Industry standard; combine with Snowflake or range allocator
UUID (v4 / v7)	Practically unique	None	Not URL-short without encoding	Longer as string; usually encoded → longer than 7 chars unless truncated (collision risk)
Snowflake-style	Strong per machine + time	None per ID (clock sync matters)	Roughly time-ordered	Scales horizontally; encode to Base62 for short codes

Our choice¶

Layer	Choice	Rationale
Primary store	PostgreSQL (or MySQL if org-standard)	ACID `UNIQUE` on `short_code`, straightforward HA, great read replica story for redirect path; sufficient until sharding is truly required.
Cache	Redis (cluster if needed)	Rich TTL, HA options, rate limiting & locks for thundering herd; Memcached is fine if you only need dumb KV and have other solutions for limits.
Edge	CDN optional	Cuts latency for hot links; use conservative caching or active purge when edits matter.
IDs	Snowflake (or DB sequence / range allocation) for production scale; hash only with explicit collision handling	Snowflake avoids a single DB bottleneck while keeping uniqueness; counter+Base62 from one DB is acceptable at smaller scale.

This aligns with the architecture diagram in Step 3 (PostgreSQL + Redis + async analytics).

CAP Theorem Analysis¶

CAP (Consistency, Availability, Partition tolerance) applies to distributed stores. In practice: network partitions happen, so systems choose between CP (sacrifice availability during partition) and AP (sacrifice strong consistency during partition). The URL shortener is interesting because different operations want different tradeoffs.

Operation	Desired behavior	CAP-style reading	Why
Redirect (read)	Always answer quickly; stale cache occasionally OK	AP for the serving path	Users tolerate rare slightly stale reads better than widespread failed redirects; cache + async replication favor availability and latency.
Create short URL (write)	Never hand out the same `short_code` to two URLs	CP at the persistence boundary	Uniqueness is a correctness property: you need atomic insert or conditional write against an authoritative source.
Analytics	Complete eventually	AP	Counts can lag; use async pipelines (Kafka → ClickHouse).

Per-operation narrative:

Redirects: Load balancer + app + Redis strive for high availability. If Redis is empty after a failover, you degrade to PostgreSQL—slightly higher latency, still available if the DB is up. During a partition, you might serve from cache (A) while replicas converge (eventual C for non-critical fields).
Creates: The insert into urls with UNIQUE(short_code) (or conditional put in DynamoDB) is the consistency choke point: two concurrent creators must not succeed with the same code. That is a CP moment: you wait for the authoritative node / quorum—not “best effort duplicate.”
Analytics: AP—duplicate or lost events can be bounded with idempotent consumers; SLAs are softer (see below).

flowchart TB subgraph AP_path["AP-leaning: redirect read path"] R[Client] --> Edge[CDN / LB] Edge --> App[Redirect service] App --> Cache[(Redis)] App --> DB[(Primary / replicas)] end subgraph CP_moment["CP moment: create mapping"] C[Create API] --> Auth[Transactional insert UNIQUE short_code] Auth --> Primary[(Authoritative store)] end subgraph Analytics_AP["AP: analytics pipeline"] App --> Q[Queue] Q --> OLAP[(Analytics store)] end

One-liner for interviews: We bias AP on the hot read path for availability and latency, but we enforce CP semantics at write time so short codes are never duplicated.

SLA and SLO Definitions¶

SLA = contract with users (often includes credits). SLO = internal target you measure against. Error budget = allowable bad events (e.g., 0.01% unavailability per month) before you freeze features and fix reliability.

Service-level objectives (examples)¶

Area	SLO	Measurement window	Notes
Redirect latency	p99 < 50 ms (server-side, excluding client RTT)	Rolling 30 days	Aligns with NFR; track cache hit ratio correlation.
Redirect availability	99.99% successful HTTP 302/301 (excluding 404 for bad codes)	Monthly	Matches “links must work” expectation; exclude abuse-driven 4xx if policy-defined.
URL creation latency	p99 < 200 ms	Rolling 30 days	Includes ID gen + DB commit; stricter if synchronous custom alias checks.
Data durability	RPO ≤ 1 min, RTO ≤ 15 min for mapping data	Per disaster scenario	Sync replica → lower RPO; quantify last mapping loss risk.
Analytics accuracy	≤ 0.1% discrepancy vs raw log reconciliation	Daily	Async pipeline; bounded delay (e.g., 5 min freshness) stated separately.

Error budget policy¶

Budget: For 99.99% availability, allowed unavailability ≈ 4.38 minutes/month. Spend budget on planned maintenance only with explicit approval; redirect burns budget fastest—prioritize alerts on redirect SLO.
If budget is exhausted: Stop non-critical releases; focus on HA, load tests, cache failover; consider tighter redirect-only on-call until burn rate recovers.
Latency SLOs: Treat p99 redirect regressions as launch-blocking if sustained; they often precede availability incidents.

These SLOs connect directly to monitoring in Step 5 (p99 redirect, success rate, cache hit ratio).

Step 2: Back-of-Envelope Estimation¶

Capacity planning is crucial. Let's do the math for a Twitter-scale service.

Traffic Estimation¶

Given:
- 100 million URLs shortened per month
- 100:1 read-to-write ratio (for every URL created, 100 clicks)

Writes:
- 100M / (30 days × 24 hours × 3600 seconds) = ~40 URLs/second
- Peak (5× average): 200 URLs/second

Reads:
- 40 × 100 = 4,000 redirects/second
- Peak: 20,000 redirects/second

Storage Estimation¶

Per URL record:
- Short code: 7 bytes
- Long URL: ~500 bytes (average)
- Timestamps: 16 bytes
- User ID: 8 bytes
- Other metadata: ~50 bytes
- Total: ~600 bytes

5 years of data:
- 100M/month × 12 months × 5 years = 6 billion URLs
- 6B × 600 bytes = 3.6 TB

With indexes and overhead: ~10 TB

Short Code Capacity¶

How many unique short codes can we generate?

Using Base62 (a-z, A-Z, 0-9):
- 6 characters: 62^6 = 56.8 billion combinations
- 7 characters: 62^7 = 3.5 trillion combinations

At 100M URLs/month:
- 6 chars last: 56.8B / (100M × 12) = ~47 years
- 7 chars last: Effectively unlimited

Decision: Use 7 characters for ample headroom

Bandwidth Estimation¶

Reads:
- 4,000 requests/sec × (200 bytes request + 500 bytes response) = 2.8 MB/sec
- Peak: 14 MB/sec

This is easily handled by modern networks.

Step 3: High-Level Architecture¶

Now let's design the system architecture.

Core Components¶

flowchart TB subgraph Clients["Client Layer"] Web[Web Browser] Mobile[Mobile App] API[API Client] end subgraph Edge["Edge Layer"] CDN[CDN / Edge Cache] LB[Load Balancer] end subgraph App["Application Layer"] APIGw[API Gateway] ShortenSvc[Shortening Service] RedirectSvc[Redirect Service] AnalyticsSvc[Analytics Service] end subgraph Data["Data Layer"] Cache[(Redis Cache)] DB[(PostgreSQL)] Analytics[(ClickHouse)] end subgraph Async["Async Processing"] Queue[Kafka] Worker[Analytics Worker] end Web --> CDN Mobile --> LB API --> LB CDN --> LB LB --> APIGw APIGw --> ShortenSvc APIGw --> RedirectSvc APIGw --> AnalyticsSvc ShortenSvc --> DB ShortenSvc --> Cache RedirectSvc --> Cache RedirectSvc --> DB RedirectSvc --> Queue Queue --> Worker Worker --> Analytics AnalyticsSvc --> Analytics

Two Core Flows¶

The system has two main operations:

Write Flow (URL Shortening): 1. Client sends long URL to API 2. Service generates unique short code 3. Mapping stored in database 4. Short URL returned to client

Read Flow (Redirect): 1. User clicks short URL 2. Service looks up mapping (cache first, then DB) 3. Returns HTTP redirect to original URL 4. Logs click event asynchronously

Let's dive deep into each component.

Step 4: Deep Dive¶

4.1 Short Code Generation¶

This is the heart of the system. How do we create unique, short, URL-safe codes?

Approach 1: Hash the URL¶

The obvious approach—hash the long URL and take the first N characters.

import hashlib
import base64

def hash_url(long_url: str) -> str:
    # MD5 hash (128 bits)
    hash_bytes = hashlib.md5(long_url.encode()).digest()
    # Base64 encode and take first 7 chars
    encoded = base64.urlsafe_b64encode(hash_bytes).decode()
    return encoded[:7].replace('-', 'a').replace('_', 'b')

# Example
hash_url("https://example.com/page")  # "rL0Y7vC"

Problems with hashing:

Collisions: Two different URLs might hash to the same 7 characters. MD5 is 128 bits, but we're only using 7 Base62 characters (~41 bits). Collisions are mathematically certain at scale.

Collision resolution is complex:

def shorten_with_collision_handling(long_url: str) -> str:
    for attempt in range(10):
        # Add attempt number to create different hashes
        hash_input = long_url if attempt == 0 else f"{long_url}:{attempt}"
        short_code = hash_url(hash_input)

        if not database.exists(short_code):
            database.insert(short_code, long_url)
            return short_code

    raise Exception("Too many collisions")

Same URL always gets same code: This might be desirable (deduplication) or not (each user wants their own tracking link).

Verdict: Hashing works for small scale but becomes problematic at billions of URLs.

Approach 2: Counter with Base62 Encoding (Recommended)¶

Use an auto-incrementing counter and convert to Base62.

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def to_base62(num: int) -> str:
    """Convert a number to base62 string."""
    if num == 0:
        return ALPHABET[0]

    result = []
    while num > 0:
        result.append(ALPHABET[num % 62])
        num //= 62

    return ''.join(reversed(result))

def from_base62(s: str) -> int:
    """Convert base62 string back to number."""
    num = 0
    for char in s:
        num = num * 62 + ALPHABET.index(char)
    return num

# Examples
to_base62(1)           # "1"
to_base62(62)          # "10"
to_base62(1000000)     # "4c92"
to_base62(999999999)   # "15FTGf"

Advantages: - No collisions: Every number maps to a unique string - Predictable length: Easy to calculate when you'll need more characters - Simple implementation: Just arithmetic

Challenge: Where does the counter come from in a distributed system?

Distributed ID Generation¶

When you have multiple servers, they all need unique IDs without coordination delays.

Option A: Database Auto-Increment

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE,
    long_url TEXT NOT NULL
);

-- Insert returns the auto-generated ID
INSERT INTO urls (long_url) VALUES ('https://example.com') RETURNING id;

Pros: Simple, guaranteed unique Cons: Database becomes bottleneck, single point of failure

Option B: Range Allocation

Each server pre-allocates a range of IDs: - Server 1: IDs 1 - 1,000,000 - Server 2: IDs 1,000,001 - 2,000,000 - etc.

When a server exhausts its range, it requests a new one from a coordinator.

class IDGenerator:
    def __init__(self, coordinator_url):
        self.coordinator = coordinator_url
        self.current_range = None
        self.current_id = 0

    def get_next_id(self):
        if self.current_range is None or self.current_id >= self.current_range['end']:
            self.current_range = self.request_new_range()
            self.current_id = self.current_range['start']

        id = self.current_id
        self.current_id += 1
        return id

    def request_new_range(self):
        # Atomically get next range from coordinator
        response = requests.post(f"{self.coordinator}/allocate-range")
        return response.json()  # {"start": 5000001, "end": 6000000}

Pros: Reduces coordination to rare events Cons: Gaps in IDs if server crashes, need coordinator service

Option C: Snowflake IDs (Twitter's Approach)

Generate 64-bit IDs that embed timestamp, machine ID, and sequence number.

┌─────────────────────────────────────────────────────────────────┐
│                        64-bit Snowflake ID                      │
├────────┬───────────────────────┬──────────┬────────────────────┤
│ 1 bit  │      41 bits          │ 10 bits  │      12 bits       │
│ unused │     timestamp         │machine ID│      sequence      │
│        │  (milliseconds)       │          │   (per millisecond)│
└────────┴───────────────────────┴──────────┴────────────────────┘

Breaking it down:

41 bits for timestamp: Milliseconds since custom epoch. Lasts 69 years.
10 bits for machine ID: Supports 1,024 machines
12 bits for sequence: 4,096 IDs per machine per millisecond

import time

class SnowflakeGenerator:
    def __init__(self, machine_id: int):
        self.machine_id = machine_id
        self.sequence = 0
        self.last_timestamp = -1
        self.epoch = 1609459200000  # Custom epoch: 2021-01-01

    def generate(self) -> int:
        timestamp = int(time.time() * 1000) - self.epoch

        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF  # 12 bits
            if self.sequence == 0:
                # Wait for next millisecond
                while timestamp <= self.last_timestamp:
                    timestamp = int(time.time() * 1000) - self.epoch
        else:
            self.sequence = 0

        self.last_timestamp = timestamp

        # Construct 64-bit ID
        id = ((timestamp << 22) |
              (self.machine_id << 12) |
              self.sequence)

        return id

# Usage
generator = SnowflakeGenerator(machine_id=1)
id = generator.generate()  # e.g., 6904298052141748224
short_code = to_base62(id)  # e.g., "aBc123def"

Pros: - No coordination needed - IDs are roughly time-sorted - Each machine can generate 4M IDs/second

Cons: - Requires careful clock synchronization - Machine ID management needed

Recommendation: Use Snowflake for large-scale systems. Use database auto-increment for simpler deployments.

Security: Predictable vs Random¶

Counter-based IDs are sequential. Someone could enumerate all short codes: - tiny.co/1, tiny.co/2, tiny.co/3, ...

If this is a concern:

Add randomness: XOR the ID with a secret key or shuffle bits
Use longer codes: 7+ characters make enumeration impractical
Rate limit: Detect and block enumeration attempts

SECRET = 0xDEADBEEF12345678

def obfuscate_id(id: int) -> int:
    """Make IDs non-sequential while remaining reversible."""
    return id ^ SECRET

def deobfuscate_id(obfuscated: int) -> int:
    return obfuscated ^ SECRET

4.2 Database Design¶

Schema Design¶

-- Main URL mappings table
CREATE TABLE urls (
    id BIGINT PRIMARY KEY,                    -- Snowflake ID
    short_code VARCHAR(10) UNIQUE NOT NULL,   -- Base62 encoded
    long_url TEXT NOT NULL,                   -- Original URL
    long_url_hash VARCHAR(64),                -- For duplicate detection
    user_id UUID,                             -- NULL for anonymous
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,                     -- NULL = never expires
    is_active BOOLEAN DEFAULT TRUE,
    click_count BIGINT DEFAULT 0              -- Denormalized for quick access
);

-- Critical indexes
CREATE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_long_url_hash ON urls(long_url_hash);
CREATE INDEX idx_user_urls ON urls(user_id, created_at DESC);
CREATE INDEX idx_expires ON urls(expires_at) WHERE expires_at IS NOT NULL;

-- Click events for analytics (consider separate database)
CREATE TABLE clicks (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) NOT NULL,
    clicked_at TIMESTAMP DEFAULT NOW(),
    ip_address INET,
    user_agent TEXT,
    referrer TEXT,
    country_code VARCHAR(2)
);

CREATE INDEX idx_clicks_code_time ON clicks(short_code, clicked_at);

SQL vs NoSQL Decision¶

Factor	SQL (PostgreSQL)	NoSQL (DynamoDB/Cassandra)
Consistency	Strong (ACID)	Eventual (tunable)
Query flexibility	Complex queries, JOINs	Limited (key-value)
Scaling writes	Harder (vertical first)	Easy (horizontal)
Unique constraints	Built-in	Application-enforced
Operational complexity	Lower (familiar)	Higher

The table above contrasts SQL vs NoSQL families. For engine-level tradeoffs (PostgreSQL vs MySQL vs DynamoDB vs Cassandra) and how they relate to a read-heavy workload, see Technology Selection & Tradeoffs in Step 1.

Recommendation: Start with PostgreSQL. It handles millions of URLs easily. Migrate to NoSQL only if you hit scaling limits (billions of URLs, 100K+ writes/sec).

Handling Duplicate URLs¶

Should the same long URL always map to the same short code?

Option 1: Allow duplicates (different short codes for same URL) - Each user can have their own tracking link - Simpler implementation - Uses more storage

Option 2: Deduplicate (same short code for same URL) - Saves storage - Requires hash lookup before insert - Might not fit all use cases (users want individual analytics)

def shorten_url(long_url: str, user_id: str, deduplicate: bool = True) -> str:
    url_hash = hashlib.sha256(long_url.encode()).hexdigest()

    if deduplicate:
        # Check if URL already exists
        existing = db.query(
            "SELECT short_code FROM urls WHERE long_url_hash = %s AND user_id = %s",
            (url_hash, user_id)
        )
        if existing:
            return existing.short_code

    # Generate new short code
    id = id_generator.generate()
    short_code = to_base62(id)

    db.execute(
        """INSERT INTO urls (id, short_code, long_url, long_url_hash, user_id)
           VALUES (%s, %s, %s, %s, %s)""",
        (id, short_code, long_url, url_hash, user_id)
    )

    return short_code

4.3 Caching Strategy¶

With a 100:1 read-to-write ratio, caching is essential. Most URLs follow the 80/20 rule: 20% of URLs get 80% of traffic.

Cache Architecture¶

flowchart LR Request[Request] --> Cache{Redis} Cache -->|HIT| Return[Return URL] Cache -->|MISS| DB[(PostgreSQL)] DB --> Store[Store in Cache] Store --> Return

Redis Data Model¶

# Store URL mappings
SET url:aB3xYz1 "https://example.com/long/url" EX 86400

# Hash for multiple fields
HSET url:aB3xYz1 long_url "https://example.com/..." expires_at "2025-12-31"

Cache Implementation¶

import redis
import json

class URLCache:
    def __init__(self, redis_client, ttl_seconds=86400):
        self.redis = redis_client
        self.ttl = ttl_seconds

    def get(self, short_code: str) -> str | None:
        """Get long URL from cache."""
        key = f"url:{short_code}"
        result = self.redis.get(key)
        return result.decode() if result else None

    def set(self, short_code: str, long_url: str, expires_at: int = None):
        """Cache URL mapping."""
        key = f"url:{short_code}"

        if expires_at:
            # Use URL's expiration as TTL
            ttl = max(expires_at - int(time.time()), 1)
        else:
            ttl = self.ttl

        self.redis.setex(key, ttl, long_url)

    def delete(self, short_code: str):
        """Remove from cache (on delete or update)."""
        self.redis.delete(f"url:{short_code}")

# In redirect service
def get_long_url(short_code: str) -> str:
    # Try cache first
    long_url = cache.get(short_code)

    if long_url:
        metrics.incr("cache.hit")
        return long_url

    # Cache miss - query database
    metrics.incr("cache.miss")
    result = db.query("SELECT long_url FROM urls WHERE short_code = %s", short_code)

    if not result:
        raise NotFoundError("Short URL not found")

    # Cache for next time
    cache.set(short_code, result.long_url)

    return result.long_url

Cache Hit Ratio Target¶

Aim for 95%+ cache hit ratio. With proper TTLs and sufficient memory: - 95% of redirects served from cache (~1ms) - 5% hit database (~10-50ms) - Average latency: ~3ms

Cache Sizing¶

Cache size = Active URLs × Average URL size

If 10 million URLs are "active" (clicked in last 24h):
10M × 600 bytes ≈ 6 GB

With Redis overhead: ~10-15 GB

This easily fits in a single Redis instance. For larger scale, use Redis Cluster.

4.4 Redirect Flow Deep Dive¶

The redirect is the most critical path. It must be fast and reliable.

301 vs 302 Redirects¶

Status Code	Type	Browser Caches?	Analytics Impact
301	Permanent	Yes (forever)	Loses repeat clicks
302	Temporary	Usually no	Captures all clicks
307	Temporary	No	Captures all clicks

Trade-off: - 301: Better for SEO (passes link equity), but browsers cache it, so you lose analytics for repeat visits - 302: Every click hits your server, enabling accurate analytics

Recommendation: Use 302 for most use cases. Use 301 only if SEO is critical and analytics aren't.

Complete Redirect Flow¶

sequenceDiagram participant Browser participant LB as Load Balancer participant App as Redirect Service participant Cache as Redis participant DB as PostgreSQL participant Queue as Kafka Browser->>LB: GET /aB3xYz1 LB->>App: Forward request App->>Cache: GET url:aB3xYz1 alt Cache Hit Cache-->>App: "https://example.com/..." else Cache Miss App->>DB: SELECT long_url WHERE short_code = 'aB3xYz1' DB-->>App: "https://example.com/..." App->>Cache: SET url:aB3xYz1 (async) end App->>Queue: Log click event (async, non-blocking) App-->>Browser: 302 Redirect → Location: https://example.com/... Browser->>Browser: Follow redirect to destination

Implementation¶

from fastapi import FastAPI, Response, HTTPException
from fastapi.responses import RedirectResponse
import asyncio

app = FastAPI()

@app.get("/{short_code}")
async def redirect(short_code: str, request: Request):
    # Validate short code format
    if not is_valid_short_code(short_code):
        raise HTTPException(status_code=400, detail="Invalid short code")

    # Get long URL
    try:
        long_url = await get_long_url(short_code)
    except NotFoundError:
        raise HTTPException(status_code=404, detail="URL not found")

    # Log click asynchronously (don't wait)
    asyncio.create_task(log_click(
        short_code=short_code,
        ip=request.client.host,
        user_agent=request.headers.get("user-agent"),
        referrer=request.headers.get("referer")
    ))

    # Return redirect
    return RedirectResponse(
        url=long_url,
        status_code=302
    )

async def log_click(short_code: str, ip: str, user_agent: str, referrer: str):
    """Send click event to Kafka for async processing."""
    event = {
        "short_code": short_code,
        "timestamp": datetime.utcnow().isoformat(),
        "ip": ip,
        "user_agent": user_agent,
        "referrer": referrer
    }
    await kafka_producer.send("click-events", value=event)

4.5 Analytics System¶

Analytics must not slow down redirects. Process them asynchronously.

Architecture¶

flowchart LR Redirect[Redirect Service] -->|async| Kafka[Kafka] Kafka --> Worker1[Analytics Worker] Kafka --> Worker2[Analytics Worker] Worker1 --> ClickHouse[(ClickHouse)] Worker2 --> ClickHouse ClickHouse --> Dashboard[Analytics Dashboard]

Click Event Schema¶

{
    "short_code": "aB3xYz1",
    "timestamp": "2024-01-15T10:30:00Z",
    "ip": "203.0.113.50",
    "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0...",
    "referrer": "https://twitter.com/user/status/123",
    "country": "US",
    "city": "San Francisco",
    "device_type": "mobile",
    "browser": "Safari",
    "os": "iOS"
}

Analytics Worker¶

from kafka import KafkaConsumer
import geoip2.database

class AnalyticsWorker:
    def __init__(self):
        self.consumer = KafkaConsumer(
            'click-events',
            group_id='analytics-workers',
            bootstrap_servers=['kafka:9092'],
            value_deserializer=lambda m: json.loads(m.decode())
        )
        self.geoip = geoip2.database.Reader('GeoLite2-City.mmdb')
        self.batch = []
        self.batch_size = 1000

    def run(self):
        for message in self.consumer:
            event = message.value

            # Enrich with geo data
            try:
                geo = self.geoip.city(event['ip'])
                event['country'] = geo.country.iso_code
                event['city'] = geo.city.name
            except:
                event['country'] = 'UNKNOWN'

            # Parse user agent
            ua = user_agents.parse(event['user_agent'])
            event['device_type'] = 'mobile' if ua.is_mobile else 'desktop'
            event['browser'] = ua.browser.family
            event['os'] = ua.os.family

            self.batch.append(event)

            if len(self.batch) >= self.batch_size:
                self.flush_batch()

    def flush_batch(self):
        # Batch insert to ClickHouse
        clickhouse.execute(
            "INSERT INTO clicks VALUES",
            self.batch
        )
        self.batch = []

Why ClickHouse for Analytics?¶

ClickHouse is a column-oriented database optimized for analytical queries.

Traditional DB (PostgreSQL):

SELECT COUNT(*) FROM clicks WHERE short_code = 'aB3xYz1' AND clicked_at > '2024-01-01';
-- Scans all columns of matching rows
-- Slow for billions of rows

ClickHouse: - Stores columns separately - Compresses similar data efficiently - Processes analytical queries 100-1000x faster

-- Fast aggregations
SELECT 
    toDate(clicked_at) as day,
    country,
    COUNT(*) as clicks
FROM clicks
WHERE short_code = 'aB3xYz1'
GROUP BY day, country
ORDER BY day;

Step 5: Scaling & Production¶

5.1 Scaling Strategies¶

Scaling the Read Path (Redirects)¶

Redirects are the hot path. Here's how to scale:

Bottleneck	Solution
Single server	Add more app servers behind load balancer
Database reads	Add caching (95%+ hit rate)
Cache size	Use Redis Cluster (shard data)
Single DB	Add read replicas
Geographic latency	Deploy in multiple regions

Scaling the Write Path (Shortening)¶

Bottleneck	Solution
ID generation	Distributed ID generator (Snowflake)
Database writes	Connection pooling, write batching
Single DB	Shard by short_code hash

Database Sharding¶

At extreme scale (billions of URLs), shard the database:

def get_shard(short_code: str) -> int:
    """Determine which shard holds this short code."""
    hash_value = hash(short_code)
    return hash_value % NUM_SHARDS

# Example with 4 shards
# short_code "aB3xYz1" → Shard 2
# short_code "xY7zT9q" → Shard 0

Sharding architecture:

flowchart TD App[Application] --> Router[Shard Router] Router --> S0[(Shard 0)] Router --> S1[(Shard 1)] Router --> S2[(Shard 2)] Router --> S3[(Shard 3)]

Challenges: - Cross-shard queries are complex - Rebalancing shards is painful - Need a shard mapping service

When to shard: Only after you've exhausted vertical scaling, read replicas, and caching. Most URL shorteners never need sharding.

5.2 High Availability and Fault Tolerance¶

Failure Scenarios¶

Component	Failure Impact	Mitigation
App server	Some requests fail	Load balancer health checks, multiple instances
Load balancer	Total outage	Redundant LBs, DNS failover
Redis cache	Slower redirects	Fall back to DB, cache warmup on recovery
PostgreSQL	Can't read/write	Multi-AZ deployment, automatic failover
Kafka	Analytics delayed	Multi-broker cluster, data replicated

Database High Availability¶

PostgreSQL HA setup: - Primary handles writes - Synchronous standby for failover (RPO = 0) - Async replicas for read scaling - Use PgBouncer for connection pooling

Cache Thundering Herd¶

When a popular URL's cache expires, thousands of requests simultaneously hit the database.

Solution 1: Distributed Locking

async def get_long_url_with_lock(short_code: str) -> str:
    # Try cache
    cached = cache.get(short_code)
    if cached:
        return cached

    lock_key = f"lock:{short_code}"

    # Try to acquire lock
    if redis.set(lock_key, "1", nx=True, ex=10):
        try:
            # We have the lock - fetch from DB
            url = db.query("SELECT long_url FROM urls WHERE short_code = %s", short_code)
            cache.set(short_code, url)
            return url
        finally:
            redis.delete(lock_key)
    else:
        # Someone else is fetching - wait and retry
        await asyncio.sleep(0.1)
        return await get_long_url_with_lock(short_code)

Solution 2: Probabilistic Early Refresh

Refresh cache before it expires, randomly:

def get_with_early_refresh(short_code: str) -> str:
    cached, ttl = cache.get_with_ttl(short_code)

    if cached:
        # Probabilistically refresh if TTL is low
        if ttl < 300 and random.random() < 0.1:  # 10% chance
            asyncio.create_task(refresh_cache(short_code))
        return cached

    return fetch_and_cache(short_code)

5.3 Security Considerations¶

URL Validation¶

Don't blindly redirect to any URL:

from urllib.parse import urlparse
import requests

BLOCKED_SCHEMES = {'javascript', 'data', 'vbscript'}
BLOCKED_DOMAINS = {'malware.example.com', 'phishing.example.com'}

def validate_url(url: str) -> bool:
    """Validate URL is safe to redirect to."""
    try:
        parsed = urlparse(url)

        # Check scheme
        if parsed.scheme not in ('http', 'https'):
            return False

        if parsed.scheme in BLOCKED_SCHEMES:
            return False

        # Check against blocklist
        if parsed.netloc in BLOCKED_DOMAINS:
            return False

        # Check against Google Safe Browsing API (optional)
        if is_malicious(url):
            return False

        return True
    except:
        return False

Rate Limiting¶

Prevent abuse:

from redis import Redis
import time

class RateLimiter:
    def __init__(self, redis: Redis, max_requests: int, window_seconds: int):
        self.redis = redis
        self.max_requests = max_requests
        self.window = window_seconds

    def is_allowed(self, key: str) -> bool:
        """Token bucket rate limiting."""
        now = int(time.time())
        window_key = f"ratelimit:{key}:{now // self.window}"

        count = self.redis.incr(window_key)
        if count == 1:
            self.redis.expire(window_key, self.window)

        return count <= self.max_requests

# Usage
limiter = RateLimiter(redis, max_requests=100, window_seconds=3600)

@app.post("/api/v1/urls")
async def create_url(request: Request):
    user_id = get_user_id(request)

    if not limiter.is_allowed(f"create:{user_id}"):
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

    # ... create URL

Preventing Enumeration¶

If short codes are sequential (1, 2, 3...), attackers can enumerate all URLs.

Mitigations: 1. Use longer codes (7+ chars) 2. Obfuscate IDs (XOR with secret) 3. Monitor and block bulk access patterns 4. Add CAPTCHA for suspicious requests

5.4 Multi-Language Implementations¶

URL Shortening Service¶

JavaGo

import java.security.SecureRandom;
import java.time.Instant;
import java.util.Optional;

public class UrlShortenerService {

    private static final String BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    private static final int SHORT_CODE_LENGTH = 7;
    private static final SecureRandom RANDOM = new SecureRandom();

    private final UrlRepository urlRepository;
    private final CacheService cache;
    private final SnowflakeIdGenerator idGenerator;

    public UrlShortenerService(UrlRepository urlRepository, CacheService cache,
                                SnowflakeIdGenerator idGenerator) {
        this.urlRepository = urlRepository;
        this.cache = cache;
        this.idGenerator = idGenerator;
    }

    public record ShortenedUrl(String shortCode, String longUrl, Instant createdAt, Instant expiresAt) {}

    public ShortenedUrl shorten(String longUrl, String userId, Instant expiresAt) {
        // check if URL already shortened
        Optional<ShortenedUrl> existing = urlRepository.findByLongUrl(longUrl);
        if (existing.isPresent()) return existing.get();

        // generate short code via Snowflake ID → Base62
        long id = idGenerator.nextId();
        String shortCode = toBase62(id);

        ShortenedUrl result = new ShortenedUrl(shortCode, longUrl, Instant.now(), expiresAt);
        urlRepository.save(result, userId);
        cache.put("url:" + shortCode, longUrl, expiresAt);

        return result;
    }

    public Optional<String> resolve(String shortCode) {
        // check cache first
        String cached = cache.get("url:" + shortCode);
        if (cached != null) return Optional.of(cached);

        // cache miss → query database
        Optional<ShortenedUrl> fromDb = urlRepository.findByShortCode(shortCode);
        fromDb.ifPresent(url -> cache.put("url:" + shortCode, url.longUrl(), url.expiresAt()));

        return fromDb.map(ShortenedUrl::longUrl);
    }

    private String toBase62(long value) {
        StringBuilder sb = new StringBuilder();
        value = Math.abs(value);
        while (sb.length() < SHORT_CODE_LENGTH) {
            sb.append(BASE62.charAt((int) (value % 62)));
            value /= 62;
        }
        return sb.reverse().toString();
    }
}

package shortener

import (
    "context"
    "crypto/rand"
    "fmt"
    "math/big"
    "time"
)

const base62Chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
const codeLength = 7

type ShortenedURL struct {
    ShortCode string
    LongURL   string
    UserID    string
    CreatedAt time.Time
    ExpiresAt time.Time
}

type URLRepository interface {
    FindByShortCode(ctx context.Context, code string) (*ShortenedURL, error)
    FindByLongURL(ctx context.Context, longURL string) (*ShortenedURL, error)
    Save(ctx context.Context, url *ShortenedURL) error
}

type Cache interface {
    Get(ctx context.Context, key string) (string, error)
    Set(ctx context.Context, key, value string, ttl time.Duration) error
}

type Service struct {
    repo  URLRepository
    cache Cache
}

func NewService(repo URLRepository, cache Cache) *Service {
    return &Service{repo: repo, cache: cache}
}

func (s *Service) Shorten(ctx context.Context, longURL, userID string,
    expiresAt time.Time) (*ShortenedURL, error) {

    // check if already shortened
    existing, err := s.repo.FindByLongURL(ctx, longURL)
    if err == nil && existing != nil {
        return existing, nil
    }

    code, err := generateBase62Code(codeLength)
    if err != nil {
        return nil, fmt.Errorf("generating code: %w", err)
    }

    url := &ShortenedURL{
        ShortCode: code,
        LongURL:   longURL,
        UserID:    userID,
        CreatedAt: time.Now(),
        ExpiresAt: expiresAt,
    }

    if err := s.repo.Save(ctx, url); err != nil {
        return nil, fmt.Errorf("saving URL: %w", err)
    }

    ttl := time.Until(expiresAt)
    _ = s.cache.Set(ctx, "url:"+code, longURL, ttl)

    return url, nil
}

func (s *Service) Resolve(ctx context.Context, shortCode string) (string, error) {
    // cache first
    if cached, err := s.cache.Get(ctx, "url:"+shortCode); err == nil && cached != "" {
        return cached, nil
    }

    url, err := s.repo.FindByShortCode(ctx, shortCode)
    if err != nil {
        return "", err
    }
    if url == nil {
        return "", fmt.Errorf("short code not found: %s", shortCode)
    }

    ttl := time.Until(url.ExpiresAt)
    _ = s.cache.Set(ctx, "url:"+shortCode, url.LongURL, ttl)

    return url.LongURL, nil
}

func generateBase62Code(length int) (string, error) {
    result := make([]byte, length)
    max := big.NewInt(int64(len(base62Chars)))
    for i := range result {
        n, err := rand.Int(rand.Reader, max)
        if err != nil {
            return "", err
        }
        result[i] = base62Chars[n.Int64()]
    }
    return string(result), nil
}

Redirect Controller¶

import jakarta.servlet.http.HttpServletRequest;
import java.time.Instant;
import java.util.Map;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RedirectController {
    private final UrlShortenerService service;
    private final AnalyticsPublisher analytics;

    @GetMapping("/{shortCode}")
    public ResponseEntity<Void> redirect(@PathVariable String shortCode,
                                          HttpServletRequest request) {
        return service.resolve(shortCode)
            .map(longUrl -> {
                analytics.publishClickEvent(shortCode, request.getRemoteAddr(),
                    request.getHeader("User-Agent"), Instant.now());
                return ResponseEntity.status(HttpStatus.FOUND)
                    .header("Location", longUrl)
                    .header("Cache-Control", "private, max-age=90")
                    .<Void>build();
            })
            .orElse(ResponseEntity.notFound().build());
    }

    @PostMapping("/api/v1/shorten")
    public ResponseEntity<Map<String, String>> shorten(@RequestBody ShortenRequest req) {
        var result = service.shorten(req.url(), req.userId(), req.expiresAt());
        return ResponseEntity.status(HttpStatus.CREATED).body(Map.of(
            "short_url", "https://short.ly/" + result.shortCode(),
            "short_code", result.shortCode(),
            "expires_at", result.expiresAt().toString()
        ));
    }
}

5.5 Monitoring and Alerting¶

Key Metrics¶

Category	Metric	Alert Threshold
Latency	P99 redirect time	> 100ms
Availability	Redirect success rate	< 99.9%
Throughput	Redirects per second	Depends on baseline
Cache	Hit ratio	< 90%
Database	Connection pool usage	> 80%
Errors	5xx error rate	> 0.1%

Logging¶

{
    "timestamp": "2024-01-15T10:30:00Z",
    "level": "INFO",
    "service": "redirect",
    "short_code": "aB3xYz1",
    "long_url": "https://example.com/...",
    "latency_ms": 3.5,
    "cache_hit": true,
    "user_agent": "...",
    "ip": "..."
}

Interview Checklist¶

Use this to structure your answer:

Sample Interview Dialogue¶

Interviewer: "Design a URL shortener like TinyURL."

You: "Great question! Before I dive in, let me ask a few questions to understand the scope.

First, what's the expected scale? Are we building for a startup with thousands of URLs or a Twitter-scale system with billions?

Second, what's the read-to-write ratio? I'd guess most URL shorteners see many more clicks than creates.

Third, do we need analytics—click counting, referrer tracking, that sort of thing?"

Interviewer: "Let's say 100 million URLs created per month, 100:1 read/write ratio, and yes, we want basic analytics."

You: "Got it. So we're looking at about 40 writes per second, 4,000 reads per second, and we need an analytics pipeline that doesn't slow down redirects.

Let me start with the high-level architecture. We have two main flows: shortening URLs and redirecting. For shortening, a request comes in, we generate a unique short code—I'll explain how in a moment—store the mapping in a database, and return the short URL.

For redirects, we want to be extremely fast. I'd put a Redis cache in front of the database. With a 100:1 read ratio and proper caching, we should achieve 95%+ cache hit rates, meaning most redirects complete in under 5 milliseconds.

For short code generation, the key challenge is generating unique IDs across multiple servers without coordination delays. I'd recommend a Snowflake-style ID generator..."

Summary¶

Component	Choice	Rationale
ID Generation	Snowflake IDs	Distributed, no coordination, time-sorted
Encoding	Base62 (7 chars)	URL-safe, 3.5T combinations
Database	PostgreSQL	Strong consistency, familiar, scales well
Cache	Redis	Sub-millisecond lookups, simple KV model
Analytics Queue	Kafka	High throughput, durable, replayable
Analytics DB	ClickHouse	Optimized for aggregations
Redirect Code	302	Enables accurate analytics

The URL shortener demonstrates core distributed systems concepts: unique ID generation, caching, database design, and async processing. Master this design, and you'll have a solid foundation for more complex systems.