Design a Voting/Polling System¶

What We're Building¶

A voting system lets users create polls and cast votes. While the concept is simple, building a reliable voting system at scale presents fascinating distributed systems challenges.

Types of Voting Systems¶

Type	Examples	Key Challenges
Casual polls	Twitter polls, Strawpoll	High throughput, real-time results
Content voting	Reddit upvotes, Stack Overflow	Preventing gaming, vote velocity
Corporate elections	Board votes, shareholder meetings	Auditability, strict access control
Government elections	Presidential elections	Security, anonymity, auditing

The technical requirements vary dramatically: - A Twitter poll needs to handle millions of votes per second with real-time results - A corporate election needs ironclad auditability with restricted access - A government election needs cryptographic verification while preserving anonymity

Core Challenge¶

The fundamental problem in voting systems is: How do you ensure each person votes exactly once while handling massive concurrent traffic?

This seems simple but becomes complex when: - Thousands of people vote in the same second - Network requests can be duplicated (retries, timeouts) - Databases can have race conditions - Malicious users try to vote multiple times

Step 1: Requirements Clarification¶

Questions to Ask¶

Question	Impact on Design
What type of voting? (polls, elections, upvotes)	Determines security and anonymity needs
Who can vote? (public, authenticated, specific groups)	Authentication complexity
How strict is "one vote per person"?	Duplicate prevention strategy
Real-time results or only after voting ends?	Caching and consistency approach
Can votes be changed after casting?	Complicates uniqueness guarantees
What's the expected scale?	Infrastructure sizing

Our Design: General-Purpose Polling Platform¶

Let's design a system similar to Strawpoll or Twitter Polls:

Functional Requirements:

Feature	Priority	Description
User registration/login	Must have	Identify voters
Create polls	Must have	Title, options, duration
Cast one vote per poll	Must have	Core functionality
View results	Must have	After poll closes or real-time
Custom poll settings	Nice to have	Anonymous, allow result viewing
Poll expiration	Nice to have	Auto-close after deadline

Non-Functional Requirements:

Requirement	Target	Rationale
Availability	99.99%	People expect voting to work
Latency	< 500ms for voting	Responsive UX
Accuracy	Zero lost/duplicate votes	Data integrity
Throughput	10,000+ votes/sec (peak)	Handle viral polls
Consistency	No duplicate votes ever	System correctness

Technology Selection & Tradeoffs¶

Choosing storage, aggregation, deduplication, and rate limiting is not “pick the fastest”—each option trades correctness, ops cost, and failure behavior. Interviewers want to hear why a choice fits this workload (burst writes, exactly-once semantics per user/poll).

Vote storage (PostgreSQL vs Redis vs DynamoDB vs Cassandra)¶

Store	Strengths	Weaknesses	Fit for high-burst vote writes
PostgreSQL	ACID, `UNIQUE` constraints, rich queries, familiar ops	Single-primary write path; needs pooling/sharding at extreme scale	Strong default when dedup must be authoritative; use async ingestion + batching to smooth bursts
Redis	Extremely fast writes	Not a durable system of record; persistence modes add complexity; cluster split-brain risks	Cache + fast dedup flag, not sole source of truth for “did this user vote?” in production
DynamoDB	Managed, horizontal scale, predictable at high TPS	Conditional writes / idempotency patterns needed; hot partitions on `poll_id`	Good if you partition keys carefully (e.g. `user_id`+`poll_id` composite, avoid mega-hot polls on one key without sharding strategy)
Cassandra	Write-optimized, wide-column	No cheap cross-partition uniqueness—duplicate prevention is harder; LWT adds latency	Risky for strict one-vote-per-user unless you model uniqueness in application layer or use strong patterns per partition

Why it matters: Duplicate votes are a consistency problem. The durable store must enforce (user_id, poll_id) uniqueness. Redis/Dynamo/Cassandra can participate, but the guarantee usually lives in PostgreSQL (or Dynamo conditional writes with careful key design).

Count aggregation (real-time counter vs batch vs CRDT)¶

Approach	Pros	Cons	When to use
Real-time counter (Redis `HINCRBY`, DB column++)	Low read latency; great UX for live results	Lost increments if not paired with durable write/replay; ordering issues under retries	Live polls after vote is durably accepted or idempotently applied
Batch aggregation (Spark/Flink/scheduled SQL)	Simple mental model; good for analytics	High lag; poor for “live” leaderboard	Reporting, reconciliation, historical analytics
CRDT / distributed counters	Nice for highly partitioned eventually consistent counts	Overkill for most polls; still need authoritative dedup elsewhere	Specialized multi-region display layers—not for replacing vote integrity

Why it matters: Counters can be eventually consistent for display; deduplication cannot. Often: Kafka + worker updates Redis for speed, DB remains source of truth, periodic reconciliation fixes drift.

Deduplication (DB constraint vs Redis set vs Bloom filter)¶

Mechanism	Behavior	False negatives?	False positives?	Notes
DB `UNIQUE (user_id, poll_id)`	Strong guarantee	No	No	Source of truth; handles races via transaction/constraint violation
Redis `SET` / `SETNX`	Fast “already voted?”	No (if hit)	No	Cache; can be wrong on eviction/crash unless TTL and recovery sync from DB
Bloom filter	Memory-efficient membership test	No	Yes (may say “maybe voted”)	Use only as prefilter—“probably voted, check DB”—never sole reject path

Why it matters: Retries and concurrent tabs make idempotency keys and unique constraints non-negotiable; Redis accelerates happy path; Bloom filters save RAM at cost of rare extra DB lookups.

Rate limiting (token bucket vs sliding window; in-memory vs distributed)¶

Algorithm	Burst behavior	Steady-state fairness	Implementation notes
Token bucket	Allows controlled bursts (tokens refill)	Good for APIs that tolerate short spikes	Common in gateways; bucket per user/IP
Sliding window	Stricter in a rolling interval	Fairer for “max N per minute” abuse prevention	Often implemented with Redis sorted sets or approximate counters
Fixed window	Simple	Spike at window boundaries	Easier but uneven; mention as naive option

Deployment	Pros	Cons
In-memory (per gateway instance)	Zero extra RTT; very fast	Wrong under load balancing unless sticky sessions or sync—counts split across nodes
Distributed (Redis/Redis Cluster, Envoy rate limit service)	Consistent limits across instances	Extra hop; must handle Redis failure (fail open vs closed)

Why it matters: Rate limits protect availability and reduce fraud; they are not a substitute for vote dedup (different problem).

Our choice¶

Vote durability + dedup: PostgreSQL with UNIQUE (user_id, poll_id) as the final arbiter; Redis for fast duplicate checks and live counts; Kafka (or similar) to absorb bursts and decouple API from DB write pressure.
Counts: Real-time Redis counters updated after successful durable/idempotent processing, plus batch or periodic reconciliation against PostgreSQL for accuracy.
Dedup layers: Redis SETNX / key per (poll, user) + idempotent consumer + DB constraint; optional Bloom only if memory is constrained and extra DB reads are acceptable.
Rate limiting: Distributed sliding window or token bucket in Redis (or API gateway plugin backed by Redis) for consistent limits; tune fail-open vs fail-closed for voting (often fail-open for availability with abuse handled elsewhere).

Rationale in one line: Optimize the write path for bursts without sacrificing exactly-once vote semantics; treat display counts as eventually consistent with a reconciling truth in PostgreSQL.

CAP Theorem Analysis¶

CAP is a lens, not a prescription: in practice we choose per operation consistency vs availability, often with latency (PACELC) as the real tie-breaker.

The tension¶

Vote deduplication needs strong consistency: a user must not record two successful votes for the same poll.
Peak voting demands high availability: users expect the button to work during spikes; regional outages should degrade gracefully.
Leaderboards / live counts can often tolerate eventual consistency if bounds are stated (seconds of lag).

Per-operation posture¶

Surface	Typical choice	Rationale
Vote submission (acceptance)	CP on the decision “have we recorded this vote?”	Duplicate prevention is a correctness invariant; prefer failing or retrying over double-counting
Vote counts / bar charts	AP / eventual	Reads scale with replicas and caches; small lag is acceptable if disclosed
Leaderboard (if ranked by votes)	AP	Same as counts; tie-breaks may use snapshot timestamps

Why not “all AP” or “all CP”?¶

All AP: You might show snappy results while risking duplicate acceptance unless another layer enforces uniqueness—dangerous.
All CP: You maximize correctness but sacrifice availability under partitions (e.g. strict quorum everywhere), hurting peak traffic.

Diagram: logical partitions of consistency¶

flowchart TB subgraph CP_Path["CP-oriented: vote integrity"] VS[Vote submission path] UC[(DB UNIQUE / conditional write)] VS --> UC end subgraph AP_Path["AP-oriented: read scaling"] RC[Results / counts read path] CACHE[(Redis / read replicas)] RC --> CACHE end subgraph Eventual["Eventual: display freshness"] LB[Leaderboard / live ticker] Q[Stream / worker updates] LB --> Q Q --> CACHE end UC -.->|"reconcile / async projection"| CACHE

Interview takeaway: Say clearly: dedup is a consistency problem; fan-out reads and UI aggregates are availability/throughput problems. Separate paths, reconcile.

SLA and SLO Definitions¶

SLAs are contracts with users or customers; SLOs are internal targets used to steer engineering; error budgets quantify how much unreliability you can spend before slowing feature work.

SLOs (example targets for a viral-ready poll platform)¶

SLO	Target	Measurement window	Notes
Vote submission latency	p99 < 500 ms, p50 < 150 ms	Rolling 30 days	End-to-end from edge to “accepted” response (202 or 200 per design)
Vote acceptance rate	≥ 99.95% of valid authenticated requests succeed	Rolling 30 days	Exclude client errors (4xx); include 5xx and timeouts
Count accuracy (lag)	≤ 5 s behind authoritative processing for live results (p99)	Rolling 7 days	Compare Redis/display vs DB reconciliation job
Availability during peak	≥ 99.99% for vote API during declared peak events	Per event window	May use stricter internal target than global monthly SLA

SLI (what we measure)¶

Latency: API gateway or service histograms for POST /polls/{id}/votes.
Success: ratio of 2xx to total minus 4xx validation failures.
Accuracy: scheduled job comparing SUM(votes) vs poll_options.vote_count / Redis hashes.
Availability: synthetic probes + success rate from real traffic.

Error budget policy¶

Monthly budget (example)	If burn is fast…
99.9% availability ≈ 43 minutes downtime/month	Freeze non-critical releases; prioritize hotfix, scaling, load tests
99.95% ≈ 22 minutes	Same, plus incident review for vote path only
Latency SLO miss (p99 > 500 ms sustained)	Treat like availability burn: scale workers, Kafka consumers, DB pool; review sync hot paths

Policy: Consuming >50% of error budget in <25% of the window triggers a freeze on risky changes and an incident review. Voting systems: prioritize acceptance rate and dedup correctness over cosmetic features when budgets burn.

API Design¶

REST-shaped JSON APIs are easy to reason about in interviews; emphasize idempotency, clear errors, and async acceptance where queues are used.

Conventions¶

Auth: Authorization: Bearer <JWT> for authenticated routes.
Idempotency: Idempotency-Key header on POST vote (and optional on poll create) for safe retries.
Async vote path: 202 Accepted + operation_id or vote_id when processing is queued; GET status endpoint for polling clients that need it.

Endpoints overview¶

Method	Path	Purpose
`POST`	`/v1/polls`	Create poll
`GET`	`/v1/polls/{poll_id}`	Poll metadata + options
`PATCH`	`/v1/polls/{poll_id}`	Update poll (owner; restricted fields)
`POST`	`/v1/polls/{poll_id}/votes`	Submit a vote
`GET`	`/v1/polls/{poll_id}/results`	Aggregated counts / percentages
`GET`	`/v1/polls/{poll_id}/votes/me`	Check whether current user voted (and option if allowed)
`GET`	`/v1/polls/{poll_id}/votes/{request_id}`	Optional: status of async vote by client request id

Submit a vote¶

POST /v1/polls/{poll_id}/votes

Request headers: Authorization, Idempotency-Key: <uuid>

Request body:

{
  "option_id": "550e8400-e29b-41d4-a716-446655440000",
  "client_meta": {
    "app_version": "1.4.2",
    "platform": "ios"
  }
}

Response 202 Accepted (queued processing):

{
  "status": "accepted",
  "poll_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "request_id": "a3bb189e-8bf9-3888-b889-317e3f6b8c4d",
  "message": "Vote is being processed"
}

Response 200 OK (synchronous path, if implemented):

{
  "status": "recorded",
  "vote_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "poll_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "option_id": "550e8400-e29b-41d4-a716-446655440000",
  "recorded_at": "2026-04-05T12:00:01.234Z"
}

Errors (examples):

{
  "error": {
    "code": "ALREADY_VOTED",
    "message": "You have already voted on this poll",
    "poll_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7"
  }
}

Get results / counts¶

GET /v1/polls/{poll_id}/results

Optional query: ?tally=final|live (if product distinguishes).

Response 200 OK:

{
  "poll_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "status": "active",
  "total_votes": 12847,
  "as_of": "2026-04-05T12:00:05.000Z",
  "options": [
    {
      "option_id": "550e8400-e29b-41d4-a716-446655440000",
      "label": "Option A",
      "votes": 7201,
      "percentage": 56.1
    },
    {
      "option_id": "6f9619ff-8b86-d011-b42d-00cf4fc964ff",
      "label": "Option B",
      "votes": 5646,
      "percentage": 43.9
    }
  ]
}

Why as_of: Signals eventual freshness for interview discussion (cached vs authoritative).

Create / manage polls¶

Create: POST /v1/polls

{
  "title": "Best programming language in 2026?",
  "description": "For backend services at scale.",
  "options": [
    { "text": "Go", "display_order": 0 },
    { "text": "Rust", "display_order": 1 },
    { "text": "TypeScript", "display_order": 2 }
  ],
  "start_time": "2026-04-05T10:00:00Z",
  "end_time": "2026-04-12T10:00:00Z",
  "settings": {
    "show_results_during_voting": true,
    "allow_anonymous": false
  }
}

Response 201 Created:

{
  "poll_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "status": "draft",
  "created_at": "2026-04-05T09:55:00.000Z",
  "manage_url": "https://api.example.com/v1/polls/7c9e6679-7425-40de-944b-e07fc1f90ae7"
}

Update (owner): PATCH /v1/polls/{poll_id}

{
  "status": "active",
  "title": "Best programming language in 2026? (edited)"
}

Response 200 OK: updated poll envelope (omit for brevity in interview if time-boxed).

Check vote status¶

GET /v1/polls/{poll_id}/votes/me

Response 200 OK (has voted):

{
  "has_voted": true,
  "option_id": "550e8400-e29b-41d4-a716-446655440000",
  "voted_at": "2026-04-05T11:59:58.000Z"
}

Response 200 OK (not voted):

{
  "has_voted": false
}

Optional async status: GET /v1/polls/{poll_id}/votes/requests/{request_id}

{
  "request_id": "a3bb189e-8bf9-3888-b889-317e3f6b8c4d",
  "state": "completed",
  "vote_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
}

state may be pending, completed, or failed with error object—useful when linking API design to Kafka backpressure and retries in discussion.

Step 2: Back-of-Envelope Estimation¶

Let's size for a Twitter-scale polling system.

Traffic Estimation¶

Assumptions:
- 10 million daily active users
- 1% create polls daily = 100,000 polls/day
- Average poll gets 100 votes = 10 million votes/day
- Viral poll gets 10 million votes

Average load:
- 10M votes / 86,400 seconds ≈ 115 votes/second

Peak load (viral poll):
- 10 million votes in 1 hour = 2,800 votes/second
- With burst: 10,000 votes/second

Storage Estimation¶

Per vote:
- vote_id: 8 bytes
- poll_id: 8 bytes  
- user_id: 8 bytes
- option_id: 8 bytes
- timestamp: 8 bytes
- metadata: 20 bytes
Total: ~60 bytes

Daily: 10M votes × 60 bytes = 600 MB
Yearly: 220 GB
5 years: ~1 TB

Per poll:
- Metadata: ~500 bytes
- Options: 100 bytes × 5 = 500 bytes
Daily: 100K polls × 1KB = 100 MB

Database Operations¶

Writes:
- Peak: 10,000 votes/sec (INSERT)
- Each vote requires unique constraint check

Reads:
- Poll viewing: 100x vote rate = 1 million reads/sec (cached)
- Results aggregation: Heavy queries, should be pre-computed

Step 3: High-Level Architecture¶

System Overview¶

flowchart TB subgraph Clients["Client Layer"] Web[Web App] Mobile[Mobile App] API[API Clients] end subgraph Edge["Edge Layer"] CDN[CDN] LB[Load Balancer] end subgraph Gateway["API Layer"] APIGw[API Gateway<br/>Rate Limiting + Auth] end subgraph Services["Application Layer"] UserSvc[User Service] PollSvc[Poll Service] VoteSvc[Vote Service] ResultSvc[Results Service] end subgraph Async["Async Processing"] Queue[Kafka] VoteWorker[Vote Processor] ResultWorker[Results Aggregator] end subgraph Data["Data Layer"] Cache[(Redis)] PollDB[(Poll DB)] VoteDB[(Vote DB)] AuditLog[(Audit Log)] end Web --> CDN Mobile --> LB API --> LB CDN --> LB LB --> APIGw APIGw --> UserSvc APIGw --> PollSvc APIGw --> VoteSvc APIGw --> ResultSvc VoteSvc --> Queue VoteSvc --> Cache Queue --> VoteWorker VoteWorker --> VoteDB VoteWorker --> Cache VoteWorker --> AuditLog ResultSvc --> Cache PollSvc --> PollDB PollSvc --> Cache

Why Async Vote Processing?¶

At 10,000 votes/second, synchronous database writes would overwhelm PostgreSQL.

Synchronous approach (won't scale):

User → API → Database → Response
       └── 10,000 writes/sec to one DB

Asynchronous approach (scales):

User → API → Message Queue → Response (202 Accepted)
             └── Workers → Database (batched, controlled)

Benefits: - Backpressure handling: Queue buffers traffic spikes - Reliability: Messages persist even if workers crash - Scalability: Add more workers for more throughput - Latency: User gets fast response, actual processing happens async

Step 4: Deep Dive¶

4.1 The Core Challenge - Preventing Duplicate Votes¶

This is the most critical part of the design. A user must never be able to vote twice on the same poll.

Why It's Hard¶

Consider this scenario: 1. User clicks "Vote" button 2. Request times out (network issue) 3. Client retries 4. Original request eventually succeeds 5. Retry also succeeds 6. User has voted twice!

Or this race condition:

Thread 1: Check if user voted → No
Thread 2: Check if user voted → No
Thread 1: Insert vote → Success
Thread 2: Insert vote → Success (duplicate!)

Defense in Depth Strategy¶

We use multiple layers of protection:

flowchart TD Vote[Vote Request] --> L1{"Layer 1:<br/>Redis Check"} L1 -->|Already voted| Reject1[Reject] L1 -->|Not found| L2["Layer 2:<br/>Kafka Dedup"] L2 --> L3{"Layer 3:<br/>DB Constraint"} L3 -->|Constraint violation| Reject2[Log & Ignore] L3 -->|Success| Store[Vote Stored] Store --> UpdateCache[Update Redis]

Layer 1: Fast Redis Check (API Level)¶

Before accepting the vote, check if the user already voted:

class VoteService:
    def __init__(self, redis_client, kafka_producer):
        self.redis = redis_client
        self.kafka = kafka_producer

    async def cast_vote(self, user_id: str, poll_id: str, option_id: str) -> dict:
        # Validate poll is active
        poll = await self.get_poll(poll_id)
        if not poll or poll.status != "active":
            raise PollNotActiveError()

        if not self.is_valid_option(poll, option_id):
            raise InvalidOptionError()

        # Layer 1: Fast duplicate check
        vote_key = f"voted:{poll_id}:{user_id}"
        if self.redis.exists(vote_key):
            raise AlreadyVotedError("You have already voted on this poll")

        # Optimistic: Set Redis flag immediately
        # This provides fast rejection for subsequent requests
        # TTL should be longer than poll duration
        self.redis.setex(vote_key, poll.ttl_seconds, "pending")

        # Enqueue for processing
        await self.kafka.send("votes", {
            "user_id": user_id,
            "poll_id": poll_id,
            "option_id": option_id,
            "timestamp": datetime.utcnow().isoformat(),
            "request_id": generate_request_id()  # For idempotency
        })

        return {"status": "accepted", "message": "Vote is being processed"}

Why this isn't enough:

Redis SET + Kafka SEND isn't atomic
If the server crashes between them, state is inconsistent
Redis might not have the data (cache miss)

Layer 2: Idempotent Message Processing¶

Each vote message has a unique request_id. The worker deduplicates:

class VoteProcessor:
    def __init__(self):
        self.processed_requests = {}  # In-memory for hot path

    async def process_vote(self, message: dict):
        request_id = message["request_id"]
        user_id = message["user_id"]
        poll_id = message["poll_id"]
        option_id = message["option_id"]

        # Idempotency check (in-memory for speed)
        if request_id in self.processed_requests:
            log.info(f"Duplicate request {request_id}, skipping")
            return

        try:
            # Layer 3: Database insert with constraint
            await self.db.execute("""
                INSERT INTO votes (user_id, poll_id, option_id, voted_at)
                VALUES ($1, $2, $3, NOW())
            """, user_id, poll_id, option_id)

            # Mark as processed
            self.processed_requests[request_id] = True

            # Update Redis for fast future checks
            vote_key = f"voted:{poll_id}:{user_id}"
            self.redis.setex(vote_key, 86400 * 7, "confirmed")

            # Update vote count
            await self.increment_vote_count(poll_id, option_id)

        except UniqueViolationError:
            # Already voted - this is expected for retries
            log.info(f"User {user_id} already voted on poll {poll_id}")

        except Exception as e:
            # Unexpected error - let Kafka retry
            log.error(f"Failed to process vote: {e}")
            raise

Layer 3: Database Unique Constraint (The Last Line of Defense)¶

The database constraint guarantees correctness even if all other layers fail:

CREATE TABLE votes (
    vote_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    poll_id UUID NOT NULL REFERENCES polls(poll_id),
    user_id UUID NOT NULL REFERENCES users(user_id),
    option_id UUID NOT NULL REFERENCES poll_options(option_id),
    voted_at TIMESTAMP DEFAULT NOW(),

    -- THE CRITICAL CONSTRAINT
    CONSTRAINT uq_one_vote_per_user_per_poll UNIQUE (user_id, poll_id)
);

-- This index also speeds up "has user voted?" queries
CREATE INDEX idx_votes_user_poll ON votes(user_id, poll_id);

Why database constraints are non-negotiable:

Atomicity: The INSERT either fully succeeds or fully fails
Durability: Once committed, the vote is permanent
Correctness: Even with race conditions, only one row can exist

# This is safe even with concurrent requests:
try:
    db.execute("INSERT INTO votes ...")
except UniqueViolationError:
    # Another thread/process already inserted - that's fine
    pass

4.2 Database Design¶

Schema Design¶

-- Users table
CREATE TABLE users (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(100) UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    is_verified BOOLEAN DEFAULT FALSE
);

-- Polls table
CREATE TABLE polls (
    poll_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    creator_id UUID NOT NULL REFERENCES users(user_id),
    title VARCHAR(255) NOT NULL,
    description TEXT,
    start_time TIMESTAMP NOT NULL,
    end_time TIMESTAMP NOT NULL,
    status VARCHAR(20) DEFAULT 'draft', -- draft, active, closed

    -- Settings
    allow_anonymous BOOLEAN DEFAULT FALSE,
    show_results_during_voting BOOLEAN DEFAULT TRUE,
    allow_multiple_options BOOLEAN DEFAULT FALSE,
    max_votes_per_user INT DEFAULT 1,

    -- Denormalized for quick access
    total_votes INT DEFAULT 0,

    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_polls_status ON polls(status, end_time);
CREATE INDEX idx_polls_creator ON polls(creator_id);

-- Poll options
CREATE TABLE poll_options (
    option_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    poll_id UUID NOT NULL REFERENCES polls(poll_id) ON DELETE CASCADE,
    option_text VARCHAR(500) NOT NULL,
    display_order INT NOT NULL,
    vote_count INT DEFAULT 0,  -- Denormalized for fast results

    UNIQUE (poll_id, display_order)
);

CREATE INDEX idx_options_poll ON poll_options(poll_id);

-- Votes table (critical for integrity)
CREATE TABLE votes (
    vote_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    poll_id UUID NOT NULL,
    user_id UUID NOT NULL,
    option_id UUID NOT NULL,
    voted_at TIMESTAMP DEFAULT NOW(),
    ip_address INET,  -- For abuse detection
    user_agent TEXT,

    CONSTRAINT fk_poll FOREIGN KEY (poll_id) REFERENCES polls(poll_id),
    CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(user_id),
    CONSTRAINT fk_option FOREIGN KEY (option_id) REFERENCES poll_options(option_id),

    -- One vote per user per poll
    CONSTRAINT uq_user_poll UNIQUE (user_id, poll_id)
);

CREATE INDEX idx_votes_poll_option ON votes(poll_id, option_id);
CREATE INDEX idx_votes_user ON votes(user_id);

-- Audit log for compliance
CREATE TABLE vote_audit_log (
    log_id BIGSERIAL PRIMARY KEY,
    vote_id UUID NOT NULL,
    poll_id UUID NOT NULL,
    user_id UUID NOT NULL,
    action VARCHAR(20) NOT NULL,  -- 'cast', 'changed', 'revoked'
    old_option_id UUID,
    new_option_id UUID,
    timestamp TIMESTAMP DEFAULT NOW(),
    ip_address INET,
    request_metadata JSONB
);

CREATE INDEX idx_audit_poll ON vote_audit_log(poll_id, timestamp);

Denormalization for Performance¶

Instead of counting votes on every request:

-- Slow: COUNT on every read
SELECT option_id, COUNT(*) 
FROM votes 
WHERE poll_id = 'xxx' 
GROUP BY option_id;

We maintain counters:

-- Fast: Read denormalized count
SELECT option_id, option_text, vote_count 
FROM poll_options 
WHERE poll_id = 'xxx' 
ORDER BY display_order;

The vote processor updates counters atomically:

async def increment_vote_count(poll_id: str, option_id: str):
    # Atomic increment
    await db.execute("""
        UPDATE poll_options 
        SET vote_count = vote_count + 1 
        WHERE option_id = $1
    """, option_id)

    await db.execute("""
        UPDATE polls 
        SET total_votes = total_votes + 1 
        WHERE poll_id = $1
    """, poll_id)

    # Invalidate cache
    await redis.delete(f"results:{poll_id}")

4.3 Real-Time Results with Redis¶

For polls that show live results, we need sub-second updates.

Redis Data Model¶

# Vote counts per option (Hash)
HSET results:poll123 option_a 1542
HSET results:poll123 option_b 3201
HSET results:poll123 option_c 876

# Increment on vote
HINCRBY results:poll123 option_a 1

# Get all results
HGETALL results:poll123
# Returns: {"option_a": "1542", "option_b": "3201", "option_c": "876"}

# Track who voted (Set)
SADD voters:poll123 user_abc
SISMEMBER voters:poll123 user_abc  # Returns 1 (true)

Results Service¶

class ResultsService:
    def __init__(self, redis_client, db):
        self.redis = redis_client
        self.db = db

    async def get_results(self, poll_id: str, user_id: str = None) -> dict:
        poll = await self.get_poll(poll_id)

        # Check if results are visible
        if poll.status == "active" and not poll.show_results_during_voting:
            if user_id:
                has_voted = await self.has_user_voted(poll_id, user_id)
                if not has_voted:
                    raise ResultsNotAvailableError("Vote to see results")

        # Try Redis first
        results_key = f"results:{poll_id}"
        cached = self.redis.hgetall(results_key)

        if cached:
            return self.format_results(poll, cached)

        # Cache miss - compute from database
        results = await self.db.fetch("""
            SELECT option_id, vote_count
            FROM poll_options
            WHERE poll_id = $1
            ORDER BY display_order
        """, poll_id)

        # Cache for next time
        for row in results:
            self.redis.hset(results_key, str(row['option_id']), row['vote_count'])
        self.redis.expire(results_key, 3600)  # 1 hour TTL

        return self.format_results(poll, results)

    def format_results(self, poll, results) -> dict:
        total = sum(int(v) for v in results.values())

        return {
            "poll_id": poll.poll_id,
            "title": poll.title,
            "status": poll.status,
            "total_votes": total,
            "options": [
                {
                    "option_id": opt_id,
                    "votes": int(count),
                    "percentage": round(int(count) / total * 100, 1) if total > 0 else 0
                }
                for opt_id, count in results.items()
            ]
        }

Real-Time Updates with WebSockets¶

For live-updating results:

from fastapi import WebSocket
import asyncio

class ResultsWebSocket:
    def __init__(self):
        self.connections: dict[str, list[WebSocket]] = {}

    async def connect(self, poll_id: str, websocket: WebSocket):
        await websocket.accept()

        if poll_id not in self.connections:
            self.connections[poll_id] = []
        self.connections[poll_id].append(websocket)

        # Send current results immediately
        results = await results_service.get_results(poll_id)
        await websocket.send_json(results)

    async def broadcast_update(self, poll_id: str, results: dict):
        if poll_id not in self.connections:
            return

        for ws in self.connections[poll_id]:
            try:
                await ws.send_json(results)
            except:
                self.connections[poll_id].remove(ws)

# In vote processor, after updating counts:
async def on_vote_processed(poll_id: str, option_id: str):
    results = await results_service.get_results(poll_id)
    await ws_manager.broadcast_update(poll_id, results)

4.4 Message Queue Design with Kafka¶

Why Kafka?¶

Requirement	Kafka Feature
High throughput	Partitioned, batched writes
Durability	Replication, disk persistence
Ordering	Per-partition ordering
Replay	Consumer offsets, retention
Scalability	Add partitions, consumers

Topic Configuration¶

topic: votes
partitions: 12  # Allows 12 parallel consumers
replication_factor: 3  # Survives 2 broker failures
retention.ms: 604800000  # 7 days for replay/audit
min.insync.replicas: 2  # Require 2 brokers to ack

Partitioning Strategy¶

Option 1: Partition by user_id

partition = hash(user_id) % num_partitions

- Same user's votes go to same partition - Helps with per-user deduplication - Good load distribution

Option 2: Partition by poll_id

partition = hash(poll_id) % num_partitions

- All votes for a poll in one partition - Easier to maintain poll-level ordering - Risk: Hot polls overload one partition

Recommendation: Partition by user_id for better load distribution.

Consumer Group¶

from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'votes',
    bootstrap_servers=['kafka1:9092', 'kafka2:9092', 'kafka3:9092'],
    group_id='vote-processors',
    auto_offset_reset='earliest',
    enable_auto_commit=False,  # Manual commit for reliability
    value_deserializer=lambda m: json.loads(m.decode())
)

for message in consumer:
    try:
        await process_vote(message.value)
        consumer.commit()  # Only commit after successful processing
    except Exception as e:
        log.error(f"Failed to process vote: {e}")
        # Don't commit - Kafka will redeliver

Dead Letter Queue¶

Votes that fail repeatedly need investigation:

flowchart LR Main[votes topic] --> Consumer[Vote Processor] Consumer -->|success| Commit[Commit Offset] Consumer -->|failure x3| DLQ[votes-dlq topic] DLQ --> Alert[Alert Team] DLQ --> Manual[Manual Processing]

MAX_RETRIES = 3

async def process_with_retry(message):
    retries = message.headers.get('x-retry-count', 0)

    try:
        await process_vote(message.value)
    except Exception as e:
        if retries >= MAX_RETRIES:
            # Send to dead letter queue
            await kafka.send('votes-dlq', 
                value=message.value,
                headers={'x-error': str(e), 'x-original-offset': message.offset}
            )
        else:
            # Re-enqueue with retry count
            await kafka.send('votes',
                value=message.value,
                headers={'x-retry-count': retries + 1}
            )

Step 5: Scaling & Production¶

5.1 Scaling Strategies¶

Scaling Components¶

Component	Scaling Strategy	Trigger
API Gateway	Horizontal (add instances)	CPU > 70%, latency > 200ms
Vote Service	Horizontal	Queue depth, latency
Kafka	Add partitions, brokers	Throughput limits
Vote Processors	Add consumers	Queue lag > 10,000
PostgreSQL	Read replicas → Sharding	Query latency, connections
Redis	Cluster mode	Memory usage, ops/sec

Handling Viral Polls¶

When a celebrity creates a poll and 10 million people vote:

flowchart LR subgraph Normal["Normal: 100 votes/sec"] N1[3 API pods] N2[3 workers] end subgraph Viral["Viral: 10K votes/sec"] V1[30 API pods] V2[30 workers] end Monitor[Metrics] --> HPA[HPA / Auto-scaler] HPA -->|scale up| Viral

Auto-scaling rules:

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    kind: Deployment
    name: vote-service
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: kafka_consumer_lag
      target:
        type: Value
        value: "5000"  # Scale up if lag exceeds 5K

Database Scaling¶

Stage 1: Read replicas

Writes → Primary
Reads → Read Replicas (round-robin)

Stage 2: Connection pooling (PgBouncer)

App (1000 connections) → PgBouncer (100 connections) → PostgreSQL

Stage 3: Sharding by poll_id

def get_shard(poll_id: str) -> str:
    shard_id = hash(poll_id) % NUM_SHARDS
    return f"shard_{shard_id}"

# Votes for poll ABC → shard_2
# Votes for poll XYZ → shard_0

5.2 Failure Handling¶

Failure Modes and Recovery¶

Failure	Impact	Detection	Recovery
API pod crash	Some requests fail	Health check	LB routes to healthy pods
Kafka broker down	Reduced capacity	Broker health	Failover to replicas
Vote processor crash	Processing delays	Consumer lag	Kafka redelivers, other workers continue
Primary DB down	Writes fail	Connection errors	Promote replica
Redis down	Slower checks	Connection timeout	Fall back to DB

Graceful Degradation¶

When components fail, degrade gracefully instead of hard failing:

async def cast_vote_with_fallback(user_id, poll_id, option_id):
    # Try Redis for duplicate check
    try:
        if await redis.exists(f"voted:{poll_id}:{user_id}"):
            raise AlreadyVotedError()
    except RedisConnectionError:
        # Redis down - skip fast check, rely on DB constraint
        log.warning("Redis unavailable, proceeding without cache check")

    # Try to enqueue vote
    try:
        await kafka.send("votes", {...})
        return {"status": "accepted"}
    except KafkaError:
        # Kafka down - try direct DB write (slower but works)
        log.warning("Kafka unavailable, writing directly to DB")
        try:
            await db.execute("INSERT INTO votes ...")
            return {"status": "confirmed"}
        except UniqueViolationError:
            raise AlreadyVotedError()

Data Consistency Checks¶

Periodically verify Redis counts match database:

async def reconcile_vote_counts():
    """Run hourly to catch any drift."""

    polls = await db.fetch("SELECT poll_id FROM polls WHERE status = 'active'")

    for poll in polls:
        poll_id = poll['poll_id']

        # Get DB truth
        db_counts = await db.fetch("""
            SELECT option_id, COUNT(*) as count
            FROM votes WHERE poll_id = $1
            GROUP BY option_id
        """, poll_id)

        # Get Redis counts
        redis_counts = redis.hgetall(f"results:{poll_id}")

        # Compare and fix
        for row in db_counts:
            option_id = str(row['option_id'])
            db_count = row['count']
            redis_count = int(redis_counts.get(option_id, 0))

            if db_count != redis_count:
                log.warning(f"Count mismatch for {poll_id}/{option_id}: "
                           f"DB={db_count}, Redis={redis_count}")
                # Fix Redis
                redis.hset(f"results:{poll_id}", option_id, db_count)

5.3 Security Considerations¶

Authentication and Authorization¶

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer

security = HTTPBearer()

async def get_current_user(token: str = Depends(security)) -> User:
    try:
        payload = jwt.decode(token.credentials, SECRET_KEY, algorithms=["HS256"])
        user_id = payload.get("sub")
        return await user_service.get_user(user_id)
    except jwt.ExpiredSignatureError:
        raise HTTPException(401, "Token expired")
    except jwt.JWTError:
        raise HTTPException(401, "Invalid token")

@app.post("/polls/{poll_id}/vote")
async def vote(
    poll_id: str,
    option_id: str,
    user: User = Depends(get_current_user)
):
    # User is authenticated
    return await vote_service.cast_vote(user.user_id, poll_id, option_id)

Rate Limiting¶

Prevent abuse:

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@app.post("/polls/{poll_id}/vote")
@limiter.limit("10/minute")  # 10 votes per minute per IP
async def vote(poll_id: str, request: Request):
    ...

@app.post("/polls")
@limiter.limit("5/hour")  # 5 polls per hour per user
async def create_poll(poll: PollCreate, user: User = Depends(get_current_user)):
    ...

Bot Prevention¶

Technique	Implementation	Effectiveness
CAPTCHA	reCAPTCHA on vote	High, but friction
Rate limiting	Per IP, per user	Medium
Device fingerprinting	FingerprintJS	Medium-High
Behavioral analysis	ML on voting patterns	High
Email verification	Require verified email	High

async def validate_vote_request(request: Request, user: User):
    # Check for suspicious patterns
    recent_votes = await get_recent_votes(user.user_id, minutes=5)

    if len(recent_votes) > 20:
        raise SuspiciousActivityError("Too many votes in short time")

    # Check device fingerprint
    fingerprint = request.headers.get("X-Device-Fingerprint")
    if fingerprint:
        devices_for_user = await get_user_devices(user.user_id)
        if len(devices_for_user) > 5:
            raise SuspiciousActivityError("Too many devices")

Anonymity vs Auditability Trade-off¶

Challenge: You want votes to be anonymous (no one knows how I voted), but you also need to verify one-vote-per-person.

Our approach: - System knows who voted (required for duplicate prevention) - System stores what they voted for (for results) - Other users cannot see individual votes - Admins can audit if legally required

For truly anonymous voting (e.g., government elections): - Use cryptographic techniques (blind signatures, homomorphic encryption) - Voter gets a receipt they can verify without revealing their vote - This is complex and outside typical interview scope

5.4 Multi-Language Implementations¶

Vote service with idempotency and layered deduplication¶

JavaGo

import java.time.Instant;
import java.util.UUID;

public class VoteService {

    public record VoteRequest(String pollId, String optionId, String userId, String idempotencyKey) {}
    public record VoteResult(String voteId, boolean accepted, String message) {}

    private final VoteRepository voteRepository;
    private final RedisTemplate<String, String> redis;
    private final KafkaTemplate<String, VoteEvent> kafka;

    public VoteService(VoteRepository voteRepository, RedisTemplate<String, String> redis,
                        KafkaTemplate<String, VoteEvent> kafka) {
        this.voteRepository = voteRepository;
        this.redis = redis;
        this.kafka = kafka;
    }

    public VoteResult castVote(VoteRequest request) {
        String voteId = UUID.randomUUID().toString();

        // Layer 1: Redis deduplication (fast path)
        String dedupeKey = "vote:" + request.pollId() + ":" + request.userId();
        Boolean isNew = redis.opsForValue().setIfAbsent(dedupeKey, voteId);
        if (Boolean.FALSE.equals(isNew)) {
            return new VoteResult(voteId, false, "Duplicate vote detected (cache)");
        }

        // Layer 2: Idempotency check
        if (voteRepository.existsByIdempotencyKey(request.idempotencyKey())) {
            return new VoteResult(voteId, false, "Duplicate vote detected (idempotency)");
        }

        // Layer 3: Database unique constraint (final safety net)
        try {
            Vote vote = new Vote(voteId, request.pollId(), request.optionId(),
                request.userId(), request.idempotencyKey(), Instant.now());
            voteRepository.save(vote);
        } catch (DuplicateKeyException e) {
            return new VoteResult(voteId, false, "Duplicate vote detected (database)");
        }

        // Publish to Kafka for async result aggregation
        kafka.send("votes", request.pollId(),
            new VoteEvent(voteId, request.pollId(), request.optionId(), Instant.now()));

        // Increment Redis counter for real-time results
        redis.opsForHash().increment("poll:" + request.pollId() + ":results",
            request.optionId(), 1);

        return new VoteResult(voteId, true, "Vote recorded");
    }
}

/**
 * Kafka consumer that aggregates votes and reconciles with the database.
 */
public class VoteAggregationConsumer {

    @KafkaListener(topics = "votes", groupId = "vote-aggregator")
    public void onVoteEvent(VoteEvent event) {
        // periodic reconciliation between Redis counters and DB counts
        // runs less frequently, ensures eventual consistency
        long dbCount = voteRepository.countByPollIdAndOptionId(
            event.pollId(), event.optionId());
        long redisCount = Long.parseLong(
            redis.opsForHash().get("poll:" + event.pollId() + ":results",
                event.optionId()).toString());

        if (Math.abs(dbCount - redisCount) > 10) {
            redis.opsForHash().put("poll:" + event.pollId() + ":results",
                event.optionId(), String.valueOf(dbCount));
        }
    }
}

package voting

import (
    "context"
    "fmt"
    "time"

    "github.com/google/uuid"
    "github.com/redis/go-redis/v9"
)

type VoteRequest struct {
    PollID         string
    OptionID       string
    UserID         string
    IdempotencyKey string
}

type VoteResult struct {
    VoteID   string
    Accepted bool
    Message  string
}

type VoteRepository interface {
    Save(ctx context.Context, vote *Vote) error
    ExistsByIdempotencyKey(ctx context.Context, key string) (bool, error)
    CountByPollAndOption(ctx context.Context, pollID, optionID string) (int64, error)
}

type Vote struct {
    ID             string
    PollID         string
    OptionID       string
    UserID         string
    IdempotencyKey string
    CreatedAt      time.Time
}

type VoteService struct {
    repo   VoteRepository
    redis  *redis.Client
    events chan<- VoteEvent
}

type VoteEvent struct {
    VoteID   string `json:"vote_id"`
    PollID   string `json:"poll_id"`
    OptionID string `json:"option_id"`
}

func NewVoteService(repo VoteRepository, redisClient *redis.Client,
    events chan<- VoteEvent) *VoteService {
    return &VoteService{repo: repo, redis: redisClient, events: events}
}

func (s *VoteService) CastVote(ctx context.Context, req VoteRequest) VoteResult {
    voteID := uuid.New().String()

    // Layer 1: Redis deduplication
    dedupeKey := fmt.Sprintf("vote:%s:%s", req.PollID, req.UserID)
    set, err := s.redis.SetNX(ctx, dedupeKey, voteID, 24*time.Hour).Result()
    if err != nil || !set {
        return VoteResult{voteID, false, "Duplicate vote (cache)"}
    }

    // Layer 2: Idempotency check
    exists, _ := s.repo.ExistsByIdempotencyKey(ctx, req.IdempotencyKey)
    if exists {
        return VoteResult{voteID, false, "Duplicate vote (idempotency)"}
    }

    // Layer 3: Database save with unique constraint
    vote := &Vote{
        ID:             voteID,
        PollID:         req.PollID,
        OptionID:       req.OptionID,
        UserID:         req.UserID,
        IdempotencyKey: req.IdempotencyKey,
        CreatedAt:      time.Now(),
    }
    if err := s.repo.Save(ctx, vote); err != nil {
        return VoteResult{voteID, false, "Duplicate vote (database)"}
    }

    // Update real-time counter in Redis
    resultsKey := fmt.Sprintf("poll:%s:results", req.PollID)
    s.redis.HIncrBy(ctx, resultsKey, req.OptionID, 1)

    // Publish event for async processing
    s.events <- VoteEvent{VoteID: voteID, PollID: req.PollID, OptionID: req.OptionID}

    return VoteResult{voteID, true, "Vote recorded"}
}

func (s *VoteService) GetResults(ctx context.Context, pollID string) (map[string]int64, error) {
    resultsKey := fmt.Sprintf("poll:%s:results", pollID)
    results, err := s.redis.HGetAll(ctx, resultsKey).Result()
    if err != nil {
        return nil, err
    }

    counts := make(map[string]int64)
    for optionID, countStr := range results {
        var count int64
        fmt.Sscanf(countStr, "%d", &count)
        counts[optionID] = count
    }
    return counts, nil
}

5.5 Monitoring and Observability¶

Key Metrics¶

Category	Metric	Alert Threshold
Availability	Vote success rate	< 99.9%
Latency	Vote API P99	> 500ms
Throughput	Votes processed/sec	Trend down
Queue	Kafka consumer lag	> 10,000
Duplicates	Duplicate vote attempts	Spike (> 10%)
Errors	5xx error rate	> 0.1%

Logging¶

Structured logs for easy querying:

{
    "timestamp": "2024-01-15T10:30:00Z",
    "level": "INFO",
    "service": "vote-processor",
    "event": "vote_processed",
    "poll_id": "uuid",
    "user_id": "uuid",
    "option_id": "uuid",
    "processing_time_ms": 15,
    "was_duplicate": false,
    "cache_hit": true
}

Dashboard Panels¶

Real-time metrics:
Votes per second
Active polls count
Error rate
Queue health:
Consumer lag
Partition distribution
Processing latency
Database health:
Query latency
Connection pool usage
Replication lag

Interview Checklist¶

Sample Interview Dialogue¶

Interviewer: "Design a voting system."

You: "Great! Let me clarify a few things first. What type of voting system? A casual polling app like Strawpoll, Reddit-style upvotes, or something more formal like corporate elections?"

Interviewer: "Let's go with a general polling platform. Users create polls, others vote."

You: "Got it. How strict is duplicate vote prevention, and what scale are we targeting?"

Interviewer: "Strict—exactly one vote per user per poll. And let's say it could go viral—maybe 10,000 votes per second peak."

You: "Perfect. So we need strong consistency for duplicate prevention and high throughput. The core challenge is ensuring exactly-once voting while handling massive concurrency.

I'd use a three-layer defense: 1. Fast Redis check to reject obvious duplicates 2. Idempotent message processing in Kafka 3. Database unique constraint as the final guarantee

For the architecture, I'd process votes asynchronously through Kafka to handle the 10K/sec throughput. The API accepts votes quickly, returns 202 Accepted, and workers process them in controlled batches to the database.

Let me draw the architecture..."

Summary¶

Challenge	Solution
Tech stack choices	PostgreSQL + Redis + Kafka; CP for vote integrity, AP for reads; distributed rate limits
CAP tension	Strong consistency on dedup; eventual counts/leaderboard with reconciliation
SLA/SLO	p99 latency, acceptance rate, count lag SLOs; error budget gates releases
API surface	REST + idempotency keys; async `202` + status; results with `as_of` freshness
Duplicate prevention	Three layers: Redis check + Kafka dedup + DB constraint
High throughput	Async processing via Kafka, horizontal scaling
Real-time results	Redis counters, WebSocket updates
Data accuracy	Database unique constraint, periodic reconciliation
Scalability	Kafka partitioning, worker scaling, DB sharding
Reliability	Graceful degradation, dead letter queue, retry logic
Security	JWT auth, rate limiting, bot detection

The voting system demonstrates critical distributed systems concepts: consistency (one vote per user), idempotency (safe retries), and async processing (handling bursts). Master this pattern, and you'll be equipped to design many similar systems.