Ride Sharing (Uber/Lyft)¶

What We're Building¶

A ride-hailing marketplace that connects passengers who need trips with drivers who supply capacity, in real time, over a geographic area—conceptually similar to Uber, Lyft, or regional equivalents.

Core capabilities in scope: - Riders request rides with pickup and (optional) drop-off locations - The system finds suitable nearby drivers under business rules (vehicle type, capacity, bans) - Live location updates for drivers and active trips (map UX, ETAs, safety) - Trip lifecycle from request through completion, cancellation, and payment settlement - Dynamic pricing (surge) when supply is scarce relative to demand - Payments: authorize, capture, split (platform fee), and payout to drivers

Challenge	Description
Real-time geospatial load	Millions of moving points; queries must be fast and geographically local
Matching under constraints	Nearest is not always best (heading, ETA, fairness, surge regions)
Stateful trips	Idempotent transitions; split-brain during network partitions hurts UX and money
ETA accuracy	Road network routing, traffic, and stale GPS all affect trust
Payments	Card networks, disputes, regional regulations, driver payouts
Safety and abuse	Fake GPS, collusion, account takeover; audit trails matter

Comparison to Adjacent Systems¶

System	Similarity	Difference
Food delivery	Dispatch + ETA + payments	Batchability, restaurant prep, less strict passenger-driver lock
Chat / realtime	WebSockets, presence	Ride sharing adds geo indexes and trip state machines
Notification system	Async fan-out	Here, location streams dominate hot paths

Note

In interviews, anchor numbers as orders of magnitude. Precise public metrics change by year and market; the goal is to show you understand QPS, index cardinality, and operational trade-offs.

Step 1: Requirements¶

Functional Requirements¶

Requirement	Priority	Description
Rider requests a ride	Must have	Pickup pin, optional drop-off, vehicle class
Driver availability	Must have	Online/offline; accepts trip offers
Driver-rider matching	Must have	Offer trip to suitable driver(s); fallback if declined or timeout
Live location	Must have	Stream driver position during dispatch and trip
Trip lifecycle	Must have	States from requested to completed/cancelled with rules per state
ETA / price estimate	Must have	Pre-trip ETA and fare band; update during trip
Payments	Must have	Charge rider; platform fee; pay driver per policy
Ratings	Should have	Post-trip feedback
Scheduled rides	Nice to have	Different matching window and reminder notifications
Pool / shared rides	Nice to have	Multi-stop routing and constraint satisfaction

Clarifying questions (typical)

Question	Why it matters
One city or global?	Data residency, map providers, currency
Regulatory category?	Taxi vs TNC rules affect pricing display and data retention
Vehicle classes?	Matching filters and supply pools
Cancellation fees?	State machine transitions and payment intents
Fraud tolerance?	GPS spoofing detection depth

Non-Functional Requirements¶

Requirement	Target	Rationale
Match latency	Low seconds for offer round-trip	Riders abandon quickly
Location update freshness	1–5 s typical for map UX; adaptive	Battery vs accuracy trade-off
Availability	99.9%+ for core APIs	Outages strand riders/drivers mid-trip
Consistency	Strong for trip state and payments	Money and safety; use OLTP stores and idempotent APIs
Scalability	Horizontal for stateless APIs; shard geo indexes	Hot cities create hot cells
Durability	Trip and payment events auditable for years	Disputes and regulation

Warning

Never claim exactly-once side effects at planetary scale without qualifiers. Use idempotency keys for payments and trip transitions; make reconciliation jobs first-class.

API Design¶

Transports - HTTPS/JSON (or gRPC) for request/response: create trip, cancel, history - WebSocket (or MQTT) for high-frequency location and trip events to mobile apps

Representative REST-style endpoints

Operation	Method	Purpose
`POST /v1/trips`	HTTP	Rider creates trip request (pickup, drop-off, product)
`POST /v1/trips/{id}/cancel`	HTTP	Cancel with reason; idempotency key
`GET /v1/trips/{id}`	HTTP	Trip detail and timeline
`POST /v1/drivers/heartbeat`	HTTP	Periodic availability + coarse location
`POST /v1/payments/intents`	HTTP	Create payment authorization
`WS /v1/stream`	WebSocket	Multiplexed: `location`, `trip.updated`, `offer`, `eta`

WebSocket message envelope (conceptual)

{
  "type": "location.update",
  "trip_id": "trip_01h2",
  "seq": 10442,
  "lat": 37.7749,
  "lng": -122.4194,
  "heading_deg": 90,
  "ts_ms": 1712000000000
}

Tip

Sequence numbers or monotonic timestamps help clients discard stale updates after reconnects.

Technology Selection & Tradeoffs¶

Ride sharing stacks several independent “knobs”: how you index space, how you move bits to clients, where you persist truth, how you assign work, and how you buffer high-volume streams. Interviewers care that you name alternatives, compare on real constraints (latency, ops cost, correctness), and commit to a coherent story.

Location indexing¶

Approach	Idea	Strengths	Weaknesses	Typical use
Geohash	Hierarchical string/bit encoding of lat/lng into rectangular cells	Simple prefix scans in KV/Redis; easy bucketing and cache keys	Cell shapes stretch at poles; edge neighbors need explicit neighbor expansion, not prefix alone	Redis GEO, quick prototypes, cell-based surge
*R-tree / R-tree**	Balanced tree of bounding rectangles for points and polygons	Strong for variable-density data and polygon containment (zones, airports)	Higher write amplification; harder to shard horizontally than fixed cells	GIS databases (PostGIS), offline analytics, geofencing
S2 (Google)	Hilbert curve on a sphere; hierarchical cells with stable IDs	Spherical model; fast neighbor/cover ops; level-based refinement	Learning curve; you still tune cell level per city density	Global fleets, consistent cross-region logic
H3 (Uber)	Hexagonal hierarchical grid on the sphere	Uniform neighbors (6 at fine levels); good for aggregates (heatmap, surge hexes)	Not a drop-in for arbitrary “distance to point” APIs; tooling ecosystem smaller than S2 in some stacks	Demand/supply aggregation, surge maps, ML features

Why it matters in interviews: Matching starts with coarse spatial filtering; the index should support ring expansion, TTL eviction for offline drivers, and hot-cell isolation in dense cores.

Real-time communication (client ↔ edge)¶

Option	Model	Strengths	Weaknesses	Fit
WebSocket	Full-duplex TCP, often via HTTPS upgrade	Ubiquitous in mobile; simple multiplexed trip + location channels; CDN/WAF friendly	Stateful connections; reconnect storms; regional stickiness	Default for rider/driver live UX
gRPC streaming	HTTP/2 multiplexed RPC streams	Strong typing, flow control, binary payloads	Mobile/browser support and ops tooling vary; often paired with gateways	Service-to-service high-QPS streams (e.g., internal fan-out)
MQTT	Pub/sub over lightweight broker	Great for many small devices and topic fan-out	Less common as primary app API for consumer ride apps; extra hop (broker)	Telematics, IoT, some embedded fleets

Database (trip truth vs hot geo vs write-heavy telemetry)¶

Store	Strengths	Weaknesses	Role in ride sharing
PostgreSQL + PostGIS	ACID transactions; trip and payment integrity; rich constraints; spatial predicates	Vertical scaling limits; cross-region latency if monolithic	System of record for trips, users, payments, disputes
Redis GEO	Sub-ms nearby queries; simple ops; pairs with geohash	Memory-bound; not your financial ledger; durability is a product choice (AOF/cluster)	Live driver positions and offer locks with TTL
Cassandra	Wide partitions; high ingest; multi-DC	Tunable consistency; no rich joins; operational complexity	High-volume trip events, location archives, idempotent write paths at huge scale
DynamoDB	Managed, predictable key-path access; global tables option	Hot partitions need design; limited ad-hoc query vs SQL	Regional trip metadata, ride history, feature stores with known access patterns

Interview pattern: OLTP for money and state machine; Redis (or equivalent) for geo hot path; column/wide-column or KV for append-only telemetry at extreme scale.

Matching algorithm¶

Approach	What it does	Pros	Cons
Greedy nearest (with ETA)	Iteratively pick best-scoring driver by heuristic after spatial filter	Simple; fast; easy to add business rules	Can be globally suboptimal; order of offers matters
Hungarian / min-cost assignment	Optimal one-to-one assignment on a cost matrix	Strong when batching many simultaneous requests	O(n³); expensive at city scale; needs batch windows
ML-based ranking	Learn weights or pairwise preferences from data	Handles nonlinear effects (acceptance, churn)	Cold start; explainability; still needs guardrails and fallbacks

Production systems almost always combine greedy + ETA + rules, with batch optimization only in constrained windows (e.g., pool rides, small zones).

Message queue: location updates¶

System	Strengths	Weaknesses	When to pick
Kafka	Durable log; high throughput; replay; consumer groups; cross-service contracts	Ops/heavy JVM ecosystem in some orgs; ordering per key is a discipline	Default for `driver.locations`: partition by `driver_id`, multiple consumers
Redis Streams	Low latency; simple if you already run Redis at core	Not the same multi-team replay story as Kafka at petabyte scale; persistence model must be explicit	Edge aggregation, smaller deployments, or extra fast path beside Kafka

Our choice (illustrative architecture)
- S2 or geohash-backed Redis GEO (or H3 for surge/heatmap) for live nearby queries—operational simplicity and proven mobile patterns.
- WebSockets (or MQTT only if product is IoT-heavy) for rider/driver streams; Kafka for location ingestion and downstream consumers (matching freshness, ETA, analytics).
- PostgreSQL + PostGIS as source of truth for trips and payments; Redis for geo + short-lived coordination; Cassandra or DynamoDB only when you justify massive append-only or multi-region access patterns with clear key design.
- Greedy matching with road-network ETA and sequential offers; reserve Hungarian/ILP for bounded batch problems; layer ML on ranking weights, not as the only safety-critical assigner.

This stack optimizes for interview clarity: separate AP-friendly location paths from CP trip/payment paths, and show you know when to pay for optimality vs latency.

CAP Theorem Analysis¶

CAP is a thought model, not a literal database toggle: in partitions, you choose between linearizable consistency and always answering with possibly stale data. Ride sharing mixes subsystems with different tolerances.

Subsystem	Dominant need	CAP lens (under partition)	Rationale
Location tracking	Freshness + availability	AP: accept that two clients might briefly disagree on exact coordinates; eventual convergence is OK	Maps can tolerate seconds of skew; do not block the world on GPS
Ride matching	No double-dispatch	CP for assignment decision: one writer “wins” (trip offer acceptance)	Two drivers must not be MATCHED to the same trip; use compare-and-swap, fencing tokens, or DB constraints
Trip state	Correct lifecycle	CP for transitions (`REQUESTED` → `MATCHED` → …)	Stale trip state breaks UX and billing; prefer strong consistency per `trip_id`
Payment	Monetary integrity	CP: authoritative ledger; no duplicated capture without idempotency	Card networks are eventually consistent externally; your ledger must reconcile

Why “driver location is AP but assignment is CP” is a strong interview answer:
- Location is high volume and softly consistent—show last-known-good with timestamps.
- Assignment is low volume and hard invariant—serialize decisions on the trip row or dedicated lock service.

flowchart TB subgraph AP["AP-biased — under partition: keep serving; location may briefly disagree"] L[Driver location stream] M[Map / ETA display] H[Heartbeats] end subgraph CP["CP-biased — under partition: serialize; no double-dispatch or double-capture"] A[Offer accept / assign driver] T[Trip state transitions] P[Payment authorize / capture] end L --> M A --> T T --> P

Note

Mention idempotency keys and single-writer patterns for the CP side; mention timestamps + sequence numbers on the AP side so clients reject garbage after reconnect.

SLA and SLO Definitions¶

SLAs are contracts (often external); SLOs are internal targets composed from SLIs (indicators). For interviews, define what you measure, target, and how you burn error budget.

Service level indicators and example objectives¶

Area	SLI (what we measure)	Example SLO target	Why candidates care
Matching latency	Time from `POST /trips` to first driver offer sent (p95/p99)	p95 < 3 s, p99 < 8 s in dense cities	Drives geo index + matching service budgets
ETA accuracy	Absolute error between predicted and actual trip time (p50/p90)	p90 error < 25% of trip duration (tune per market)	Wrong ETAs erode trust; tie to map/traffic pipeline
Location freshness	Age of last accepted driver position shown to rider (p95)	p95 < 5 s on trip; adaptive when idle	Battery vs UX; backpressure must not silently stall
Trip completion rate	Completed / (requested − fraudulent) over rolling window	> 98% “happy path” completion excluding rider cancel	Separates product health from supply gaps
Payment processing	Successful capture on first attempt / reconciliation latency	99.95% capture success within 60 s of completion; 100% eventual consistency via ledger	Money path gets the tightest objectives

Error budget policy¶

Principle	Practice
Budget = 1 − SLO over window (e.g., 30 days)	Example: 99.9% monthly availability ⇒ 43.2 min downtime budget
Fast vs slow burn	Alert on burn rate: consuming budget too fast blocks risky releases
Feature flags	Disable non-critical experiments (e.g., new ranker) when matching SLO burns
Tiered response	Green: normal deploys. Yellow: freeze features; scale matching. Red: degrade AP paths (coarser map) before CP paths (never “best effort” payments)

Warning

Never promise 100% for distributed systems except where business definition allows (e.g., “100% of trips eventually reach terminal state or explicit reconciliation case”).

Database Schema¶

Schemas below are illustrative for interviews: normalize users, keep trips as the core aggregate, isolate high-churn driver locations, and treat payments as append-only events with idempotency.

`users` (riders and drivers)¶

Column	Type	Notes
`user_id`	`UUID` (PK)	Stable identity
`role`	`ENUM('rider','driver','both')`	App may use separate profiles
`email` / `phone`	`TEXT`, hashed or tokenized	PII minimization
`status`	`ENUM('active','suspended',…)`	Fraud / compliance
`created_at`	`TIMESTAMPTZ`	Audit

driver_profiles (1:1 when role includes driver)

Column	Type	Notes
`driver_id`	`UUID` (PK, FK → `users`)
`vehicle_class`	`TEXT`	e.g., `ECONOMY`, `XL`
`license_region`	`TEXT`	Regulatory
`rating_avg`	`NUMERIC(3,2)`	Denormalized cache; source of truth can be aggregates

`rides` / `trips`¶

Column	Type	Notes
`trip_id`	`UUID` (PK)	Public id in APIs
`rider_id`	`UUID` (FK → `users`)
`driver_id`	`UUID` (FK → `users`, nullable until matched)
`pickup_lat`, `pickup_lng`	`DOUBLE PRECISION`	Or `GEOGRAPHY(POINT)` in PostGIS
`dropoff_lat`, `dropoff_lng`	`DOUBLE PRECISION`	Nullable if unknown at request
`status`	`ENUM('REQUESTED','MATCHED',…)`	Align with state machine
`fare_estimate_cents` / `fare_final_cents`	`BIGINT`	Integer money
`currency`	`CHAR(3)`	ISO 4217
`idempotency_key`	`TEXT` UNIQUE	Rider retries
`requested_at`, `matched_at`, `started_at`, `completed_at`	`TIMESTAMPTZ`	Timeline + SLO measurement
`version`	`BIGINT`	Optimistic locking for transitions

Indexes: (rider_id, requested_at DESC), (driver_id, status), (status, requested_at) for ops dashboards.

`driver_locations` (hot path)¶

Column	Type	Notes
`driver_id`	`UUID` (PK)	One row per online driver, or partition by time if historified
`lat`, `lng`	`DOUBLE PRECISION`
`geohash` / `s2_cell_id`	`TEXT` / `BIGINT`	Accelerate cell queries if not only in Redis
`heading_deg`	`SMALLINT`	Optional
`updated_at`	`TIMESTAMPTZ`	Freshness SLI
`trip_id`	`UUID` NULL	When on active trip

At scale, Redis GEO holds the same fields with TTL; OLTP may store only snapshots or nothing if stream-only.

`payments`¶

Column	Type	Notes
`payment_id`	`UUID` (PK)
`trip_id`	`UUID` (FK → `rides`)
`payer_user_id`	`UUID`	Rider
`amount_cents`	`BIGINT`
`currency`	`CHAR(3)`
`status`	`ENUM('AUTHORIZED','CAPTURED','FAILED','REFUNDED')`
`psp_reference`	`TEXT`	Opaque token from PSP
`idempotency_key`	`TEXT` UNIQUE	Mandatory for captures
`created_at`, `settled_at`	`TIMESTAMPTZ`	Reconciliation

Tip

In a deep dive, add trip_events (append-only) for audit and outbox rows for reliable downstream notifications—interviewers reward event + transaction pairing.

Step 2: Back-of-the-Envelope Estimation¶

Assumptions (illustrative metro)¶

- Metro DAU (riders + drivers): 2 million
- Completed trips per day: 8 million
- Average trip duration: 20 minutes active driving
- Peak factor: 2× average QPS during rush hour
- Location updates while on trip: 1 per 3 seconds per active driver

Trip Creations per Second¶

8,000,000 trips / 86,400 s ≈ 93 trips/s average
Peak ≈ 186 trips/s (2×)

Location Update Rate¶

Suppose 80,000 concurrent active trips city-wide at peak (illustrative).
Updates per second ≈ 80,000 / 3 ≈ 27,000 loc/s (plus idle driver heartbeats).

Note

Writes to location pipelines often exceed trip creation QPS by an order of magnitude. Design the hot path for ingestion and aggregation, not just CRUD.

Storage (orders of magnitude)¶

Data	Illustrative daily volume	Notes
Trip records	8M rows/day	Indexed by rider, driver, time
Location samples	Billions of points/day	Often short TTL in hot store; archive to object storage
Payment ledger	8M+ events	Immutable append log + OLTP

Bandwidth¶

If each location update is ~100 bytes on the wire after compression:

27,000 × 100 B ≈ 2.7 MB/s (location alone, one region)

Fan-out to maps, ETA services, and regional aggregators adds egress cost; edge aggregation helps.

Step 3: High-Level Design¶

Architecture Overview¶

flowchart TB subgraph Clients["Clients"] R[Rider app] D[Driver app] end subgraph Edge["Edge / API"] GW[API Gateway + Auth] WS[WebSocket gateway] end subgraph Core["Core services"] TS[Trip service] MS[Matching service] LS[Location ingestion] ETA[ETA / routing] PR[Pricing / surge] PAY[Payments] NS[Notification] end subgraph Data["Data stores"] PG[(PostgreSQL / CockroachDB)] RD[(Redis)] MQ[[Kafka / Pulsar]] GS[("Geo index: Redis GEO / S2 / custom")] end R --> GW D --> GW R --> WS D --> WS GW --> TS GW --> MS WS --> LS LS --> MQ MQ --> MS MS --> GS MS --> ETA TS --> PG MS --> RD PR --> RD PAY --> PG TS --> NS

Responsibilities - Trip service: source of truth for trip state; emits domain events - Matching service: candidate search, ranking, offer timeouts, reassignment - Location ingestion: validate, throttle, publish to streams; update geo indexes - ETA service: road-network routing + traffic overlays; caches frequent polylines - Pricing: base fare, distance/time, surge multipliers with caps - Payments: PSP integration; idempotent captures; payout scheduler

Tip

Keep trip state transitions in a small number of services with clear ownership. Avoid “matching updates trip directly” and “trip calls matching in a circle” without a defined orchestration pattern (saga, outbox).

Geospatial Grid (conceptual)¶

flowchart LR subgraph City["City partitioned into cells"] C1[Cell A1] C2[Cell A2] C3[Cell B1] C4[Cell B2] end R[Rider request] --> C2 C2 --> Q[Query drivers in cell + ring of neighbors] Q --> M[Rank + offer]

Cells may be geohash prefixes, S2 cells, or quadtree buckets backed by the same matching API.

Step 4: Deep Dive¶

4.1 Location Tracking and Updates¶

Goals - Fresh enough for map UX and ETA refinement - Battery-friendly on mobile (adaptive sampling) - Abuse-resistant (rate limits, plausibility checks)

Pipeline 1. Mobile batches points or sends on a timer (e.g., every 2–5 s on trip) 2. API validates JWT, trip/driver binding, and basic sanity (speed caps) 3. Ingestion writes to Kafka topic driver.locations partitioned by driver_id 4. Consumers update Redis GEO (or equivalent) for live queries 5. Downsampler writes trip trace to object storage for disputes (PII policy)

WebSocket vs polling - WebSocket (or HTTP/2 streams) is standard for streaming updates to the rider during dispatch and trip - Polling is simpler but wastes battery and increases tail latency

Note

Treat high-frequency location as a stream processing problem: backpressure, poison messages, and consumer lag dashboards are production requirements.

4.2 Geospatial Indexing (Geohash / S2 / QuadTree)¶

Approach	Idea	Pros	Cons
Geohash	Hierarchical string encoding of lat/lng into base-32 cells	Simple prefix queries in KV stores; easy bucketing	Edge distortion; pole/corner neighbors need neighbor list not just prefix
Quadtree	Recursive subdivision of space	Adaptive density; good for variable driver density	More complex updates; hot leaves under stadium events
S2 (Google)	Hilbert curve on a sphere; hierarchical cells	Consistent sphere geometry; stable neighbor ops	Learning curve; need library support

When interviewers push “which one?” - At large scale, teams often use S2 or geohash + neighbor expansion with Redis GEO (geohash under the hood) or a dedicated spatial service. - Quadtree shines when you want adaptive refinement and can pay for more complex rebalancing.

Comparison summary

Criterion	Geohash	Quadtree	S2
Spherical correctness	Approximate	Planar projection issues unless projected	Strong
Neighbor queries	Prefix + neighbor hashes	Tree traversal	Fixed-level cell union
Implementation effort	Low	Medium–high	Medium (library)

4.3 Driver-Rider Matching Algorithm¶

Problem shape - Find K nearest available drivers subject to: - ETA to pickup (not Euclidean distance) - Heading / road access heuristics - Fairness (avoid starving distant drivers entirely) - Compliance (licensed vehicle class)

Typical stages 1. Coarse filter: drivers in cells overlapping pickup plus ring expansion until minimum count or max radius 2. Rank: score w1 * eta_pickup + w2 * idle_time + w3 * acceptance_rate + penalties 3. Offer: send push/WebSocket offer; timeout (e.g., 10–20 s) 4. Fallback: next candidate; escalate surge or broaden radius

sequenceDiagram participant R as Rider participant T as Trip service participant M as Matching participant G as Geo index participant D as Driver R->>T: POST /trips T->>M: match_request(pickup, constraints) M->>G: nearby_drivers(cell, k) G-->>M: candidate list M->>M: rank by ETA + rules M->>D: push offer (trip_id, ttl) D-->>M: accept / timeout M->>T: assign(driver_id)

Warning

Nearest in straight-line distance is a common trap. Interviewers reward mentioning road-network ETA and one-way systems.

4.4 ETA Computation¶

Inputs - Map graph (nodes/edges), speed profiles by hour-of-week, live traffic feeds (probe speeds, closures) - Live driver position for remaining route refresh

Architecture sketch - Precomputed distances between region hubs for coarse estimates - Routing engine (e.g., OSRM-like contraction hierarchies) for accurate paths on demand - Cache ETAs by (origin_cell, dest_cell, time_bucket) with short TTL

Failure modes - Stale traffic after incidents - GPS drift in urban canyons - Detours; communicate confidence (“7–11 min”) not false precision

4.5 Trip Lifecycle Management¶

States (illustrative)

State	Meaning
`REQUESTED`	Rider created trip; matching not finished
`MATCHED`	Driver assigned; not yet arrived
`DRIVER_ARRIVED`	Driver at pickup zone
`IN_PROGRESS`	Trip started (meter on)
`COMPLETED`	Trip ended normally
`CANCELLED`	Cancelled by rider/driver/system with reason code

stateDiagram-v2 [*] --> REQUESTED REQUESTED --> MATCHED : driver assigned REQUESTED --> CANCELLED : no driver / rider cancel MATCHED --> DRIVER_ARRIVED : driver at pickup MATCHED --> CANCELLED : timeout / cancel DRIVER_ARRIVED --> IN_PROGRESS : trip start IN_PROGRESS --> COMPLETED : trip end IN_PROGRESS --> CANCELLED : support / fraud (rare) COMPLETED --> [*] CANCELLED --> [*]

Implementation notes - Persist transitions with monotonic version or state + epoch - All mutating endpoints accept Idempotency-Key - Emit outbox events for notifications, analytics, and payment triggers

4.6 Surge Pricing¶

Intent - Balance supply and demand when requests exceed available drivers in a micro-region

Common model

price = (base + per_minute * T + per_mile * D) * surge_multiplier

Surge multiplier derived from request rate, available driver count, ETA inflation, sometimes ML forecast
Apply caps and transparency (map shading); store multiplier history for support

Concern	Mitigation
User backlash	Cap multiplier; show upfront estimate
Driver gaming	Detect coordinated cancels; anomaly on GPS
Split-brain pricing	Freeze multiplier at trip request time (versioned quote)

4.7 Payment Integration¶

Flows - Authorize at request or match (hold) - Capture on completion (or incremental for long trips if product requires) - Platform fee and driver payout via scheduled transfers or instant pay products

Requirements - PCI scope minimization (use PSP tokens; never store PAN) - Idempotent captures and refunds - Ledger + OLTP: double-entry style for reconciliation

Step 5: Scaling & Production¶

Area	Technique
Geo hot spots	Partition indexes by city/region; avoid global locks
Location writes	Partition Kafka by driver; batch updates to Redis
Matching	Precompute coarse candidates; cache map tiles and ETA
Trips DB	Shard by `trip_id` or `city_id`; avoid cross-shard transactions for happy path
Observability	Metrics: match latency, offer acceptance, cancel reasons, ETA error
Resilience	Timeouts everywhere; dead-letter queues; chaos on regional failover

Disaster scenarios to mention - Data center failure: multi-AZ; Kafka mirroring; read-only degradation modes - Payment provider outage: queue captures; manual reconciliation playbooks

Interview Tips¶

Start with requirements and non-goals; define trip state early
Draw geo index + matching pipeline before micro-optimizing storage
Separate location streaming from transactional trip data
Always mention idempotency for payments and event ordering for analytics
Close with trade-offs: strong consistency vs availability for non-payment paths

Interview Checklist¶

Phase	You should be able to
Requirements	List functional and non-functional requirements; ask about vehicle classes and regions
Estimation	Estimate trip QPS vs location QPS; storage for trips vs traces
High-level	APIs, WebSockets, trip service, matching, geo index, ETA, pricing, payments
Deep dive	Geohash vs S2 vs quadtree; matching stages; ETA inputs; surge fairness
Production	Sharding, Kafka lag, Redis hot keys, payment idempotency, monitoring
Closing	Trade-offs, failure modes, and how you would validate in shadow traffic

Sample Interview Dialogue¶

Interviewer: “Design Uber.”

You: “I will assume ride-hailing in large cities: riders request pickup/drop-off, we match an available driver, stream locations, complete the trip, and charge through a PSP. Out of scope: freight and micromobility unless you want them.”

Interviewer: “Focus on matching and real-time location.”

You: “Matching is not pure nearest-neighbor in Euclidean space. I would bucket drivers in a spatial index—geohash prefixes, S2 cells, or a quadtree—query an expanding set of cells around the pickup, then rank by ETA from the road network, idle time, and constraints. I would offer to one driver at a time with a timeout, then move to the next candidate.”

Interviewer: “How do drivers update location?”

You: “Mobile sends periodic points over HTTPS or streams via WebSocket during active trips. Ingestion validates and publishes to Kafka partitioned by driver_id. Consumers update Redis GEO for fast GEORADIUS queries and downsample traces to cheaper storage for disputes.”

Interviewer: “How do you handle surge?”

You: “Compute a multiplier per micro-region from demand/supply imbalance, with caps and a frozen quote at request time so riders are not surprised. Audit multiplier history for support.”

Summary¶

Ride sharing combines high-write location streams, geospatial indexing (geohash, quadtree, or S2), constrained matching with road-network ETA, a clear trip state machine, surge pricing with transparency, and payment correctness via idempotent APIs and ledgers. In interviews, emphasize separation of concerns between real-time movement data and transactional trip state, and show how you would scale the hot path in dense cities without collapsing shared infrastructure.

Appendix: Reference Code Snippets¶

Geohash — encode and neighbor ring¶

PythonJava

BASE32 = "0123456789bcdefghjkmnpqrstuvwxyz"

def encode_geohash(lat: float, lon: float, precision: int = 9) -> str:
    """Simple geohash encoder (interleaved bit string), for interview illustration."""
    lat_interval = (-90.0, 90.0)
    lon_interval = (-180.0, 180.0)
    bits = []
    even = True
    while len(bits) < precision * 5:
        if even:
            mid = sum(lon_interval) / 2
            if lon >= mid:
                bits.append(1)
                lon_interval = (mid, lon_interval[1])
            else:
                bits.append(0)
                lon_interval = (lon_interval[0], mid)
        else:
            mid = sum(lat_interval) / 2
            if lat >= mid:
                bits.append(1)
                lat_interval = (mid, lat_interval[1])
            else:
                bits.append(0)
                lat_interval = (lat_interval[0], mid)
        even = not even
    out = []
    for i in range(0, len(bits), 5):
        val = 0
        for b in bits[i : i + 5]:
            val = (val << 1) | b
        out.append(BASE32[val])
    return "".join(out[:precision])

import java.util.LinkedHashSet;
import java.util.Set;

/** Interview-style stub: use a library in production (e.g., ch.hsr geohash). */
public final class GeoSearchRing {
    private GeoSearchRing() {}

    public static Set<String> ring(String centerGeohash, int neighborDepth) {
        Set<String> cells = new LinkedHashSet<>();
        cells.add(centerGeohash);
        // Pseudocode: for each prefix level, add 8 neighbors from library
        // Real impl: geohash.getAdjacent() for N, S, E, W, NE, NW, SE, SW
        for (int i = 0; i < neighborDepth; i++) {
            // expand set with neighbors of all current cells
        }
        return cells;
    }
}

Location update handler (Go)¶

package location

import (
    "context"
    "encoding/json"
    "errors"
    "time"
)

var ErrInvalidCoords = errors.New("invalid coordinates")

type Update struct {
    DriverID string
    TripID   string
    Lat      float64
    Lng      float64
    Seq      int64
    TS       time.Time
}

type Publisher interface {
    Publish(ctx context.Context, topic string, key []byte, value []byte) error
}

func HandleLocation(ctx context.Context, p Publisher, u Update) error {
    if u.Lat < -90 || u.Lat > 90 || u.Lng < -180 || u.Lng > 180 {
        return ErrInvalidCoords
    }
    payload, err := json.Marshal(u)
    if err != nil {
        return err
    }
    return p.Publish(ctx, "driver.locations", []byte(u.DriverID), payload)
}

Matching score (Python) — rank candidates¶

def rank_score(eta_sec: float, idle_sec: float, accept: float, weights: tuple[float, float, float]) -> float:
    w_eta, w_idle, w_accept = weights
    # Lower is better; invert acceptance which is "higher is better"
    return w_eta * eta_sec + w_idle * idle_sec + w_accept * (1.0 - accept)

Redis GEO query (illustrative)¶

GEOADD drivers:online <lng> <lat> <driver_id>
GEORADIUS drivers:online <lng> <lat> 3 km WITHDIST ASC COUNT 20

Appendix: Mermaid — Matching Flow (end-to-end)¶

flowchart TD A[Rider requests trip] --> B[Persist REQUESTED] B --> C[Geo query candidates] C --> D[Rank by ETA + rules] D --> E{Offer sent?} E -->|timeout| F[Next candidate] F --> D E -->|accept| G[Assign driver MATCHED] G --> H[Stream location to rider] H --> I[Complete trip + capture payment]

Note

Production systems add experiments, shadow traffic, and safety checks between every step; the diagram is a conceptual backbone for discussion.

Ride Sharing (Uber/Lyft)¶

What We're Building¶

Why Ride Sharing Is Hard¶

Comparison to Adjacent Systems¶

Step 1: Requirements¶

Functional Requirements¶

Non-Functional Requirements¶

API Design¶

Technology Selection & Tradeoffs¶

Location indexing¶

Real-time communication (client ↔ edge)¶

Database (trip truth vs hot geo vs write-heavy telemetry)¶

Matching algorithm¶

Message queue: location updates¶

CAP Theorem Analysis¶

SLA and SLO Definitions¶

Service level indicators and example objectives¶

Error budget policy¶

Database Schema¶

users (riders and drivers)¶

rides / trips¶

driver_locations (hot path)¶

payments¶

Step 2: Back-of-the-Envelope Estimation¶

Assumptions (illustrative metro)¶

Trip Creations per Second¶

Location Update Rate¶

Storage (orders of magnitude)¶

Bandwidth¶

Step 3: High-Level Design¶

Architecture Overview¶

Geospatial Grid (conceptual)¶

Step 4: Deep Dive¶

4.1 Location Tracking and Updates¶

4.2 Geospatial Indexing (Geohash / S2 / QuadTree)¶

4.3 Driver-Rider Matching Algorithm¶

4.4 ETA Computation¶

4.5 Trip Lifecycle Management¶

4.6 Surge Pricing¶

4.7 Payment Integration¶

Step 5: Scaling & Production¶

Interview Tips¶

Interview Checklist¶

Sample Interview Dialogue¶

Summary¶

Appendix: Reference Code Snippets¶

Geohash — encode and neighbor ring¶

Location update handler (Go)¶

Matching score (Python) — rank candidates¶

Redis GEO query (illustrative)¶

Appendix: Mermaid — Matching Flow (end-to-end)¶

`users` (riders and drivers)¶

`rides` / `trips`¶

`driver_locations` (hot path)¶

`payments`¶