API Gateway¶

What We're Building¶

An API gateway is a reverse proxy and policy enforcement point that sits in front of microservices: it terminates TLS, authenticates callers, authorizes routes, shapes traffic (rate limits, circuit breaking), routes to the right upstream (with load balancing and canary splits), optionally transforms requests/responses, caches safe responses, and emits unified observability (logs, metrics, traces). It is the single front door for external and internal API consumers.

Why it exists: without a gateway, every service reinvents TLS, auth, quotas, and client-facing versioning. The gateway centralizes cross-cutting concerns so application teams focus on business logic while platform/SRE teams own SLOs for ingress, security baselines, and traffic management. At scale, the gateway is often deployed per region with anycast or DNS steering, backed by shared control plane stores (etcd/Consul) and stateful edge data in Redis.

Production shape: data plane instances (Envoy, NGINX, Kong, or custom Go/Rust) execute hot-path logic with sub-millisecond overhead targets; control plane services publish route tables, rate-limit policies, and certificates with hot reload (xDS, file watch, or config API). PostgreSQL holds durable audit and API key metadata; Redis holds token buckets and response cache entries; OpenTelemetry exports traces to the same backends as services for correlation.

Capability	Why it matters
Routing	Path/header/weighted rules map clients to the correct upstream version and region
Rate limiting	Protects upstreams and enforces commercial tiers (per key, IP, global)
AuthN/Z	API keys, JWT validation, OAuth introspection—before traffic hits services
Transformation	Header injection, body rewrite, REST↔gRPC bridging at the edge
Load balancing & resilience	RR, least-conn, consistent hash; circuit breaker prevents cascading failure
Composition	Aggregate multiple backend calls into one client-facing API
Caching	TTL-based response cache with invalidation hooks
Observability	Structured logs, RED metrics, distributed traces (OTel)
Extensibility	WASM/Lua/plugins—policy without recompiling the binary
Ops	Hot config reload, service discovery, mTLS, canary/blue-green

Note

In interviews, separate data plane (low latency, stateless logic + fast I/O) from control plane (durable config, rollout, audit). Confusing them is a common trap.

Reference systems (interview vocabulary)¶

System	Positioning	Typical talking point
Envoy	L7 proxy, xDS control plane, WASM filters	Rate limit service sidecar; ext_authz for auth
Kong	Postgres/Lua or DB-less; rich plugins	Admin API; declarative config for GitOps
NGINX / OpenResty	High-performance reverse proxy	Lua for custom logic; less dynamic than xDS
AWS API Gateway / Azure APIM	Managed edge + integrations	Vendor IAM; latency vs self-hosted tradeoff
Traefik / HAProxy	Ingress / LB with dynamic config	Simpler than Envoy; fewer L7 extension points

Questions to ask (before you draw boxes)¶

Question	Why it matters
External vs internal clients?	Public internet drives WAF, DDoS, TLS ciphers
Synchronous only or WebSockets/SSE?	Long-lived connections change LB and deploy strategy
Who owns API keys—us or IdP?	Drives Postgres vs OAuth-only
Multi-region active-active?	Global rate limits and routing consistency
Compliance (PCI, HIPAA)?	mTLS, audit granularity, data residency

1. Requirements Clarification¶

Functional Requirements¶

Requirement	Priority	Description
Path-based routing	Must have	Match URI prefixes/patterns to upstream clusters
Header-based routing	Must have	Route by `X-Tenant`, `Accept`, mobile vs web, A/B flags
Weighted routing	Must have	Canary (% to new version), blue-green switch
Rate limiting	Must have	Token bucket per API key, per client IP, global ceiling
API key auth	Must have	Validate key, attach identity to request context
JWT validation	Must have	Signature, `aud`/`iss`, `exp`; optional JWKS rotation
OAuth 2.0 introspection	Should have	Opaque tokens via RFC 7662-style introspection endpoint
Header/body transform	Should have	Inject tracing headers, strip secrets, JSON patch
REST↔gRPC translation	Should have	gRPC-Web or JSON to gRPC for browser/legacy clients
Load balancing	Must have	RR, least-connections, consistent hash (session affinity)
Circuit breaker	Must have	Open on failures; half-open probe; configurable thresholds
Request aggregation	Nice to have	Fan-out to multiple services; combine JSON
Response caching	Should have	Cacheable GET with TTL; surrogate keys / purge API
Logs/metrics/traces	Must have	OTel; correlation IDs; no PII in logs
Plugin architecture	Should have	WASM/Lua/Go plugins for custom policy
Hot reload	Must have	Routes/limits/certs without process restart
Service discovery	Must have	Consul, etcd, Kubernetes Endpoints
TLS & mTLS	Must have	Edge TLS; optional client cert to gateway
WebSocket / SSE	Should have	Long-lived upgrade proxying
Canary / blue-green	Should have	Traffic split and instant rollback

Non-Functional Requirements¶

Requirement	Target	Rationale
Throughput	~100K RPS per node (illustrative; workload-dependent)	Hot path must be I/O and syscall efficient
Routing overhead	Sub-millisecond p99 added latency vs direct upstream	Users pay gateway tax on every request
Availability	99.99% regional gateway tier	Ingress is on the critical path
Consistency (config)	Seconds to propagate route changes globally	Balance freshness vs blast radius
Security	TLS 1.2+, strong ciphers; secrets in KMS/HSM	Edge is the attack surface
Auditability	Append-only admin actions in PostgreSQL	Compliance and incident forensics

Out of Scope¶

Authoritative user directory / IdP — gateway integrates with Okta/Auth0/Keycloak; it does not replace them.
Business logic — orchestration belongs in BFF or domain services; gateway stays thin unless using composition as an explicit product feature.
Full API lifecycle (design-first codegen, developer portal billing) — often a separate developer platform; gateway may expose metadata but not own the portal UX.
Payload inspection beyond policy — deep ML content scanning is typically async pipelines, not synchronous gateway path (unless specialized product).

2. Back-of-Envelope Estimation¶

Assume one gateway node target 100K RPS sustained for sizing discussions; multiply by N active nodes behind LB with ~70% peak utilization headroom.

Traffic & concurrency¶

Quantity	Formula / assumption	Result
Peak RPS per node	Design target	100,000
Average request size	JSON APIs	1 KB ingress
Average response size	JSON APIs	2 KB egress
Ingress bandwidth	`100K × 1 KB`	~100 MB/s (~800 Mbps)
Egress bandwidth	`100K × 2 KB`	~200 MB/s (~1.6 Gbps)
Concurrent connections	Assume 50ms mean service time, open keep-alive	`100K × 0.05` ≈ 5,000 (order-of-magnitude)

Tip

Real numbers depend on TLS, compression, HTTP/2 multiplexing, and payload size. Use these as interview Fermi checks, not purchase orders.

Storage (PostgreSQL — audit & config)¶

Store	Daily volume	Retention	Notes
Audit log	`100M requests/day × 500 bytes/row` ≈ 50 GB/day	90 days hot, then archive	Index by `ts`, `actor`, `route_id`
API keys metadata	Low churn	Years	Small table; secrets are hashed
Route config history	Versioned rows on change	1 year	Rollback support

Redis (rate limits + cache)¶

Use	Key cardinality	Memory hint
Token bucket per API key	1M active keys × ~100 B	~100 MB + overhead
Per-IP limit	100K hot IPs	~10–20 MB
Response cache	Popular GETs; TTL 60s	GB–TB (product choice)

Bandwidth & cross-AZ¶

Path	Cost driver
Client → gateway	Internet / CDN; DDoS protection
Gateway → upstream	Often same region; cross-AZ doubles bill—topology-aware LB helps

CPU, memory & file descriptors (order-of-magnitude)¶

Resource	Rule of thumb	At ~100K RPS
CPU	TLS + JSON parsing dominate; HTTP/2 multiplexing amortizes handshakes	16–32 vCPU class machines (workload-dependent)
Memory	Connection buffers, TLS session cache, route tables	32–64 GB if large body buffering
FDs	`ulimit -n`; one FD per upstream + client connection	Tune to 1M+ on Linux with `sysctl`

Warning

100K RPS per node is a design target, not a guarantee—payload size, TLS resumption rate, and plugin CPU can move this by 10×.

3. High-Level Design¶

System Architecture¶

                         +------------------+
                         |   Global DNS /   |
                         |  Geo / Anycast   |
                         +--------+---------+
                                  |
          +-----------------------+-----------------------+
          |                       |                       |
   +------v------+          +------v------+         +------v------+
   |  Region A   |          |  Region B   |         |  Region C   |
   |  Gateway LB |          |  Gateway LB |         |  Gateway LB |
   +------+------+          +------+------+         +------+------+
          |                        |                        |
   +------v------+          +------v------+         +------v------+
   | Gateway pool|          | Gateway pool|         | Gateway pool|
   | (Envoy/Kong)|          | (Envoy/Kong)|         | (Envoy/Kong)|
   +---+---+-----+          +---+---+-----+         +---+---+-----+
       |   |                      |   |                 |   |
       |   |  +-------------------+   |                 |   |
       |   |  |                       |                 |   |
       v   v  v                       v                 v   v
   +---Redis Cluster (rate limits, cache)------------------------+
   |  shard by hash(api_key) / hash(route)                      |
   +----------------------------------------------------------------+
       |
       v
   +---PostgreSQL (audit, keys, route metadata)---+
   +---etcd / Consul (runtime discovery, optional config watch)---+
       |
       v
   +---------------- Upstream microservices ----------------+
   |  Orders │ Payments │ Users │ gRPC internal services |
   +--------------------------------------------------------+

Component Overview¶

Component	Role
Edge LB	L4/L7 health checks, TLS offload optional first hop
Gateway data plane	Policy chain: TLS → auth → route → limit → transform → proxy
Redis	Token buckets, sliding windows, response cache
PostgreSQL	API keys, audit trail, route config snapshots
etcd / Consul	Service catalog, KV for dynamic upstreams, config watches
Control plane API	Admin REST for routes, limits, keys; publishes to data plane
Secret store	KMS/Vault for signing keys, TLS certs
OTel collector	Metrics, logs, traces export

Core Flow¶

Client connects with TLS; gateway terminates (or passes through to upstream per policy).
Router matches host + path + method + headers; resolves upstream cluster and traffic weight (canary).
Authentication extracts API key or bearer token; validates JWT locally or calls introspection.
Authorization checks scopes/roles against route requirements.
Rate limiter runs token bucket in Redis (Lua atomicity); may enforce global counter.
Optional cache lookup for idempotent GET with cache policy.
Load balancer picks upstream (RR / least-conn / consistent hash on user id).
Circuit breaker gates unhealthy origins; may short-circuit with 503.
Transform adjusts headers/body; gRPC bridge if needed.
Proxy streams request/response; WebSocket/SSE upgraded connections pass through.
Observability emits span (OTel), metrics (latency, codes), audit row (async).
Response returns to client with Retry-After on 429 if limited.

End-to-end sequence (happy path)¶

sequenceDiagram participant C as Client participant G as Gateway participant R as Redis participant U as Upstream participant O as OTel C->>G: TLS + HTTP request G->>G: route + auth + authz G->>R: RL token bucket (Lua) R-->>G: allow + remaining G->>U: proxied request + trace headers U-->>G: response G->>O: span + metrics G-->>C: response

Canary routing (deterministic)¶

flowchart LR REQ[Incoming request] H[Hash user_id or api_key_id] P[Bucket 0..99] REQ --> H --> P P -->|"0-94 stable"| ST[upstream v1] P -->|"95-99 canary"| CA[upstream v2]

4. Detailed Component Design¶

4.1 Routing & Traffic Management¶

Path-based routing uses longest-prefix or regex matchers (e.g. /v2/orders/* → orders-v2 cluster). Header-based rules enable tenant isolation: X-Route-Shard: eu → EU pool.

Weighted routing assigns percentages: stable: 95%, canary: 5%. Implementations often use consistent hashing of request id into [0,100) or Envoy runtime / split client with sticky debug headers.

Blue-green: flip weights 0/100 atomically after health checks; canary is gradual. Store weights in control plane; push via xDS, file snapshot, or Redis pub/sub for custom stacks.

Request
   |
   v
+--------+    match host/path/header
| Router |---> route table (versioned)
+--------+
   |
   +---> cluster: orders-prod
           |
           +---> subset: version=canary (5%)

Warning

Hostile clients can game unseeded random routing—prefer deterministic canary (user id hash) for consistent UX and easier debugging.

Service discovery integration¶

Backend	Mechanism	Example
Consul	DNS or HTTP API; long-poll blocking queries	`orders.service.consul` → healthy instances
etcd	Watch prefix `/discovery/orders/`	Custom control plane writes instance list
Kubernetes	Endpoints / EndpointSlice via API watch	`orders.ns.svc.cluster.local`
Static	Bootstrap list in config	DR drills, small clusters

Health-aware LB: only passing instances receive traffic; unhealthy drains over grace period to avoid flapping.

4.2 Authentication & Authorization¶

Mechanism	Behavior	Storage / crypto
API keys	`Authorization: ApiKey <id>` or header `X-API-Key`	Lookup id → hashed secret salt compare in PG or cache; never log raw key
JWT	Validate `alg`, `exp`, `iss`, `aud`; optional JWKS fetch with cache	Local verify (RS256) preferred for latency
OAuth introspection	POST to AS with token; returns `active`, `scope`	Extra RTT—cache short TTL (5–30s) keyed by token hash

Authorization maps JWT scope / custom claims to route templates. Example: orders:write required for POST /orders. Use OPA or Cedar-style policy embedded in control plane for complex ABAC.

4.3 Rate Limiting (Redis Token Bucket)¶

Per API key, IP, and global limits compose as waterfall: evaluate cheapest filters first (global) to protect Redis.

Token bucket (Lua sketch):

-- KEYS[1] = bucket key, ARGV: capacity, refill_per_sec, now_ms, cost
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local data = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(data[1])
local ts = tonumber(data[2])
if tokens == nil then tokens = capacity; ts = now end

local delta = math.max(0, now - ts) / 1000.0
tokens = math.min(capacity, tokens + delta * refill)

if tokens < cost then
  return {0, tokens}  -- denied, remaining
end
tokens = tokens - cost
redis.call('HSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, 3600000)
return {1, tokens}

Global limit: single key or sharded counters with probabilistic merge if key hotspot—at extreme scale use approximate (Redis HyperLogLog not for counts; use GCRA or sliding window in separate keys per slice).

4.4 Transformation, gRPC Bridge & Composition¶

Header injection: add X-Request-Id, X-Forwarded-For, W3C traceparent from OTel context.

Body rewrite: JSON merge patch for legacy clients; large bodies increase latency—prefer minimal transforms at gateway.

REST ↔ gRPC: gRPC-Web proxy (Envoy filter) or JSON transcoding via protobuf google.api.http annotations. Double encoding cost: keep payloads small.

Request aggregation (composition): orchestrate parallel upstream calls (async I/O), merge JSON. Risk: partial failure—define per-subresource error policy (fail all vs degrade). Often better in a BFF; gateway composition is for edge-only latency wins.

4.5 Load Balancing, Circuit Breaker & Caching¶

Algorithms:

Algorithm	When
Round-robin	Homogeneous pods, stateless
Least connections	Long-lived or variable-cost requests
Consistent hashing	Cache affinity on `user_id` / session

Circuit breaker state machine:

        +----------------+
        |    CLOSED      |  requests pass; failures counted
        +-------+--------+
                | failure threshold exceeded
                v
        +-------+--------+
        |     OPEN       |  fail fast (503/429 upstream)
        +-------+--------+
                | after cooldown
                v
        +-------+--------+
        |   HALF-OPEN    |  allow probe traffic
        +-------+--------+
           success -> CLOSED
           failure -> OPEN

Implement with sliding failure rate or consecutive errors; export gauge cb_state{upstream}.

stateDiagram-v2 [*] --> Closed Closed --> Open: threshold exceeded Open --> HalfOpen: after cooldown HalfOpen --> Closed: probe ok HalfOpen --> Open: probe fails note right of Open: fail fast to client

Response caching: key = hash(method, path, normalized query, vary headers). TTL from Cache-Control or route default. Invalidation: admin purge API writing tombstone in Redis; or short TTL for simplicity.

Cache strategy	Mechanism	Tradeoff
Fixed TTL	Redis `SETEX`	Simple; stale reads until expiry
Surrogate-Key	Purge tag `orders:*` on write	Needs publisher hooks from services
ETag / If-None-Match	Conditional GET	Extra RTT; great for REST clients

4.6 Observability, Plugins & Hot Reload¶

OpenTelemetry: start span at entry; inject into upstream grpc/http headers. RED metrics: rate, errors, duration by route, upstream.

Structured logs: JSON with trace_id, route_id, no PII—mask Authorization.

Plugin model: Envoy WASM (Rust/AssemblyScript), NGINX Lua, Kong Go plugins, Traefik middleware. Contract: init → request → response hooks with bounded CPU timeouts.

Hot reload: Envoy xDS (ADS) streams config; SIGHUP on NGINX; custom Go watches etcd prefix /gateway/routes/v{ver} and atomic swap of route table pointer (RCU pattern).

4.7 WebSocket, SSE, and long-lived streams¶

Protocol	Gateway behavior	Pitfalls
WebSocket	HTTP `Upgrade`; bidirectional TCP proxy after handshake	Idle timeouts must exceed heartbeat; sticky optional
SSE	HTTP/1.1 chunked read; unidirectional server→client	LB buffering can delay events—disable proxy buffering
gRPC streaming	HTTP/2 streams end-to-end	Deadline propagation via `grpc-timeout`

Timeouts: align gateway read_timeout, upstream service, and client SDK—shortest wins.

4.8 TLS termination, mTLS, and SPIFFE¶

Mode	Description
TLS terminate at gateway	Certificates on gateway; upstream may be plain HTTP in private network (not ideal) or mTLS
TLS passthrough	Gateway sees SNI only; L4 routing—no HTTP auth at edge
mTLS client→gateway	B2B partners present client cert; map CN/SAN to tenant
mTLS gateway→upstream	SPIFFE IDs; workload identity from SPIRE

[Client --TLS--> Gateway --mTLS SPIFFE--> Upstream]

4.9 Example configuration snippets¶

Envoy (fragment — rate limit + ext_authz): production YAML is verbose; this captures intent:

http_filters:
  - name: envoy.filters.http.ext_authz
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
      grpc_service:
        envoy_grpc:
          cluster_name: ext_authz_cluster
      transport_api_version: V3
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: api_gateway
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_service

etcd watch (Go-style pseudocode) for hot reload:

watchCh := client.Watch(ctx, "/gateway/routes/", clientv3.WithPrefix())
for wresp := range watchCh {
  for _, ev := range wresp.Events {
    snap := loadSnapshotFromEtcd(ctx, client)
    atomic.StorePointer(&routeTable, unsafe.Pointer(snap))
  }
}

Kong declarative (sketch):

_format_version: "3.0"
services:
  - name: orders
    url: http://orders.orders-ns.svc.cluster.local:8080
    routes:
      - name: orders-api
        paths:
          - /v1/orders
plugins:
  - name: rate-limiting
    config:
      minute: 100
      policy: redis
      redis_host: redis.gateway.svc.cluster.local

5. Technology Selection & Tradeoffs¶

Data plane: Envoy vs NGINX vs Kong vs custom Go/Rust¶

Option	Pros	Cons	Our Pick + Rationale
Envoy	Rich L7 filters, xDS hot reload, gRPC-Web, rate limit service, WASM, strong K8s story	Operational complexity; steep learning curve	Default for cloud-native — filters cover TLS, RL, authz extensibility
NGINX / OpenResty	Extremely fast, Lua for glue, huge ops familiarity	Dynamic config less elegant than xDS; extensibility via C/Lua	Good when team is NGINX-native and patterns are stable
Kong (OSS/Enterprise)	Postgres-backed config, plugin ecosystem, admin API	Heavier; vendor path for enterprise features	Good when you want batteries-included API management
Custom Go/Rust	Full control, minimal deps, embed RL/auth exactly	You own security patches, HTTP/2, H2, WS edge cases	Pick at very high scale or special protocols—long-term cost

Config & discovery store¶

Option	Pros	Cons	Our Pick
etcd	Strong consistency, watches, K8s native	Ops overhead if not already in stack	Primary when K8s + gateway config co-located
Consul	Service mesh + DNS; multi-DC	Another stack if not used	Pick if enterprise already standardized
Kubernetes API	No extra CRD infra	API server load at scale	Ingress + Gateway API for k8s-only shops

Rate limit state¶

Option	Pros	Cons	Our Pick
Redis	Lua, TTL, sub-ms, mature	Hot key, memory	Default
Memcached	Simple	No Lua richness	Edge-only simple counters
Local memory	Fastest	Not shared	Layer with Redis for micro-bursts only

Audit & durable metadata¶

Option	Pros	Cons	Our Pick
PostgreSQL	ACID, JSONB, audit triggers	Vertical limits	Default for keys + audit
DynamoDB	Regional, pay-per-use	Less expressive ad hoc	When AWS-native

Rate limiting implementation shape¶

Option	Pros	Cons	Our Pick
Envoy global RL + Redis	Central policy; same math everywhere	Extra hop to rate_limit_service	Large mesh deployments
Embedded Lua/WASM in gateway	Lowest hop count	Harder to uniformly update policy	Single-vendor edge
Dedicated RL microservice (gRPC)	Independent scaling	Critical path dependency	When RL CPU dominates

Custom Go/Rust gateway: when it wins¶

Signal	Meaning
Protocol needs not covered by Envoy filters	Proprietary framing, custom binary
p99 budget < 100µs proxy overhead	Hand-tuned io_uring / zero-copy
Org has platform team to maintain	Security patches, HTTP/3, QUIC

6. CAP Theorem Analysis¶

The gateway system is not one database—partition by concern:

Subsystem	CAP stance	Explanation
Route config (etcd)	CP	Prefer consistent view of routes; brief unavailability better than split-brain routing
Redis rate limits	AP leaning	Under partition, over-admission or fail-open—product choice; per-key primary avoids split writes if single region
PostgreSQL audit	CP	Strong durability for compliance
Response cache	AP	Stale cache acceptable; TTL bounds inconsistency

flowchart TB subgraph DataPlane["Gateway data plane"] G[Ingress workers] end subgraph CP["Control path - CP bias"] E[etcd/Consul] PG[(PostgreSQL)] end subgraph AP["Hot path - latency bias"] R[Redis RL + cache] end G -->|"watch routes"| E G -->|"write audit async"| PG G -->|"Lua token bucket"| R

Partition scenario (Redis unavailable):

flowchart TB P[Network partition / Redis down] P --> FO["Fail open: allow traffic"] P --> FC["Fail closed: 503 or 429"] FO --> R1["Risk: upstream overload"] FC --> R2["Risk: total outage of API"]

Tip

Say clearly: CAP applies per subsystem. The gateway process favors availability of traffic (often fail open on RL) vs strict quota—that is a business knob, not a theorem misunderstanding.

7. SLA and SLO Definitions¶

SLIs & SLOs¶

SLI	Measurement	SLO (example)
Availability	Successful responses (2xx/3xx) / total (excl. client 4xx)	99.99% monthly
Latency	p99 gateway-added latency (upstream subtracted or canary direct)	< 1 ms p99 overhead
Error rate	`5xx` from gateway / all	< 0.01%
Config freshness	Time from publish to 99% nodes applied	< 30 s

Error budget policy¶

Budget remaining	Action
> 50%	Normal release velocity
25–50%	Freeze risky gateway plugin changes
< 25%	Feature freeze; incident review; prioritize reliability work
Exhausted	Stop canary increases; root-cause review before new deploys

Note

99.99% ≈ 4.38 min/month downtime budget—ingress incidents burn budget fast; multi-AZ and graceful degradation are mandatory talking points.

8. Database Schema and Data Model¶

PostgreSQL DDL for route configuration, rate limit rules, API keys, and audit logs.

-- Tenants / environments
CREATE TABLE tenants (
  id              BIGSERIAL PRIMARY KEY,
  external_ref    TEXT NOT NULL UNIQUE,
  name            TEXT NOT NULL,
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Upstream clusters (logical groups of instances)
CREATE TABLE upstream_clusters (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  name            TEXT NOT NULL,
  discovery_type  TEXT NOT NULL CHECK (discovery_type IN ('static','dns','consul','k8s')),
  discovery_spec  JSONB NOT NULL,  -- hosts, SRV name, k8s service ref, etc.
  health_check    JSONB,
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (tenant_id, name)
);

-- Routes: matching + destination + flags
CREATE TABLE routes (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  host_pattern    TEXT NOT NULL,
  path_pattern    TEXT NOT NULL,
  methods         TEXT[] NOT NULL DEFAULT ARRAY['GET','POST','PUT','PATCH','DELETE'],
  header_rules    JSONB,  -- [{"name":"X-Env","op":"eq","value":"canary"}]
  upstream_id     BIGINT NOT NULL REFERENCES upstream_clusters(id),
  weight          INT NOT NULL DEFAULT 100 CHECK (weight >= 0 AND weight <= 100),
  priority        INT NOT NULL DEFAULT 1000,
  tls_mode        TEXT NOT NULL DEFAULT 'terminate' CHECK (tls_mode IN ('terminate','passthrough')),
  mtls_upstream   BOOLEAN NOT NULL DEFAULT false,
  cache_ttl_sec   INT,
  enabled         BOOLEAN NOT NULL DEFAULT true,
  version         BIGINT NOT NULL DEFAULT 1,
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_routes_tenant ON routes(tenant_id);

-- Rate limit rules (evaluated in order)
CREATE TABLE rate_limit_rules (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  name            TEXT NOT NULL,
  scope           TEXT NOT NULL CHECK (scope IN ('global','api_key','ip','route')),
  route_id        BIGINT REFERENCES routes(id),
  algorithm       TEXT NOT NULL DEFAULT 'token_bucket',
  limit_per_sec   NUMERIC NOT NULL,
  burst           NUMERIC NOT NULL,
  priority        INT NOT NULL DEFAULT 100,
  enabled         BOOLEAN NOT NULL DEFAULT true,
  UNIQUE (tenant_id, name)
);

-- API keys (secret stored as hash only)
CREATE TABLE api_keys (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  key_prefix      TEXT NOT NULL,  -- first 8 chars for lookup
  secret_hash     BYTEA NOT NULL,
  salt            BYTEA NOT NULL,
  scopes          TEXT[] NOT NULL DEFAULT '{}',
  status          TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active','revoked')),
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
  revoked_at      TIMESTAMPTZ
);
CREATE INDEX idx_api_keys_prefix ON api_keys(key_prefix) WHERE status = 'active';

-- Audit log (append-heavy; partition in production)
CREATE TABLE gateway_audit_log (
  id              BIGSERIAL PRIMARY KEY,
  ts              TIMESTAMPTZ NOT NULL DEFAULT now(),
  tenant_id       BIGINT REFERENCES tenants(id),
  actor_type      TEXT NOT NULL,  -- 'admin','system'
  actor_id        TEXT,
  action          TEXT NOT NULL,
  resource_type   TEXT NOT NULL,
  resource_id     TEXT,
  payload         JSONB,
  ip              INET,
  trace_id        TEXT
);
CREATE INDEX idx_audit_ts ON gateway_audit_log(ts DESC);
CREATE INDEX idx_audit_actor ON gateway_audit_log(actor_id, ts DESC);

-- Published config snapshots (for audit + rollback)
CREATE TABLE route_config_snapshots (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  version         BIGINT NOT NULL,
  content_hash    BYTEA NOT NULL,
  payload         JSONB NOT NULL,
  published_by    TEXT,
  published_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (tenant_id, version)
);

-- Response cache surrogate keys (optional; for tag-based purge)
CREATE TABLE cache_surrogate_tags (
  id              BIGSERIAL PRIMARY KEY,
  tenant_id       BIGINT NOT NULL REFERENCES tenants(id),
  tag             TEXT NOT NULL,
  cache_key_hash  BYTEA NOT NULL,
  expires_at      TIMESTAMPTZ NOT NULL,
  UNIQUE (tenant_id, tag, cache_key_hash)
);
CREATE INDEX idx_surrogate_tag ON cache_surrogate_tags(tenant_id, tag);

Warning

Partition gateway_audit_log by month; use async insert from gateway to avoid blocking hot path.

9. API Design¶

Admin API — routes & limits¶

POST /admin/v1/tenants/{tenantId}/routes
Content-Type: application/json
Authorization: Bearer <admin-token>

{
  "hostPattern": "api.example.com",
  "pathPattern": "/v1/orders/*",
  "methods": ["GET", "POST"],
  "headerRules": [{ "name": "X-Cell", "op": "eq", "value": "us-east-1a" }],
  "upstreamCluster": "orders-prod",
  "weight": 95,
  "priority": 100,
  "cacheTtlSec": null
}

PATCH /admin/v1/routes/{routeId}
Content-Type: application/json

{ "enabled": false, "weight": 0 }

POST /admin/v1/tenants/{tenantId}/rate-limit-rules
Content-Type: application/json

{
  "name": "orders-key-bucket",
  "scope": "api_key",
  "routeId": 42,
  "limitPerSec": 50,
  "burst": 100
}

API key lifecycle:

POST /admin/v1/tenants/{tenantId}/api-keys
Content-Type: application/json

{
  "scopes": ["orders:read", "orders:write"],
  "description": "Partner ACME integration"
}

201 Created
Content-Type: application/json

{
  "id": "key_01hq...",
  "prefix": "gw_live_a",
  "secret": "gw_live_a************************xyz",
  "scopes": ["orders:read", "orders:write"],
  "createdAt": "2026-04-05T12:00:00Z"
}

Warning

Return secret once; persist only hash + prefix for lookup.

Publish route snapshot (triggers xDS / etcd push):

POST /admin/v1/tenants/{tenantId}/config/publish
Content-Type: application/json

{ "comment": "Enable canary 5% on orders" }

Cache purge by surrogate tag:

POST /admin/v1/tenants/{tenantId}/cache/purge
Content-Type: application/json

{ "tags": ["orders:user:42", "catalog:product:99"] }

Gateway proxy behavior (client-facing)¶

Client request	Gateway behavior
`GET /v1/orders` + valid API key	Route → upstream; emit trace; 200/4xx/5xx
Same, over quota	429 + `Retry-After: 3`
Invalid JWT	401 `WWW-Authenticate`
Upstream circuit open	503 + `Retry-After` optional
WebSocket upgrade	TCP stream proxy; idle timeouts aligned

Introspection cache (internal):

POST /internal/oauth/introspect
Content-Type: application/x-www-form-urlencoded

token=<opaque>&token_type_hint=access_token

Response cached in Redis SET introspect:<sha256(token)> JSON EX 30.

10. Scaling & Production Considerations¶

Topic	Practice
Horizontal scale	Stateless gateway pods + anycast / LB; connection draining on deploy
Redis	Cluster mode; hash tags `{tenant}:rl:*` for locality; replicas for read-heavy cache
Hot keys	Shard global limit; local token refill + sync
Config rollout	Canary control plane; validate JSON schema before xDS push
TLS	Cert auto-renew (ACME); OCSP stapling
Multi-region	Global RL is hard—prefer per-region budgets + async reconciliation
Testing	Chaos on Redis/etcd; verify fail-open vs fail-closed matches policy

Load shedding & backpressure¶

Signal	Action
Redis latency p99 > SLO	Shed non-critical routes first (feature flag)
Upstream queue depth	429 with `Retry-After` beats 504 storm
CPU saturation on gateway	Drop expensive transforms; bypass cache write

Deployment patterns¶

Pattern	Description
Rolling	Replace N% pods; maxUnavailable / maxSurge tuned for connections
Blue-green	Two stacks; flip LB weight instantly
Canary (data plane)	New Envoy binary on 5% pods; compare 5xx and p99

Observability in production¶

Signal	Alert
`gateway_requests_total{code=~"5.."}`	Page if > SLO error budget burn
`gateway_upstream_circuit_breaker_state`	Warn on open
`gateway_config_last_applied_timestamp`	Stale config > 5 min

Capacity planning (back-of-envelope)¶

Goal	Math
1M RPS regional	`1M / 100K = 10` nodes at target + N+2 spare
Redis memory	Count distinct RL keys at peak × bytes/key × 1.5 fragmentation

Tip

Treat gateway as multiplier: a retry storm from clients can 10× upstream load—coordinate retry budgets with 429 semantics.

11. Security, Compliance, and Data Privacy¶

Concern	Design
Secrets	API keys hashed (Argon2/bcrypt); raw keys shown once at creation
TLS	TLS 1.2+; HSTS for public APIs
mTLS	Optional client certs for B2B; SPIFFE identities mesh-style
PII	No full tokens or bodies in logs; mask `Authorization`
Audit	All admin mutations to PostgreSQL; immutable append
Compliance	Data residency: route + upstream pinned to region; GDPR delete flows for API key metadata
DDoS	Syn flood → LB; app-layer → WAF + rate limits; edge partnership (Cloudflare/Akamai)

Interview Self-Check¶

Question	Strong answer hints
Where does JWT verify happen?	Gateway for uniform policy; service for fine-grained authz
Global rate limit at 10M RPS?	Shard counters; approximation; local + merge tradeoffs
Why not cache POST?	Non-idempotent; unless explicit safe extension
Circuit breaker vs retry?	Breaker stops hammering; retries amplify load—retry budget
etcd vs Redis for routes?	etcd for source of truth; Redis for ephemeral RL/cache
gRPC from browser?	gRPC-Web + Envoy; or JSON transcoding
How do you debug p99 regressions?	Compare upstream vs gateway span; check RL Redis; TLS session reuse
ext_authz latency too high?	Cache decisions; move JWT local; batch introspection
WebSocket behind multiple gateways?	IP hash or session stickiness; or single regional anycast
Blue-green vs canary?	Blue-green = instant cut; canary = gradual risk reduction

Staff-level follow-ups¶

Topic	Depth to show
Blast radius	Bad route config → internal validation + staged rollout
Multi-tenant fairness	Noisy neighbor: per-tenant RL + separate Redis shards
Compliance	Audit who changed which route; tie to corp SSO

Quick Reference Tables¶

Routing & resilience¶

Pattern	Use
Path + header match	Tenant / cell routing
Weighted	Canary
Consistent hash	Cache affinity
CB closed/open/half-open	Upstream protection

Technology summary¶

Layer	Typical choice
Data plane	Envoy (+ RL service), Kong, NGINX
RL state	Redis + Lua
Config / SD	etcd, Consul, K8s API
Audit	PostgreSQL
Tracing	OpenTelemetry

HTTP status map (gateway)¶

Code	Cause
401	Auth failed
403	Authz denied
429	Rate limited
503	CB open / no healthy upstream

Redis key patterns (operations cheat sheet)¶

Key family	Example	Purpose
Token bucket	`{t1}:rl:tb:rule:{id}:sub:{hash}`	Per-subject limit
Introspection	`oauth:introspect:{sha256(token)}`	Cached AS response
Response cache	`cache:{h(method,path,query,vary)}`	GET cache entry

Control plane vs data plane (one-liner)¶

Plane	Owns
Control	Route CRUD, keys, audit, publish pipeline
Data	Every request: match, auth, RL, proxy

Tip

Sound bite: “We treat the gateway as a policy execution engine at the edge: Redis for milliseconds-counts, Postgres for durable identity and audit, etcd for consistent routing intent, and OTel so every 429 and 503 is explainable in a trace.”