How to design systems that scale horizontally instead of just buying a bigger server. Load balancers, multi-layer caching strategies, CDN architecture, and the failure modes that bite at scale.
How to design systems that scale horizontally instead of just buying a bigger server. Load balancers, multi-layer caching strategies, CDN architecture, and the failure modes that bite at scale.
Lesson outline
Vertical scaling (scale up): buy a bigger server — more CPU, RAM, faster disk. Simple, no code changes. Has a ceiling and a single point of failure. Good for databases and stateful services.
Horizontal scaling (scale out): add more servers behind a load balancer. Theoretically unlimited scale, no single point of failure. Requires your application to be stateless. Good for stateless API servers.
Real systems use both: horizontal scaling for the stateless app tier (easy), vertical scaling for the database (complex to horizontally scale).
Make your app tier stateless first
If your app servers store session state in-process, you cannot horizontally scale — every request must go to the same server. Move all session data to Redis before adding a second app server.
Round robin: Requests go to servers 1, 2, 3, 1, 2, 3... Equal distribution if requests are similar weight. Fails when some requests are 100x heavier.
Least connections: Route to the server with the fewest active connections. Better for variable-weight requests.
IP hash: Same client IP always goes to the same server. Use for sticky sessions. Downside: poor distribution if many users share an IP (corporate NAT).
Layer 4 vs Layer 7: L4 (TCP) is faster — the load balancer does not inspect HTTP. L7 (HTTP) is smarter — can route by URL path, host header, or cookie. Use L7 for microservices.
1# NGINX Layer 7 Load Balancer23upstream api_servers {least_conn is better than round_robin for variable-latency workloads4least_conn; # Better than round_robin for variable-latency requests56server api-1.internal:3000 weight=2; # 16 cores — gets 2x traffic7server api-2.internal:3000 weight=1; # 8 cores8server api-3.internal:3000 weight=1; # 8 cores9keepalive 32;10}11WebSocket connections must be sticky — the WS is stateful per-connection12upstream ws_servers {13ip_hash; # WebSocket connections must be sticky (stateful connection)14server ws-1.internal:3001;15server ws-2.internal:3001;16}1718server {19listen 443 ssl http2;2021location /ws/ {22proxy_pass http://ws_servers;23proxy_http_version 1.1;24proxy_set_header Upgrade $http_upgrade;25proxy_set_header Connection "upgrade";26}2728location /api/ {29proxy_pass http://api_servers;30proxy_connect_timeout 5s;31proxy_read_timeout 30s;32proxy_set_header X-Real-IP $remote_addr;33proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;34}35}
L1: In-process cache (LRU in-memory): Fastest — no network hop. Single server only. Good for: computed configuration, static lookup tables, expensive computed results.
L2: Distributed cache (Redis, Memcached): Shared across all app servers. ~1ms network hop. Good for: user sessions, API response caching, rate limit counters.
CDN: Edge servers worldwide cache static assets close to users. AWS CloudFront, Cloudflare, Fastly. Reduces latency by 50-150ms internationally. Can cache SSG/ISR page responses.
Cache invalidation strategies: TTL (simplest, may serve stale data), Write-through (update cache on every write), Cache-aside (read-through on miss), Event-driven invalidation (pub/sub on data change).
Cache invalidation is hard
Stale cache is often worse than no cache — users see incorrect data. Design your invalidation strategy before your caching strategy.
1// Cache-Aside (Lazy Loading) Pattern2const CACHE_TTL = 300; // 5 minutes34async function getUserById(userId: string) {5const cacheKey = `user:${userId}`;67// 1. Check L2 cache first8const cached = await redis.get(cacheKey);9if (cached) return JSON.parse(cached); // Cache HIT1011// 2. Miss → query database12const user = await db.query.users.findFirst({13where: eq(users.id, userId),14});15if (!user) return null;1617// 3. Populate cache with TTL18await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));19return user;20}2122async function updateUser(userId: string, data: Partial<User>) {23const updated = await db.update(users)24.set(data).where(eq(users.id, userId)).returning();Always invalidate cache on write — never leave stale data2526// Invalidate cache on write27await redis.del(`user:${userId}`);28return updated[0];29}Cache stampede: many concurrent misses all hit the DB simultaneously — use a lock3031// Cache stampede prevention: distributed lock on cache miss32async function getUserWithLock(userId: string) {33const cacheKey = `user:${userId}`;34const lockKey = `lock:user:${userId}`;3536const cached = await redis.get(cacheKey);37if (cached) return JSON.parse(cached);3839// Try to acquire lock (NX = only if not exists, EX = expire after 5s)40const lockAcquired = await redis.set(lockKey, '1', 'NX', 'EX', 5);41if (!lockAcquired) {42await new Promise(r => setTimeout(r, 50));43return getUserById(userId); // retry — cache likely populated now44}4546try {47const user = await db.query.users.findFirst({ where: eq(users.id, userId) });48if (user) await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));49return user;50} finally {51await redis.del(lockKey); // always release lock52}53}
Cache-Control headers drive CDN behavior:
Content hashing: With content-hashed filenames (Vite, Webpack), the filename changes on every deploy → CDN stores both versions → no purge needed. Without hashing, you must explicitly purge the CDN on every deploy.
Edge functions (Cloudflare Workers, Vercel Edge Functions): Run code at CDN edge for A/B testing, auth validation, geolocation-based content — without a round-trip to origin.
Scaling questions test whether you understand trade-offs and can sequence optimizations correctly.
Common questions:
Strong answers include:
Red flags:
Quick check · Scaling Horizontally: Load Balancing, Caching & CDN
1 / 1
Key takeaways
From the books
System Design Interview – An Insider's Guide — Alex Xu (2020)
Chapter 1: Scale From Zero To Millions
Caching is almost always the first optimization. A 90% cache hit rate reduces database load by 10x without a single schema change.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.