Skip to main content
Career Paths
Concepts
Fsp Scaling Caching Cdn
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Scaling Horizontally: Load Balancing, Caching & CDN

How to design systems that scale horizontally instead of just buying a bigger server. Load balancers, multi-layer caching strategies, CDN architecture, and the failure modes that bite at scale.

🎯Key Takeaways
Horizontal scaling requires stateless app servers — move sessions to Redis first
Multi-layer caching: in-process (fastest) → Redis (shared) → CDN (global)
Cache-aside is the most flexible pattern: check cache → DB on miss → populate cache
Use content-hashed filenames to avoid CDN cache purging entirely
Cache-Control headers are as important as application code

Scaling Horizontally: Load Balancing, Caching & CDN

How to design systems that scale horizontally instead of just buying a bigger server. Load balancers, multi-layer caching strategies, CDN architecture, and the failure modes that bite at scale.

~4 min read
Be the first to complete!
What you'll learn
  • Horizontal scaling requires stateless app servers — move sessions to Redis first
  • Multi-layer caching: in-process (fastest) → Redis (shared) → CDN (global)
  • Cache-aside is the most flexible pattern: check cache → DB on miss → populate cache
  • Use content-hashed filenames to avoid CDN cache purging entirely
  • Cache-Control headers are as important as application code

Lesson outline

Vertical vs horizontal scaling: why you need both

Vertical scaling (scale up): buy a bigger server — more CPU, RAM, faster disk. Simple, no code changes. Has a ceiling and a single point of failure. Good for databases and stateful services.

Horizontal scaling (scale out): add more servers behind a load balancer. Theoretically unlimited scale, no single point of failure. Requires your application to be stateless. Good for stateless API servers.

Real systems use both: horizontal scaling for the stateless app tier (easy), vertical scaling for the database (complex to horizontally scale).

Make your app tier stateless first

If your app servers store session state in-process, you cannot horizontally scale — every request must go to the same server. Move all session data to Redis before adding a second app server.

Load balancers: algorithms and sticky sessions

Round robin: Requests go to servers 1, 2, 3, 1, 2, 3... Equal distribution if requests are similar weight. Fails when some requests are 100x heavier.

Least connections: Route to the server with the fewest active connections. Better for variable-weight requests.

IP hash: Same client IP always goes to the same server. Use for sticky sessions. Downside: poor distribution if many users share an IP (corporate NAT).

Layer 4 vs Layer 7: L4 (TCP) is faster — the load balancer does not inspect HTTP. L7 (HTTP) is smarter — can route by URL path, host header, or cookie. Use L7 for microservices.

nginx-lb.conf
1# NGINX Layer 7 Load Balancer
2
3upstream api_servers {
least_conn is better than round_robin for variable-latency workloads
4 least_conn; # Better than round_robin for variable-latency requests
5
6 server api-1.internal:3000 weight=2; # 16 cores — gets 2x traffic
7 server api-2.internal:3000 weight=1; # 8 cores
8 server api-3.internal:3000 weight=1; # 8 cores
9 keepalive 32;
10}
11
WebSocket connections must be sticky — the WS is stateful per-connection
12upstream ws_servers {
13 ip_hash; # WebSocket connections must be sticky (stateful connection)
14 server ws-1.internal:3001;
15 server ws-2.internal:3001;
16}
17
18server {
19 listen 443 ssl http2;
20
21 location /ws/ {
22 proxy_pass http://ws_servers;
23 proxy_http_version 1.1;
24 proxy_set_header Upgrade $http_upgrade;
25 proxy_set_header Connection "upgrade";
26 }
27
28 location /api/ {
29 proxy_pass http://api_servers;
30 proxy_connect_timeout 5s;
31 proxy_read_timeout 30s;
32 proxy_set_header X-Real-IP $remote_addr;
33 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
34 }
35}

Multi-layer caching: L1 → L2 → CDN → Origin

L1: In-process cache (LRU in-memory): Fastest — no network hop. Single server only. Good for: computed configuration, static lookup tables, expensive computed results.

L2: Distributed cache (Redis, Memcached): Shared across all app servers. ~1ms network hop. Good for: user sessions, API response caching, rate limit counters.

CDN: Edge servers worldwide cache static assets close to users. AWS CloudFront, Cloudflare, Fastly. Reduces latency by 50-150ms internationally. Can cache SSG/ISR page responses.

Cache invalidation strategies: TTL (simplest, may serve stale data), Write-through (update cache on every write), Cache-aside (read-through on miss), Event-driven invalidation (pub/sub on data change).

Cache invalidation is hard

Stale cache is often worse than no cache — users see incorrect data. Design your invalidation strategy before your caching strategy.

cache-aside-pattern.ts
1// Cache-Aside (Lazy Loading) Pattern
2const CACHE_TTL = 300; // 5 minutes
3
4async function getUserById(userId: string) {
5 const cacheKey = `user:${userId}`;
6
7 // 1. Check L2 cache first
8 const cached = await redis.get(cacheKey);
9 if (cached) return JSON.parse(cached); // Cache HIT
10
11 // 2. Miss → query database
12 const user = await db.query.users.findFirst({
13 where: eq(users.id, userId),
14 });
15 if (!user) return null;
16
17 // 3. Populate cache with TTL
18 await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));
19 return user;
20}
21
22async function updateUser(userId: string, data: Partial<User>) {
23 const updated = await db.update(users)
24 .set(data).where(eq(users.id, userId)).returning();
Always invalidate cache on write — never leave stale data
25
26 // Invalidate cache on write
27 await redis.del(`user:${userId}`);
28 return updated[0];
29}
Cache stampede: many concurrent misses all hit the DB simultaneously — use a lock
30
31// Cache stampede prevention: distributed lock on cache miss
32async function getUserWithLock(userId: string) {
33 const cacheKey = `user:${userId}`;
34 const lockKey = `lock:user:${userId}`;
35
36 const cached = await redis.get(cacheKey);
37 if (cached) return JSON.parse(cached);
38
39 // Try to acquire lock (NX = only if not exists, EX = expire after 5s)
40 const lockAcquired = await redis.set(lockKey, '1', 'NX', 'EX', 5);
41 if (!lockAcquired) {
42 await new Promise(r => setTimeout(r, 50));
43 return getUserById(userId); // retry — cache likely populated now
44 }
45
46 try {
47 const user = await db.query.users.findFirst({ where: eq(users.id, userId) });
48 if (user) await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(user));
49 return user;
50 } finally {
51 await redis.del(lockKey); // always release lock
52 }
53}

CDN and Cache-Control: caching at the edge

Cache-Control headers drive CDN behavior:

  • public, max-age=31536000, immutable — Content-hashed assets (bundle.abc123.js) — cache forever, never revalidate
  • public, s-maxage=3600, stale-while-revalidate=86400 — Semi-dynamic API responses — CDN caches 1hr, serves stale while revalidating in background
  • private, no-store — Authenticated/sensitive responses — never cache in CDN

Content hashing: With content-hashed filenames (Vite, Webpack), the filename changes on every deploy → CDN stores both versions → no purge needed. Without hashing, you must explicitly purge the CDN on every deploy.

Edge functions (Cloudflare Workers, Vercel Edge Functions): Run code at CDN edge for A/B testing, auth validation, geolocation-based content — without a round-trip to origin.

How this might come up in interviews

Scaling questions test whether you understand trade-offs and can sequence optimizations correctly.

Common questions:

  • How would you scale a system from 1,000 to 1,000,000 users?
  • Explain cache invalidation strategies and their trade-offs.
  • When would you use a CDN for API responses, and what are the risks?

Strong answers include:

  • Sequences: stateless → caching → read replicas → sharding
  • Distinguishes public vs private caching
  • Understands cache stampede and prevention

Red flags:

  • Starts with database sharding as the first scaling lever
  • Does not know what Cache-Control headers do
  • Caches private user data in a public CDN

Quick check · Scaling Horizontally: Load Balancing, Caching & CDN

1 / 1

Your API returns user profile data requested 10,000 times/minute. Best caching strategy?

Key takeaways

  • Horizontal scaling requires stateless app servers — move sessions to Redis first
  • Multi-layer caching: in-process (fastest) → Redis (shared) → CDN (global)
  • Cache-aside is the most flexible pattern: check cache → DB on miss → populate cache
  • Use content-hashed filenames to avoid CDN cache purging entirely
  • Cache-Control headers are as important as application code

From the books

System Design Interview – An Insider's Guide — Alex Xu (2020)

Chapter 1: Scale From Zero To Millions

Caching is almost always the first optimization. A 90% cache hit rate reduces database load by 10x without a single schema change.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.