Skip to main content
Career Paths
Concepts
System Design Caching Layers
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Caching: Strategies, Layers & Invalidation

The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.

🎯Key Takeaways
Caching exists because the latency difference between RAM (100ns) and database (1-10ms) is 10,000-100,000x. At any meaningful scale, caching is structural, not optional.
Use cache-aside as your default read pattern. Write event-driven DEL on every write, with TTL as a safety fallback. This gives near-zero stale windows without write-through complexity.
"Cache invalidation is one of the two hard problems in CS" — because race conditions, missed invalidation paths, and stampedes make it genuinely hard. Always use DEL (not UPDATE) on writes to avoid race conditions.
Cache stampede (thundering herd) occurs when a hot cache key expires and many requests hit the DB simultaneously. Prevent with: mutex locks for high-traffic keys, probabilistic early expiration, and cache warming for predictable traffic events.
Choose Redis over Memcached for 95% of use cases: rich data structures, optional persistence, pub/sub for event-driven invalidation, and native cluster support. Add an in-process cache (Guava/Caffeine) as an L1 cache in front of Redis for the hottest < 1% of data.

Caching: Strategies, Layers & Invalidation

The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.

~23 min read
Be the first to complete!
What you'll learn
  • Caching exists because the latency difference between RAM (100ns) and database (1-10ms) is 10,000-100,000x. At any meaningful scale, caching is structural, not optional.
  • Use cache-aside as your default read pattern. Write event-driven DEL on every write, with TTL as a safety fallback. This gives near-zero stale windows without write-through complexity.
  • "Cache invalidation is one of the two hard problems in CS" — because race conditions, missed invalidation paths, and stampedes make it genuinely hard. Always use DEL (not UPDATE) on writes to avoid race conditions.
  • Cache stampede (thundering herd) occurs when a hot cache key expires and many requests hit the DB simultaneously. Prevent with: mutex locks for high-traffic keys, probabilistic early expiration, and cache warming for predictable traffic events.
  • Choose Redis over Memcached for 95% of use cases: rich data structures, optional persistence, pub/sub for event-driven invalidation, and native cluster support. Add an in-process cache (Guava/Caffeine) as an L1 cache in front of Redis for the hottest < 1% of data.

Lesson outline

Why Caching Exists: The Latency Problem

Caching exists because of a fundamental law of physics: accessing data from memory is orders of magnitude faster than accessing it from disk or over a network. The latency table that every FAANG engineer has memorized:

Storage LevelLatencyRelative SpeedTechnology Examples
CPU L1 cache1 nanosecond1x (baseline)CPU register
CPU L2 cache4 nanoseconds4x slowerOn-die CPU cache
RAM (in-process memory)100 nanoseconds100x slowerApplication in-memory cache (Guava, Caffeine)
NVMe SSD100 microseconds100,000x slowerLocal SSD storage
Redis (same datacenter)500 microseconds500,000x slowerRemote in-memory store
HDD (mechanical disk)10 milliseconds10,000,000x slowerTraditional spinning disk
Database query (with index)1-10 milliseconds1-10M slowerPostgreSQL, MySQL
Database query (no index)100-1000 milliseconds100M-1B slowerFull table scan
Cross-datacenter (US → EU)150 milliseconds150M slowerMulti-region replication

The implication is striking: a database query (1ms) is 10,000 times slower than a RAM access (100ns). Redis (0.5ms) is still 5,000 times slower than in-process memory. This means putting data at the right cache level can reduce latency by 1,000-100,000x.

The Practical Impact of the Latency Table

A service doing 10,000 DB queries/sec at 5ms each uses ~50 server-seconds of wait time. If 90% of those queries are cacheable and you add Redis (0.5ms), you reduce wait time from 50 server-seconds to 5.45 server-seconds — a 9x improvement. This means you can serve the same load with 9x fewer servers, or serve 9x more traffic on the same infrastructure. Caching is the single highest-ROI performance optimization in most web systems.

But caching introduces new problems: stale data, cache invalidation complexity, cold start (the "thundering herd"), and memory constraints. Understanding when each problem occurs — and how to prevent it — is what separates a naive cache implementation from a production-grade one.

The 4 Core Problems Caching Solves (and Introduces)

  • Latency reduction — Cache serves data at memory speed (< 1ms) vs. disk/DB speed (1-100ms). Most visible benefit. Directly improves user experience.
  • Throughput amplification — One DB handles 50K QPS. With 95% cache hit rate, that DB sees only 2,500 QPS — serving 20x the traffic with the same DB. This is the scaling superpower of caching.
  • Compute cost reduction — Some queries are expensive to compute (aggregations, ML ranking, complex JOINs). Caching the result means computing once, serving many times.
  • INTRODUCED: Stale data problem — Cached data is a snapshot at point-in-time. If the underlying data changes, the cache is wrong until invalidated. The cache invalidation problem is why Phil Karlton famously called it "one of two hard problems in CS."

The Caching Hierarchy: Browser → CDN → App → DB

There isn't one cache — there's a hierarchy of caches, each progressively closer to the user. Data flows from the source of truth (database) through multiple cache layers before reaching the user. Each layer has different characteristics: location, capacity, latency, and freshness.

graph LR
    U[User Browser] -->|Cache-Control header
max-age=3600| BC[Browser Cache
Local storage
0ms, stalest]
    BC -->|Cache miss| CDN[CDN Edge Node
1-50ms from user
GB capacity]
    CDN -->|Cache miss| RP[Reverse Proxy Cache
Nginx/Varnish
at datacenter]
    RP -->|Cache miss| AC[App Cache
Redis/Memcached
0.5ms, TB capacity]
    AC -->|Cache miss| DB[(Database
Source of Truth
1-10ms, any size, freshest)]

    style U fill:#4dabf7
    style BC fill:#74c0fc
    style CDN fill:#a5d8ff
    style RP fill:#d0ebff
    style AC fill:#e7f5ff
    style DB fill:#fff

The caching hierarchy: closer to user = faster but less fresh. Further from user = slower but more fresh.

Cache LayerLocationLatencyCapacityWho Controls InvalidationTTL Range
Browser cacheUser's device0ms (local)50-500MBHTTP Cache-Control headersMinutes to weeks
CDN (CloudFront, Akamai)Edge PoP, 10-50ms from user5-50msEffectively unlimitedCache-Control, CDN API invalidationMinutes to years
Reverse proxy (Nginx, Varnish)Datacenter edge< 1ms10-100GB per nodeCache-Control, explicit purgeSeconds to minutes
Application cache (Redis)Same datacenter as app0.5-2ms25 GB - 100 TB (cluster)Application code explicitlySeconds to hours
DB buffer poolDatabase server RAM< 0.1ms% of DB instance RAMAutomatic (LRU by DB)LRU-managed, no TTL

HTTP Cache-Control: The Most Underused Cache Tool

Browser and CDN caches are controlled by HTTP headers. The most important: Cache-Control: max-age=3600, public — tells both browser AND CDN to cache for 1 hour. Cache-Control: max-age=0, private — tells CDN not to cache (user-specific), browser can cache. Cache-Control: no-store — nothing anywhere should cache this response. ETag: "abc123" — version identifier; browser sends If-None-Match header on next request; if same, server returns 304 Not Modified (no body — saves bandwidth). Most engineers don't set these headers, leaving cache behavior to defaults. Setting them correctly eliminates 50-80% of CDN origin requests for static assets.

The key insight about the hierarchy: when a request arrives at the browser, it checks its local cache first. Cache hit: sub-millisecond, doesn't even reach the network. Cache miss: request goes to CDN. CDN cache hit: 5-50ms, doesn't reach your servers. CDN cache miss: request reaches your origin, hits your reverse proxy, then Redis, then DB. Every layer hit saves the latency and compute of all subsequent layers. A CDN cache hit at 20ms saves you 0.5ms Redis + 5ms DB + network latency + server CPU — a total savings of > 50ms.

Cache-Aside (Lazy Loading) Pattern

Cache-aside is the most common caching pattern and should be your default. The application code explicitly manages the cache: before reading data, check the cache. On a hit, return from cache. On a miss, read from database, populate the cache, then return the data.

Cache-Aside Pattern: Read Path and Write Path

→

01

Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.

→

02

Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.

→

03

Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.

04

Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.

1

Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.

2

Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.

3

Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.

4

Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.

Cache-Aside Pseudocode

function getUser(userId): cacheKey = "user:" + userId. value = redis.get(cacheKey). if value: return JSON.parse(value) // cache hit. user = db.query("SELECT * FROM users WHERE id = ?", userId). if user: redis.setex(cacheKey, 3600, JSON.stringify(user)) // cache for 1 hour. return user. // Result: DB only hit on cache miss (1/3600 of requests if TTL=1hr and each user accessed roughly every hour)

sequenceDiagram
    participant C as Client
    participant App as App Server
    participant Redis as Redis Cache
    participant DB as Database

    C->>App: GET /user/123
    App->>Redis: GET user:123
    alt Cache Hit
        Redis-->>App: {name: "Alice", ...} (0.5ms)
        App-->>C: 200 OK (total: ~2ms)
    else Cache Miss
        Redis-->>App: nil
        App->>DB: SELECT * FROM users WHERE id=123 (5ms)
        DB-->>App: {name: "Alice", ...}
        App->>Redis: SET user:123 {JSON} EX 3600
        App-->>C: 200 OK (total: ~8ms)
    end

Cache-aside: cache hit serves in ~2ms. Cache miss falls through to DB at ~8ms and populates cache.

Cache-Aside: Pros and Cons

  • Pro: Resilient to cache failure — Cache is on the hot path but not required. If Redis goes down, requests hit the database. Slow, but not broken. No data loss.
  • Pro: Only caches requested data — Lazy loading means only data that's actually requested gets cached. The cache fills organically with hot data, not all data.
  • Con: Cache miss penalty — First request after cache eviction is slow (DB read + Redis write). Under cold start or after a cache clear, all requests hit the DB simultaneously — the thundering herd problem.
  • Con: Potential stale data window — Between a DB update and cache invalidation, readers may see stale data. The window is: time between update and DEL key (typically milliseconds in well-written code, but can be longer if DEL fails).

Write-Through and Write-Behind Caching Patterns

Cache-aside handles reads. But what about writes? There are three write strategies, each with different consistency and performance trade-offs.

The Three Write Caching Patterns at a Glance

Write-through: write to cache AND database synchronously. Always fresh, slower writes. Write-behind (write-back): write to cache first, async flush to database. Fast writes, risk of data loss. Write-around: skip cache entirely on write, write directly to DB. Cache remains clean, but reads after writes will be cache misses.

PatternWrite PathRead PathConsistencyWrite LatencyData Loss Risk
Write-ThroughWrite cache + DB synchronouslyCache always has latest dataStrongHigh (both writes must complete)None (DB always has data)
Write-BehindWrite cache, async DB flushCache has latest, DB may lagEventualLow (only cache write)Risk if cache fails before flush
Write-AroundWrite directly to DB, skip cacheFirst read = DB (cache miss), subsequent = cacheStrong (DB = truth)Medium (DB write)None
Cache-Aside + DeleteWrite DB, then delete cache keyNext read repopulates cacheSlightly eventual (brief stale window)Low (delete is fast)None

When to Use Each Pattern

  • Cache-aside + delete (most common) — Use for most CRUD applications. Write to DB, delete the cache key. Simple, resilient, eventually consistent. Perfect for user profiles, product info, any data that's read far more than written.
  • Write-through — Use when readers expect immediate consistency: "I just updated my profile — I should see the change immediately." Write-through ensures the cache is always fresh. Cost: every write hits both DB and cache, so write latency increases.
  • Write-behind — Use for high-write-throughput workloads where some data loss is acceptable: analytics events, clickstream data, metrics aggregation. Buffer writes in Redis, flush to DB in batches. 1,000 DB writes become 1 bulk insert. Risk: if Redis fails before flush, you lose the buffered writes.
  • Write-around — Use for data that is written once and rarely read back immediately: log files, archive data, batch-imported records. Prevents cache pollution with data that users won't re-read shortly after writing.

The Write Race Condition: Why Order Matters

Dangerous pattern: write DB, then UPDATE cache (not delete). Two concurrent writers can cause: Writer A: DB update → Writer B: DB update → Writer B: cache update (with B's value) → Writer A: cache update (with A's old value, OVERWRITING B's correct value). Cache now has stale data indefinitely. Fix: always DELETE cache on write, never UPDATE. The next read will populate with fresh data. This "delete on write" pattern avoids race conditions entirely.

Cache Invalidation Strategies: The Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. This quote is famous because cache invalidation is genuinely hard. Getting it wrong means users see stale data or, worse, incorrect data.

Cache invalidation is the process of removing or updating cached data when the underlying source data changes. There are four strategies, each appropriate for different scenarios.

The 4 Cache Invalidation Strategies

  • TTL (Time-To-Live) Expiration — Set a maximum lifetime on cached data. After TTL expires, the next read fetches fresh data and refreshes the cache. Pros: simple, automatic, no explicit invalidation code. Cons: data is always somewhat stale (up to TTL duration). Best for: product catalog, static pages, anything where "recent" is good enough.
  • Event-Driven Invalidation — When data changes, explicitly delete the cache key. Code: after UPDATE users SET ..., do Redis.DEL("user:" + user_id). Pros: near-zero stale window (only between DB write and DEL). Cons: every write path must include cache invalidation. Easy to miss one update path. Best for: user profiles, anything where freshness matters.
  • Cache Tags / Surrogate Keys — Tag cache entries with related identifiers. Invalidate all entries with a given tag when data changes. Example: product page cached with tags ["product:123", "category:electronics"]. When product 123 changes: invalidate all entries tagged "product:123". Varnish and Fastly CDN support this natively. Pros: efficient bulk invalidation. Cons: complex to implement correctly.
  • Version-Based Invalidation — Include a version number in the cache key: "user:123:v5". When data changes, increment the version. Old key naturally expires by TTL. Pros: atomic — no invalidation race conditions. Cons: cache usage is higher (old versions linger until TTL). Best for: immutable-ish data (product images, CSS files — cache busting).

The Cache Invalidation Race Condition

Here's a subtle bug that causes stale data even with event-driven invalidation: T=0: Reader reads cache (miss). T=1: Writer updates DB. T=2: Writer deletes cache key. T=3: Reader writes OLD data to cache (read was started at T=0, before the update). Now cache has stale data until TTL expires. Solution: use the "cache-aside + delete-on-write" pattern consistently. The window where stale data is cached is bounded by the time between your DB read and cache write (typically milliseconds). For critical data where this matters, use write-through or distributed locking.

sequenceDiagram
    participant W as Writer
    participant DB as Database
    participant R as Redis
    participant C as Reader

    W->>DB: UPDATE users SET name="Bob" WHERE id=123
    DB-->>W: OK
    W->>R: DEL user:123
    R-->>W: OK (key deleted)

    C->>R: GET user:123
    R-->>C: nil (key was deleted)
    C->>DB: SELECT * FROM users WHERE id=123
    DB-->>C: {name: "Bob"} ← fresh data
    C->>R: SET user:123 {name:"Bob"} EX 3600

    Note over W,C: Event-driven invalidation: stale window is
only the time between DB write and DEL
(typically < 1ms in practice)

Event-driven invalidation: delete cache key on write, next read fetches fresh data from DB

StrategyFreshnessComplexityBest TTLUse Case
TTL onlyStale by up to TTLLow1 min - 1 hourProduct catalog, news articles, any data where "recent" is OK
Event-driven deleteNear-instant (< 1ms stale window)Medium24 hours (fallback)User profiles, account settings, anything user-visible
Cache tagsInstant (bulk invalidation)HighUnlimited (tag-controlled)CDN content, pages with complex dependencies
Version keysInstant (old key abandoned)MediumShort (prevent bloat)Static assets, CSS/JS files (cache busting)

Redis vs. Memcached: The Deep Dive

Redis and Memcached are both in-memory key-value stores, but they have very different capabilities. Choosing the wrong one can limit your architecture or add unnecessary complexity.

DimensionRedisMemcached
Data structuresStrings, Hashes, Lists, Sets, Sorted Sets, HyperLogLog, Streams, Bitmaps, GeospatialStrings only (arbitrary byte arrays)
PersistenceRDB snapshots + AOF (append-only file). Survives restarts.Memory only. All data lost on restart.
ReplicationBuilt-in master-replica replication. Redis Cluster (automatic sharding).No built-in replication. Third-party tools needed (mcrouter).
Pub/Sub messagingNative PUBLISH/SUBSCRIBE commands. Used by Socket.io, chat systems.Not supported.
Atomic operationsINCR, LPUSH, SADD, ZADD, etc. — all atomic. MULTI/EXEC transactions.Only CAS (compare-and-swap). Limited atomic ops.
Lua scriptingRun Lua scripts server-side (atomic execution)Not supported.
Memory efficiencyHigher overhead per key (data structure metadata)Lower overhead per key (simpler storage format)
Throughput~1M ops/sec single node~1.5M ops/sec single node (simpler = faster)
Horizontal scalingRedis Cluster: native sharding up to 1000 nodesClient-side hashing (consistent hashing via client libraries)

When to Choose Redis vs. Memcached

Choose Redis when: you need data persistence (cache that survives restarts), you need pub/sub messaging, you need sorted sets for leaderboards or rate limiting, you need atomic operations beyond simple CAS, or you want a single technology for both caching AND session storage AND pub/sub. Choose Memcached when: you only need a simple key-value cache with no persistence, you're optimizing for raw throughput (1.5M ops/sec vs 1M), you have a team already expert in Memcached, or you need multi-threading (Memcached is multi-threaded, Redis single-threaded per shard). In practice: Redis is the default choice for 90%+ of use cases. Memcached is typically only chosen for legacy reasons or extreme simplicity requirements.

Redis Data Structures and When to Use Each

  • String (GET/SET) — Simple caching: user profiles, API responses, HTML fragments. SET user:123 {JSON} EX 3600. The most basic and most-used type.
  • Hash (HSET/HGET) — Structured objects where you need partial updates. Instead of serializing an entire user object, store each field separately: HSET user:123 name "Alice" email "alice@..." Updated fields don't require re-reading the entire object.
  • Sorted Set (ZADD/ZRANGE) — Leaderboards, rate limiting (score = timestamp), priority queues. ZADD leaderboard 9500 "user_A". ZRANGE leaderboard 0 9 WITHSCORES (top 10 players). Critical for any ranked list.
  • List (LPUSH/RPOP) — Message queues (LPUSH + BRPOP = blocking queue), activity feeds (most recent N items). LPUSH user:123:feed {tweet_JSON}. LRANGE user:123:feed 0 99 (latest 100 feed items).
  • Set (SADD/SMEMBERS) — Unique visitors, "who liked this post", tag management. SADD post:456:likes user_789. SCARD post:456:likes (count of likes). Set operations: SINTER (intersection), SUNION (union) for "friends who follow this artist."
  • HyperLogLog — Approximate unique count with 0.81% error. Uses 12KB regardless of data size. PFADD daily-visitors user_123. PFCOUNT daily-visitors. Perfect for analytics: unique page visitors, unique searches.

CDN Caching: Edge Nodes, Cache-Control Headers, and Cost Math

A CDN (Content Delivery Network) is a globally distributed cache. Requests are routed to the nearest "Point of Presence" (PoP) instead of your origin server. If the CDN has the content cached, it serves it immediately from 5-50ms away. If not, it fetches from origin, caches, and serves future requests from the edge.

Major CDN Providers and Their Strengths

Cloudflare: 285+ PoPs worldwide, DDoS protection built-in, WAF, free tier available. AWS CloudFront: tight AWS integration, S3 origin, Lambda@Edge for edge computing, pricing: $0.0085-$0.0120/GB depending on region. Fastly: real-time cache invalidation (< 150ms global purge), Varnish-based (highly configurable). Akamai: oldest CDN, enterprise-focused, 4,000+ PoPs, used by major media companies. For most applications on AWS: CloudFront. For edge computing requirements: Cloudflare Workers.

CDN Cache-Control Strategy for Different Content Types

→

01

Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.

→

02

API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.

→

03

User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.

→

04

Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.

05

Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.

1

Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.

2

API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.

3

User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.

4

Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.

5

Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.

CDN Cost Math: Why Getting Cache-Control Right Saves Real Money

Scenario: 10 million page views/day. Each page loads 50 assets (JS, CSS, images) averaging 100KB each. Without CDN: 10M × 50 × 100KB = 50TB/day of origin egress. At AWS egress pricing of $0.09/GB: $4,608/day = $1.68M/year. With CDN at 95% cache hit rate: only 5% hits origin = 2.5TB/day of origin egress = $230/day = $84,000/year. Savings: $1.6M/year. CDN cost at $0.01/GB × 50TB/day at edge: $500/day = $182,500/year. Net savings: $1.6M - $182.5K = $1.42M/year. Getting Cache-Control headers right (to maximize hit rate) is worth significant engineering investment.

Cache Stampede & Thundering Herd: The Cold Start Problem

The cache stampede (also called thundering herd) is one of the most dangerous failure modes in cached systems. It happens when many requests arrive simultaneously for data that is not in the cache — and all of them go to the database at once.

Scenario: a popular product page is cached. At 11:00 PM, the TTL expires. In the next 100ms, 10,000 requests arrive for this page (it's a flash sale countdown). All 10,000 miss the cache. All 10,000 hit the database simultaneously. The database, which normally handles 1,000 QPS, suddenly receives 100,000 QPS. It falls over. The website is down during the flash sale.

Twitter 2012: Cache Stampede on Trending Topics (Real Incident)

Twitter's trending topics caused a cache stampede in 2012. When a topic started trending, the cache key for that topic expired (or didn't exist yet) simultaneously across millions of users scrolling their timelines. All those users' requests queried the database for the same trending data at once — far more load than the DB could handle. The cascade: DB overloaded → query timeouts → timeline load failures → 503 errors → users retrying → more DB load. A 3-hour partial outage resulted. Post-mortem fix: probabilistic early expiration + prewarming. Twitter now regenerates trending cache before TTL expires, and uses mutex locks to prevent stampedes.

Three Solutions to Cache Stampede

  • Mutex / Cache Lock — First request to find a cache miss acquires a distributed lock (Redis SET NX EX). Other requests wait. Lock holder queries DB, populates cache, releases lock. Other requests see cache hit. Downside: requests that arrived during the lock wait — adding latency. Use for: highly popular data with expensive DB queries. Avoid for: queries that must return in < 10ms (lock waiting adds latency).
  • Probabilistic Early Expiration (PER) — Don't wait for TTL to expire. Occasionally regenerate the cache early with some probability that increases as expiry approaches. Formula: probability_to_refresh = exp(-remaining_ttl / delta) where delta controls aggressiveness. When probability triggers, fetch fresh data and update cache proactively. Downside: occasional "early" DB reads even when cache is still valid. Use for: very high-traffic keys where any stampede is catastrophic.
  • Cache Warming / Pre-Generation — Before cache expires (or before an expected traffic spike), proactively regenerate the cache entry. Background job runs every 50 minutes to refresh 1-hour TTL entries. For events: run cache warmer for expected popular pages 30 minutes before go-live. Netflix warms its recommendation cache before major show releases. Downside: requires a job scheduler and knowledge of what to pre-warm. Use for: predictable high-traffic events, page-level caching.
sequenceDiagram
    participant R1 as Request 1
    participant R2 as Request 2 (concurrent)
    participant Redis as Redis
    participant DB as Database

    R1->>Redis: GET product:123
    Redis-->>R1: nil (cache miss)
    R1->>Redis: SET product:123:lock 1 NX EX 5
    Redis-->>R1: OK (lock acquired)

    R2->>Redis: GET product:123
    Redis-->>R2: nil (cache miss)
    R2->>Redis: SET product:123:lock 1 NX EX 5
    Redis-->>R2: FAIL (lock already held)
    R2->>R2: Wait 100ms, retry...

    R1->>DB: SELECT * FROM products WHERE id=123
    DB-->>R1: {name: "Widget", price: 29.99}
    R1->>Redis: SET product:123 {JSON} EX 3600
    R1->>Redis: DEL product:123:lock
    R1-->>R1: Return to client

    R2->>Redis: GET product:123
    Redis-->>R2: {name: "Widget"} (cache hit!)
    R2-->>R2: Return to client

Mutex pattern: first request acquires lock, populates cache. Others wait, then see cache hit.

Cache Design in Interviews: Sizing, Keys, Eviction, Monitoring

When cache design comes up in a FAANG interview, interviewers expect you to address four specific topics: cache sizing (how big?), key design (how do you structure keys?), eviction policy (what happens when cache is full?), and monitoring (how do you know it's working?). A complete answer covers all four.

The 4 Cache Design Dimensions

→

01

Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).

→

02

Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.

→

03

Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.

04

Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.

1

Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).

2

Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.

3

Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.

4

Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.

The Cache Hit Rate Formula and What It Tells You

Hit rate = redis_cache_hits / (redis_cache_hits + redis_cache_misses). Redis CLI: redis-cli INFO stats | grep keyspace. Calculate: (keyspace_hits / (keyspace_hits + keyspace_misses)) × 100. Target hit rates by use case: User profile cache: > 70% (users re-requested frequently). Product catalog: > 90% (few products, many requests per product). Timeline/feed: > 95% (heavily read per user per session). Search autocomplete: > 80%. If hit rate is low: (1) TTL is too short — data expires before it's re-requested. (2) Key cardinality is too high — too many unique keys, each cached once. (3) Working set is too large for cache size — increase Redis memory.

graph TB
    Size[Cache Sizing] --> WS[Working Set
= total data × 20%]
    WS --> Nodes[Redis Nodes
= WS × 2.5 ÷ 25GB × 1.5x replication]

    Keys[Key Design] --> NS[Namespace:ID:field
e.g. user:123:profile]
    Keys --> Short[Keep short
every byte costs memory]
    Keys --> HotKey[Avoid hot keys
shard if > 100K QPS on one key]

    Evict[Eviction Policy] --> LRU[allkeys-lru
best for caches]
    Evict --> noev[noeviction
only for persistent data]

    Monitor[Monitoring] --> HitRate[Hit rate > 80%]
    Monitor --> EvictRate[Eviction rate < 1%]
    Monitor --> Latency[p99 latency < 1ms]

The 4 cache design dimensions: sizing, key design, eviction policy, monitoring

How this might come up in interviews

Caching is the most commonly discussed topic in system design interviews because it's required in every scaled system. Interviewers specifically test: (1) Can you explain the caching hierarchy? (2) Do you know cache-aside vs. write-through? (3) Can you explain the cache invalidation problem? (4) Do you know what a cache stampede is and how to prevent it? (5) Can you size a cache (working set × 2.5 / node capacity)? Strong candidates also mention monitoring (hit rate), eviction policy (allkeys-lru), and TTL design.

Common questions:

  • L4: "What's the difference between cache-aside and write-through caching?" [Tests: basic pattern knowledge and ability to explain trade-offs]
  • L4-L5: "You're caching user profiles with a 1-hour TTL. A user updates their profile. What problem occurs and how do you fix it?" [Tests: cache invalidation problem identification and event-driven solution]
  • L5: "What is a cache stampede and when does it occur? How do you prevent it?" [Tests: thundering herd knowledge, mutex lock or probabilistic early expiration solutions]
  • L5-L6: "Design the caching layer for Twitter's home timeline. How do you cache 300M users' personalized feeds?" [Tests: cache key design, fan-out vs pull, Redis sorted sets, working set calculation]
  • L6: "What's the difference between Redis and Memcached? When would you choose each?" [Tests: depth of knowledge on data structures, persistence, pub/sub, cluster models]
  • L6-L7: "Design a caching system that needs to handle 10 million cache keys with sub-millisecond invalidation globally across 3 regions." [Tests: CDN surrogate keys, Redis replication lag, eventual consistency of cache invalidation, global CDN purge APIs]

Key takeaways

  • Caching exists because the latency difference between RAM (100ns) and database (1-10ms) is 10,000-100,000x. At any meaningful scale, caching is structural, not optional.
  • Use cache-aside as your default read pattern. Write event-driven DEL on every write, with TTL as a safety fallback. This gives near-zero stale windows without write-through complexity.
  • "Cache invalidation is one of the two hard problems in CS" — because race conditions, missed invalidation paths, and stampedes make it genuinely hard. Always use DEL (not UPDATE) on writes to avoid race conditions.
  • Cache stampede (thundering herd) occurs when a hot cache key expires and many requests hit the DB simultaneously. Prevent with: mutex locks for high-traffic keys, probabilistic early expiration, and cache warming for predictable traffic events.
  • Choose Redis over Memcached for 95% of use cases: rich data structures, optional persistence, pub/sub for event-driven invalidation, and native cluster support. Add an in-process cache (Guava/Caffeine) as an L1 cache in front of Redis for the hottest < 1% of data.
Before you move on: can you answer these?

You're caching user profile data in Redis with a 1-hour TTL. A user updates their name. For up to 1 hour, other requests might see the old name. What are your options to fix this, and what are the trade-offs?

Option 1: Event-driven invalidation — on profile update, immediately DEL the cache key. Next request fetches fresh data. Pros: near-zero stale window (milliseconds). Cons: every write path must include the DEL; if DEL fails (Redis connection issue), stale data persists until TTL. Option 2: Write-through — on profile update, write the new data to both DB and Redis simultaneously. Pros: cache always has fresh data, no staleness. Cons: every write requires a Redis call (higher write latency, Redis becomes critical path). Option 3: Reduce TTL — use 5 minutes instead of 1 hour. Pros: simpler, no code changes. Cons: more DB load (12x more DB reads per cache entry). Option 4: Real-time notification + push invalidation — publish an event when profile updates; subscribers delete the cache key. Pros: decoupled. Cons: complex architecture. For most production apps: use option 1 (event-driven DEL) + keep the 1-hour TTL as a safety net for missed invalidations.

Your interview question is: design a caching strategy for Twitter's Home Timeline. The feed contains the last 800 tweets from accounts the user follows. It changes constantly as new tweets are posted. How do you design the cache?

The Twitter home timeline is one of the most studied caching problems. Key requirements: (1) Each feed is personalized — user_id must be part of the cache key. (2) Feeds change constantly — new tweets are written every second. (3) Most users have < 2,000 follows — manageable fan-out. (4) Celebrities (10M+ followers) can't use naive fan-out. Design: Cache key: feed:user_id (e.g., feed:12345). Cache value: ordered list of the last 800 tweet IDs (just IDs, not full tweet data). When a user posts a tweet: fan-out to all followers' feed caches using LPUSH (add to front) + LTRIM (keep only 800 items). Tweet content is cached separately: tweet:tweet_id → full tweet JSON. On feed read: LRANGE feed:user_id 0 99 (get 100 tweet IDs), then pipeline GET tweet:tweet_id for each ID. TTL: Feed cache: 7 days (with event-driven LPUSH — no need to expire since it's kept fresh). Tweet cache: 24 hours (tweets rarely change after posting). Celebrity exception: for accounts with > 1M followers, skip fan-out. At read time, merge their recent tweets with the pre-computed fan-out feed. Cache hit rate target: > 99% for active users. Hit rate for 30-day-inactive users: irrelevant (let it expire).

Your Redis cache has a 90% hit rate, but during peak traffic, DB latency spikes. Explain why this might happen and how you diagnose and fix it.

The 10% cache miss rate at peak traffic is the culprit. At 100K QPS and 90% hit rate: 10K QPS hits the DB during normal traffic. If peak is 3x normal (300K QPS): 30K QPS hits the DB. If the DB's comfortable limit is 20K QPS, it saturates during peak. The math reveals: improving hit rate from 90% to 99% reduces DB load from 30K to 3K peak QPS — a 10x improvement without adding any DB infrastructure. Diagnosis steps: (1) redis-cli INFO stats | grep keyspace to confirm hit rate. (2) CloudWatch DB CPU and connection count to correlate DB saturation with traffic peaks. (3) Find the specific cache keys with low hit rates (Redis monitoring with CloudWatch Enhanced Monitoring for ElastiCache or keyspace notifications + sampling). Fixes by root cause: if hot data with short TTL → increase TTL. If popular queries not being cached → add caching to those code paths. If stampede during TTL expiry → add jitter. If DB itself is slow → add DB read replicas to handle the miss traffic. The goal: design for 99%+ hit rate on the hottest 20% of data.

🧠Mental Model

💡 Analogy

Caching is like keeping document copies on your desk instead of walking to the file room every time. Documents on your desk (in-process memory cache) = sub-millisecond. Documents in the filing cabinet in the same office (Redis, same datacenter) = half a second walk. Documents in the basement archive (database) = elevator ride, 5 minutes. The trick is knowing which documents to keep on your desk (hot data), how long a copy stays valid before it goes stale (TTL), and when to run back to the file room to get a fresh copy (cache invalidation). The documents on your desk have expiry dates — some go stale in an hour (news articles), some in a year (product images), some never go stale (static files).

⚡ Core Idea

Cache works by trading freshness for speed. The closer to the user, the faster (and staler) the cache. Every cache layer answers the question: "Is the performance gain worth the staleness risk for this data?" Cache-aside with TTL is the universal default; event-driven invalidation for freshness-sensitive data; write-through for critical data. The hard part is cache invalidation — use event-driven DEL on writes plus a TTL as a fallback to handle missed invalidations.

🎯 Why It Matters

At any meaningful scale, caching is not optional — it's structural. A database with 50K QPS capacity behind a service receiving 500K QPS will be destroyed without a cache. The cache hit rate determines your entire DB infrastructure cost: at 90% hit rate, your DB sees 50K QPS. At 99% hit rate, your DB sees 5K QPS — 10x less, meaning 10x cheaper DB infrastructure. The 9% difference in hit rate determines whether you need a $500/month DB instance or a $5,000/month one.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.