The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.
The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.
Lesson outline
Caching exists because of a fundamental law of physics: accessing data from memory is orders of magnitude faster than accessing it from disk or over a network. The latency table that every FAANG engineer has memorized:
| Storage Level | Latency | Relative Speed | Technology Examples |
|---|---|---|---|
| CPU L1 cache | 1 nanosecond | 1x (baseline) | CPU register |
| CPU L2 cache | 4 nanoseconds | 4x slower | On-die CPU cache |
| RAM (in-process memory) | 100 nanoseconds | 100x slower | Application in-memory cache (Guava, Caffeine) |
| NVMe SSD | 100 microseconds | 100,000x slower | Local SSD storage |
| Redis (same datacenter) | 500 microseconds | 500,000x slower | Remote in-memory store |
| HDD (mechanical disk) | 10 milliseconds | 10,000,000x slower | Traditional spinning disk |
| Database query (with index) | 1-10 milliseconds | 1-10M slower | PostgreSQL, MySQL |
| Database query (no index) | 100-1000 milliseconds | 100M-1B slower | Full table scan |
| Cross-datacenter (US → EU) | 150 milliseconds | 150M slower | Multi-region replication |
The implication is striking: a database query (1ms) is 10,000 times slower than a RAM access (100ns). Redis (0.5ms) is still 5,000 times slower than in-process memory. This means putting data at the right cache level can reduce latency by 1,000-100,000x.
The Practical Impact of the Latency Table
A service doing 10,000 DB queries/sec at 5ms each uses ~50 server-seconds of wait time. If 90% of those queries are cacheable and you add Redis (0.5ms), you reduce wait time from 50 server-seconds to 5.45 server-seconds — a 9x improvement. This means you can serve the same load with 9x fewer servers, or serve 9x more traffic on the same infrastructure. Caching is the single highest-ROI performance optimization in most web systems.
But caching introduces new problems: stale data, cache invalidation complexity, cold start (the "thundering herd"), and memory constraints. Understanding when each problem occurs — and how to prevent it — is what separates a naive cache implementation from a production-grade one.
The 4 Core Problems Caching Solves (and Introduces)
There isn't one cache — there's a hierarchy of caches, each progressively closer to the user. Data flows from the source of truth (database) through multiple cache layers before reaching the user. Each layer has different characteristics: location, capacity, latency, and freshness.
graph LR
U[User Browser] -->|Cache-Control header
max-age=3600| BC[Browser Cache
Local storage
0ms, stalest]
BC -->|Cache miss| CDN[CDN Edge Node
1-50ms from user
GB capacity]
CDN -->|Cache miss| RP[Reverse Proxy Cache
Nginx/Varnish
at datacenter]
RP -->|Cache miss| AC[App Cache
Redis/Memcached
0.5ms, TB capacity]
AC -->|Cache miss| DB[(Database
Source of Truth
1-10ms, any size, freshest)]
style U fill:#4dabf7
style BC fill:#74c0fc
style CDN fill:#a5d8ff
style RP fill:#d0ebff
style AC fill:#e7f5ff
style DB fill:#fffThe caching hierarchy: closer to user = faster but less fresh. Further from user = slower but more fresh.
| Cache Layer | Location | Latency | Capacity | Who Controls Invalidation | TTL Range |
|---|---|---|---|---|---|
| Browser cache | User's device | 0ms (local) | 50-500MB | HTTP Cache-Control headers | Minutes to weeks |
| CDN (CloudFront, Akamai) | Edge PoP, 10-50ms from user | 5-50ms | Effectively unlimited | Cache-Control, CDN API invalidation | Minutes to years |
| Reverse proxy (Nginx, Varnish) | Datacenter edge | < 1ms | 10-100GB per node | Cache-Control, explicit purge | Seconds to minutes |
| Application cache (Redis) | Same datacenter as app | 0.5-2ms | 25 GB - 100 TB (cluster) | Application code explicitly | Seconds to hours |
| DB buffer pool | Database server RAM | < 0.1ms | % of DB instance RAM | Automatic (LRU by DB) | LRU-managed, no TTL |
HTTP Cache-Control: The Most Underused Cache Tool
Browser and CDN caches are controlled by HTTP headers. The most important: Cache-Control: max-age=3600, public — tells both browser AND CDN to cache for 1 hour. Cache-Control: max-age=0, private — tells CDN not to cache (user-specific), browser can cache. Cache-Control: no-store — nothing anywhere should cache this response. ETag: "abc123" — version identifier; browser sends If-None-Match header on next request; if same, server returns 304 Not Modified (no body — saves bandwidth). Most engineers don't set these headers, leaving cache behavior to defaults. Setting them correctly eliminates 50-80% of CDN origin requests for static assets.
The key insight about the hierarchy: when a request arrives at the browser, it checks its local cache first. Cache hit: sub-millisecond, doesn't even reach the network. Cache miss: request goes to CDN. CDN cache hit: 5-50ms, doesn't reach your servers. CDN cache miss: request reaches your origin, hits your reverse proxy, then Redis, then DB. Every layer hit saves the latency and compute of all subsequent layers. A CDN cache hit at 20ms saves you 0.5ms Redis + 5ms DB + network latency + server CPU — a total savings of > 50ms.
Cache-aside is the most common caching pattern and should be your default. The application code explicitly manages the cache: before reading data, check the cache. On a hit, return from cache. On a miss, read from database, populate the cache, then return the data.
Cache-Aside Pattern: Read Path and Write Path
01
Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.
02
Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.
03
Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.
04
Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.
Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.
Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.
Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.
Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.
Cache-Aside Pseudocode
function getUser(userId): cacheKey = "user:" + userId. value = redis.get(cacheKey). if value: return JSON.parse(value) // cache hit. user = db.query("SELECT * FROM users WHERE id = ?", userId). if user: redis.setex(cacheKey, 3600, JSON.stringify(user)) // cache for 1 hour. return user. // Result: DB only hit on cache miss (1/3600 of requests if TTL=1hr and each user accessed roughly every hour)
sequenceDiagram
participant C as Client
participant App as App Server
participant Redis as Redis Cache
participant DB as Database
C->>App: GET /user/123
App->>Redis: GET user:123
alt Cache Hit
Redis-->>App: {name: "Alice", ...} (0.5ms)
App-->>C: 200 OK (total: ~2ms)
else Cache Miss
Redis-->>App: nil
App->>DB: SELECT * FROM users WHERE id=123 (5ms)
DB-->>App: {name: "Alice", ...}
App->>Redis: SET user:123 {JSON} EX 3600
App-->>C: 200 OK (total: ~8ms)
endCache-aside: cache hit serves in ~2ms. Cache miss falls through to DB at ~8ms and populates cache.
Cache-Aside: Pros and Cons
Cache-aside handles reads. But what about writes? There are three write strategies, each with different consistency and performance trade-offs.
The Three Write Caching Patterns at a Glance
Write-through: write to cache AND database synchronously. Always fresh, slower writes. Write-behind (write-back): write to cache first, async flush to database. Fast writes, risk of data loss. Write-around: skip cache entirely on write, write directly to DB. Cache remains clean, but reads after writes will be cache misses.
| Pattern | Write Path | Read Path | Consistency | Write Latency | Data Loss Risk |
|---|---|---|---|---|---|
| Write-Through | Write cache + DB synchronously | Cache always has latest data | Strong | High (both writes must complete) | None (DB always has data) |
| Write-Behind | Write cache, async DB flush | Cache has latest, DB may lag | Eventual | Low (only cache write) | Risk if cache fails before flush |
| Write-Around | Write directly to DB, skip cache | First read = DB (cache miss), subsequent = cache | Strong (DB = truth) | Medium (DB write) | None |
| Cache-Aside + Delete | Write DB, then delete cache key | Next read repopulates cache | Slightly eventual (brief stale window) | Low (delete is fast) | None |
When to Use Each Pattern
The Write Race Condition: Why Order Matters
Dangerous pattern: write DB, then UPDATE cache (not delete). Two concurrent writers can cause: Writer A: DB update → Writer B: DB update → Writer B: cache update (with B's value) → Writer A: cache update (with A's old value, OVERWRITING B's correct value). Cache now has stale data indefinitely. Fix: always DELETE cache on write, never UPDATE. The next read will populate with fresh data. This "delete on write" pattern avoids race conditions entirely.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. This quote is famous because cache invalidation is genuinely hard. Getting it wrong means users see stale data or, worse, incorrect data.
Cache invalidation is the process of removing or updating cached data when the underlying source data changes. There are four strategies, each appropriate for different scenarios.
The 4 Cache Invalidation Strategies
The Cache Invalidation Race Condition
Here's a subtle bug that causes stale data even with event-driven invalidation: T=0: Reader reads cache (miss). T=1: Writer updates DB. T=2: Writer deletes cache key. T=3: Reader writes OLD data to cache (read was started at T=0, before the update). Now cache has stale data until TTL expires. Solution: use the "cache-aside + delete-on-write" pattern consistently. The window where stale data is cached is bounded by the time between your DB read and cache write (typically milliseconds). For critical data where this matters, use write-through or distributed locking.
sequenceDiagram
participant W as Writer
participant DB as Database
participant R as Redis
participant C as Reader
W->>DB: UPDATE users SET name="Bob" WHERE id=123
DB-->>W: OK
W->>R: DEL user:123
R-->>W: OK (key deleted)
C->>R: GET user:123
R-->>C: nil (key was deleted)
C->>DB: SELECT * FROM users WHERE id=123
DB-->>C: {name: "Bob"} ← fresh data
C->>R: SET user:123 {name:"Bob"} EX 3600
Note over W,C: Event-driven invalidation: stale window is
only the time between DB write and DEL
(typically < 1ms in practice)Event-driven invalidation: delete cache key on write, next read fetches fresh data from DB
| Strategy | Freshness | Complexity | Best TTL | Use Case |
|---|---|---|---|---|
| TTL only | Stale by up to TTL | Low | 1 min - 1 hour | Product catalog, news articles, any data where "recent" is OK |
| Event-driven delete | Near-instant (< 1ms stale window) | Medium | 24 hours (fallback) | User profiles, account settings, anything user-visible |
| Cache tags | Instant (bulk invalidation) | High | Unlimited (tag-controlled) | CDN content, pages with complex dependencies |
| Version keys | Instant (old key abandoned) | Medium | Short (prevent bloat) | Static assets, CSS/JS files (cache busting) |
Redis and Memcached are both in-memory key-value stores, but they have very different capabilities. Choosing the wrong one can limit your architecture or add unnecessary complexity.
| Dimension | Redis | Memcached |
|---|---|---|
| Data structures | Strings, Hashes, Lists, Sets, Sorted Sets, HyperLogLog, Streams, Bitmaps, Geospatial | Strings only (arbitrary byte arrays) |
| Persistence | RDB snapshots + AOF (append-only file). Survives restarts. | Memory only. All data lost on restart. |
| Replication | Built-in master-replica replication. Redis Cluster (automatic sharding). | No built-in replication. Third-party tools needed (mcrouter). |
| Pub/Sub messaging | Native PUBLISH/SUBSCRIBE commands. Used by Socket.io, chat systems. | Not supported. |
| Atomic operations | INCR, LPUSH, SADD, ZADD, etc. — all atomic. MULTI/EXEC transactions. | Only CAS (compare-and-swap). Limited atomic ops. |
| Lua scripting | Run Lua scripts server-side (atomic execution) | Not supported. |
| Memory efficiency | Higher overhead per key (data structure metadata) | Lower overhead per key (simpler storage format) |
| Throughput | ~1M ops/sec single node | ~1.5M ops/sec single node (simpler = faster) |
| Horizontal scaling | Redis Cluster: native sharding up to 1000 nodes | Client-side hashing (consistent hashing via client libraries) |
When to Choose Redis vs. Memcached
Choose Redis when: you need data persistence (cache that survives restarts), you need pub/sub messaging, you need sorted sets for leaderboards or rate limiting, you need atomic operations beyond simple CAS, or you want a single technology for both caching AND session storage AND pub/sub. Choose Memcached when: you only need a simple key-value cache with no persistence, you're optimizing for raw throughput (1.5M ops/sec vs 1M), you have a team already expert in Memcached, or you need multi-threading (Memcached is multi-threaded, Redis single-threaded per shard). In practice: Redis is the default choice for 90%+ of use cases. Memcached is typically only chosen for legacy reasons or extreme simplicity requirements.
Redis Data Structures and When to Use Each
A CDN (Content Delivery Network) is a globally distributed cache. Requests are routed to the nearest "Point of Presence" (PoP) instead of your origin server. If the CDN has the content cached, it serves it immediately from 5-50ms away. If not, it fetches from origin, caches, and serves future requests from the edge.
Major CDN Providers and Their Strengths
Cloudflare: 285+ PoPs worldwide, DDoS protection built-in, WAF, free tier available. AWS CloudFront: tight AWS integration, S3 origin, Lambda@Edge for edge computing, pricing: $0.0085-$0.0120/GB depending on region. Fastly: real-time cache invalidation (< 150ms global purge), Varnish-based (highly configurable). Akamai: oldest CDN, enterprise-focused, 4,000+ PoPs, used by major media companies. For most applications on AWS: CloudFront. For edge computing requirements: Cloudflare Workers.
CDN Cache-Control Strategy for Different Content Types
01
Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.
02
API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.
03
User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.
04
Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.
05
Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.
Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.
API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.
User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.
Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.
Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.
CDN Cost Math: Why Getting Cache-Control Right Saves Real Money
Scenario: 10 million page views/day. Each page loads 50 assets (JS, CSS, images) averaging 100KB each. Without CDN: 10M × 50 × 100KB = 50TB/day of origin egress. At AWS egress pricing of $0.09/GB: $4,608/day = $1.68M/year. With CDN at 95% cache hit rate: only 5% hits origin = 2.5TB/day of origin egress = $230/day = $84,000/year. Savings: $1.6M/year. CDN cost at $0.01/GB × 50TB/day at edge: $500/day = $182,500/year. Net savings: $1.6M - $182.5K = $1.42M/year. Getting Cache-Control headers right (to maximize hit rate) is worth significant engineering investment.
The cache stampede (also called thundering herd) is one of the most dangerous failure modes in cached systems. It happens when many requests arrive simultaneously for data that is not in the cache — and all of them go to the database at once.
Scenario: a popular product page is cached. At 11:00 PM, the TTL expires. In the next 100ms, 10,000 requests arrive for this page (it's a flash sale countdown). All 10,000 miss the cache. All 10,000 hit the database simultaneously. The database, which normally handles 1,000 QPS, suddenly receives 100,000 QPS. It falls over. The website is down during the flash sale.
Twitter 2012: Cache Stampede on Trending Topics (Real Incident)
Twitter's trending topics caused a cache stampede in 2012. When a topic started trending, the cache key for that topic expired (or didn't exist yet) simultaneously across millions of users scrolling their timelines. All those users' requests queried the database for the same trending data at once — far more load than the DB could handle. The cascade: DB overloaded → query timeouts → timeline load failures → 503 errors → users retrying → more DB load. A 3-hour partial outage resulted. Post-mortem fix: probabilistic early expiration + prewarming. Twitter now regenerates trending cache before TTL expires, and uses mutex locks to prevent stampedes.
Three Solutions to Cache Stampede
sequenceDiagram
participant R1 as Request 1
participant R2 as Request 2 (concurrent)
participant Redis as Redis
participant DB as Database
R1->>Redis: GET product:123
Redis-->>R1: nil (cache miss)
R1->>Redis: SET product:123:lock 1 NX EX 5
Redis-->>R1: OK (lock acquired)
R2->>Redis: GET product:123
Redis-->>R2: nil (cache miss)
R2->>Redis: SET product:123:lock 1 NX EX 5
Redis-->>R2: FAIL (lock already held)
R2->>R2: Wait 100ms, retry...
R1->>DB: SELECT * FROM products WHERE id=123
DB-->>R1: {name: "Widget", price: 29.99}
R1->>Redis: SET product:123 {JSON} EX 3600
R1->>Redis: DEL product:123:lock
R1-->>R1: Return to client
R2->>Redis: GET product:123
Redis-->>R2: {name: "Widget"} (cache hit!)
R2-->>R2: Return to clientMutex pattern: first request acquires lock, populates cache. Others wait, then see cache hit.
When cache design comes up in a FAANG interview, interviewers expect you to address four specific topics: cache sizing (how big?), key design (how do you structure keys?), eviction policy (what happens when cache is full?), and monitoring (how do you know it's working?). A complete answer covers all four.
The 4 Cache Design Dimensions
01
Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).
02
Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.
03
Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.
04
Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.
Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).
Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.
Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.
Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.
The Cache Hit Rate Formula and What It Tells You
Hit rate = redis_cache_hits / (redis_cache_hits + redis_cache_misses). Redis CLI: redis-cli INFO stats | grep keyspace. Calculate: (keyspace_hits / (keyspace_hits + keyspace_misses)) × 100. Target hit rates by use case: User profile cache: > 70% (users re-requested frequently). Product catalog: > 90% (few products, many requests per product). Timeline/feed: > 95% (heavily read per user per session). Search autocomplete: > 80%. If hit rate is low: (1) TTL is too short — data expires before it's re-requested. (2) Key cardinality is too high — too many unique keys, each cached once. (3) Working set is too large for cache size — increase Redis memory.
graph TB
Size[Cache Sizing] --> WS[Working Set
= total data × 20%]
WS --> Nodes[Redis Nodes
= WS × 2.5 ÷ 25GB × 1.5x replication]
Keys[Key Design] --> NS[Namespace:ID:field
e.g. user:123:profile]
Keys --> Short[Keep short
every byte costs memory]
Keys --> HotKey[Avoid hot keys
shard if > 100K QPS on one key]
Evict[Eviction Policy] --> LRU[allkeys-lru
best for caches]
Evict --> noev[noeviction
only for persistent data]
Monitor[Monitoring] --> HitRate[Hit rate > 80%]
Monitor --> EvictRate[Eviction rate < 1%]
Monitor --> Latency[p99 latency < 1ms]The 4 cache design dimensions: sizing, key design, eviction policy, monitoring
Caching is the most commonly discussed topic in system design interviews because it's required in every scaled system. Interviewers specifically test: (1) Can you explain the caching hierarchy? (2) Do you know cache-aside vs. write-through? (3) Can you explain the cache invalidation problem? (4) Do you know what a cache stampede is and how to prevent it? (5) Can you size a cache (working set × 2.5 / node capacity)? Strong candidates also mention monitoring (hit rate), eviction policy (allkeys-lru), and TTL design.
Common questions:
Key takeaways
You're caching user profile data in Redis with a 1-hour TTL. A user updates their name. For up to 1 hour, other requests might see the old name. What are your options to fix this, and what are the trade-offs?
Option 1: Event-driven invalidation — on profile update, immediately DEL the cache key. Next request fetches fresh data. Pros: near-zero stale window (milliseconds). Cons: every write path must include the DEL; if DEL fails (Redis connection issue), stale data persists until TTL. Option 2: Write-through — on profile update, write the new data to both DB and Redis simultaneously. Pros: cache always has fresh data, no staleness. Cons: every write requires a Redis call (higher write latency, Redis becomes critical path). Option 3: Reduce TTL — use 5 minutes instead of 1 hour. Pros: simpler, no code changes. Cons: more DB load (12x more DB reads per cache entry). Option 4: Real-time notification + push invalidation — publish an event when profile updates; subscribers delete the cache key. Pros: decoupled. Cons: complex architecture. For most production apps: use option 1 (event-driven DEL) + keep the 1-hour TTL as a safety net for missed invalidations.
Your interview question is: design a caching strategy for Twitter's Home Timeline. The feed contains the last 800 tweets from accounts the user follows. It changes constantly as new tweets are posted. How do you design the cache?
The Twitter home timeline is one of the most studied caching problems. Key requirements: (1) Each feed is personalized — user_id must be part of the cache key. (2) Feeds change constantly — new tweets are written every second. (3) Most users have < 2,000 follows — manageable fan-out. (4) Celebrities (10M+ followers) can't use naive fan-out. Design: Cache key: feed:user_id (e.g., feed:12345). Cache value: ordered list of the last 800 tweet IDs (just IDs, not full tweet data). When a user posts a tweet: fan-out to all followers' feed caches using LPUSH (add to front) + LTRIM (keep only 800 items). Tweet content is cached separately: tweet:tweet_id → full tweet JSON. On feed read: LRANGE feed:user_id 0 99 (get 100 tweet IDs), then pipeline GET tweet:tweet_id for each ID. TTL: Feed cache: 7 days (with event-driven LPUSH — no need to expire since it's kept fresh). Tweet cache: 24 hours (tweets rarely change after posting). Celebrity exception: for accounts with > 1M followers, skip fan-out. At read time, merge their recent tweets with the pre-computed fan-out feed. Cache hit rate target: > 99% for active users. Hit rate for 30-day-inactive users: irrelevant (let it expire).
Your Redis cache has a 90% hit rate, but during peak traffic, DB latency spikes. Explain why this might happen and how you diagnose and fix it.
The 10% cache miss rate at peak traffic is the culprit. At 100K QPS and 90% hit rate: 10K QPS hits the DB during normal traffic. If peak is 3x normal (300K QPS): 30K QPS hits the DB. If the DB's comfortable limit is 20K QPS, it saturates during peak. The math reveals: improving hit rate from 90% to 99% reduces DB load from 30K to 3K peak QPS — a 10x improvement without adding any DB infrastructure. Diagnosis steps: (1) redis-cli INFO stats | grep keyspace to confirm hit rate. (2) CloudWatch DB CPU and connection count to correlate DB saturation with traffic peaks. (3) Find the specific cache keys with low hit rates (Redis monitoring with CloudWatch Enhanced Monitoring for ElastiCache or keyspace notifications + sampling). Fixes by root cause: if hot data with short TTL → increase TTL. If popular queries not being cached → add caching to those code paths. If stampede during TTL expiry → add jitter. If DB itself is slow → add DB read replicas to handle the miss traffic. The goal: design for 99%+ hit rate on the hottest 20% of data.
💡 Analogy
Caching is like keeping document copies on your desk instead of walking to the file room every time. Documents on your desk (in-process memory cache) = sub-millisecond. Documents in the filing cabinet in the same office (Redis, same datacenter) = half a second walk. Documents in the basement archive (database) = elevator ride, 5 minutes. The trick is knowing which documents to keep on your desk (hot data), how long a copy stays valid before it goes stale (TTL), and when to run back to the file room to get a fresh copy (cache invalidation). The documents on your desk have expiry dates — some go stale in an hour (news articles), some in a year (product images), some never go stale (static files).
⚡ Core Idea
Cache works by trading freshness for speed. The closer to the user, the faster (and staler) the cache. Every cache layer answers the question: "Is the performance gain worth the staleness risk for this data?" Cache-aside with TTL is the universal default; event-driven invalidation for freshness-sensitive data; write-through for critical data. The hard part is cache invalidation — use event-driven DEL on writes plus a TTL as a fallback to handle missed invalidations.
🎯 Why It Matters
At any meaningful scale, caching is not optional — it's structural. A database with 50K QPS capacity behind a service receiving 500K QPS will be destroyed without a cache. The cache hit rate determines your entire DB infrastructure cost: at 90% hit rate, your DB sees 50K QPS. At 99% hit rate, your DB sees 5K QPS — 10x less, meaning 10x cheaper DB infrastructure. The 9% difference in hit rate determines whether you need a $500/month DB instance or a $5,000/month one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.