Every system design uses the same dozen primitives: databases, caches, queues, load balancers, CDNs, proxies, object stores, coordination services, search engines, and monitoring stacks. Master what each one does, when to reach for it, and how they fail — and you can design any system.
Every system design uses the same dozen primitives: databases, caches, queues, load balancers, CDNs, proxies, object stores, coordination services, search engines, and monitoring stacks. Master what each one does, when to reach for it, and how they fail — and you can design any system.
Lesson outline
System design is not about inventing new components — it is about assembling known primitives in the right order for your specific constraints. Every system at FAANG is built from the same ~12 building blocks. The difference between a junior and senior engineer is not knowing more components — it is knowing which combination to use and why.
Think of it like Lego. A child sees individual bricks. An architect sees structural patterns: "I need a load-bearing wall here, a window there, a roof truss." Similarly, when a senior engineer hears "Design Instagram," they immediately think: "Object storage for photos, CDN for delivery, Redis for feed cache, Postgres for metadata, Kafka for async processing." They are not designing from scratch — they are selecting from a toolbox.
The Interview Signal
When interviewers ask system design questions, they are testing whether you know these building blocks and can justify why you chose each one. Saying "we need a database" is L4. Saying "we need Postgres for user metadata because we need strong consistency and complex queries, but Cassandra for the activity feed because it is write-heavy and we can tolerate eventual consistency" is L6.
graph TB
Client[Client Browser/App] --> CDN[CDN - Static Assets]
Client --> LB[Load Balancer]
LB --> API[API Servers - Stateless]
API --> Cache[Cache - Redis/Memcached]
API --> DB[(Primary Database)]
API --> Queue[Message Queue]
API --> Search[Search Engine]
API --> ObjStore[Object Storage - S3]
Cache --> DB
Queue --> Workers[Background Workers]
Workers --> DB
Workers --> ObjStore
DB --> Replica[(Read Replicas)]
API --> Coord[Coordination Service - ZooKeeper]
API --> Monitor[Monitoring + Alerting]The standard building blocks of a distributed system — every FAANG design uses some combination of these
In this lesson, we will deep-dive each building block: what it does, when to use it, how it fails, and what real-world system uses it. By the end, you will be able to look at any system design question and immediately identify which blocks you need.
The database is the heart of every system. Your first architectural decision is almost always: SQL or NoSQL? This choice cascades into everything else — your data model, consistency guarantees, scaling strategy, and operational complexity.
| Characteristic | SQL (Postgres, MySQL) | NoSQL Document (MongoDB) | NoSQL Wide-Column (Cassandra) | NoSQL Key-Value (DynamoDB) |
|---|---|---|---|---|
| Data Model | Tables with rows and columns, strict schema | Flexible JSON documents, schema-optional | Rows with dynamic columns, partition key required | Simple key-value pairs, no relations |
| Consistency | Strong (ACID transactions) | Configurable (eventual to strong) | Tunable (eventual to quorum) | Eventual by default, strong optional |
| Scaling | Vertical first, horizontal via sharding or read replicas | Horizontal via sharding | Linear horizontal scaling by design | Automatic horizontal, fully managed |
| Query Power | Full SQL: JOINs, subqueries, aggregations | Rich queries on documents, no JOINs | Limited: partition key + clustering column only | Primary key lookup only, GSIs for secondary |
| Best For | Complex relationships, transactions, analytics | Rapid iteration, semi-structured data | Write-heavy, time-series, event logs | Session stores, user profiles, simple lookups |
| FAANG Example | Instagram user/metadata in Postgres | eBay product catalog in MongoDB | Netflix viewing history in Cassandra | Amazon shopping cart in DynamoDB |
The Decision Framework for Databases
Ask three questions: (1) Do I need JOINs or complex queries? → SQL. (2) Is my write throughput > 50K QPS? → NoSQL wide-column. (3) Do I need sub-millisecond latency on simple lookups? → NoSQL key-value. Most FAANG systems use multiple databases: SQL for metadata, NoSQL for high-throughput data, and a search engine for full-text queries.
Replication and sharding are critical for databases at scale. Replication creates copies on multiple servers for read throughput and durability. Sharding splits data across servers by a partition key for write throughput. The key insight: replication is easy, sharding is hard. Always exhaust vertical scaling and read replicas before sharding.
Database Scaling Progression
01
Single node: handles up to ~5K write QPS and 10TB. Good for most startups. Use Postgres.
02
Read replicas: 1 primary + N replicas. Multiplies read throughput by N. Still one write node.
03
Connection pooling (PgBouncer): reduces connection overhead, squeezes 2-3x more throughput from existing hardware.
04
Vertical scaling: upgrade to larger instance (r5.4xlarge → r5.8xlarge). Quick win, linear cost.
05
Functional sharding: separate databases per service (users DB, orders DB, analytics DB). Reduces blast radius.
06
Horizontal sharding: partition data by key (user_id, region). Required above ~50K write QPS or 50TB. Introduces distributed transactions complexity.
Single node: handles up to ~5K write QPS and 10TB. Good for most startups. Use Postgres.
Read replicas: 1 primary + N replicas. Multiplies read throughput by N. Still one write node.
Connection pooling (PgBouncer): reduces connection overhead, squeezes 2-3x more throughput from existing hardware.
Vertical scaling: upgrade to larger instance (r5.4xlarge → r5.8xlarge). Quick win, linear cost.
Functional sharding: separate databases per service (users DB, orders DB, analytics DB). Reduces blast radius.
Horizontal sharding: partition data by key (user_id, region). Required above ~50K write QPS or 50TB. Introduces distributed transactions complexity.
Caching is the single highest-impact optimization in system design. A cache hit returns data in ~1ms from Redis vs ~10ms from a database query vs ~100ms from a cold disk read. That is a 10-100x improvement per request, and it compounds: a system handling 100K QPS with a 95% cache hit rate only sends 5K QPS to the database.
The Caching Hierarchy (from closest to farthest)
graph LR
Request[Incoming Request] --> AppCache{In-Process Cache?}
AppCache -->|HIT ~100ns| Response[Response]
AppCache -->|MISS| Redis{Redis Cache?}
Redis -->|HIT ~1ms| Response
Redis -->|MISS| DB[(Database ~10ms)]
DB --> WriteBack[Write to Redis]
WriteBack --> ResponseCache-aside (lazy loading) pattern: check caches in order, populate on miss
The three main caching patterns you must know:
| Pattern | How It Works | Pros | Cons | Use When |
|---|---|---|---|---|
| Cache-Aside (Lazy Loading) | App checks cache first. On miss, reads DB, then writes to cache. | Only caches what is actually requested. Simple to implement. | First request for each key is slow (cache miss). Cache can become stale. | Most common default. Use for user profiles, product data, any read-heavy workload. |
| Write-Through | App writes to cache AND DB simultaneously on every write. | Cache is always up-to-date. No stale data. | Higher write latency (two writes per operation). Caches data that may never be read. | When consistency matters more than write speed. Financial data, inventory counts. |
| Write-Behind (Write-Back) | App writes to cache only. Cache asynchronously flushes to DB in batches. | Fastest write path. Absorbs write spikes. | Risk of data loss if cache crashes before flush. Complex to implement correctly. | Write-heavy workloads where small data loss is acceptable. Analytics events, view counts, like counts. |
Cache Invalidation — The Two Hard Problems
Phil Karlton said: "There are only two hard things in Computer Science: cache invalidation and naming things." Stale cache data causes real outages. At Facebook, a cache invalidation bug in 2021 caused users to see other users' data. Three approaches: (1) TTL-based: set a time limit (e.g., 5 minutes), accept staleness within that window. (2) Event-driven: publish a cache invalidation event whenever data changes. (3) Version-based: include a version number in the cache key, increment on write.
Message queues are the shock absorbers of distributed systems. They sit between services that produce work and services that consume it, providing three critical properties: decoupling (producer does not need to know about consumer), buffering (absorbs traffic spikes), and reliability (messages persist until processed).
Without queues, services are tightly coupled: if the email service is down, user registration fails. With a queue between them, registration succeeds immediately and the email is sent when the service recovers. This is the fundamental architectural pattern behind every resilient system at FAANG scale.
| Queue Type | Examples | Ordering | Throughput | Best For |
|---|---|---|---|---|
| Traditional Message Queue | RabbitMQ, ActiveMQ, SQS | FIFO within a queue | 10K-100K msg/sec | Task distribution, job queues, RPC-style messaging |
| Distributed Event Log | Apache Kafka, Amazon Kinesis | Ordered within partition | 1M+ msg/sec per partition | Event sourcing, CDC, real-time analytics, activity feeds |
| Pub/Sub | Google Pub/Sub, SNS, Redis Pub/Sub | No ordering guarantee | Variable, very high | Fan-out notifications, broadcasting events to multiple consumers |
graph LR
API[API Server] -->|"POST /upload"| Queue[Message Queue]
Queue --> W1[Worker 1: Transcode]
Queue --> W2[Worker 2: Thumbnail]
Queue --> W3[Worker 3: Notification]
W1 --> S3[Object Storage]
W2 --> S3
W3 --> Push[Push Notification Service]
subgraph "Without Queue"
API2[API Server] -->|blocks| T[Transcode]
T -->|blocks| TH[Thumbnail]
TH -->|blocks| N[Notify]
endQueues enable parallel async processing — the API responds immediately while workers handle heavy tasks
Kafka vs SQS — The FAANG Decision
Use Kafka when: you need event replay (consumers can re-read old events), high throughput (millions of events/sec), multiple consumers reading the same stream, or event sourcing patterns. Use SQS when: you need simple task distribution, exactly-once processing matters more than throughput, or you want zero operational overhead. Most FAANG systems use both: Kafka for event streaming between services, SQS for individual job queues within a service.
Key queue concepts for interviews: at-least-once delivery means consumers may see duplicates (design for idempotency). Dead letter queues (DLQs) capture messages that fail processing repeatedly — essential for debugging. Back-pressure is what happens when consumers cannot keep up: the queue grows, memory fills, and eventually the system collapses. Always set queue size limits and alerting.
Queue Design Checklist for Interviews
A Content Delivery Network is a geographically distributed network of edge servers that cache and serve content close to users. The fundamental physics: light travels through fiber at ~200,000 km/sec, so a round trip from New York to Singapore (~15,000 km) takes ~150ms minimum. A CDN edge server in Singapore serves the same content in ~5ms. That is a 30x latency reduction that no amount of code optimization can achieve.
How a CDN Request Works
01
User requests image.jpg. DNS resolves to CDN edge server closest to the user (via anycast or geo-DNS).
02
Edge server checks local cache. If present and not expired (cache HIT): return immediately in ~5-20ms.
03
If cache MISS: edge server fetches from origin server (your S3 bucket or API), caches the response, and returns it. First request is slow, subsequent requests are fast.
04
Cache-Control headers determine how long the edge caches the content: max-age=86400 means cache for 24 hours. s-maxage overrides specifically for shared caches (CDN).
05
Cache invalidation: when you deploy new assets, either use cache-busting filenames (bundle.abc123.js) or send purge requests to the CDN API.
User requests image.jpg. DNS resolves to CDN edge server closest to the user (via anycast or geo-DNS).
Edge server checks local cache. If present and not expired (cache HIT): return immediately in ~5-20ms.
If cache MISS: edge server fetches from origin server (your S3 bucket or API), caches the response, and returns it. First request is slow, subsequent requests are fast.
Cache-Control headers determine how long the edge caches the content: max-age=86400 means cache for 24 hours. s-maxage overrides specifically for shared caches (CDN).
Cache invalidation: when you deploy new assets, either use cache-busting filenames (bundle.abc123.js) or send purge requests to the CDN API.
| CDN Feature | CloudFront (AWS) | Akamai | Cloudflare |
|---|---|---|---|
| Edge Locations | 600+ in 100+ cities | 4,100+ in 130+ countries | 300+ in 100+ countries |
| Best For | AWS-native apps, S3 origins | Enterprise, media streaming, gaming | DDoS protection, edge compute, Workers |
| Pricing Model | Pay per GB transferred + requests | Contract-based, volume discounts | Free tier available, then per-request |
| Edge Compute | Lambda@Edge, CloudFront Functions | EdgeWorkers | Cloudflare Workers (V8 isolates) |
| Cache Invalidation | API call, ~60 seconds to propagate | Fast Purge, ~5 seconds | API call, ~30 seconds globally |
CDN Cost Math
CDN pricing is typically $0.02-0.085/GB for the first 10TB/month. Without a CDN, serving 1TB/month directly from origin costs ~$90/month on AWS (data transfer). With a CDN at 95% cache hit rate: only 50GB hits origin ($4.50) + CDN costs ~$20 = $24.50 total. As scale increases, the savings compound: at 100TB/month, CDN saves ~$7,000/month and massively reduces origin load.
What to cache at the CDN: static assets (images, CSS, JS, fonts) are the obvious choice with long TTLs (1 year, use cache-busting filenames). API responses can also be cached at the edge for public, non-personalized content (product listings, news feeds). Dynamic personalized content generally cannot be edge-cached, but you can cache the template and hydrate client-side.
A load balancer distributes incoming traffic across multiple backend servers. Without it, horizontal scaling is impossible — you cannot tell users to manually pick which server to connect to. Load balancers also provide health checking (stop sending traffic to dead servers), SSL termination (offload TLS from app servers), and connection management.
graph TB
Client[Clients] --> DNS[DNS Round Robin]
DNS --> LB1[Load Balancer - Active]
DNS --> LB2[Load Balancer - Standby]
LB1 --> S1[Server 1]
LB1 --> S2[Server 2]
LB1 --> S3[Server 3]
LB1 --> S4[Server 4]
LB2 -.->|failover| S1
LB2 -.->|failover| S2
subgraph "Health Checks"
HC[Every 10s: GET /health]
HC --> S1
HC --> S2
HC --> S3
HC --> S4
endLoad balancer with health checks and standby failover — the standard production setup
| Algorithm | How It Works | Best For | Weakness |
|---|---|---|---|
| Round Robin | Sends requests to servers in sequence: 1, 2, 3, 1, 2, 3... | Stateless services with uniform server capacity | Ignores server load — a slow server gets the same traffic as a fast one |
| Least Connections | Sends to the server with fewest active connections | Long-lived connections (WebSockets, streaming) | Slightly more overhead to track connection counts |
| IP Hash | Hashes client IP to always route same client to same server | Session affinity without cookies | Uneven distribution if some IPs generate more traffic |
| Weighted Round Robin | Like round robin but servers get traffic proportional to weight | Mixed server sizes (e.g., during canary deploys) | Manual weight configuration, does not adapt to real-time load |
A reverse proxy sits in front of your servers and handles concerns like SSL termination, compression, rate limiting, and request routing. In practice, many load balancers are also reverse proxies (Nginx, HAProxy, AWS ALB). The distinction matters in interviews: a load balancer distributes traffic; a reverse proxy adds functionality to the request/response path.
L4 vs L7 Load Balancing
L4 (transport layer) load balancers route based on IP/port — they are fast but cannot inspect HTTP headers, cookies, or URL paths. L7 (application layer) load balancers can route based on HTTP content: /api/* goes to API servers, /static/* goes to CDN origin, /ws/* goes to WebSocket servers. Use L7 (AWS ALB) for most web applications. Use L4 (AWS NLB) for raw TCP/UDP workloads like gaming, IoT, or when you need extreme throughput (millions of connections).
Object storage (AWS S3, Google Cloud Storage, Azure Blob) is purpose-built for storing unstructured binary data: images, videos, backups, logs, ML model artifacts. Unlike file systems or databases, object storage scales to exabytes with 99.999999999% (11 nines) durability. Every FAANG system that handles media relies on object storage.
Key properties: objects are immutable (you replace, never update in place). Access is via HTTP APIs (PUT/GET/DELETE), not file system calls. Objects have metadata (content-type, custom headers) and versioning. Storage tiers optimize cost: hot (frequent access, $0.023/GB/month), infrequent ($0.0125/GB), archive/Glacier ($0.004/GB).
When to Use Object Storage vs Database
Pre-signed URLs — The Upload Pattern
Never proxy large file uploads through your API servers. Instead: (1) Client requests upload URL from API. (2) API generates a pre-signed S3 URL (valid for 15 minutes). (3) Client uploads directly to S3 using the pre-signed URL. (4) S3 notifies your API via event (S3 Event Notification or Lambda). This keeps large files off your API servers, avoids timeouts, and lets S3 handle the heavy lifting.
Databases are terrible at full-text search. Running LIKE "%search term%" on a 100M row table requires a full table scan — O(n) per query. Search engines (Elasticsearch, Apache Solr, OpenSearch) use inverted indexes to return results in O(1) average time regardless of data size. If your system needs search, autocomplete, or faceted filtering, you need a dedicated search engine.
An inverted index maps every word to the list of documents containing it. When you search for "distributed caching," the engine intersects the document lists for "distributed" and "caching" in milliseconds. Add scoring (TF-IDF, BM25) and you get relevance ranking.
| Feature | Elasticsearch | PostgreSQL Full-Text | Algolia (Managed) |
|---|---|---|---|
| Query Latency | 1-10ms for most queries | 10-100ms, degrades with table size | <5ms (optimized for speed) |
| Scaling | Horizontal sharding by index | Limited to single node per table | Fully managed, auto-scales |
| Features | Fuzzy search, aggregations, geo-search, ML ranking | Basic text search, trigram matching | Typo tolerance, faceting, instant search |
| Operational Cost | High — cluster management, reindex | Zero — built into DB | Low ops, high $ at scale |
| Best For | Log analytics, product search at scale | Small-medium datasets, simple search | E-commerce, SaaS product search |
Search is Eventually Consistent
Elasticsearch indexes data asynchronously. When a user creates a post and immediately searches for it, the search index may not have caught up. Design for this: show recently created items from the database, merge with search results. This is the pattern Twitter and LinkedIn use for "your own recent activity" in search.
In interviews, mention search when the system has: user-generated content that needs searching (tweets, products, articles), autocomplete/typeahead features, log aggregation (ELK stack), or complex filtering with facets (price range + category + brand on an e-commerce site).
In a distributed system with hundreds of microservices, how does Service A find Service B? How do you elect a leader when the current leader crashes? How do you manage distributed locks? Coordination services (ZooKeeper, etcd, Consul) solve these problems using consensus algorithms (Raft, Paxos) that guarantee consistency even when nodes fail.
What Coordination Services Do
ZooKeeper vs etcd vs Consul
ZooKeeper: battle-tested, used by Kafka, HBase, Hadoop. Java-based, complex to operate. etcd: used by Kubernetes for all cluster state. Go-based, simpler API (key-value). Consul: HashiCorp product, includes built-in service mesh and health checking. Best developer experience. In interviews, any of these is a valid choice — the key is knowing when you need a coordination service at all.
When to mention coordination services in interviews: leader election scenarios (only one worker should process a partition), distributed cron (only one instance runs the scheduled job), partition assignment (which server owns which data range), and feature flag distribution (real-time config updates without redeployment).
A system without monitoring is flying blind. At FAANG scale, every system has three pillars of observability: metrics (quantitative measurements over time), logs (detailed event records), and traces (request path through distributed services). Together, they answer: "Is the system healthy? If not, what broke and why?"
| Pillar | What It Captures | Tools | Example |
|---|---|---|---|
| Metrics | Numeric values over time: QPS, latency percentiles, error rate, CPU/memory | Prometheus, Datadog, CloudWatch, Grafana | p99 latency spiked from 50ms to 2s at 14:30 UTC |
| Logs | Structured event records with context: timestamp, request ID, user ID, error message | ELK Stack, Splunk, CloudWatch Logs, Loki | ERROR 14:30:05 req=abc123 user=456 "Connection refused to payments-service:8080" |
| Traces | End-to-end request path across services with timing for each hop | Jaeger, Zipkin, AWS X-Ray, Datadog APT | Request abc123: API (2ms) → Auth (5ms) → DB (1500ms) → Response. DB is the bottleneck. |
graph TB
App[Application] -->|"Emit metrics"| Prom[Prometheus]
App -->|"Write logs"| ELK[ELK Stack]
App -->|"Send traces"| Jaeger[Jaeger/Zipkin]
Prom --> Grafana[Grafana Dashboards]
Grafana --> Alert[Alert Manager]
Alert --> PD[PagerDuty/Slack]
ELK --> Search[Log Search + Analysis]
Jaeger --> TraceUI[Trace Visualization]
subgraph "The Four Golden Signals"
GS1[Latency - how long requests take]
GS2[Traffic - how many requests per second]
GS3[Errors - what percentage of requests fail]
GS4[Saturation - how full is the system]
endThe three pillars of observability feed into dashboards, alerts, and debugging workflows
The Four Golden Signals (Google SRE)
Google SRE defines four signals that every service must monitor: (1) Latency: p50, p95, p99 response times. (2) Traffic: requests per second. (3) Errors: HTTP 5xx rate, application error rate. (4) Saturation: CPU utilization, memory usage, disk I/O, connection pool usage. If you monitor these four, you catch 95% of production issues before users notice.
In system design interviews, always mention monitoring as part of your design. "I would add Prometheus metrics for QPS, latency percentiles, and error rates on every service, with Grafana dashboards and PagerDuty alerts when p99 latency exceeds 500ms or error rate exceeds 1%." This shows operational maturity that distinguishes L5+ candidates.
Now that you know the individual building blocks, the skill is knowing which ones to reach for given a set of requirements. Here is a decision framework that maps requirements to building blocks:
| Requirement | Building Block | Why |
|---|---|---|
| Store structured data with relationships | SQL Database (Postgres) | ACID transactions, JOINs, complex queries |
| Store > 1TB of write-heavy data | NoSQL (Cassandra/DynamoDB) | Horizontal scaling, high write throughput |
| Reduce read latency by 10-100x | Cache (Redis) | Sub-millisecond reads from memory |
| Store images, videos, files | Object Storage (S3) | Exabyte scale, 11 nines durability, cheap |
| Decouple services, handle async work | Message Queue (Kafka/SQS) | Buffering, retry, parallel processing |
| Full-text search, autocomplete | Search Engine (Elasticsearch) | Inverted index, sub-10ms relevance search |
| Serve static assets globally | CDN (CloudFront) | Edge caching, 30x latency reduction |
| Distribute traffic across servers | Load Balancer (ALB/NLB) | Health checks, SSL termination, routing |
| Leader election, distributed locks | Coordination Service (ZooKeeper) | Consensus-based consistency guarantees |
| Know when things break | Monitoring (Prometheus + Grafana) | Metrics, alerts, dashboards |
The 30-Second Building Block Quiz
For any system design question, quickly identify: (1) What data do I store? → Database type. (2) What is read-heavy? → Add cache. (3) Any binary/media? → Object storage + CDN. (4) Any async work? → Queue + workers. (5) Any search? → Elasticsearch. (6) Multiple servers? → Load balancer. (7) How do I know it is working? → Monitoring. This checklist covers 90% of system design building block decisions.
graph TD
Q1{What data do you store?} -->|Structured, relationships| SQL[SQL Database]
Q1 -->|High write volume, simple access| NoSQL[NoSQL Database]
Q1 -->|Binary files, media| S3[Object Storage]
Q2{Is read latency critical?} -->|Yes, sub-ms needed| Redis[Redis Cache]
Q2 -->|No, 10ms is fine| Direct[Direct DB reads]
Q3{Any async processing?} -->|Yes| Queue[Message Queue]
Q3 -->|No| Sync[Synchronous API]
Q4{Global users?} -->|Yes| CDN[CDN + LB]
Q4 -->|No, single region| LB[Load Balancer only]
Q5{Need search or autocomplete?} -->|Yes| ES[Elasticsearch]
Q5 -->|No| Skip[Skip search engine]Building block selection decision tree — use this in every system design interview
Building block knowledge is tested in every system design interview, usually implicitly. The interviewer says "Design X" and watches whether you instinctively reach for the right components. Explicitly naming building blocks and justifying each choice is what separates strong candidates from average ones. Some interviewers ask targeted questions: "Why Redis and not Memcached?" or "Why Kafka instead of SQS?" to test depth.
Common questions:
Key takeaways
You are asked to design a photo-sharing app like Instagram. List every building block you would use and explain why each is needed.
SQL Database (Postgres): store user profiles, follow relationships, photo metadata (owner, caption, timestamp, likes count) — needs complex queries and strong consistency. Object Storage (S3): store actual photo files — cheap, durable, scales to petabytes. CDN (CloudFront): serve photos from edge locations globally — reduces latency from 200ms to 20ms for users far from origin. Redis Cache: cache user feeds, profile data, popular photo metadata — reduces DB load by 90%+ given heavy read pattern. Message Queue (Kafka): handle async tasks — photo processing (resize, filter), notification delivery, feed fanout to followers. Elasticsearch: power search for users, hashtags, locations. Load Balancer (ALB): distribute traffic across API servers, health checks, SSL termination. Background Workers: consume from Kafka to process photos (generate thumbnails at 5 resolutions), send push notifications, update follower feeds. Monitoring (Prometheus + Grafana): track upload success rate, feed generation latency, CDN hit rate, DB connection pool utilization.
Your Redis cache just went down and you are seeing a massive spike in database load. What is happening and how do you fix it immediately?
This is a cache stampede (also called thundering herd). With Redis down, 100% of read requests that were being served from cache (~95% of all reads) are now hitting the database directly. If the system was handling 100K read QPS with 95% cache hit rate, the database was seeing ~5K QPS. Now it is seeing 100K QPS — a 20x increase that the database cannot handle. Immediate fixes: (1) Enable the database connection pool queue (with timeout) so requests queue rather than overwhelming the DB with connections. (2) If you have a fallback Memcached cluster or read replicas, route traffic there. (3) Implement request coalescing: if 1000 requests arrive for the same cache key, only let one hit the database and queue the rest. (4) Apply aggressive rate limiting at the load balancer. (5) Bring Redis back ASAP — restart with persistence (RDB snapshot) to warm the cache from the last snapshot rather than starting cold. Long-term: deploy Redis in cluster mode with replication so a single node failure does not lose the entire cache.
An interviewer says: "Your system needs to handle 500K messages per second from IoT devices." Which building blocks do you reach for and why?
500K messages/sec is extremely high write throughput. Building blocks: (1) Load Balancer (NLB, not ALB): use L4 load balancing for raw TCP/UDP since IoT protocols are often lightweight (MQTT, CoAP) and L7 parsing overhead is wasteful at this scale. (2) Message Queue (Kafka): Kafka handles millions of messages/sec with horizontal partitioning. Partition by device_id for ordering within a device. 500K msg/sec needs ~50 partitions across 10+ brokers. (3) NoSQL (Cassandra or TimescaleDB): IoT data is time-series — immutable writes ordered by timestamp. Cassandra handles high write throughput natively. Partition key = device_id, clustering column = timestamp. (4) Object Storage (S3): for raw data archival — store compressed hourly batches for long-term analysis. (5) Redis: cache recent device state for real-time dashboards (last heartbeat, latest reading). (6) Stream Processing (Kafka Streams or Flink): real-time anomaly detection on the message stream — alert if temperature exceeds threshold or heartbeat stops. (7) Monitoring: critical at this scale — track message lag, consumer throughput, and storage growth rate.
💡 Analogy
Building blocks in system design are like the standard parts in a hardware store. A plumber does not forge their own pipes — they select from standardized parts (pipes, fittings, valves, seals) and assemble them for the specific building. Similarly, system designers do not invent new storage engines or consensus protocols — they select from standardized building blocks (databases, caches, queues, load balancers) and assemble them for the specific requirements. The skill is not in knowing each part exists — it is in knowing which combination solves the problem at hand, how they connect, and which will fail first under load.
⚡ Core Idea
Every distributed system at any scale is composed of the same ~12 building blocks: SQL databases, NoSQL databases, caches, message queues, object storage, CDNs, load balancers, reverse proxies, search engines, coordination services, monitoring systems, and API gateways. Mastery means knowing what each does, when to reach for it, how it fails, and how it connects to the others. A system design interview is fundamentally a building block selection and assembly exercise.
🎯 Why It Matters
Engineers who do not know the building blocks waste months reinventing them poorly. They store binary files in databases (slow, expensive), skip caching (10x slower), process heavy tasks synchronously (timeouts, poor UX), or skip monitoring (blind to production issues). Knowing the standard toolkit lets you design correct systems in 45 minutes instead of learning expensive lessons over 45 months.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.