Skip to main content
Career Paths
Concepts
System Design Building Blocks
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Core Building Blocks of Distributed Systems

Every system design uses the same dozen primitives: databases, caches, queues, load balancers, CDNs, proxies, object stores, coordination services, search engines, and monitoring stacks. Master what each one does, when to reach for it, and how they fail — and you can design any system.

🎯Key Takeaways
Every distributed system is built from ~12 standard building blocks: SQL/NoSQL databases, caches, queues, object storage, CDNs, load balancers, search engines, coordination services, and monitoring. Knowing when to use each one is the core skill of system design.
The database choice is your most impactful decision: SQL for relationships and complex queries, NoSQL for high write throughput and horizontal scaling. Most FAANG systems use multiple databases, each optimized for its access pattern.
Caching provides 10-100x latency improvement. Cache-aside is the default pattern. Cache invalidation is the hardest problem — use TTL as the baseline, event-driven invalidation for consistency-critical data.
Message queues decouple services, absorb traffic spikes, and enable async processing. Use Kafka for event streaming, SQS for simple job queues. Always design consumers for idempotency since at-least-once delivery means duplicates.
In every system design interview, always mention: what database and why, whether you need a cache, where async processing fits, and how you will monitor the system. This demonstrates the operational maturity that distinguishes senior engineers.

Core Building Blocks of Distributed Systems

Every system design uses the same dozen primitives: databases, caches, queues, load balancers, CDNs, proxies, object stores, coordination services, search engines, and monitoring stacks. Master what each one does, when to reach for it, and how they fail — and you can design any system.

~22 min read
Be the first to complete!
What you'll learn
  • Every distributed system is built from ~12 standard building blocks: SQL/NoSQL databases, caches, queues, object storage, CDNs, load balancers, search engines, coordination services, and monitoring. Knowing when to use each one is the core skill of system design.
  • The database choice is your most impactful decision: SQL for relationships and complex queries, NoSQL for high write throughput and horizontal scaling. Most FAANG systems use multiple databases, each optimized for its access pattern.
  • Caching provides 10-100x latency improvement. Cache-aside is the default pattern. Cache invalidation is the hardest problem — use TTL as the baseline, event-driven invalidation for consistency-critical data.
  • Message queues decouple services, absorb traffic spikes, and enable async processing. Use Kafka for event streaming, SQS for simple job queues. Always design consumers for idempotency since at-least-once delivery means duplicates.
  • In every system design interview, always mention: what database and why, whether you need a cache, where async processing fits, and how you will monitor the system. This demonstrates the operational maturity that distinguishes senior engineers.

Lesson outline

Why Building Blocks Matter: The Lego Analogy

System design is not about inventing new components — it is about assembling known primitives in the right order for your specific constraints. Every system at FAANG is built from the same ~12 building blocks. The difference between a junior and senior engineer is not knowing more components — it is knowing which combination to use and why.

Think of it like Lego. A child sees individual bricks. An architect sees structural patterns: "I need a load-bearing wall here, a window there, a roof truss." Similarly, when a senior engineer hears "Design Instagram," they immediately think: "Object storage for photos, CDN for delivery, Redis for feed cache, Postgres for metadata, Kafka for async processing." They are not designing from scratch — they are selecting from a toolbox.

The Interview Signal

When interviewers ask system design questions, they are testing whether you know these building blocks and can justify why you chose each one. Saying "we need a database" is L4. Saying "we need Postgres for user metadata because we need strong consistency and complex queries, but Cassandra for the activity feed because it is write-heavy and we can tolerate eventual consistency" is L6.

graph TB
    Client[Client Browser/App] --> CDN[CDN - Static Assets]
    Client --> LB[Load Balancer]
    LB --> API[API Servers - Stateless]
    API --> Cache[Cache - Redis/Memcached]
    API --> DB[(Primary Database)]
    API --> Queue[Message Queue]
    API --> Search[Search Engine]
    API --> ObjStore[Object Storage - S3]
    Cache --> DB
    Queue --> Workers[Background Workers]
    Workers --> DB
    Workers --> ObjStore
    DB --> Replica[(Read Replicas)]
    API --> Coord[Coordination Service - ZooKeeper]
    API --> Monitor[Monitoring + Alerting]

The standard building blocks of a distributed system — every FAANG design uses some combination of these

In this lesson, we will deep-dive each building block: what it does, when to use it, how it fails, and what real-world system uses it. By the end, you will be able to look at any system design question and immediately identify which blocks you need.

Databases: SQL vs NoSQL — The Foundational Choice

The database is the heart of every system. Your first architectural decision is almost always: SQL or NoSQL? This choice cascades into everything else — your data model, consistency guarantees, scaling strategy, and operational complexity.

CharacteristicSQL (Postgres, MySQL)NoSQL Document (MongoDB)NoSQL Wide-Column (Cassandra)NoSQL Key-Value (DynamoDB)
Data ModelTables with rows and columns, strict schemaFlexible JSON documents, schema-optionalRows with dynamic columns, partition key requiredSimple key-value pairs, no relations
ConsistencyStrong (ACID transactions)Configurable (eventual to strong)Tunable (eventual to quorum)Eventual by default, strong optional
ScalingVertical first, horizontal via sharding or read replicasHorizontal via shardingLinear horizontal scaling by designAutomatic horizontal, fully managed
Query PowerFull SQL: JOINs, subqueries, aggregationsRich queries on documents, no JOINsLimited: partition key + clustering column onlyPrimary key lookup only, GSIs for secondary
Best ForComplex relationships, transactions, analyticsRapid iteration, semi-structured dataWrite-heavy, time-series, event logsSession stores, user profiles, simple lookups
FAANG ExampleInstagram user/metadata in PostgreseBay product catalog in MongoDBNetflix viewing history in CassandraAmazon shopping cart in DynamoDB

The Decision Framework for Databases

Ask three questions: (1) Do I need JOINs or complex queries? → SQL. (2) Is my write throughput > 50K QPS? → NoSQL wide-column. (3) Do I need sub-millisecond latency on simple lookups? → NoSQL key-value. Most FAANG systems use multiple databases: SQL for metadata, NoSQL for high-throughput data, and a search engine for full-text queries.

Replication and sharding are critical for databases at scale. Replication creates copies on multiple servers for read throughput and durability. Sharding splits data across servers by a partition key for write throughput. The key insight: replication is easy, sharding is hard. Always exhaust vertical scaling and read replicas before sharding.

Database Scaling Progression

→

01

Single node: handles up to ~5K write QPS and 10TB. Good for most startups. Use Postgres.

→

02

Read replicas: 1 primary + N replicas. Multiplies read throughput by N. Still one write node.

→

03

Connection pooling (PgBouncer): reduces connection overhead, squeezes 2-3x more throughput from existing hardware.

→

04

Vertical scaling: upgrade to larger instance (r5.4xlarge → r5.8xlarge). Quick win, linear cost.

→

05

Functional sharding: separate databases per service (users DB, orders DB, analytics DB). Reduces blast radius.

06

Horizontal sharding: partition data by key (user_id, region). Required above ~50K write QPS or 50TB. Introduces distributed transactions complexity.

1

Single node: handles up to ~5K write QPS and 10TB. Good for most startups. Use Postgres.

2

Read replicas: 1 primary + N replicas. Multiplies read throughput by N. Still one write node.

3

Connection pooling (PgBouncer): reduces connection overhead, squeezes 2-3x more throughput from existing hardware.

4

Vertical scaling: upgrade to larger instance (r5.4xlarge → r5.8xlarge). Quick win, linear cost.

5

Functional sharding: separate databases per service (users DB, orders DB, analytics DB). Reduces blast radius.

6

Horizontal sharding: partition data by key (user_id, region). Required above ~50K write QPS or 50TB. Introduces distributed transactions complexity.

Caches: The 1000x Latency Multiplier

Caching is the single highest-impact optimization in system design. A cache hit returns data in ~1ms from Redis vs ~10ms from a database query vs ~100ms from a cold disk read. That is a 10-100x improvement per request, and it compounds: a system handling 100K QPS with a 95% cache hit rate only sends 5K QPS to the database.

The Caching Hierarchy (from closest to farthest)

  • L1/L2 CPU Cache — ~1ns access. Managed by hardware. You cannot control this directly, but data structures that are cache-friendly (arrays vs linked lists) exploit it.
  • In-Process Cache (Guava, Caffeine) — ~100ns access. Lives inside your application process. Zero network overhead. Limited by JVM heap size (~2-8GB typically). Perfect for config, feature flags, and hot reference data.
  • Distributed Cache (Redis, Memcached) — ~1ms access. Shared across all app servers via network. Can store 25-500GB per node. The workhorse of FAANG caching — used for session data, user profiles, feed caches, rate limit counters.
  • CDN Cache (CloudFront, Akamai) — ~10-50ms access depending on edge proximity. Caches static assets (images, JS, CSS) and some dynamic responses at edge locations worldwide. Reduces origin bandwidth by 80-95%.
  • Database Buffer Pool — ~1-5ms access. The database keeps frequently accessed pages in memory automatically. Properly sized buffer pool (60-80% of RAM) can make many "disk reads" actually memory reads.
graph LR
    Request[Incoming Request] --> AppCache{In-Process Cache?}
    AppCache -->|HIT ~100ns| Response[Response]
    AppCache -->|MISS| Redis{Redis Cache?}
    Redis -->|HIT ~1ms| Response
    Redis -->|MISS| DB[(Database ~10ms)]
    DB --> WriteBack[Write to Redis]
    WriteBack --> Response

Cache-aside (lazy loading) pattern: check caches in order, populate on miss

The three main caching patterns you must know:

PatternHow It WorksProsConsUse When
Cache-Aside (Lazy Loading)App checks cache first. On miss, reads DB, then writes to cache.Only caches what is actually requested. Simple to implement.First request for each key is slow (cache miss). Cache can become stale.Most common default. Use for user profiles, product data, any read-heavy workload.
Write-ThroughApp writes to cache AND DB simultaneously on every write.Cache is always up-to-date. No stale data.Higher write latency (two writes per operation). Caches data that may never be read.When consistency matters more than write speed. Financial data, inventory counts.
Write-Behind (Write-Back)App writes to cache only. Cache asynchronously flushes to DB in batches.Fastest write path. Absorbs write spikes.Risk of data loss if cache crashes before flush. Complex to implement correctly.Write-heavy workloads where small data loss is acceptable. Analytics events, view counts, like counts.

Cache Invalidation — The Two Hard Problems

Phil Karlton said: "There are only two hard things in Computer Science: cache invalidation and naming things." Stale cache data causes real outages. At Facebook, a cache invalidation bug in 2021 caused users to see other users' data. Three approaches: (1) TTL-based: set a time limit (e.g., 5 minutes), accept staleness within that window. (2) Event-driven: publish a cache invalidation event whenever data changes. (3) Version-based: include a version number in the cache key, increment on write.

Message Queues: Decoupling Producers and Consumers

Message queues are the shock absorbers of distributed systems. They sit between services that produce work and services that consume it, providing three critical properties: decoupling (producer does not need to know about consumer), buffering (absorbs traffic spikes), and reliability (messages persist until processed).

Without queues, services are tightly coupled: if the email service is down, user registration fails. With a queue between them, registration succeeds immediately and the email is sent when the service recovers. This is the fundamental architectural pattern behind every resilient system at FAANG scale.

Queue TypeExamplesOrderingThroughputBest For
Traditional Message QueueRabbitMQ, ActiveMQ, SQSFIFO within a queue10K-100K msg/secTask distribution, job queues, RPC-style messaging
Distributed Event LogApache Kafka, Amazon KinesisOrdered within partition1M+ msg/sec per partitionEvent sourcing, CDC, real-time analytics, activity feeds
Pub/SubGoogle Pub/Sub, SNS, Redis Pub/SubNo ordering guaranteeVariable, very highFan-out notifications, broadcasting events to multiple consumers
graph LR
    API[API Server] -->|"POST /upload"| Queue[Message Queue]
    Queue --> W1[Worker 1: Transcode]
    Queue --> W2[Worker 2: Thumbnail]
    Queue --> W3[Worker 3: Notification]
    W1 --> S3[Object Storage]
    W2 --> S3
    W3 --> Push[Push Notification Service]

    subgraph "Without Queue"
    API2[API Server] -->|blocks| T[Transcode]
    T -->|blocks| TH[Thumbnail]
    TH -->|blocks| N[Notify]
    end

Queues enable parallel async processing — the API responds immediately while workers handle heavy tasks

Kafka vs SQS — The FAANG Decision

Use Kafka when: you need event replay (consumers can re-read old events), high throughput (millions of events/sec), multiple consumers reading the same stream, or event sourcing patterns. Use SQS when: you need simple task distribution, exactly-once processing matters more than throughput, or you want zero operational overhead. Most FAANG systems use both: Kafka for event streaming between services, SQS for individual job queues within a service.

Key queue concepts for interviews: at-least-once delivery means consumers may see duplicates (design for idempotency). Dead letter queues (DLQs) capture messages that fail processing repeatedly — essential for debugging. Back-pressure is what happens when consumers cannot keep up: the queue grows, memory fills, and eventually the system collapses. Always set queue size limits and alerting.

Queue Design Checklist for Interviews

  • Idempotency — Design consumers to handle duplicate messages safely. Use idempotency keys or deduplication windows.
  • Dead Letter Queue — Messages that fail N times go to DLQ for manual investigation. Without this, poison messages block the entire queue.
  • Visibility Timeout — How long a message is hidden from other consumers while being processed. Too short: duplicates. Too long: stuck messages block throughput.
  • Ordering Guarantees — FIFO queues are slower but ordered. Standard queues are faster but may reorder. Choose based on your use case.
  • Monitoring — Track queue depth (backlog), processing latency (time from enqueue to completion), and DLQ size. Alert if queue depth grows for > 5 minutes.

CDNs: Serving Content at the Edge

A Content Delivery Network is a geographically distributed network of edge servers that cache and serve content close to users. The fundamental physics: light travels through fiber at ~200,000 km/sec, so a round trip from New York to Singapore (~15,000 km) takes ~150ms minimum. A CDN edge server in Singapore serves the same content in ~5ms. That is a 30x latency reduction that no amount of code optimization can achieve.

How a CDN Request Works

→

01

User requests image.jpg. DNS resolves to CDN edge server closest to the user (via anycast or geo-DNS).

→

02

Edge server checks local cache. If present and not expired (cache HIT): return immediately in ~5-20ms.

→

03

If cache MISS: edge server fetches from origin server (your S3 bucket or API), caches the response, and returns it. First request is slow, subsequent requests are fast.

→

04

Cache-Control headers determine how long the edge caches the content: max-age=86400 means cache for 24 hours. s-maxage overrides specifically for shared caches (CDN).

05

Cache invalidation: when you deploy new assets, either use cache-busting filenames (bundle.abc123.js) or send purge requests to the CDN API.

1

User requests image.jpg. DNS resolves to CDN edge server closest to the user (via anycast or geo-DNS).

2

Edge server checks local cache. If present and not expired (cache HIT): return immediately in ~5-20ms.

3

If cache MISS: edge server fetches from origin server (your S3 bucket or API), caches the response, and returns it. First request is slow, subsequent requests are fast.

4

Cache-Control headers determine how long the edge caches the content: max-age=86400 means cache for 24 hours. s-maxage overrides specifically for shared caches (CDN).

5

Cache invalidation: when you deploy new assets, either use cache-busting filenames (bundle.abc123.js) or send purge requests to the CDN API.

CDN FeatureCloudFront (AWS)AkamaiCloudflare
Edge Locations600+ in 100+ cities4,100+ in 130+ countries300+ in 100+ countries
Best ForAWS-native apps, S3 originsEnterprise, media streaming, gamingDDoS protection, edge compute, Workers
Pricing ModelPay per GB transferred + requestsContract-based, volume discountsFree tier available, then per-request
Edge ComputeLambda@Edge, CloudFront FunctionsEdgeWorkersCloudflare Workers (V8 isolates)
Cache InvalidationAPI call, ~60 seconds to propagateFast Purge, ~5 secondsAPI call, ~30 seconds globally

CDN Cost Math

CDN pricing is typically $0.02-0.085/GB for the first 10TB/month. Without a CDN, serving 1TB/month directly from origin costs ~$90/month on AWS (data transfer). With a CDN at 95% cache hit rate: only 50GB hits origin ($4.50) + CDN costs ~$20 = $24.50 total. As scale increases, the savings compound: at 100TB/month, CDN saves ~$7,000/month and massively reduces origin load.

What to cache at the CDN: static assets (images, CSS, JS, fonts) are the obvious choice with long TTLs (1 year, use cache-busting filenames). API responses can also be cached at the edge for public, non-personalized content (product listings, news feeds). Dynamic personalized content generally cannot be edge-cached, but you can cache the template and hydrate client-side.

Load Balancers and Reverse Proxies

A load balancer distributes incoming traffic across multiple backend servers. Without it, horizontal scaling is impossible — you cannot tell users to manually pick which server to connect to. Load balancers also provide health checking (stop sending traffic to dead servers), SSL termination (offload TLS from app servers), and connection management.

graph TB
    Client[Clients] --> DNS[DNS Round Robin]
    DNS --> LB1[Load Balancer - Active]
    DNS --> LB2[Load Balancer - Standby]
    LB1 --> S1[Server 1]
    LB1 --> S2[Server 2]
    LB1 --> S3[Server 3]
    LB1 --> S4[Server 4]
    LB2 -.->|failover| S1
    LB2 -.->|failover| S2

    subgraph "Health Checks"
    HC[Every 10s: GET /health]
    HC --> S1
    HC --> S2
    HC --> S3
    HC --> S4
    end

Load balancer with health checks and standby failover — the standard production setup

AlgorithmHow It WorksBest ForWeakness
Round RobinSends requests to servers in sequence: 1, 2, 3, 1, 2, 3...Stateless services with uniform server capacityIgnores server load — a slow server gets the same traffic as a fast one
Least ConnectionsSends to the server with fewest active connectionsLong-lived connections (WebSockets, streaming)Slightly more overhead to track connection counts
IP HashHashes client IP to always route same client to same serverSession affinity without cookiesUneven distribution if some IPs generate more traffic
Weighted Round RobinLike round robin but servers get traffic proportional to weightMixed server sizes (e.g., during canary deploys)Manual weight configuration, does not adapt to real-time load

A reverse proxy sits in front of your servers and handles concerns like SSL termination, compression, rate limiting, and request routing. In practice, many load balancers are also reverse proxies (Nginx, HAProxy, AWS ALB). The distinction matters in interviews: a load balancer distributes traffic; a reverse proxy adds functionality to the request/response path.

L4 vs L7 Load Balancing

L4 (transport layer) load balancers route based on IP/port — they are fast but cannot inspect HTTP headers, cookies, or URL paths. L7 (application layer) load balancers can route based on HTTP content: /api/* goes to API servers, /static/* goes to CDN origin, /ws/* goes to WebSocket servers. Use L7 (AWS ALB) for most web applications. Use L4 (AWS NLB) for raw TCP/UDP workloads like gaming, IoT, or when you need extreme throughput (millions of connections).

Object Storage and Blob Stores

Object storage (AWS S3, Google Cloud Storage, Azure Blob) is purpose-built for storing unstructured binary data: images, videos, backups, logs, ML model artifacts. Unlike file systems or databases, object storage scales to exabytes with 99.999999999% (11 nines) durability. Every FAANG system that handles media relies on object storage.

Key properties: objects are immutable (you replace, never update in place). Access is via HTTP APIs (PUT/GET/DELETE), not file system calls. Objects have metadata (content-type, custom headers) and versioning. Storage tiers optimize cost: hot (frequent access, $0.023/GB/month), infrequent ($0.0125/GB), archive/Glacier ($0.004/GB).

When to Use Object Storage vs Database

  • Store in Object Storage — Images, videos, audio, PDFs, backups, logs, static website assets, ML training data, data lake files. Anything over 1MB or unstructured.
  • Store in Database — User metadata, relationships, transactional data, anything you need to query by fields, anything under 1MB that benefits from indexing.
  • The Hybrid Pattern — Store the blob in S3, store the metadata (filename, owner, size, S3 key) in the database. This is the universal pattern at FAANG: Instagram stores photo bytes in S3, photo metadata in Postgres.

Pre-signed URLs — The Upload Pattern

Never proxy large file uploads through your API servers. Instead: (1) Client requests upload URL from API. (2) API generates a pre-signed S3 URL (valid for 15 minutes). (3) Client uploads directly to S3 using the pre-signed URL. (4) S3 notifies your API via event (S3 Event Notification or Lambda). This keeps large files off your API servers, avoids timeouts, and lets S3 handle the heavy lifting.

Search Engines and Full-Text Search

Databases are terrible at full-text search. Running LIKE "%search term%" on a 100M row table requires a full table scan — O(n) per query. Search engines (Elasticsearch, Apache Solr, OpenSearch) use inverted indexes to return results in O(1) average time regardless of data size. If your system needs search, autocomplete, or faceted filtering, you need a dedicated search engine.

An inverted index maps every word to the list of documents containing it. When you search for "distributed caching," the engine intersects the document lists for "distributed" and "caching" in milliseconds. Add scoring (TF-IDF, BM25) and you get relevance ranking.

FeatureElasticsearchPostgreSQL Full-TextAlgolia (Managed)
Query Latency1-10ms for most queries10-100ms, degrades with table size<5ms (optimized for speed)
ScalingHorizontal sharding by indexLimited to single node per tableFully managed, auto-scales
FeaturesFuzzy search, aggregations, geo-search, ML rankingBasic text search, trigram matchingTypo tolerance, faceting, instant search
Operational CostHigh — cluster management, reindexZero — built into DBLow ops, high $ at scale
Best ForLog analytics, product search at scaleSmall-medium datasets, simple searchE-commerce, SaaS product search

Search is Eventually Consistent

Elasticsearch indexes data asynchronously. When a user creates a post and immediately searches for it, the search index may not have caught up. Design for this: show recently created items from the database, merge with search results. This is the pattern Twitter and LinkedIn use for "your own recent activity" in search.

In interviews, mention search when the system has: user-generated content that needs searching (tweets, products, articles), autocomplete/typeahead features, log aggregation (ELK stack), or complex filtering with facets (price range + category + brand on an e-commerce site).

Coordination Services and Service Discovery

In a distributed system with hundreds of microservices, how does Service A find Service B? How do you elect a leader when the current leader crashes? How do you manage distributed locks? Coordination services (ZooKeeper, etcd, Consul) solve these problems using consensus algorithms (Raft, Paxos) that guarantee consistency even when nodes fail.

What Coordination Services Do

  • Service Discovery — Services register themselves (IP, port, health). Other services query the registry to find endpoints. When a service scales up/down, the registry updates automatically.
  • Leader Election — When multiple instances of a service exist but only one should perform a task (e.g., cron jobs, partition assignment), the coordination service elects a leader and re-elects if the leader fails.
  • Distributed Locks — When multiple services need exclusive access to a resource (e.g., processing a payment), a distributed lock prevents double-processing. ZooKeeper ephemeral nodes provide lock semantics with automatic release on disconnect.
  • Configuration Management — Store and distribute configuration to all services. When config changes, all services are notified via watches/subscriptions. Feature flags, circuit breaker thresholds, and rate limits are managed this way at scale.
  • Cluster Membership — Track which nodes are alive, which partitions they own, and trigger rebalancing when nodes join or leave. Kafka uses ZooKeeper (or KRaft) for exactly this.

ZooKeeper vs etcd vs Consul

ZooKeeper: battle-tested, used by Kafka, HBase, Hadoop. Java-based, complex to operate. etcd: used by Kubernetes for all cluster state. Go-based, simpler API (key-value). Consul: HashiCorp product, includes built-in service mesh and health checking. Best developer experience. In interviews, any of these is a valid choice — the key is knowing when you need a coordination service at all.

When to mention coordination services in interviews: leader election scenarios (only one worker should process a partition), distributed cron (only one instance runs the scheduled job), partition assignment (which server owns which data range), and feature flag distribution (real-time config updates without redeployment).

Monitoring, Observability, and Alerting

A system without monitoring is flying blind. At FAANG scale, every system has three pillars of observability: metrics (quantitative measurements over time), logs (detailed event records), and traces (request path through distributed services). Together, they answer: "Is the system healthy? If not, what broke and why?"

PillarWhat It CapturesToolsExample
MetricsNumeric values over time: QPS, latency percentiles, error rate, CPU/memoryPrometheus, Datadog, CloudWatch, Grafanap99 latency spiked from 50ms to 2s at 14:30 UTC
LogsStructured event records with context: timestamp, request ID, user ID, error messageELK Stack, Splunk, CloudWatch Logs, LokiERROR 14:30:05 req=abc123 user=456 "Connection refused to payments-service:8080"
TracesEnd-to-end request path across services with timing for each hopJaeger, Zipkin, AWS X-Ray, Datadog APTRequest abc123: API (2ms) → Auth (5ms) → DB (1500ms) → Response. DB is the bottleneck.
graph TB
    App[Application] -->|"Emit metrics"| Prom[Prometheus]
    App -->|"Write logs"| ELK[ELK Stack]
    App -->|"Send traces"| Jaeger[Jaeger/Zipkin]
    Prom --> Grafana[Grafana Dashboards]
    Grafana --> Alert[Alert Manager]
    Alert --> PD[PagerDuty/Slack]
    ELK --> Search[Log Search + Analysis]
    Jaeger --> TraceUI[Trace Visualization]

    subgraph "The Four Golden Signals"
    GS1[Latency - how long requests take]
    GS2[Traffic - how many requests per second]
    GS3[Errors - what percentage of requests fail]
    GS4[Saturation - how full is the system]
    end

The three pillars of observability feed into dashboards, alerts, and debugging workflows

The Four Golden Signals (Google SRE)

Google SRE defines four signals that every service must monitor: (1) Latency: p50, p95, p99 response times. (2) Traffic: requests per second. (3) Errors: HTTP 5xx rate, application error rate. (4) Saturation: CPU utilization, memory usage, disk I/O, connection pool usage. If you monitor these four, you catch 95% of production issues before users notice.

In system design interviews, always mention monitoring as part of your design. "I would add Prometheus metrics for QPS, latency percentiles, and error rates on every service, with Grafana dashboards and PagerDuty alerts when p99 latency exceeds 500ms or error rate exceeds 1%." This shows operational maturity that distinguishes L5+ candidates.

Putting It All Together: The Building Block Selection Framework

Now that you know the individual building blocks, the skill is knowing which ones to reach for given a set of requirements. Here is a decision framework that maps requirements to building blocks:

RequirementBuilding BlockWhy
Store structured data with relationshipsSQL Database (Postgres)ACID transactions, JOINs, complex queries
Store > 1TB of write-heavy dataNoSQL (Cassandra/DynamoDB)Horizontal scaling, high write throughput
Reduce read latency by 10-100xCache (Redis)Sub-millisecond reads from memory
Store images, videos, filesObject Storage (S3)Exabyte scale, 11 nines durability, cheap
Decouple services, handle async workMessage Queue (Kafka/SQS)Buffering, retry, parallel processing
Full-text search, autocompleteSearch Engine (Elasticsearch)Inverted index, sub-10ms relevance search
Serve static assets globallyCDN (CloudFront)Edge caching, 30x latency reduction
Distribute traffic across serversLoad Balancer (ALB/NLB)Health checks, SSL termination, routing
Leader election, distributed locksCoordination Service (ZooKeeper)Consensus-based consistency guarantees
Know when things breakMonitoring (Prometheus + Grafana)Metrics, alerts, dashboards

The 30-Second Building Block Quiz

For any system design question, quickly identify: (1) What data do I store? → Database type. (2) What is read-heavy? → Add cache. (3) Any binary/media? → Object storage + CDN. (4) Any async work? → Queue + workers. (5) Any search? → Elasticsearch. (6) Multiple servers? → Load balancer. (7) How do I know it is working? → Monitoring. This checklist covers 90% of system design building block decisions.

graph TD
    Q1{What data do you store?} -->|Structured, relationships| SQL[SQL Database]
    Q1 -->|High write volume, simple access| NoSQL[NoSQL Database]
    Q1 -->|Binary files, media| S3[Object Storage]

    Q2{Is read latency critical?} -->|Yes, sub-ms needed| Redis[Redis Cache]
    Q2 -->|No, 10ms is fine| Direct[Direct DB reads]

    Q3{Any async processing?} -->|Yes| Queue[Message Queue]
    Q3 -->|No| Sync[Synchronous API]

    Q4{Global users?} -->|Yes| CDN[CDN + LB]
    Q4 -->|No, single region| LB[Load Balancer only]

    Q5{Need search or autocomplete?} -->|Yes| ES[Elasticsearch]
    Q5 -->|No| Skip[Skip search engine]

Building block selection decision tree — use this in every system design interview

How this might come up in interviews

Building block knowledge is tested in every system design interview, usually implicitly. The interviewer says "Design X" and watches whether you instinctively reach for the right components. Explicitly naming building blocks and justifying each choice is what separates strong candidates from average ones. Some interviewers ask targeted questions: "Why Redis and not Memcached?" or "Why Kafka instead of SQS?" to test depth.

Common questions:

  • L4: "What is the difference between a SQL and NoSQL database? When would you use each?" [Tests: basic understanding of data model trade-offs, ability to match database to access pattern]
  • L4-L5: "Explain cache-aside pattern. What happens when your cache goes down?" [Tests: caching fundamentals, understanding of cache stampede, ability to design for failure]
  • L5: "In a system with 100K read QPS, where would you add caching and what cache hit rate would you need to keep DB load under 10K QPS?" [Tests: capacity math with caching — need 90% hit rate]
  • L5-L6: "Compare Kafka and SQS. When would you use each in a microservices architecture?" [Tests: understanding of event streaming vs job queue patterns, ordering and replay semantics]
  • L6: "You are designing a notification system that sends push, email, and SMS. Which building blocks do you need and how do they connect?" [Tests: queue fan-out, idempotency, rate limiting, monitoring integration]
  • L6-L7: "Walk me through every building block in a production system you have built. What would you change if you had to handle 100x the current traffic?" [Tests: real-world experience, scaling reasoning, building block evolution under growth]

Key takeaways

  • Every distributed system is built from ~12 standard building blocks: SQL/NoSQL databases, caches, queues, object storage, CDNs, load balancers, search engines, coordination services, and monitoring. Knowing when to use each one is the core skill of system design.
  • The database choice is your most impactful decision: SQL for relationships and complex queries, NoSQL for high write throughput and horizontal scaling. Most FAANG systems use multiple databases, each optimized for its access pattern.
  • Caching provides 10-100x latency improvement. Cache-aside is the default pattern. Cache invalidation is the hardest problem — use TTL as the baseline, event-driven invalidation for consistency-critical data.
  • Message queues decouple services, absorb traffic spikes, and enable async processing. Use Kafka for event streaming, SQS for simple job queues. Always design consumers for idempotency since at-least-once delivery means duplicates.
  • In every system design interview, always mention: what database and why, whether you need a cache, where async processing fits, and how you will monitor the system. This demonstrates the operational maturity that distinguishes senior engineers.
Before you move on: can you answer these?

You are asked to design a photo-sharing app like Instagram. List every building block you would use and explain why each is needed.

SQL Database (Postgres): store user profiles, follow relationships, photo metadata (owner, caption, timestamp, likes count) — needs complex queries and strong consistency. Object Storage (S3): store actual photo files — cheap, durable, scales to petabytes. CDN (CloudFront): serve photos from edge locations globally — reduces latency from 200ms to 20ms for users far from origin. Redis Cache: cache user feeds, profile data, popular photo metadata — reduces DB load by 90%+ given heavy read pattern. Message Queue (Kafka): handle async tasks — photo processing (resize, filter), notification delivery, feed fanout to followers. Elasticsearch: power search for users, hashtags, locations. Load Balancer (ALB): distribute traffic across API servers, health checks, SSL termination. Background Workers: consume from Kafka to process photos (generate thumbnails at 5 resolutions), send push notifications, update follower feeds. Monitoring (Prometheus + Grafana): track upload success rate, feed generation latency, CDN hit rate, DB connection pool utilization.

Your Redis cache just went down and you are seeing a massive spike in database load. What is happening and how do you fix it immediately?

This is a cache stampede (also called thundering herd). With Redis down, 100% of read requests that were being served from cache (~95% of all reads) are now hitting the database directly. If the system was handling 100K read QPS with 95% cache hit rate, the database was seeing ~5K QPS. Now it is seeing 100K QPS — a 20x increase that the database cannot handle. Immediate fixes: (1) Enable the database connection pool queue (with timeout) so requests queue rather than overwhelming the DB with connections. (2) If you have a fallback Memcached cluster or read replicas, route traffic there. (3) Implement request coalescing: if 1000 requests arrive for the same cache key, only let one hit the database and queue the rest. (4) Apply aggressive rate limiting at the load balancer. (5) Bring Redis back ASAP — restart with persistence (RDB snapshot) to warm the cache from the last snapshot rather than starting cold. Long-term: deploy Redis in cluster mode with replication so a single node failure does not lose the entire cache.

An interviewer says: "Your system needs to handle 500K messages per second from IoT devices." Which building blocks do you reach for and why?

500K messages/sec is extremely high write throughput. Building blocks: (1) Load Balancer (NLB, not ALB): use L4 load balancing for raw TCP/UDP since IoT protocols are often lightweight (MQTT, CoAP) and L7 parsing overhead is wasteful at this scale. (2) Message Queue (Kafka): Kafka handles millions of messages/sec with horizontal partitioning. Partition by device_id for ordering within a device. 500K msg/sec needs ~50 partitions across 10+ brokers. (3) NoSQL (Cassandra or TimescaleDB): IoT data is time-series — immutable writes ordered by timestamp. Cassandra handles high write throughput natively. Partition key = device_id, clustering column = timestamp. (4) Object Storage (S3): for raw data archival — store compressed hourly batches for long-term analysis. (5) Redis: cache recent device state for real-time dashboards (last heartbeat, latest reading). (6) Stream Processing (Kafka Streams or Flink): real-time anomaly detection on the message stream — alert if temperature exceeds threshold or heartbeat stops. (7) Monitoring: critical at this scale — track message lag, consumer throughput, and storage growth rate.

🧠Mental Model

💡 Analogy

Building blocks in system design are like the standard parts in a hardware store. A plumber does not forge their own pipes — they select from standardized parts (pipes, fittings, valves, seals) and assemble them for the specific building. Similarly, system designers do not invent new storage engines or consensus protocols — they select from standardized building blocks (databases, caches, queues, load balancers) and assemble them for the specific requirements. The skill is not in knowing each part exists — it is in knowing which combination solves the problem at hand, how they connect, and which will fail first under load.

⚡ Core Idea

Every distributed system at any scale is composed of the same ~12 building blocks: SQL databases, NoSQL databases, caches, message queues, object storage, CDNs, load balancers, reverse proxies, search engines, coordination services, monitoring systems, and API gateways. Mastery means knowing what each does, when to reach for it, how it fails, and how it connects to the others. A system design interview is fundamentally a building block selection and assembly exercise.

🎯 Why It Matters

Engineers who do not know the building blocks waste months reinventing them poorly. They store binary files in databases (slow, expensive), skip caching (10x slower), process heavy tasks synchronously (timeouts, poor UX), or skip monitoring (blind to production issues). Knowing the standard toolkit lets you design correct systems in 45 minutes instead of learning expensive lessons over 45 months.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.