Skip to main content
Career Paths
Concepts
Bep Scalable Backend
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Scalable Backend: From Thousands to Billions of Users

The architecture that works at 1k users will break at 100k. Plan ahead, not too far.

🎯Key Takeaways
Build for 10× current scale, not 1000×. Over-engineering kills startups.
Stateless services scale horizontally. All state goes to shared storage (Redis, PostgreSQL, S3).
The scaling ladder: optimize → vertical scale → read replicas → cache → message queue → partition → shard.
Idempotency keys prevent double-processing on client retries — critical for payments and state-modifying operations.
Circuit breakers, bulkheads, and timeouts with backoff are essential for resilience at scale.
Four Golden Signals: Latency, Traffic, Errors, Saturation — monitor and alert on these for every service.

Scalable Backend: From Thousands to Billions of Users

The architecture that works at 1k users will break at 100k. Plan ahead, not too far.

~5 min read
Be the first to complete!
What you'll learn
  • Build for 10× current scale, not 1000×. Over-engineering kills startups.
  • Stateless services scale horizontally. All state goes to shared storage (Redis, PostgreSQL, S3).
  • The scaling ladder: optimize → vertical scale → read replicas → cache → message queue → partition → shard.
  • Idempotency keys prevent double-processing on client retries — critical for payments and state-modifying operations.
  • Circuit breakers, bulkheads, and timeouts with backoff are essential for resilience at scale.
  • Four Golden Signals: Latency, Traffic, Errors, Saturation — monitor and alert on these for every service.

The Scaling Journey: How Real Companies Grew

Instagram had 13 employees and 30 million users at acquisition by Facebook in 2012. They ran PostgreSQL, Redis, and Gearman on a handful of machines. They scaled by being pragmatic, not by over-engineering early.

The Scaling Rule of Thumb

Build for 10× your current scale. If you have 10k users, build for 100k. Don't build for 1 billion users on day one — the architecture for 1 billion is completely different from 100k, and over-engineering kills startups.

PhaseUsersArchitectureKey Additions
Phase 10–10kMonolith + single DBFocus on product, not infrastructure
Phase 210k–100kMonolith + read replicas + CDN + Redis cacheAdd Redis, deploy to multiple AZs
Phase 3100k–1MModular monolith or 3–5 services + message queueQueue async work, separate auth/search
Phase 41M–10M5–20 services + multi-region + shardingGeographic distribution, data sharding
Phase 510M–1B+50–500+ services + custom infraCustom storage, global load balancing, edge compute

Horizontal vs Vertical Scaling

TypeHowWhen to UseLimitation
Vertical (scale up)Bigger machine: more RAM/CPUStateful services (databases), before adding complexityHardware limits; single point of failure; expensive
Horizontal (scale out)More instances behind load balancerStateless services (API servers, workers)Requires stateless design; more ops complexity
Read replicasRoute reads to replicasRead-heavy workloads (>70% reads)Replication lag; eventual consistency
ShardingSplit data across multiple DB instancesWhen single DB can't handle write throughputNo cross-shard JOINs; complex ops — last resort

The Stateless Service Rule

Horizontal scaling only works if services are stateless. No in-memory sessions, no local file storage. All state goes to Redis (sessions), PostgreSQL (data), S3 (files). If your service can be killed and restarted without losing user data, it scales horizontally.

stateless-service.ts
1// Making services stateless for horizontal scaling
2
3// ❌ Stateful — breaks with multiple instances
4class StatefulService {
5 private processingOrders = new Map<string, Order>(); // in-memory
6
7 async getStatus(orderId: string) {
In-memory state breaks with multiple instances — load balancer routes to different instances
8 return this.processingOrders.get(orderId); // ❌ Only works on THIS instance
9 }
10}
11
12// ✅ Stateless — works with any number of instances
13class StatelessService {
14 async startProcessing(orderId: string) {
15 const order = await db.orders.findById(orderId);
16
17 // Store state in shared Redis — visible to ALL instances
Redis is visible to all instances — the shared source of truth for distributed state
18 await redis.setex(
19 `processing:${orderId}`,
20 3600,
21 JSON.stringify({ status: 'processing', startedAt: Date.now() })
22 );
23 }
24
Any instance can answer — they all read from the same Redis cluster
25 async getStatus(orderId: string) {
26 const data = await redis.get(`processing:${orderId}`);
27 return data ? JSON.parse(data) : null;
28 // ✅ Any instance can answer — all see the same Redis
29 }
Kubernetes HPA scales pod count automatically based on CPU/memory/custom metrics
30}
31
32// Kubernetes HorizontalPodAutoscaler
33// apiVersion: autoscaling/v2
34// kind: HorizontalPodAutoscaler
35// spec:
36// scaleTargetRef:
37// name: order-service
38// minReplicas: 2
39// maxReplicas: 50
40// metrics:
41// - type: Resource
42// resource:
43// name: cpu
44// target:
45// averageUtilization: 70 # scale when avg CPU > 70%

Essential Scalability Patterns

The Scalability Pattern Toolkit

  • 📦CQRS (Command Query Responsibility Segregation) — Separate read and write models. Writes update command model; async events update read model (denormalized for fast reads). Used by Facebook, Twitter, Netflix.
  • 🔒Idempotency keys — Client generates UUID; server processes request exactly once even if client retries. Stripe uses this for all payment APIs. Prevents double-charges on network retries.
  • 🏗️Bulkheads — Isolate failures: if search is slow, checkout still works. Separate thread pools and connection pools per downstream service.
  • Timeout + Retry with Backoff — Every external call has a timeout. Retries use exponential backoff + jitter. Max retries = 3 is the typical production limit.
  • 🔄Circuit Breaker — After N% failures, fail fast without hitting the downstream service. Prevents cascade failures. Auto-resets after a timeout window.

The Four Golden Signals (Google SRE)

Monitor for every service: (1) Latency — how long requests take. (2) Traffic — requests/sec. (3) Errors — error rate. (4) Saturation — how "full" the service is (CPU%, queue depth, connection pool%). Alert on Golden Signals — they represent everything users care about.

How this might come up in interviews

Scalability is the most common senior/staff interview topic. Show you understand the progression from simple to complex and don't jump to "use Kafka and microservices" for 1000 users.

Common questions:

  • Design Twitter's backend for 500M daily active users
  • Walk me through how you would scale a database that's a bottleneck
  • What is CQRS and when would you use it?
  • Design a URL shortener handling 10 billion URLs

Strong answers include:

  • Starts simple and adds complexity as scale demands it
  • Mentions stateless services as prerequisite for horizontal scaling
  • Applies the scaling ladder in order
  • Discusses idempotency and retry problem unprompted
  • Asks about read/write ratio before designing

Red flags:

  • "Just shard from the start"
  • "Use microservices from day one" for a startup
  • No understanding of stateless design
  • Doesn't know idempotency keys

Quick check · Scalable Backend: From Thousands to Billions of Users

1 / 3

You have a stateless Node.js API at 70% CPU handling 1000 req/s. You need 2× throughput. What is the most straightforward solution?

Key takeaways

  • Build for 10× current scale, not 1000×. Over-engineering kills startups.
  • Stateless services scale horizontally. All state goes to shared storage (Redis, PostgreSQL, S3).
  • The scaling ladder: optimize → vertical scale → read replicas → cache → message queue → partition → shard.
  • Idempotency keys prevent double-processing on client retries — critical for payments and state-modifying operations.
  • Circuit breakers, bulkheads, and timeouts with backoff are essential for resilience at scale.
  • Four Golden Signals: Latency, Traffic, Errors, Saturation — monitor and alert on these for every service.

From the books

Designing Data-Intensive Applications — Martin Kleppmann (2017)

Part II: Distributed Data

The most comprehensive treatment of distributed databases, replication, partitioning, and transactions. Read before making any database architecture decision.

The Architecture of Open Source Applications (Volume 2) — Amy Brown, Greg Wilson (2012)

Instagram's architecture

How Instagram scaled to 30M users with 13 engineers and simple technology choices. The lesson: simplicity and operational clarity beat cutting-edge complexity.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.