Interactive Explainer

🎯Key Takeaways

Partial failures are the norm in distributed systems — design for them explicitly

Choose consistency model based on cost of inconsistency: eventual for preferences, linearizable for money

Saga pattern (compensating transactions) replaces 2PC for cross-service distributed transactions

Circuit breakers prevent cascading failures; bulkheads isolate failures per dependency

Retry + backoff + jitter handles transient failures; idempotency keys prevent duplicate operations

Use etcd/ZooKeeper for consensus — never implement Raft yourself

Distributed Systems: Consensus, Consistency & Fault Tolerance

The hardest problems in distributed systems: consensus algorithms, consistency models, partition tolerance, distributed transactions, and the failure modes that appear only at scale.

~7 min read

Be the first to complete!

What you'll learn

Partial failures are the norm in distributed systems — design for them explicitly
Choose consistency model based on cost of inconsistency: eventual for preferences, linearizable for money
Saga pattern (compensating transactions) replaces 2PC for cross-service distributed transactions
Circuit breakers prevent cascading failures; bulkheads isolate failures per dependency
Retry + backoff + jitter handles transient failures; idempotency keys prevent duplicate operations
Use etcd/ZooKeeper for consensus — never implement Raft yourself

Lesson outline

Why distributed systems break your intuitions

In a single-process application, operations happen in order, data is consistent, and function calls either succeed or fail. In a distributed system, none of these are guaranteed. Two operations from different servers may appear to happen "at the same time." The same data may look different from two different servers. A remote call may succeed, fail, or — most insidiously — partially succeed (executed but never acknowledged).

The first principle of distributed systems: partial failures are the norm. A network packet may be delivered zero, one, or multiple times. A server may crash between writing data and acknowledging the write. A clock drift of 200ms can reorder events that "happened simultaneously." Designing for these failures is not paranoia — it is engineering.

Consistency models: what guarantee does your system provide?

A consistency model defines what values a read can return relative to previous writes. From strongest to weakest:

Linearizability (strong consistency): Every read sees the most recent write, globally. The system appears as a single, sequential machine. Expensive — requires coordination. Example: etcd, ZooKeeper, PostgreSQL with read-your-writes.

Sequential consistency: Operations appear in the same order to all nodes, but not necessarily in real-time order. Rare in practice.

Causal consistency: Causally related operations are seen in the same order by all nodes. Unrelated operations may be seen in different orders. Example: Cosmos DB in session mode.

Eventual consistency: Given no new updates, all replicas will converge to the same value — eventually. No guarantees about how long. Example: DynamoDB by default, Cassandra, DNS.

Most apps can live with eventual consistency — with careful design

User profile updates: eventual consistency is fine (a few seconds of stale data is not a problem). Bank account balance: linearizability is required (even momentary inconsistency enables double-spend). Choose your consistency model based on the cost of an inconsistency.

consistency-examples.ts

1// Consistency model selection based on use case
2 
3// ✅ Eventually consistent: user preferences (stale data for 5s is fine)
4const dynamoDb = new DynamoDBClient({});
5await dynamoDb.send(new PutItemCommand({
6  TableName: 'UserPreferences',
7  Item: { userId: { S: userId }, theme: { S: 'dark' } },
8  // Default: eventual consistency — written to 1 replica, others catch up
9}));
10 
11// For reading: eventually consistent (cheaper, faster)
12const readResult = await dynamoDb.send(new GetItemCommand({
13  TableName: 'UserPreferences',
ConsistentRead: false = eventual (cheaper). Use for non-critical reads.
14  Key: { userId: { S: userId } },
15  ConsistentRead: false,  // Eventual consistency (default)
16}));
17 
18// ✅ Strongly consistent: account balance (stale data = double-spend risk)
19const strongReadResult = await dynamoDb.send(new GetItemCommand({
20  TableName: 'AccountBalances',
ConsistentRead: true always reads from primary — use for financial data
21  Key: { accountId: { S: accountId } },
22  ConsistentRead: true,   // Strong consistency (2x cost, always hits primary)
23}));
24 
25// ✅ Optimistic concurrency: prevent lost updates without distributed lock
26// Using DynamoDB conditional expression (compare-and-swap)
27await dynamoDb.send(new UpdateItemCommand({
Conditional update = optimistic locking without a distributed lock
28  TableName: 'AccountBalances',
29  Key: { accountId: { S: accountId } },
30  UpdateExpression: 'SET balance = :newBalance, version = :newVersion',
31  ConditionExpression: 'version = :expectedVersion',  // Fails if version changed
32  ExpressionAttributeValues: {
33    ':newBalance': { N: String(newBalance) },
34    ':newVersion': { N: String(currentVersion + 1) },
35    ':expectedVersion': { N: String(currentVersion) },
36  },
37  // If condition fails → ConditionalCheckFailedException → retry with fresh read
38}));

Distributed transactions: saga pattern vs 2PC

Two-Phase Commit (2PC): A coordinator asks all participants to "prepare" (vote). If all agree, coordinator sends "commit." Atomic across all participants. Problem: blocking — if the coordinator crashes after prepare but before commit, participants are locked waiting indefinitely. Use only for same-database transactions.

Saga pattern: A sequence of local transactions, each publishing an event or message to trigger the next step. If one step fails, compensating transactions undo previous steps. Non-blocking, resilient to failures. The standard approach for distributed transactions in microservices.

Saga for e-commerce checkout:

→

Reserve inventory (local tx in Inventory service)

→

Charge payment (local tx in Payment service) — if fails → release inventory

→

Create order (local tx in Orders service) — if fails → refund payment, release inventory

Send confirmation email (local tx in Notification service) — if fails → just retry (email is idempotent)

Reserve inventory (local tx in Inventory service)

Charge payment (local tx in Payment service) — if fails → release inventory

Create order (local tx in Orders service) — if fails → refund payment, release inventory

Send confirmation email (local tx in Notification service) — if fails → just retry (email is idempotent)

Choreography vs Orchestration: Choreography — each service reacts to events published by others. Simple but hard to trace. Orchestration — a central saga coordinator drives the steps. Easier to trace, single point of failure if not built with care.

saga-orchestrator.ts

1// Saga Orchestrator: manages distributed checkout transaction
2 
3type SagaStep<T> = {
4  name: string;
5  execute: (ctx: T) => Promise<T>;
6  compensate: (ctx: T) => Promise<void>; // Undo if a later step fails
7};
8 
9async function runSaga<T>(initialCtx: T, steps: SagaStep<T>[]): Promise<T> {
10  const completed: SagaStep<T>[] = [];
11  let ctx = initialCtx;
12 
13  for (const step of steps) {
14    try {
15      ctx = await step.execute(ctx);
16      completed.unshift(step); // Track for compensation (reverse order)
17    } catch (error) {
18      console.error(`Saga failed at step: ${step.name}`, error);
19 
Compensating transactions run in REVERSE order of execution
20      // Run compensating transactions in reverse order
21      for (const doneStep of completed) {
22        try {
Compensation failure requires manual intervention — alert immediately
23          await doneStep.compensate(ctx);
24        } catch (compensateError) {
25          // Compensation failure → manual intervention required
26          await sendAlertToOncall(`Compensation failed: ${doneStep.name}`, compensateError);
27        }
28      }
29      throw error;
30    }
31  }
32  return ctx;
33}
34 
35// Checkout saga steps
36interface CheckoutCtx {
37  userId: string; orderId?: string;
38  reservationId?: string; paymentId?: string;
39}
40 
41const checkoutSaga: SagaStep<CheckoutCtx>[] = [
42  {
43    name: 'reserve-inventory',
44    execute: async (ctx) => ({
45      ...ctx,
46      reservationId: await inventoryService.reserve(ctx.userId),
47    }),
48    compensate: async (ctx) => {
49      if (ctx.reservationId) await inventoryService.release(ctx.reservationId);
50    },
51  },
52  {
53    name: 'charge-payment',
54    execute: async (ctx) => ({
55      ...ctx,
56      paymentId: await paymentService.charge(ctx.userId, getOrderTotal(ctx)),
57    }),
58    compensate: async (ctx) => {
59      if (ctx.paymentId) await paymentService.refund(ctx.paymentId);
60    },
61  },
62  {
63    name: 'create-order',
64    execute: async (ctx) => ({
65      ...ctx,
66      orderId: await orderService.create(ctx),
67    }),
68    compensate: async (ctx) => {
69      if (ctx.orderId) await orderService.cancel(ctx.orderId);
70    },
71  },
72];

Leader election and consensus: Raft simplified

Many distributed systems need a single "leader" to coordinate writes: database primary, job scheduler, distributed lock holder. How do nodes agree on who is leader when any node can fail at any time?

Raft is the most understandable consensus algorithm (vs Paxos which is notoriously complex). Nodes are in one of three states: Follower (receive data from leader), Candidate (campaigning for leader), Leader (accepts writes, replicates to followers).

Leader election: if a Follower does not hear from the Leader for a randomized timeout, it becomes a Candidate, increments its term, and asks others to vote. A node wins if it gets votes from a majority (quorum). Safety: only one leader per term; a leader has all committed entries.

Why quorum matters: With 3 nodes, you need 2 (majority) to agree. If one node fails, the system still works. With 5 nodes, you can tolerate 2 failures. Never run with 2 nodes — you need 2 for quorum, so if one fails, the system halts.

Use existing consensus implementations

Never implement Raft or Paxos yourself. Use etcd, ZooKeeper, or Consul for distributed coordination. They have years of battle-testing that your custom implementation will not have.

Designing for failure: circuit breakers and bulkheads

Circuit breaker: Wraps calls to an external service. Tracks failure rate. When failure rate exceeds threshold, "opens" the circuit — subsequent calls fail immediately without calling the service. After a cooldown, allows a test request to see if the service has recovered.

Bulkhead: Isolate failures using separate resource pools per dependency. If the payment service thread pool is exhausted by slow payment calls, it does not affect the inventory service thread pool. Named after ship bulkheads that prevent flooding one compartment from sinking the ship.

Timeout everywhere: Every network call must have a timeout. Without timeouts, a slow dependency eventually exhausts your thread pool. Rule of thumb: set timeout to 2x the p99 latency of the dependency under normal load.

Retry with exponential backoff and jitter: Retry transient failures. Exponential backoff prevents retry storms. Jitter randomizes the backoff to prevent synchronized retries from all clients hitting the server at the same instant.

circuit-breaker.ts

1// Circuit Breaker implementation
2// States: CLOSED (normal) → OPEN (failing) → HALF_OPEN (testing)
3 
4class CircuitBreaker {
5  private failures = 0;
6  private lastFailureTime = 0;
7  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
8 
9  constructor(
10    private threshold = 5,           // Open after 5 failures
11    private recoveryTimeout = 30000, // Try again after 30s
12    private halfOpenRequests = 3,    // Allow 3 test requests in HALF_OPEN
13  ) {}
14 
15  async call<T>(fn: () => Promise<T>): Promise<T> {
16    if (this.state === 'OPEN') {
HALF_OPEN state: test if service has recovered before fully reopening
17      if (Date.now() - this.lastFailureTime > this.recoveryTimeout) {
18        this.state = 'HALF_OPEN';
19        this.failures = 0;
20      } else {
21        throw new Error('Circuit breaker OPEN — service unavailable');
22      }
23    }
24 
25    try {
26      const result = await fn();
27      if (this.state === 'HALF_OPEN') {
28        this.state = 'CLOSED'; // Recovery confirmed
29        console.log('Circuit breaker CLOSED — service recovered');
30      }
31      this.failures = 0;
32      return result;
33    } catch (error) {
34      this.failures++;
35      this.lastFailureTime = Date.now();
36      if (this.failures >= this.threshold) {
37        this.state = 'OPEN';
38        console.error(`Circuit breaker OPEN after ${this.failures} failures`);
39      }
40      throw error;
41    }
42  }
43}
44 
45// Retry with exponential backoff + jitter
46async function withRetry<T>(
47  fn: () => Promise<T>,
48  maxAttempts = 3,
49  baseDelayMs = 100
50): Promise<T> {
51  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
Jitter (Math.random() * 100) prevents thundering herd on retry
52    try {
53      return await fn();
54    } catch (error) {
55      if (attempt === maxAttempts) throw error;
56      const delay = baseDelayMs * 2 ** attempt + Math.random() * 100; // jitter
57      console.log(`Attempt ${attempt} failed, retrying in ${delay.toFixed(0)}ms`);
58      await new Promise(r => setTimeout(r, delay));
59    }
60  }
61  throw new Error('Should never reach here');
62}

How this might come up in interviews

Distributed systems questions test whether you understand failure modes that only appear in production at scale.

Common questions:

What is the difference between strong consistency and eventual consistency?
How would you implement a distributed lock?
Design a payment system that handles partial failures safely.
What is the saga pattern and when would you use it?
Explain how Raft leader election works.

Strong answers include:

Immediately raises idempotency when discussing retries
Chooses consistency model based on specific use case
Knows the saga pattern and compensating transactions
Understands circuit breakers and cascading failures

Red flags:

Proposes 2PC as the solution for cross-service transactions
Does not mention idempotency when discussing retries
Cannot explain the difference between CP and AP systems
Thinks distributed locks are simple

Quick check · Distributed Systems: Consensus, Consistency & Fault Tolerance

1 / 1

A checkout involves: reserve inventory → charge payment → create order. Payment fails. What should happen?

Key takeaways

Partial failures are the norm in distributed systems — design for them explicitly
Choose consistency model based on cost of inconsistency: eventual for preferences, linearizable for money
Saga pattern (compensating transactions) replaces 2PC for cross-service distributed transactions
Circuit breakers prevent cascading failures; bulkheads isolate failures per dependency
Retry + backoff + jitter handles transient failures; idempotency keys prevent duplicate operations
Use etcd/ZooKeeper for consensus — never implement Raft yourself

From the books

Designing Data-Intensive Applications — Martin Kleppmann (2017)

Chapters 8-9: Trouble with Distributed Systems, Consistency and Consensus

The most important insight in distributed systems: a node cannot distinguish between a slow network and a crashed remote node. This uncertainty is fundamental and cannot be eliminated — only managed.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Why distributed systems break your intuitions

Consistency models: what guarantee does your system provide?

A consistency model defines what values a read can return relative to previous writes. From strongest to weakest:

Sequential consistency: Operations appear in the same order to all nodes, but not necessarily in real-time order. Rare in practice.

Causal consistency: Causally related operations are seen in the same order by all nodes. Unrelated operations may be seen in different orders. Example: Cosmos DB in session mode.

Eventual consistency: Given no new updates, all replicas will converge to the same value — eventually. No guarantees about how long. Example: DynamoDB by default, Cassandra, DNS.

Most apps can live with eventual consistency — with careful design

consistency-examples.ts

1// Consistency model selection based on use case
2 
3// ✅ Eventually consistent: user preferences (stale data for 5s is fine)
4const dynamoDb = new DynamoDBClient({});
5await dynamoDb.send(new PutItemCommand({
6  TableName: 'UserPreferences',
7  Item: { userId: { S: userId }, theme: { S: 'dark' } },
8  // Default: eventual consistency — written to 1 replica, others catch up
9}));
10 
11// For reading: eventually consistent (cheaper, faster)
12const readResult = await dynamoDb.send(new GetItemCommand({
13  TableName: 'UserPreferences',
ConsistentRead: false = eventual (cheaper). Use for non-critical reads.
14  Key: { userId: { S: userId } },
15  ConsistentRead: false,  // Eventual consistency (default)
16}));
17 
18// ✅ Strongly consistent: account balance (stale data = double-spend risk)
19const strongReadResult = await dynamoDb.send(new GetItemCommand({
20  TableName: 'AccountBalances',
ConsistentRead: true always reads from primary — use for financial data
21  Key: { accountId: { S: accountId } },
22  ConsistentRead: true,   // Strong consistency (2x cost, always hits primary)
23}));
24 
25// ✅ Optimistic concurrency: prevent lost updates without distributed lock
26// Using DynamoDB conditional expression (compare-and-swap)
27await dynamoDb.send(new UpdateItemCommand({
Conditional update = optimistic locking without a distributed lock
28  TableName: 'AccountBalances',
29  Key: { accountId: { S: accountId } },
30  UpdateExpression: 'SET balance = :newBalance, version = :newVersion',
31  ConditionExpression: 'version = :expectedVersion',  // Fails if version changed
32  ExpressionAttributeValues: {
33    ':newBalance': { N: String(newBalance) },
34    ':newVersion': { N: String(currentVersion + 1) },
35    ':expectedVersion': { N: String(currentVersion) },
36  },
37  // If condition fails → ConditionalCheckFailedException → retry with fresh read
38}));

Distributed transactions: saga pattern vs 2PC

Saga for e-commerce checkout:

→

Reserve inventory (local tx in Inventory service)

→

Charge payment (local tx in Payment service) — if fails → release inventory

→

Create order (local tx in Orders service) — if fails → refund payment, release inventory

Send confirmation email (local tx in Notification service) — if fails → just retry (email is idempotent)

Reserve inventory (local tx in Inventory service)

Charge payment (local tx in Payment service) — if fails → release inventory

Create order (local tx in Orders service) — if fails → refund payment, release inventory

Send confirmation email (local tx in Notification service) — if fails → just retry (email is idempotent)

saga-orchestrator.ts

1// Saga Orchestrator: manages distributed checkout transaction
2 
3type SagaStep<T> = {
4  name: string;
5  execute: (ctx: T) => Promise<T>;
6  compensate: (ctx: T) => Promise<void>; // Undo if a later step fails
7};
8 
9async function runSaga<T>(initialCtx: T, steps: SagaStep<T>[]): Promise<T> {
10  const completed: SagaStep<T>[] = [];
11  let ctx = initialCtx;
12 
13  for (const step of steps) {
14    try {
15      ctx = await step.execute(ctx);
16      completed.unshift(step); // Track for compensation (reverse order)
17    } catch (error) {
18      console.error(`Saga failed at step: ${step.name}`, error);
19 
Compensating transactions run in REVERSE order of execution
20      // Run compensating transactions in reverse order
21      for (const doneStep of completed) {
22        try {
Compensation failure requires manual intervention — alert immediately
23          await doneStep.compensate(ctx);
24        } catch (compensateError) {
25          // Compensation failure → manual intervention required
26          await sendAlertToOncall(`Compensation failed: ${doneStep.name}`, compensateError);
27        }
28      }
29      throw error;
30    }
31  }
32  return ctx;
33}
34 
35// Checkout saga steps
36interface CheckoutCtx {
37  userId: string; orderId?: string;
38  reservationId?: string; paymentId?: string;
39}
40 
41const checkoutSaga: SagaStep<CheckoutCtx>[] = [
42  {
43    name: 'reserve-inventory',
44    execute: async (ctx) => ({
45      ...ctx,
46      reservationId: await inventoryService.reserve(ctx.userId),
47    }),
48    compensate: async (ctx) => {
49      if (ctx.reservationId) await inventoryService.release(ctx.reservationId);
50    },
51  },
52  {
53    name: 'charge-payment',
54    execute: async (ctx) => ({
55      ...ctx,
56      paymentId: await paymentService.charge(ctx.userId, getOrderTotal(ctx)),
57    }),
58    compensate: async (ctx) => {
59      if (ctx.paymentId) await paymentService.refund(ctx.paymentId);
60    },
61  },
62  {
63    name: 'create-order',
64    execute: async (ctx) => ({
65      ...ctx,
66      orderId: await orderService.create(ctx),
67    }),
68    compensate: async (ctx) => {
69      if (ctx.orderId) await orderService.cancel(ctx.orderId);
70    },
71  },
72];

Leader election and consensus: Raft simplified

Many distributed systems need a single "leader" to coordinate writes: database primary, job scheduler, distributed lock holder. How do nodes agree on who is leader when any node can fail at any time?

Use existing consensus implementations

Never implement Raft or Paxos yourself. Use etcd, ZooKeeper, or Consul for distributed coordination. They have years of battle-testing that your custom implementation will not have.

Designing for failure: circuit breakers and bulkheads

circuit-breaker.ts

1// Circuit Breaker implementation
2// States: CLOSED (normal) → OPEN (failing) → HALF_OPEN (testing)
3 
4class CircuitBreaker {
5  private failures = 0;
6  private lastFailureTime = 0;
7  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
8 
9  constructor(
10    private threshold = 5,           // Open after 5 failures
11    private recoveryTimeout = 30000, // Try again after 30s
12    private halfOpenRequests = 3,    // Allow 3 test requests in HALF_OPEN
13  ) {}
14 
15  async call<T>(fn: () => Promise<T>): Promise<T> {
16    if (this.state === 'OPEN') {
HALF_OPEN state: test if service has recovered before fully reopening
17      if (Date.now() - this.lastFailureTime > this.recoveryTimeout) {
18        this.state = 'HALF_OPEN';
19        this.failures = 0;
20      } else {
21        throw new Error('Circuit breaker OPEN — service unavailable');
22      }
23    }
24 
25    try {
26      const result = await fn();
27      if (this.state === 'HALF_OPEN') {
28        this.state = 'CLOSED'; // Recovery confirmed
29        console.log('Circuit breaker CLOSED — service recovered');
30      }
31      this.failures = 0;
32      return result;
33    } catch (error) {
34      this.failures++;
35      this.lastFailureTime = Date.now();
36      if (this.failures >= this.threshold) {
37        this.state = 'OPEN';
38        console.error(`Circuit breaker OPEN after ${this.failures} failures`);
39      }
40      throw error;
41    }
42  }
43}
44 
45// Retry with exponential backoff + jitter
46async function withRetry<T>(
47  fn: () => Promise<T>,
48  maxAttempts = 3,
49  baseDelayMs = 100
50): Promise<T> {
51  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
Jitter (Math.random() * 100) prevents thundering herd on retry
52    try {
53      return await fn();
54    } catch (error) {
55      if (attempt === maxAttempts) throw error;
56      const delay = baseDelayMs * 2 ** attempt + Math.random() * 100; // jitter
57      console.log(`Attempt ${attempt} failed, retrying in ${delay.toFixed(0)}ms`);
58      await new Promise(r => setTimeout(r, delay));
59    }
60  }
61  throw new Error('Should never reach here');
62}