Interactive Explainer

🎯Key Takeaways

L4 load balancers operate at TCP/UDP level — fast, protocol-agnostic, no content inspection. L7 operate at HTTP level — support path routing, header manipulation, SSL termination, WebSocket. Use L7 (ALB) for web/API, L4 (NLB) for non-HTTP protocols or maximum throughput.

Health checks must be deep, not shallow — test that DB and Redis connections work, not just that the server process is running. A server that returns 200 from /health but can't reach the DB will fail all real requests.

Sticky sessions are a symptom of stateful servers, not a solution. The correct fix is always to externalize session state to Redis — then any server can serve any request and you never need sticky sessions.

Algorithms matter: round-robin for uniform short requests; least-outstanding-requests for variable-duration requests (file uploads, streaming); IP-hash for stateful protocols (WebSocket, game servers).

Configure asymmetric connection draining: fast health check failure detection (unhealthy after 3 failures × 15 seconds = 45 seconds), but slow connection draining (300 seconds) to let in-progress requests complete before removing a server.

Load Balancers: L4, L7, Algorithms & Failure Modes

How load balancers actually work inside. L4 vs L7. Round-robin, least connections, IP hash — when each algorithm is right. Health checks, sticky sessions, SSL termination. Real AWS ALB vs NLB configurations.

~22 min read

Be the first to complete!

What you'll learn

L4 load balancers operate at TCP/UDP level — fast, protocol-agnostic, no content inspection. L7 operate at HTTP level — support path routing, header manipulation, SSL termination, WebSocket. Use L7 (ALB) for web/API, L4 (NLB) for non-HTTP protocols or maximum throughput.
Health checks must be deep, not shallow — test that DB and Redis connections work, not just that the server process is running. A server that returns 200 from /health but can't reach the DB will fail all real requests.
Sticky sessions are a symptom of stateful servers, not a solution. The correct fix is always to externalize session state to Redis — then any server can serve any request and you never need sticky sessions.
Algorithms matter: round-robin for uniform short requests; least-outstanding-requests for variable-duration requests (file uploads, streaming); IP-hash for stateful protocols (WebSocket, game servers).
Configure asymmetric connection draining: fast health check failure detection (unhealthy after 3 failures × 15 seconds = 45 seconds), but slow connection draining (300 seconds) to let in-progress requests complete before removing a server.

Lesson outline

What a Load Balancer Actually Does

A load balancer sits between clients and a pool of backend servers, distributing incoming requests across available servers. It sounds simple, but load balancers are responsible for three critical functions that, if misconfigured, cause major outages: traffic distribution, health monitoring, and transparent failover.

Without a load balancer, adding more servers does nothing — you can't send all traffic to one server and also to multiple servers simultaneously. The load balancer is what makes horizontal scaling possible.

graph LR
    C1[Client 1] --> LB[Load Balancer]
    C2[Client 2] --> LB
    C3[Client N] --> LB
    LB -->|Round Robin| S1[Server 1
✓ Healthy]
    LB -->|Skip| S2[Server 2
✗ Unhealthy]
    LB -->|Round Robin| S3[Server 3
✓ Healthy]
    S1 --> DB[(Database)]
    S3 --> DB
    note[LB continuously health-checks all servers.
Unhealthy servers removed automatically.]

Load balancer distributes requests across healthy servers, automatically removing failed ones

There are two fundamentally different types of load balancers: Layer 4 (operates at the transport layer, TCP/UDP) and Layer 7 (operates at the application layer, HTTP/HTTPS). The difference matters enormously for latency, feature set, and use cases.

Three Things Load Balancers Do Simultaneously

1. Distribute load: spread requests across servers using an algorithm (round-robin, least-connections, etc.). 2. Health monitoring: continuously probe backend servers, automatically remove failed ones. 3. Session management: optionally maintain "sticky" connections from a client to the same server. Understanding all three prevents misconfiguration.

Load balancers also act as a security boundary: they terminate SSL/TLS, absorb DDoS attacks, and hide the number and identity of backend servers. The backend servers see only the load balancer's IP address, not the original client IP (unless the LB adds X-Forwarded-For headers).

L4 vs. L7 Load Balancing: When Each Is Right

The most important choice in load balancer configuration is L4 vs. L7. This single decision affects performance, routing capability, and cost.

An L4 (Layer 4) load balancer operates at the TCP/UDP transport layer. It routes based on IP address, port, and protocol only. It doesn't look at the HTTP content at all. It opens a TCP connection to a backend server and forwards packets. Because it doesn't interpret the content, it's extremely fast — typically handling millions of connections per second with sub-millisecond overhead.

An L7 (Layer 7) load balancer operates at the HTTP/HTTPS application layer. It terminates the client connection, reads the full HTTP request (method, path, headers, body), makes routing decisions based on that content, then forwards the request to a backend. It can do things L4 cannot: route /api requests to one server pool and /static to another, add/modify headers, perform authentication, block malicious requests.

Dimension	L4 (TCP/UDP)	L7 (HTTP/HTTPS)
Routing basis	IP address + port only	URL path, hostname, headers, cookies
Content inspection	None — packets are opaque	Full HTTP request visibility
SSL termination	TCP passthrough (SSL handled by backend)	Yes — terminates SSL at LB, backends use HTTP
Connection handling	Persistent TCP tunnel to one backend	New connection per request (can use different backends)
Throughput	Millions of connections/sec (hardware LBs)	Hundreds of thousands req/sec (software overhead)
Latency added	< 0.1ms	1-5ms per request (HTTP parsing, routing decision)
Stickiness	IP hash based	Cookie-based (more reliable)
Health checks	TCP connect	HTTP health check endpoint (/health)
AWS equivalent	NLB (Network Load Balancer)	ALB (Application Load Balancer)
Use case	TCP services, game servers, raw TCP, lowest latency	HTTP APIs, web apps, microservices routing

Rule of Thumb: L7 for Web, L4 for Everything Else

Use L7 (ALB) for: REST APIs, web applications, microservices routing, WebSocket connections (long-lived L7 connections are handled better by ALB), GraphQL. Use L4 (NLB) for: TCP services that aren't HTTP (databases, game servers, custom protocols), workloads needing maximum throughput (NLB handles 100M req/sec per AZ), preserving client IP without X-Forwarded-For headers, SMTP/FTP/MQTT (non-HTTP protocols).

graph TB
    C[Client] --> ALB[AWS ALB
Layer 7]
    ALB -->|/api/*| API[API Servers
Node.js cluster]
    ALB -->|/ws/*| WS[WebSocket Servers]
    ALB -->|/static/*| S3[S3 + CloudFront
Static assets]

    C2[Game Client] --> NLB[AWS NLB
Layer 4]
    NLB -->|TCP:3000| GS1[Game Server 1
Stateful session]
    NLB -->|TCP:3001| GS2[Game Server 2
Stateful session]

L7 (ALB) for HTTP routing by path; L4 (NLB) for non-HTTP protocols needing TCP passthrough

Load Balancing Algorithms: Round-Robin, Least Connections, IP Hash

Once the load balancer decides which server pool to send a request to, it must pick a specific server from that pool. The algorithm for this selection has a significant impact on performance and resource utilization.

The 5 Load Balancing Algorithms

Round Robin — Requests go to servers in order: S1, S2, S3, S1, S2, S3... Simple, fair, works well when all servers have equal capacity and requests have similar duration. Most common default. Problem: doesn't account for server load — a slow server accumulates a backlog.
Weighted Round Robin — Same as round-robin but with weights: a server with weight 3 gets 3 requests for every 1 request the weight-1 server gets. Use when servers have different hardware specs: m6i.4xlarge (weight 4) vs m6i.xlarge (weight 1). Common in gradual traffic migrations (canary deployments).
Least Connections — Send new requests to the server with fewest active connections. Best for workloads with variable request duration — long-running requests don't cause a server to accumulate more and more connections under round-robin. Ideal for WebSocket, streaming, or any connection-heavy workload. Requires LB to track connection count per backend.
Least Response Time — Send to the server with the lowest combination of active connections AND fastest response time. The most sophisticated option. Used by Nginx Plus and HAProxy. Best for mixed workloads where some servers are degraded but not completely unhealthy. More overhead than simpler algorithms.
IP Hash (Consistent Hashing) — Hash the client's IP to always route to the same server. Creates "sticky sessions" based on IP without cookies. Problem: if a server goes down, all its clients rehash to other servers (unavoidable). Also problematic behind NAT (many clients share one IP, all go to same server). Use case: gaming sessions, WebSocket connections where you can't use cookies.

Algorithm	Best For	Worst For	AWS ALB Support
Round Robin	Short, uniform requests (REST APIs)	Variable duration requests (streaming)	Yes (default)
Least Requests	Mixed or long-duration requests	Uniform short requests (adds overhead)	Yes (Least Outstanding Requests)
IP Hash	Stateful protocols needing same server	NAT environments, clients behind proxies	No (use cookie-based sticky sessions instead)
Weighted	Canary deployments, mixed instance types	Identical server pools	Yes (via target group weights)
Random	Very high throughput (simple, fast)	Any workload where fairness matters	No (use round robin)

Round Robin + Variable Duration = Uneven Load

Imagine 10 servers and requests that take 1ms or 10 seconds (API calls vs. file uploads). Round-robin sends the 10-second upload to Server 1. The next 9,999 short requests are round-robined — but Server 1 still has that upload running and its CPU is at 100%. Least-connections would have avoided this by not sending more requests to a busy server. For any workload with variable request duration, use least-connections instead of round-robin.

Health Checks and Failure Detection

Health checks are how load balancers know whether a backend server is capable of serving requests. This sounds trivial but is one of the most commonly misconfigured aspects of load balancer setup. Wrong health check configuration causes two opposite failures: healthy servers being removed from rotation (false negatives), or failed servers continuing to receive traffic (false positives).

There are two types of health checks: active and passive.

Active vs. Passive Health Checks

Active Health Checks — The load balancer periodically probes each backend server with a synthetic request. Default for AWS ALB: HTTP GET /health every 30 seconds. If N consecutive checks fail, mark server as unhealthy and stop routing to it. If M consecutive checks succeed, mark as healthy again. These are configurable: interval, path, expected response code, timeout, healthy threshold, unhealthy threshold.
Passive Health Checks (Circuit Breaker) — The load balancer monitors real traffic response codes. If a server returns too many 5xx errors or too many timeouts within a time window, it's temporarily removed from rotation. More reactive than active checks — detects problems in real-time from actual traffic, not synthetic probes. Used by Envoy, HAProxy's health observer, AWS ALB's slow start.

The Health Check Misconfiguration That Causes Cascading Failures

Common mistake: health check endpoint /health returns 200 even when the database connection pool is exhausted. From the load balancer's perspective, the server is healthy. In reality, all requests are failing because the server can't reach the DB. The fix: deep health checks. Instead of return 200, check all dependencies: try db.ping(); try redis.ping(); only return 200 if all pass. With deep health checks, a server that can't reach the DB is correctly marked unhealthy and removed from rotation, directing traffic to healthy servers.

AWS ALB Health Check Best Practice Configuration

→

Health check path: /health (not / — the root path often requires auth or redirects). Your /health endpoint should verify: DB connectivity, Redis connectivity, any critical dependency. Return 200 if healthy, 503 if any dependency is down.

→

Healthy threshold: 2 consecutive checks must pass before a server is added back to rotation. Prevents flapping (server briefly healthy then unhealthy again).

→

Unhealthy threshold: 3 consecutive checks must fail before removing a server. Prevents false positives from temporary network hiccups.

→

Interval: 15-30 seconds. Shorter intervals detect failures faster but add overhead. For critical APIs, use 10 seconds.

→

Timeout: 5 seconds. Health check must respond within 5 seconds or it's counted as a failure.

Grace period (slow start): 120 seconds after instance launch before health checks begin. New JVM instances need time to warm up (JIT compilation). Node.js needs time to establish DB connections. Without grace period, new instances fail health checks immediately and are never added to rotation.

Healthy threshold: 2 consecutive checks must pass before a server is added back to rotation. Prevents flapping (server briefly healthy then unhealthy again).

Unhealthy threshold: 3 consecutive checks must fail before removing a server. Prevents false positives from temporary network hiccups.

Interval: 15-30 seconds. Shorter intervals detect failures faster but add overhead. For critical APIs, use 10 seconds.

Timeout: 5 seconds. Health check must respond within 5 seconds or it's counted as a failure.

sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server 1 (healthy)
    participant S2 as Server 2 (failing)

    loop Every 15 seconds
        LB->>S1: GET /health
        S1-->>LB: 200 OK (DB=OK, Redis=OK)
        LB->>S2: GET /health
        S2-->>LB: 503 (DB connection failed)
    end

    Note over LB,S2: After 3 consecutive failures...
    LB->>LB: Mark S2 unhealthy
    LB->>LB: Remove S2 from rotation
    Note over LB,S2: All traffic goes to healthy servers

    Note over LB,S2: S2 DB connection recovered...
    LB->>S2: GET /health
    S2-->>LB: 200 OK
    LB->>S2: GET /health (check 2)
    S2-->>LB: 200 OK
    LB->>LB: Mark S2 healthy, add back to rotation

Health check lifecycle: detect failure, remove from rotation, detect recovery, add back

Session Persistence, Sticky Sessions, and the Redis Solution

Sticky sessions (also called session persistence or session affinity) ensure that a client's requests always go to the same backend server. This is needed when the server holds client state that isn't shared (in-memory sessions, local files, WebSocket connections).

But sticky sessions have serious problems. If the designated server fails, all those clients lose their sessions and must re-authenticate. Sticky sessions also cause uneven load: if some users do very long operations, their server accumulates work while other servers sit idle. And IP-based stickiness breaks behind NAT (many users share one IP, all get routed to the same server).

Shopify Black Friday 2013: The Sticky Session Disaster

Shopify used sticky sessions based on a cookie that pinned users to specific server groups. During Black Friday 2013, one server group received a disproportionate share of high-traffic merchants. That group's servers hit 100% CPU while other groups sat at 20%. Because of sticky sessions, the load balancer couldn't rebalance — moving a user to a different server would break their session. 70% of traffic was effectively stranded on overloaded servers for 90 minutes while other servers were idle. The root fix: move sessions to Redis so any server can serve any user, eliminating the need for stickiness entirely.

Sticky Session Type	Mechanism	Failure Mode	Use Case
IP-Based Sticky	Hash client IP → assign to same server	All users behind same NAT go to same server. Server fail → session lost.	Legacy systems, non-HTTP protocols, last resort
Cookie-Based Sticky (ALB)	LB sets AWSALB cookie with server ID. Client sends cookie on every request.	Server fail → cookie becomes invalid → session lost (still stateful server problem)	Short-term: stateful server that can't be fixed quickly
No Stickiness (Redis)	All servers share session state via Redis. Any server handles any request.	None: server fail → next request goes to different server with same Redis data.	Correct long-term solution for any stateless-able application
Application-Level (WebSocket)	WS connection sticky via NLB IP hash or ALB connection draining	Server fail → WS connection drops → client reconnects (acceptable for WS)	WebSocket specifically — long-lived connections that must be maintained

Connection Draining: Graceful Server Removal

When a server is removed from rotation (scale-in, deployment, failure), in-flight requests shouldn't be dropped. Connection draining (ALB calls it "deregistration delay") keeps the server in rotation for existing connections for up to 300 seconds while not sending new requests. This allows long-running requests (file uploads, streaming) to complete naturally. Set deregistration delay to 30-300 seconds depending on your longest acceptable request duration.

SSL/TLS Termination at the Load Balancer

SSL/TLS termination means the load balancer decrypts incoming HTTPS traffic and forwards unencrypted HTTP to backend servers. This offloads the CPU-intensive cryptographic operations from application servers, which is significant: TLS handshake costs ~2-10ms and 1-5% CPU overhead for modern ciphers.

Three SSL Termination Patterns

SSL Termination at LB (most common) — HTTPS from client → LB decrypts → HTTP to backend. Pros: backends don't need SSL certificates; centralized certificate management; CPU savings. Cons: traffic between LB and backends is unencrypted (mitigated by VPC private network). AWS ALB default. Use for: most web apps, APIs.
SSL Passthrough (L4) — HTTPS from client → LB passes encrypted traffic → backend decrypts. LB never sees plaintext. Pros: end-to-end encryption; LB can't inspect content (regulatory compliance). Cons: LB can't do L7 routing or cookie-based stickiness; backends need certificates. Use for: compliance requirements (HIPAA, PCI-DSS), mutual TLS (mTLS) environments.
SSL Re-encryption (SSL Bridge) — HTTPS from client → LB decrypts → LB re-encrypts → HTTPS to backend. Full visibility at LB for L7 routing, end-to-end encryption. Cons: double encryption overhead; backend certificate management. Use for: regulated environments needing both L7 routing and end-to-end encryption.

AWS Certificate Manager: Free SSL at Scale

AWS ACM (Certificate Manager) provides free SSL certificates for any domain verified via DNS or email. Certificates auto-renew. Attach to ALB in seconds. This is why SSL termination at ALB is almost universal on AWS — zero cert management cost, auto-renewal, and CPU offload from app servers. One certificate can include up to 100 Subject Alternative Names (SANs) — domain names — so one cert covers api.example.com, app.example.com, static.example.com.

Metric	Without SSL at LB (backend handles SSL)	With SSL Termination at ALB
TLS handshake CPU	Shared with application logic on every server	Handled entirely by ALB hardware accelerators
Certificate management	Deploy cert to every backend server	One cert on ALB, auto-renewed by ACM
L7 routing	Available (HTTP visible after decrypt)	Available + LB can see all headers for routing
Connection reuse	TLS sessions must be per-server	ALB maintains session resumption tickets centrally
Latency	+2-10ms per new TLS connection on each backend	ALB handles TLS, backends get HTTP (near-zero overhead)

Load Balancer High Availability: Active-Active vs. Active-Passive

A load balancer that is itself a single point of failure defeats the purpose of having a load balancer. At production scale, the load balancer tier itself must be highly available. There are two patterns.

Active-Active vs. Active-Passive LB Pairs

Active-Active: two or more load balancers both handle traffic simultaneously. DNS round-robins between them. If one fails, DNS quickly routes all traffic to the surviving one. Higher throughput. More complex failover. Active-Passive: one primary LB handles all traffic. One standby LB is ready but idle. If primary fails, standby takes over the virtual IP (using keepalived/VRRP). Simpler failover but wasted standby capacity. AWS ALB is automatically active-active within a region: it runs multiple LB nodes across availability zones simultaneously. You don't manage this.

graph TB
    DNS[DNS
example.com] -->|IP: 52.0.0.1| LB1[Load Balancer 1
AZ us-east-1a]
    DNS -->|IP: 52.0.0.2| LB2[Load Balancer 2
AZ us-east-1b]
    DNS -->|IP: 52.0.0.3| LB3[Load Balancer 3
AZ us-east-1c]
    LB1 --> S1[Server 1
AZ 1a]
    LB1 --> S2[Server 2
AZ 1b]
    LB2 --> S1
    LB2 --> S3[Server 3
AZ 1c]
    LB3 --> S2
    LB3 --> S3

    note[AWS ALB automatically runs
active-active across AZs.
You get this free.]

AWS ALB is inherently active-active across AZs — multiple LB nodes in each availability zone

Load Balancer HA: Key Configuration Choices

Cross-zone load balancing — Without cross-zone: each LB node only routes to backends in its own AZ. With cross-zone: any LB node can route to any backend regardless of AZ. Prevents uneven distribution when one AZ has fewer backend instances. AWS ALB: cross-zone is always enabled. NLB: cross-zone is optional (disabled by default to reduce cross-AZ data transfer costs).
Load balancer idle timeout — How long a connection can be idle before the LB closes it. Default AWS ALB: 60 seconds. For long-running downloads or WebSocket connections, increase to 300-3600 seconds. For high-throughput APIs, keep at 60 seconds to free connections quickly.
Access logs — ALB can log every request to S3: timestamp, client IP, request line, response code, latency, target IP. Invaluable for debugging but generates GBs/hour of logs at scale. Enable for all production ALBs. Use Athena to query logs for incident analysis.
WAF integration — AWS WAF (Web Application Firewall) attaches to ALB and can block SQL injection, XSS, rate limit IPs, block geographic regions. First line of defense against common web attacks. Essential for any public-facing API.

AWS ALB vs. NLB vs. HAProxy vs. Nginx — Real Configuration

In practice, engineers choose between managed cloud load balancers (AWS ALB, NLB) and self-managed software load balancers (HAProxy, Nginx). Each has specific strengths.

Load Balancer	Type	Max Throughput	Pricing Model	Best For
AWS ALB	L7 (HTTP/HTTPS/WebSocket)	~1 million req/sec per region	Per hour + per LCU ($0.008/hr + $0.008/LCU-hr)	Most web apps, REST APIs, microservices
AWS NLB	L4 (TCP/UDP/TLS)	100M req/sec per AZ (!)	Per hour + per NLCU ($0.006/hr + $0.006/NLCU-hr)	High-throughput non-HTTP, gaming, IoT, lowest latency
HAProxy	L4 + L7 (both modes)	Limited by hardware	Free (open source)	On-prem, Kubernetes ingress, custom routing logic
Nginx (nginx.conf)	L7 primarily	Limited by hardware	Free (or Nginx Plus: $3,500/yr)
Cloudflare LB	L7 + Global Anycast	Unlimited (CDN-scale)	Per month ($50/mo+)	Global traffic, Anycast DNS, DDoS protection built in

Real ALB Configuration for a Typical Microservices Setup

Listener rules (in priority order): 1. IF host = api.example.com AND path = /v2/* → Forward to target group: api-v2-servers (new version). 2. IF host = api.example.com AND path = /v1/* → Forward to target group: api-v1-servers (old version). 3. IF host = ws.example.com → Forward to target group: websocket-servers. 4. IF path = /health → Return 200 directly (fixed response — no backend needed). 5. Default → Forward to target group: main-api-servers. This single ALB handles versioning, WebSocket routing, and health checks — replacing what would require complex Nginx config.

The choice between ALB and self-managed (HAProxy/Nginx) comes down to two factors: operational cost and feature requirements. ALB eliminates operational overhead (no server management, auto-scaling, multi-AZ HA is built-in) but costs money per request. HAProxy/Nginx are free but require dedicated infrastructure, HA configuration, and ongoing maintenance. For most teams on AWS, ALB is the right default — the operational savings outweigh the per-request cost at all but the highest traffic levels.

Practical ALB Setup Checklist

→

Create a target group: configure target type (instances, IPs, or Lambda), port, protocol, and health check settings (path, interval, thresholds).

→

Configure health check: path = /health, interval = 15 sec, healthy threshold = 2, unhealthy threshold = 3, timeout = 5 sec. Create /health endpoint in your application that verifies DB + Redis connectivity.

→

Create the ALB: select VPC, subnets (at least 2 AZs), security group (allow 443 inbound), and enable access logging to S3.

→

Configure HTTPS listener on port 443: attach ACM certificate, set security policy to TLSv1.2 minimum (ELBSecurityPolicy-TLS-1-2-2017-01), add listener rule to forward to target group.

→

Add HTTP → HTTPS redirect: port 80 listener with redirect action to HTTPS. Never serve production traffic over plain HTTP.

→

Enable WAF: attach AWS WAF with AWS Managed Rules (AWSManagedRulesCommonRuleSet) as minimum. Adds rate limiting and common attack protection with zero configuration.

Test: run curl -I https://yourdomain.com/health and verify 200. Run load test at 2x expected peak to confirm health checks and auto-scaling work together.

Create a target group: configure target type (instances, IPs, or Lambda), port, protocol, and health check settings (path, interval, thresholds).

Create the ALB: select VPC, subnets (at least 2 AZs), security group (allow 443 inbound), and enable access logging to S3.

Configure HTTPS listener on port 443: attach ACM certificate, set security policy to TLSv1.2 minimum (ELBSecurityPolicy-TLS-1-2-2017-01), add listener rule to forward to target group.

Add HTTP → HTTPS redirect: port 80 listener with redirect action to HTTPS. Never serve production traffic over plain HTTP.

Enable WAF: attach AWS WAF with AWS Managed Rules (AWSManagedRulesCommonRuleSet) as minimum. Adds rate limiting and common attack protection with zero configuration.

Test: run curl -I https://yourdomain.com/health and verify 200. Run load test at 2x expected peak to confirm health checks and auto-scaling work together.

Load Balancer Failure Modes and How to Debug Them

Load balancers fail in non-obvious ways. The failure mode isn't "LB stopped working" — it's "LB is working but routing traffic incorrectly" or "LB health checks are passing but backends are failing." These require specific debugging approaches.

The 7 Most Common Load Balancer Failure Modes

504 Gateway Timeout — LB cannot reach backends or backends exceed the idle timeout. Debug: check target group health in console, verify security groups allow LB → backend traffic on the backend port, increase ALB idle timeout if requests take > 60 seconds.
502 Bad Gateway — Backend returned an invalid HTTP response (connection closed prematurely, malformed response). Often indicates backend crashed mid-request or application error before response was sent. Debug: check backend application logs for crash/panic, look for connection reset errors.
Uneven traffic distribution — One server getting 3x more traffic than others. Causes: cross-zone load balancing disabled (AZ has fewer instances), sticky sessions routing too many users to one server, long-running requests with round-robin algorithm. Debug: CloudWatch target group request count per instance metric.
Health checks passing but requests failing — Shallow /health endpoint returns 200 but application is broken. The health check doesn't test real functionality. Fix: deep health check that tests DB query, Redis connection, critical API dependencies.
SSL handshake errors — Client getting SSL errors when LB cert doesn't include the requested hostname. Occurs when adding new subdomains without updating the cert. Fix: update ACM cert to include new SANs, or add a second cert on the listener (ALB supports multiple certs via SNI).
Session lost on scale-in — Users being logged out during deployments or scale-in events. Cause: sticky sessions pointing to terminated instances. Fix: move sessions to Redis, or configure connection draining with sufficient timeout for active sessions to complete.
Load balancer itself becomes bottleneck — At extreme scale, the LB itself can be the bottleneck. ALB scales automatically, but has limits (LCU budget). NLB handles 100M req/sec per AZ. For higher: use DNS-level global load balancing (Route 53 health routing + multiple ALBs). This is an extremely rare problem.

The 503 "No Healthy Targets" Debugging Process

503 Service Unavailable from ALB usually means all targets in a target group are unhealthy. Debugging checklist: 1. Check target group in console — are targets "Healthy", "Unhealthy", or "Initial"? 2. If "Initial": grace period hasn't elapsed yet — wait 2 minutes. 3. If "Unhealthy": click the target to see the health check failure reason — timeout, bad HTTP code, connection refused? 4. If "connection refused": check security group on instances (does it allow TCP inbound from ALB security group on the app port?). 5. If HTTP 5xx from /health: check application logs for startup failure.

graph TD
    Error[User sees 5xx error] --> Type{What error code?}
    Type -->|503| T503[503: No Healthy Targets]
    Type -->|504| T504[504: Backend Timeout]
    Type -->|502| T502[502: Backend Invalid Response]

    T503 --> Check1[Check Target Group health]
    Check1 -->|Initial state| Wait[Wait for grace period
2-3 minutes]
    Check1 -->|Unhealthy| Debug1[Check health check failure reason
in ALB console]
    Debug1 -->|Connection refused| SG[Check security group:
Allow LB SG → app port]
    Debug1 -->|HTTP 5xx from /health| AppLog[Check application logs
for startup errors]

    T504 --> Check2[Check ALB idle timeout
vs request duration]
    Check2 -->|Request > timeout| IncreaseTimeout[Increase ALB idle timeout
from 60s to 300s]
    Check2 -->|Timeout OK| BEDown[Backend process crashed?
Check application logs]

    T502 --> AppCrash[Backend crashed mid-response
Check for OOM, panics, exceptions]

Load balancer error debugging decision tree: 503 vs 504 vs 502

How this might come up in interviews

Load balancers appear in virtually every system design interview because every scaled system needs one. Interviewers specifically test: (1) Do you know L4 vs L7 and when to use each? (2) Can you explain health check configuration and why shallow checks are dangerous? (3) Do you understand why sticky sessions are an anti-pattern? (4) Can you describe a failure mode (504, 503) and debug it? Strong candidates mention connection draining, health check grace periods, and at least one algorithm beyond round-robin.

Common questions:

L4: "What is a load balancer and how does round-robin work?" [Tests: basic understanding of LB function and most common algorithm]
L4-L5: "What's the difference between L4 and L7 load balancing? Give me an example of when you'd use each." [Tests: protocol awareness, routing capability understanding]
L5: "Your health checks are passing but users are getting errors. How is this possible and how do you fix it?" [Tests: shallow vs. deep health check knowledge]
L5-L6: "Design a load balancing layer for a chat application with 10M users and WebSocket connections." [Tests: WebSocket-specific LB requirements, sticky sessions for WS, L4 vs L7 for long-lived connections]
L6: "How would you implement zero-downtime deployments for a stateful application that can't easily migrate to Redis sessions?" [Tests: connection draining configuration, rolling deployments with LB, blue-green deployment patterns]
L6-L7: "At 100M requests/second, AWS ALB is no longer sufficient. How do you build a load balancing layer at that scale?" [Tests: NLB limitations, global Anycast, DNS-level load balancing, building custom LB infrastructure]

Key takeaways

L4 load balancers operate at TCP/UDP level — fast, protocol-agnostic, no content inspection. L7 operate at HTTP level — support path routing, header manipulation, SSL termination, WebSocket. Use L7 (ALB) for web/API, L4 (NLB) for non-HTTP protocols or maximum throughput.
Health checks must be deep, not shallow — test that DB and Redis connections work, not just that the server process is running. A server that returns 200 from /health but can't reach the DB will fail all real requests.
Sticky sessions are a symptom of stateful servers, not a solution. The correct fix is always to externalize session state to Redis — then any server can serve any request and you never need sticky sessions.
Algorithms matter: round-robin for uniform short requests; least-outstanding-requests for variable-duration requests (file uploads, streaming); IP-hash for stateful protocols (WebSocket, game servers).
Configure asymmetric connection draining: fast health check failure detection (unhealthy after 3 failures × 15 seconds = 45 seconds), but slow connection draining (300 seconds) to let in-progress requests complete before removing a server.

Before you move on: can you answer these?

Your API has a /upload endpoint that accepts large file uploads (average 500MB, takes 3-10 minutes). Users are reporting that uploads fail partway through. The rest of the API works fine. What's causing this and how do you fix it?

The failure is almost certainly an ALB idle timeout. AWS ALB has a default idle timeout of 60 seconds. A 3-10 minute upload with no traffic flowing during processing (the upload is in-progress) will be killed by the LB after 60 seconds of inactivity. Fix: increase ALB idle timeout to 600-900 seconds (10-15 minutes) for this use case. Also: ensure the upload endpoint uses multipart uploads or chunked transfer encoding so the LB sees continuous traffic, preventing idle timeout. For very large files, the better architecture is to generate a pre-signed S3 URL and have the client upload directly to S3 (bypassing the LB entirely) — then the LB only needs to handle the short URL generation request, not the multi-minute upload.

You have 10 backend servers behind an ALB. One server is running slow (p99 latency = 5 seconds vs. 100ms normal). The round-robin LB doesn't know this. What happens to your users and how do you fix it?

With round-robin, 10% of requests will be routed to the slow server and experience 5-second latency. Users unlucky enough to hit this server will see degraded performance. The health check (if it's just a TCP check or a trivially fast /health endpoint) won't detect this because the server is still responding — just slowly. Fix #1: switch from round-robin to least-outstanding-requests (LOR) algorithm on ALB. LOR tracks in-flight requests per target — the slow server will quickly accumulate more in-flight requests than healthy servers, causing the LB to prefer the fast servers. Fix #2: implement a deep health check at /health that measures response time of a representative DB query — if > 500ms, return 503, which removes the slow server from rotation. Fix #3: configure ALB anomaly mitigation (enabled by default on newer ALBs) which automatically routes fewer requests to targets with elevated error rates or latency.

An interviewer asks you to design a load balancing layer for a real-time multiplayer game server. Players connect via WebSocket and must be routed to the same server for the duration of their game session. Sessions last 20-60 minutes. There are 10,000 concurrent game sessions. How do you design this?

This requires L4 (NLB) with IP-hash-based routing, not L7 (ALB). Reasons: (1) WebSocket connections are long-lived TCP connections — once established, all data flows over that connection. An L7 LB terminates HTTP connections per-request, which doesn't map to long-lived game sessions cleanly. NLB at L4 maintains persistent TCP connections. (2) The "same server" requirement: use IP-hash at L4, which consistently routes each client IP to the same game server. (3) Health check: NLB should use TCP health checks with a game-specific protocol echo. If the server doesn't respond, TCP connection fails and NLB stops routing new connections. (4) Connection draining: when a game server is being removed (deployment, failure), NLB should drain connections with a 3600-second (1 hour) timeout to allow in-progress games to finish. (5) Scale: 10,000 concurrent sessions with 60-minute duration = 2.78 sessions started/second. Much lower write throughput than average API — this is manageable on 5-10 game server instances. The real scale concern is memory per session on each game server, not LB throughput.

🧠Mental Model

💡 Analogy

Think of a load balancer as a smart GPS router, not a traffic cop. A traffic cop just waves cars in one direction. A smart GPS router knows the current state of every road (health checks), chooses the fastest route to your destination right now (least connections), handles road closures automatically by rerouting (failover), and verifies you're authorized to be on the road before letting you through (WAF, authentication). The GPS router is constantly aware of the entire network state — not just the current intersection.

⚡ Core Idea

Load balancers do three things simultaneously: distribute load using an algorithm (round-robin, least-connections), monitor health of backends (active health checks), and handle transparent failover when backends fail. L4 is fast and protocol-agnostic. L7 is slower but can route by HTTP content. For most web applications, L7 (ALB) is the right choice. Sticky sessions are a symptom of stateful servers — the cure is externalizing state to Redis, not better stickiness.

🎯 Why It Matters

A misconfigured load balancer is one of the most common causes of production outages at companies that have otherwise correct architectures. Shallow health checks mean unhealthy servers keep receiving traffic. Wrong cooldowns on auto-scaling mean your LB routes to instances before they're ready. Sticky sessions without connection draining mean deployments log users out. Every load balancer configuration decision has a failure mode you must understand.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

Load Balancers: L4, L7, Algorithms & Failure Modes

What a Load Balancer Actually Does

L4 vs. L7 Load Balancing: When Each Is Right

Load Balancing Algorithms: Round-Robin, Least Connections, IP Hash

Health Checks and Failure Detection

Session Persistence, Sticky Sessions, and the Redis Solution

SSL/TLS Termination at the Load Balancer

Load Balancer High Availability: Active-Active vs. Active-Passive

AWS ALB vs. NLB vs. HAProxy vs. Nginx — Real Configuration

Load Balancer Failure Modes and How to Debug Them

Discussion

In-app Q&A

Load Balancers: L4, L7, Algorithms & Failure Modes

What a Load Balancer Actually Does

L4 vs. L7 Load Balancing: When Each Is Right

Load Balancing Algorithms: Round-Robin, Least Connections, IP Hash

Health Checks and Failure Detection

Session Persistence, Sticky Sessions, and the Redis Solution

SSL/TLS Termination at the Load Balancer

Load Balancer High Availability: Active-Active vs. Active-Passive

AWS ALB vs. NLB vs. HAProxy vs. Nginx — Real Configuration

Load Balancer Failure Modes and How to Debug Them

Discussion

In-app Q&A