Skip to main content
Career Paths
Concepts
Design Rate Limiter
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Designing a rate limiter

A deep-dive into rate limiting algorithms, distributed counting strategies, Redis Lua atomicity patterns, and production deployment at FAANG scale — covering token bucket, leaky bucket, fixed window, sliding window log, and sliding window counter with real-world tradeoffs.

🎯Key Takeaways
The sliding window counter is the best default algorithm for most production systems — it combines O(1) memory with near-exact accuracy and eliminates the boundary burst problem that plagues fixed windows.
Always use Redis Lua scripts (EVAL) for rate limiting logic to ensure atomicity — separate INCR and EXPIRE commands create race conditions that can permanently block users or leak counters.
Layer your rate limiters: coarse IP-based limits at the edge, per-user limits at the API gateway, and per-resource limits at individual services. Each layer catches what the layer above misses.
Rate limiter HTTP headers (429, Retry-After, X-RateLimit-*) are not optional — they enable well-behaved clients to self-throttle and are essential for a good developer experience.
In multi-region deployments, you must choose between exact global limits (high latency) and per-region independent limits (higher effective total). Most systems accept the tradeoff of per-region limits with a known multiplier.

Designing a rate limiter

A deep-dive into rate limiting algorithms, distributed counting strategies, Redis Lua atomicity patterns, and production deployment at FAANG scale — covering token bucket, leaky bucket, fixed window, sliding window log, and sliding window counter with real-world tradeoffs.

~14 min read
Be the first to complete!
What you'll learn
  • The sliding window counter is the best default algorithm for most production systems — it combines O(1) memory with near-exact accuracy and eliminates the boundary burst problem that plagues fixed windows.
  • Always use Redis Lua scripts (EVAL) for rate limiting logic to ensure atomicity — separate INCR and EXPIRE commands create race conditions that can permanently block users or leak counters.
  • Layer your rate limiters: coarse IP-based limits at the edge, per-user limits at the API gateway, and per-resource limits at individual services. Each layer catches what the layer above misses.
  • Rate limiter HTTP headers (429, Retry-After, X-RateLimit-*) are not optional — they enable well-behaved clients to self-throttle and are essential for a good developer experience.
  • In multi-region deployments, you must choose between exact global limits (high latency) and per-region independent limits (higher effective total). Most systems accept the tradeoff of per-region limits with a known multiplier.

Lesson outline

Why rate limiting matters

Every public-facing API needs a way to say "slow down" before a flood of requests collapses the system. Rate limiting is the first line of defence against denial-of-service attacks, misbehaving clients, runaway retry loops, and simple misconfiguration. Without it, a single bad actor or a bug in a partner integration can saturate your database connection pool, exhaust thread pools, and cascade failures across every microservice downstream.

Rate limiting vs throttling vs backpressure

Rate limiting rejects excess requests at the boundary. Throttling delays them (queuing or slowing). Backpressure propagates slowness upstream so producers reduce their sending rate. A production system often uses all three in concert.

What a rate limiter protects against

  • Abuse and DDoS — Prevents malicious actors from overwhelming your endpoints with high-volume automated requests.
  • Noisy neighbours — In multi-tenant systems, ensures one customer cannot starve others of capacity.
  • Runaway retries — Clients with aggressive retry logic can amplify failures; rate limits break the positive feedback loop.
  • Cost control — Downstream calls to third-party APIs (LLM providers, payment gateways) carry per-call cost; limiting upstream protects your bill.
  • Fair usage enforcement — Ties API usage to pricing tiers — free-tier users get 100 req/min, enterprise gets 10 000 req/min.

Algorithm 1 — Token bucket

The token bucket is the most widely deployed rate limiting algorithm in production. It models a bucket that holds up to B tokens (the burst capacity). Tokens are added at a steady rate of R tokens per second. Each request consumes one token. If the bucket is empty, the request is rejected (or queued). This naturally allows short bursts up to B while enforcing an average rate of R.

[Refill: R tokens/sec] --> [Bucket: max B tokens] --> [Request arrives: take 1 token]
                                                          |
                                                   bucket empty? --> 429 Too Many Requests

Token bucket conceptual flow — tokens refill at a constant rate, each request drains one token.

The math

If refill rate = R tokens/sec and bucket size = B, the maximum burst is B requests instantly, and sustained throughput is R req/sec. After a burst of B, the client must wait (B - available) / R seconds to refill. Example: R = 10, B = 50 means a client can burst 50 requests then sustain 10/sec.

Token Bucket — Redis Lua Script

-- KEYS[1] = rate limit key
-- ARGV[1] = bucket capacity (B)
-- ARGV[2] = refill rate per second (R)
-- ARGV[3] = current timestamp (seconds)
-- ARGV[4] = tokens to consume (usually 1)

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call("hmget", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Calculate tokens to add since last refill
local elapsed = math.max(0, now - last_refill)
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)

if new_tokens >= requested then
  new_tokens = new_tokens - requested
  redis.call("hmset", key, "tokens", new_tokens, "last_refill", now)
  redis.call("expire", key, math.ceil(capacity / refill_rate) * 2)
  return {1, new_tokens}  -- allowed, remaining
else
  redis.call("hmset", key, "tokens", new_tokens, "last_refill", now)
  redis.call("expire", key, math.ceil(capacity / refill_rate) * 2)
  return {0, new_tokens}  -- denied, remaining
end

Use for: Atomic token bucket in Redis — a single Lua script ensures no race condition between read-modify-write steps across concurrent requests.

Stripe uses a token bucket approach for their API rate limiting. Each API key gets a bucket sized to its plan tier. The token bucket is ideal when you want to allow controlled bursts (e.g., a batch import) while maintaining a long-term average rate.

Algorithm 2 — Leaky bucket

The leaky bucket is the dual of the token bucket. Think of a bucket with a hole in the bottom that leaks at a constant rate. Requests enter the top of the bucket (a FIFO queue). If the bucket (queue) is full, new requests are dropped. The bucket drains at a fixed rate, so outgoing requests are perfectly smoothed — no bursts at all on the output side.

[Incoming requests] --> [Queue (max size Q)] --> drains at rate R --> [Backend service]
                          |                                          
                     queue full? --> 429 Rejected

Leaky bucket model — the queue absorbs bursts, but the output is always steady at rate R.

PropertyToken BucketLeaky Bucket
Burst handlingAllows bursts up to bucket size BNo output bursts; queue absorbs input bursts
Output shapeBursty up to B, then R/secPerfectly smooth at R/sec
ImplementationCounter + timestampFIFO queue + timer
MemoryO(1) per keyO(Q) per key due to queue
Best forAPIs where short bursts are acceptableSystems needing smooth, predictable load

When to choose leaky bucket over token bucket

Use leaky bucket when your backend truly cannot handle any burstiness — for example, a payment processing pipeline with strict per-second throughput limits from a downstream provider. The tradeoff is added latency: requests sit in the queue waiting to drain rather than being served immediately.

Shopify historically used a leaky bucket for their REST Admin API. Each app got a bucket of 40 requests that leaked at 2 requests/second, giving a maximum sustained rate of 2/sec with the ability to absorb a short burst of 40.

Algorithm 3 — Fixed window counter

The simplest algorithm: divide time into fixed windows (e.g., every 60-second block starting at :00) and maintain a counter per window. When a request arrives, increment the counter for the current window. If counter > limit, reject. At the start of a new window, the counter resets to zero.

Fixed window — step by step

→

01

Determine the current window: window_start = floor(now / window_size) * window_size

→

02

Construct the key: "rate:{user_id}:{window_start}"

→

03

INCR the key in Redis. If the result is 1 (first request), also set EXPIRE to window_size.

04

If the counter exceeds the limit, return HTTP 429. Otherwise, allow the request.

1

Determine the current window: window_start = floor(now / window_size) * window_size

2

Construct the key: "rate:{user_id}:{window_start}"

3

INCR the key in Redis. If the result is 1 (first request), also set EXPIRE to window_size.

4

If the counter exceeds the limit, return HTTP 429. Otherwise, allow the request.

The boundary burst problem

A user can send N requests at second 59 of window 1 and N more at second 0 of window 2 — achieving 2N requests in 2 seconds while technically staying within "N per minute" in both windows. This is the fundamental flaw of fixed windows.

Fixed Window — Redis One-Liner (with race condition!)

-- Naive approach: INCR + EXPIRE is NOT atomic
-- Two commands = race condition window
local count = redis.call("INCR", KEYS[1])
if count == 1 then
  redis.call("EXPIRE", KEYS[1], tonumber(ARGV[1]))
end
return count

Use for: This Lua script wraps INCR and EXPIRE atomically. Without the Lua script, a crash between INCR and EXPIRE leaves a key with no TTL — a "leaked counter" that never resets and permanently blocks the user.

GitHub uses a fixed window approach for some of their REST API limits (5 000 requests per hour per authenticated user). The simplicity is attractive at scale — just one INCR per request — but the boundary burst issue means actual instantaneous load can be up to 2x the nominal limit.

Algorithm 4 — Sliding window log

To fix the boundary burst problem, the sliding window log tracks the exact timestamp of every request in a sorted set. When a new request arrives, remove all entries older than the window size, count remaining entries, and accept or reject. This gives a perfect per-user rate limit with no boundary issues.

Sliding Window Log — Redis Sorted Set

-- KEYS[1] = rate limit key
-- ARGV[1] = window size in seconds
-- ARGV[2] = max requests per window
-- ARGV[3] = current timestamp (microseconds for precision)

local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove entries outside the window
redis.call("ZREMRANGEBYSCORE", key, 0, now - window * 1000000)

-- Count remaining entries
local count = redis.call("ZCARD", key)

if count < limit then
  -- Add current request timestamp as both score and member
  redis.call("ZADD", key, now, now .. ":" .. math.random(1000000))
  redis.call("EXPIRE", key, window + 1)
  return {1, limit - count - 1}  -- allowed, remaining
else
  redis.call("EXPIRE", key, window + 1)
  return {0, 0}  -- denied, 0 remaining
end

Use for: Each request timestamp is stored as a member in a Redis sorted set. ZREMRANGEBYSCORE prunes old entries, ZCARD counts current ones. Perfectly accurate but O(N) memory per user.

Memory trap at scale

If your limit is 10 000 requests per hour, each user key stores up to 10 000 entries in a sorted set. With 1 million active users, that is 10 billion entries in Redis. This algorithm is best suited for low-limit scenarios (e.g., 100 login attempts per hour) where memory per key stays manageable.

The sliding window log is the gold standard for accuracy. It is commonly used for authentication-related limits (login attempts, password resets) where the limit is low and precision matters more than memory efficiency.

Algorithm 5 — Sliding window counter

The sliding window counter is the practical sweet spot — it combines the low memory of fixed windows with an approximation of sliding window accuracy. It keeps counters for the current and previous fixed windows, then uses a weighted formula to estimate the count in the sliding window.

The formula

estimated_count = (previous_window_count * overlap_percentage) + current_window_count. For example, if we are 30% into the current minute, overlap = 70% of previous window. If previous had 80 requests and current has 20, estimate = 80 * 0.7 + 20 = 76. With a limit of 100, we allow the request (76 < 100).

Timeline:  |----prev window (count=80)----|----current window (count=20)----|
                                          ^-- 70% overlap --^-- 30% elapsed
           Estimated sliding count = 80 * 0.70 + 20 = 76

Sliding window counter interpolates between two fixed windows for near-accurate results with O(1) memory.

Sliding Window Counter — Redis Implementation

-- KEYS[1] = current window key, KEYS[2] = previous window key
-- ARGV[1] = limit, ARGV[2] = window size (sec)
-- ARGV[3] = current timestamp

local curr_key = KEYS[1]
local prev_key = KEYS[2]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local curr_start = math.floor(now / window) * window
local elapsed = now - curr_start
local weight = 1 - (elapsed / window)

local prev_count = tonumber(redis.call("GET", prev_key) or "0")
local curr_count = tonumber(redis.call("GET", curr_key) or "0")

local estimated = prev_count * weight + curr_count

if estimated < limit then
  redis.call("INCR", curr_key)
  redis.call("EXPIRE", curr_key, window * 2)
  return {1, math.floor(limit - estimated - 1)}
else
  return {0, 0}
end

Use for: Two keys, one INCR — near-perfect accuracy with O(1) memory per user. Cloudflare uses this approach for many of their rate limiting products.

The recommended default

For most production systems, the sliding window counter is the best starting algorithm. It has O(1) memory per key (just two counters), no boundary burst problem (within ~1-2% error), and trivial Redis implementation. Only move to token bucket if you need explicit burst control, or sliding window log if you need exact precision.

Distributed rate limiting and race conditions

In a single-node system, an in-memory counter with a mutex works fine. But production systems run behind load balancers with dozens of application servers. If each node keeps its own counter, a user with a limit of 100/min could send 100 requests to each of N nodes, achieving N * 100 total requests. You need a shared, centralized counter — and that introduces distributed systems challenges.

The INCR/EXPIRE race condition

In Redis, INCR and EXPIRE are separate commands. If your process crashes after INCR but before EXPIRE, the key persists forever with no TTL — the user is permanently rate-limited. Always use a Lua script or Redis 7.0+ EXPIRETIME to make the pair atomic. Alternatively, embed the window timestamp in the key name so keys naturally become irrelevant.

Strategies for distributed counting

  • Centralized Redis — Single Redis instance (or cluster) is the source of truth. Every app server calls Redis per request. Latency: ~0.5ms per EVAL. This is the standard approach used by most companies.
  • Local + sync hybrid — Each node maintains a local counter and periodically syncs with a central store. Reduces Redis calls but allows temporary over-limit during sync gaps. Good for high-throughput, tolerance-friendly scenarios.
  • Sticky sessions — Route each user to the same node via consistent hashing on user ID. Local counters become accurate per user. Downside: uneven load distribution if some users are heavier than others.
  • Redis Cluster with hash tags — Use {user_id} hash tags to ensure all keys for one user land on the same shard. Prevents cross-shard Lua script failures while scaling horizontally.

Why Lua scripts matter

Redis is single-threaded per shard — a Lua script runs atomically without any lock. This means your read-check-update logic cannot be interleaved with another request. This is why every production rate limiter uses EVAL/EVALSHA rather than pipelining separate GET/INCR/EXPIRE commands.

ChallengeImpactMitigation
Clock skew across nodesDifferent servers disagree on current window, causing inconsistent accept/rejectUse Redis server time (redis.call("TIME")) inside Lua scripts rather than application-side timestamps
Redis failoverDuring sentinel failover (~5-30s), rate limit state is lost; all counters reset to zeroAccept brief over-limit during failover; use Redis Cluster with replicas for faster failover
Hot keysA viral endpoint causes millions of INCR calls to one key, creating a Redis hot-spotShard by {user_id}:{endpoint}, or use local pre-filtering to absorb obvious floods before hitting Redis
Network partitionApp servers cannot reach Redis; rate limiter cannot make decisionsFail-open (allow requests) or fail-closed (reject all) — choose based on your risk model; most systems fail-open

HTTP response headers and client communication

A well-designed rate limiter communicates its state to clients through standard HTTP headers so that well-behaved clients can self-throttle and avoid hitting limits in the first place.

HeaderDescriptionExample Value
HTTP 429 Too Many RequestsStatus code returned when rate limit is exceeded429
Retry-AfterSeconds (or HTTP-date) until the client should retry30
X-RateLimit-LimitThe maximum number of requests allowed in the window1000
X-RateLimit-RemainingHow many requests the client has left in the current window247
X-RateLimit-ResetUnix timestamp when the window resets (or seconds until reset)1672531200
RateLimit-PolicyIETF draft header describing the policy (e.g., "100;w=60")100;w=60

Always include Retry-After with 429

Without Retry-After, clients have no idea when to retry and will either give up or retry aggressively with exponential backoff that may not align with your window. Include it in every 429 response. Good clients like the GitHub CLI parse Retry-After and sleep exactly that long.

IETF RateLimit header fields (RFC 9110 extension)

The IETF is standardizing rate-limit headers as RateLimit (combined remaining, limit, reset) and RateLimit-Policy. While X-RateLimit-* headers are still the de facto standard, new APIs should consider adopting the IETF draft for forward compatibility.

Stripe returns X-RateLimit-Limit and X-RateLimit-Remaining on every response, not just 429s. This lets clients monitor their consumption proactively. GitHub includes X-RateLimit-Reset as a Unix timestamp and offers a /rate_limit endpoint to check status without consuming a request.

Placement, architecture, and multi-region considerations

Where you place the rate limiter has major implications for latency, accuracy, and operational complexity.

Placement options

  • API Gateway / Edge (L7 load balancer) — Rate limit at the entry point before requests reach any backend service. Lowest latency impact, broadest protection. AWS API Gateway, Kong, Envoy, and Cloudflare all support this natively. Best for global per-user or per-API-key limits.
  • Reverse proxy sidecar — A sidecar container (e.g., Envoy with rate limit service) co-located with each service instance. Useful for service-to-service rate limiting in a mesh architecture. Istio uses this pattern.
  • Application middleware — Rate limiting logic embedded in your application framework (Express middleware, Spring filter, etc.). Most flexible for business-logic-aware limits (e.g., "free users: 10 AI completions/day") but requires every service to implement it.
  • Per-resource limiter — Dedicated rate limiter in front of a specific expensive resource — e.g., a database connection pool limiter, an LLM API call limiter. Protects the bottleneck directly.

Layer your rate limiters

Production systems typically use multiple layers: a coarse global limit at the edge (e.g., 10 000 req/min per IP), a per-user limit at the API gateway (e.g., 1 000 req/min per API key), and a per-resource limit at the service level (e.g., 100 LLM calls/day per user). Each layer catches what the layer above misses.

Multi-region rate limiting

Global limit (single source of truth)

  • Cross-region Redis replication
  • Higher latency for non-local regions
  • Consistent limits worldwide
  • Used by: Stripe, GitHub

Per-region independent limits

  • Each region has its own Redis
  • Zero cross-region latency
  • User gets N req/min per region (effectively N * regions total)
  • Used by: CloudFront per-POP limiting

Hybrid: local + async sync

  • Local Redis per region for speed
  • Async replication merges counts periodically
  • Eventually consistent — allows temporary over-limit
  • Used by: Cloudflare Workers (edge rate limiting)
ApproachAccuracyLatency ImpactComplexityUse Case
Global centralized RedisExactHigh for distant regions (+50-150ms)LowStrict limits on financial APIs
Per-region independentPer-region onlyNone (local)LowApproximate limits where 2x is tolerable
Hybrid with async syncEventually consistentLow (local + background sync)HighGlobal APIs needing low latency and reasonable accuracy
CRDTs (e.g., G-Counter)Eventually consistentLowVery highResearch/cutting-edge; Cloudflare experiments

For most applications, per-region independent limits with a known multiplier (total effective limit = per-region limit * active regions) is the simplest approach. Reserve global centralized counting for scenarios where over-limit has real cost — billing, compliance, or security-critical endpoints.

Real-world rate limiting in production

CompanyAlgorithmLimitsNotable Details
GitHubFixed window5 000 req/hr (authenticated), 60/hr (unauthenticated)Dedicated /rate_limit endpoint; X-RateLimit-* headers on every response; uses conditional requests (304) to reduce consumption
StripeToken bucket100 req/sec (live mode), 25 req/sec (test mode)Per-API-key; returns X-RateLimit-* headers; exponential backoff recommended in docs; idempotency keys prevent double-charges on retries
CloudflareSliding window counterConfigurable per zoneRuns at the edge (200+ PoPs); can rate-limit by URI, headers, cookies, or JA3 fingerprint; action options: block, challenge, JS challenge, managed challenge
AWS API GatewayToken bucket10 000 req/sec (account), 5 000 burstPer-region; stage-level and method-level throttling; usage plans tie to API keys
DiscordToken bucket per routeVaries per endpoint (e.g., 5/sec for messages)Returns X-RateLimit-Bucket header; global and per-route limits; bot library authors must handle both

Idempotency and rate limiting work together

When a rate-limited request is retried, you need idempotency to prevent duplicate side effects. Stripe requires an Idempotency-Key header for POST requests — if the retry lands after the original succeeds, the duplicate is safely ignored. Always pair rate limiting with idempotency for write operations.

Rate limiter decision flow

Is the request from a known bad actor (blocklist)?
YesDrop immediately (HTTP 403)
NoCheck global rate limit
Is the global rate limit exceeded?
YesReturn 429 with Retry-After
NoCheck per-user rate limit
Is the per-user rate limit exceeded?
YesReturn 429 with Retry-After and X-RateLimit-* headers
NoCheck per-endpoint rate limit
Is the per-endpoint rate limit exceeded?
YesReturn 429 with specific endpoint limit info
NoAllow request and decrement counters
How this might come up in interviews

Rate limiter design is one of the most frequently asked system design questions at FAANG companies. Interviewers expect you to start with requirements (who is being limited, what resource, what limit), walk through at least two algorithms with tradeoffs, discuss the distributed counting problem, and explain HTTP 429 headers. Senior candidates are expected to discuss Redis Lua atomicity, multi-region challenges, and how to handle rate limiter failures (fail-open vs fail-closed).

Common questions:

  • Design a rate limiter for a cloud API service. Walk me through your algorithm choice and how it handles distributed servers.
  • How would you implement rate limiting in a system serving 10 million users across 5 geographic regions?
  • Compare token bucket and sliding window counter. When would you choose one over the other?
  • A client is getting rate-limited but claims they are under the limit. How would you debug this?
  • How do you handle the race condition between checking the counter and incrementing it in a distributed environment?
  • Your rate limiter uses Redis and Redis goes down. What happens to your system? What is your failover strategy?

Key takeaways

  • The sliding window counter is the best default algorithm for most production systems — it combines O(1) memory with near-exact accuracy and eliminates the boundary burst problem that plagues fixed windows.
  • Always use Redis Lua scripts (EVAL) for rate limiting logic to ensure atomicity — separate INCR and EXPIRE commands create race conditions that can permanently block users or leak counters.
  • Layer your rate limiters: coarse IP-based limits at the edge, per-user limits at the API gateway, and per-resource limits at individual services. Each layer catches what the layer above misses.
  • Rate limiter HTTP headers (429, Retry-After, X-RateLimit-*) are not optional — they enable well-behaved clients to self-throttle and are essential for a good developer experience.
  • In multi-region deployments, you must choose between exact global limits (high latency) and per-region independent limits (higher effective total). Most systems accept the tradeoff of per-region limits with a known multiplier.
Before you move on: can you answer these?

A user sends 100 requests at second 59 of a fixed window and 100 more at second 0 of the next window. If the limit is 100 requests per minute, how many total requests go through in those 2 seconds? How would a sliding window counter handle this differently?

With fixed windows, all 200 requests go through (100 per window, both within limit). With a sliding window counter at second 0 of the new window, the estimated count would be approximately 100 * 0.98 + 100 = 198, which exceeds the limit of 100, so most of the second batch would be rejected.

Why must INCR and EXPIRE in a fixed window rate limiter be executed atomically, and what happens if they are not?

If the process crashes after INCR but before EXPIRE, the key has no TTL and persists forever. The counter never resets, permanently blocking the user. A Redis Lua script ensures both operations execute as a single atomic unit. Alternatively, embedding the window timestamp in the key name means stale keys become irrelevant even without EXPIRE.

You run 3 API gateway instances in us-east-1 and 3 in eu-west-1, each with an independent Redis. Your per-user limit is 1000 req/min. What is the effective global limit a user could achieve, and how would you fix it?

The effective global limit is 2000 req/min (1000 per region * 2 regions). To fix: either use a single global Redis (adds cross-region latency), implement async cross-region counter synchronization (eventually consistent), or set per-region limits to 500 so the global total stays at 1000.

🧠Mental Model

💡 Analogy

A rate limiter is like a nightclub bouncer with a clicker counter. The bouncer lets people in one by one, clicking the counter each time. Once the club hits capacity, the bouncer says "line is full, come back later." The bouncer does not care who you are — just whether there is room. The token bucket is a bouncer who also hands out drink tickets at a steady rate: you can use your tickets in a burst, but once they are gone, you wait for the next batch.

⚡ Core Idea

Rate limiting is about converting an unpredictable flood of requests into a predictable, manageable stream — protecting both the system and the fairness contract between users.

🎯 Why It Matters

Without rate limiting, a single misbehaving client can take down an entire platform. Every FAANG outage postmortem eventually asks "why did we not rate-limit this?" It is the cheapest, highest-ROI reliability investment you can make.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.