Interactive Explainer

🎯Key Takeaways

A URL shortener is a distributed key-value store with a 100:1 read-to-write ratio. Every design decision — code generation, database choice, caching strategy — is driven by this extreme read skew. Optimize the read path first.

Use Snowflake-style counter-based ID generation (not hash truncation) for short codes at scale. It eliminates collisions by construction, supports distributed writes without coordination, and avoids the cascade failure mode that hash collision retries create under burst load.

The 301 vs 302 redirect decision is the most consequential trade-off: 302 enables analytics (every click tracked) but costs more infrastructure; 301 reduces server load by 50-80% but makes click tracking impossible after the first visit. Default to 302 and explain why.

Redis caching is the critical scaling lever. The power-law distribution of URL access (20% of URLs receive 80%+ of clicks) means a modestly-sized cache (25-50 GB) can absorb 95%+ of all redirect traffic, reducing database load by 20x.

Separate the write path (URL creation) from the read path (redirects) in both architecture and infrastructure. They have different scaling characteristics, different latency requirements, and must be independently scalable so that a write-side problem never degrades redirect performance.

Designing a URL Shortener at Scale

A comprehensive deep-dive into designing a production-grade URL shortening service capable of handling billions of redirects per day. Covers short-code generation strategies (Base62 encoding, MD5 hash truncation, Snowflake counters), database schema and partitioning for read-heavy workloads, caching with Redis, the critical 301 vs 302 redirect decision, analytics pipelines, rate limiting, abuse prevention, and global scaling with sharding and TTL-based garbage collection. This is one of the most frequently asked system design interview questions at FAANG companies because it tests breadth (API design, storage, caching, CDN) and depth (hashing collisions, cache invalidation, analytics at scale) in a single 45-minute session.

~35 min read

Be the first to complete!

What you'll learn

A URL shortener is a distributed key-value store with a 100:1 read-to-write ratio. Every design decision — code generation, database choice, caching strategy — is driven by this extreme read skew. Optimize the read path first.
Use Snowflake-style counter-based ID generation (not hash truncation) for short codes at scale. It eliminates collisions by construction, supports distributed writes without coordination, and avoids the cascade failure mode that hash collision retries create under burst load.
The 301 vs 302 redirect decision is the most consequential trade-off: 302 enables analytics (every click tracked) but costs more infrastructure; 301 reduces server load by 50-80% but makes click tracking impossible after the first visit. Default to 302 and explain why.
Redis caching is the critical scaling lever. The power-law distribution of URL access (20% of URLs receive 80%+ of clicks) means a modestly-sized cache (25-50 GB) can absorb 95%+ of all redirect traffic, reducing database load by 20x.
Separate the write path (URL creation) from the read path (redirects) in both architecture and infrastructure. They have different scaling characteristics, different latency requirements, and must be independently scalable so that a write-side problem never degrades redirect performance.

Lesson outline

Why URL Shorteners Matter at Scale

URL shortening is far more than a convenience feature. At the scale of services like bit.ly (processing over 600 million link clicks per month), TinyURL (one of the oldest shorteners still in operation), and Twitter's t.co (which wraps every URL shared on the platform for analytics and safety scanning), URL shorteners are critical infrastructure that sits in the hot path of billions of HTTP requests daily.

The core value proposition is deceptively simple: given a long URL like https://example.com/products/category/electronics/item/12345?utm_source=newsletter&utm_medium=email, produce a short alias like https://short.ly/a7Bx3q that redirects to the original. But beneath this simplicity lies a rich set of engineering challenges that make this one of the most popular system design interview questions.

Why Interviewers Love This Question

A URL shortener touches every layer of the stack: API design, database modeling, hashing algorithms, caching, CDN integration, analytics, rate limiting, and global distribution. It can be discussed at L4 depth (basic CRUD + cache) or L7 depth (consistent hashing for sharding, probabilistic data structures for analytics, clock-based ID generation). This range makes it the perfect calibration question for system design interviews.

Real-world URL shorteners serve several critical functions beyond just making links shorter. Twitter's t.co service wraps every outbound URL to scan for malware, track engagement metrics, and provide a consistent character count for tweets. Marketing platforms like bit.ly and Rebrandly provide detailed click analytics — geographic distribution, device types, referral sources, and time-of-day patterns — that drive campaign optimization decisions worth millions of dollars.

Business functions of URL shorteners

Character conservation — SMS messages (160 chars), tweets, and push notifications have strict length limits. A 200-character URL leaves no room for the actual message.
Click analytics — Every redirect is a data collection opportunity: timestamp, IP address, User-Agent, referrer, geographic location. This data powers marketing attribution models.
Link safety — Services like t.co check destination URLs against malware databases before redirecting. This protects users from phishing and drive-by downloads.
Brand consistency — Custom short domains (yourbrand.link/promo) reinforce brand identity while maintaining all analytics capabilities.
A/B testing and routing — Short URLs can redirect to different destinations based on user geography, device type, or experimental cohort — enabling server-side A/B tests without client changes.
Link lifecycle management — Short URLs can expire, be deactivated, or have their destination changed — useful for time-sensitive promotions or correcting accidental links.

Scale Numbers to Remember

bit.ly processes roughly 600M clicks per month (~230 clicks/second average, ~2,000/second peak). Twitter generates approximately 500M tweets per day, each containing at least one t.co link. If you assume 100M new short URLs created per day and a 100:1 read-to-write ratio, the redirect service must handle approximately 115,000 reads per second at average load and 5-10x that at peak.

Understanding the scale is critical because it drives every architectural decision downstream. A URL shortener serving 1,000 requests per day can run on a single SQLite database. One serving 100 million redirects per day requires distributed caching, database sharding, CDN integration, and carefully designed data pipelines. The interview question is specifically testing whether you can navigate these scaling thresholds and explain which architectural components become necessary at each stage.

Requirements Gathering

Before drawing any architecture diagrams, a strong system design answer begins with requirements gathering. This phase establishes the scope of the problem and surfaces the constraints that drive every subsequent decision. Interviewers specifically look for candidates who ask clarifying questions rather than immediately jumping into a solution.

Functional requirements

Create short URL — Given a long URL, generate a unique short URL (e.g., https://short.ly/a7Bx3q). Optionally allow the user to specify a custom alias.
Redirect — When a user visits the short URL, redirect them to the original long URL with minimal latency (target: < 50ms p99).
Custom aliases — Allow users to choose their own short code (e.g., short.ly/my-promo) subject to availability and character validation.
Expiration — Support optional TTL (time-to-live) for short URLs. Expired URLs should return 404 or redirect to a default page.
Analytics (optional) — Track click count, geographic distribution, referrer, device type, and timestamp per short URL.
API access — Provide a REST API for programmatic URL creation, deletion, and analytics retrieval. Support API keys for authentication.

Non-functional requirements

High availability — The redirect service must be 99.99% available. Even brief downtime breaks every link on the internet that points to us.
Low latency — Redirects must complete in under 50ms at p99. Users perceive any delay as the destination site being slow, damaging our reputation and theirs.
Scalability — Support 100M+ new URLs per day and 10B+ redirects per day at peak. The system must scale horizontally.
Durability — Once a short URL is created, the mapping must never be lost. URLs shared in printed materials or saved in bookmarks must work for years.
Consistency — A newly created short URL must be resolvable within seconds. Eventual consistency is acceptable for analytics but not for the redirect path.
Security — Rate limiting to prevent abuse, spam detection, and malware URL scanning to protect end users.

Back-of-Envelope Estimation

Start with 100M new URLs per day. That is roughly 1,200 writes/second average and 6,000 writes/second at 5x peak. With a 100:1 read-to-write ratio, expect 120,000 reads/second average and 600,000 reads/second at peak. Storage: if each URL record is ~500 bytes (short code + long URL + metadata), 100M URLs/day = 50 GB/day = 18 TB/year. Over 5 years with no expiration: ~90 TB. This means a single database node is insufficient — sharding or a distributed store is required.

Metric	Value	Derived From
New URLs per day	100M	Product requirement (bit.ly scale)
Write QPS (avg)	~1,200/sec	100M / 86,400 seconds
Write QPS (peak 5x)	~6,000/sec	1,200 x 5
Read:Write ratio	100:1	Industry standard for read-heavy services
Read QPS (avg)	~120,000/sec	1,200 x 100
Read QPS (peak 5x)	~600,000/sec	120,000 x 5
Record size	~500 bytes	Short code (7B) + long URL (200B avg) + metadata (293B)
Daily storage	50 GB	100M x 500 bytes
5-year storage	~90 TB	50 GB x 365 x 5

API Contract

POST /api/v1/urls { "long_url": "https://...", "custom_alias": "optional", "ttl_seconds": 86400 } returns { "short_url": "https://short.ly/a7Bx3q", "expires_at": "..." }. GET /{short_code} returns 301 or 302 redirect. DELETE /api/v1/urls/{short_code} deactivates the URL. GET /api/v1/urls/{short_code}/stats returns analytics. Use API versioning from day one — you will change the response format.

A critical question to ask the interviewer: should short URLs be permanent or expire by default? This decision profoundly affects storage growth, garbage collection strategy, and cache TTL policy. If URLs are permanent, storage grows without bound and you need a plan for 90+ TB of data. If URLs expire after, say, 5 years, you can reclaim short codes and keep storage manageable.

Custom Alias Trap

Custom aliases introduce a uniqueness check on the write path: before accepting the alias, you must verify it does not already exist. Under high write QPS, this check-then-insert pattern creates a race condition unless you use database-level unique constraints or distributed locks. Many candidates forget this edge case. The simplest solution: rely on a UNIQUE index on the short_code column and handle the duplicate key error gracefully.

Short URL Generation Strategies

The heart of a URL shortener is the algorithm that converts a long URL into a short, unique code. There are three primary approaches, each with distinct trade-offs around collision risk, predictability, performance, and implementation complexity. A strong interview answer compares all three and justifies a choice based on the specific requirements.

Short Code Length Math

A short code using Base62 (a-z, A-Z, 0-9) with 7 characters provides 62^7 = 3.52 trillion unique combinations. At 100M new URLs per day, this address space lasts over 96,000 years. With 6 characters (62^6 = 56.8 billion), it lasts 1,556 years. Seven characters is the industry standard because it balances brevity with an effectively inexhaustible address space.

Strategy 1: Base62 Encoding of Auto-Increment ID

→

Insert the URL record into the database and obtain the auto-incremented integer ID (e.g., 1234567890).

→

Convert the integer to a Base62 string: repeatedly divide by 62 and map remainders to characters (0-9 = 0-9, a-z = 10-35, A-Z = 36-61).

→

Example: ID 1234567890 in Base62 = "1LY7VK" (6 characters). Pad to 7 characters if needed.

→

Store the mapping: short_code "1LY7VK" -> long URL. The short code is deterministically derived from the ID.

→

On redirect lookup, convert the short code back to the integer ID and fetch the record by primary key — an O(1) operation.

Advantage: zero collision risk because auto-increment IDs are unique by definition. Disadvantage: sequential and predictable — attackers can enumerate all URLs.

Insert the URL record into the database and obtain the auto-incremented integer ID (e.g., 1234567890).

Convert the integer to a Base62 string: repeatedly divide by 62 and map remainders to characters (0-9 = 0-9, a-z = 10-35, A-Z = 36-61).

Example: ID 1234567890 in Base62 = "1LY7VK" (6 characters). Pad to 7 characters if needed.

Store the mapping: short_code "1LY7VK" -> long URL. The short code is deterministically derived from the ID.

On redirect lookup, convert the short code back to the integer ID and fetch the record by primary key — an O(1) operation.

Advantage: zero collision risk because auto-increment IDs are unique by definition. Disadvantage: sequential and predictable — attackers can enumerate all URLs.

Strategy 2: MD5 Hash Truncation

→

Compute MD5(long_url) to get a 128-bit hash (32 hex characters).

→

Take the first 7 characters of the Base62-encoded hash as the short code.

→

Check the database for collisions: does this short code already exist?

→

If collision detected: append a counter or salt to the long URL and rehash. Repeat until a unique code is found.

→

Store the mapping. On redirect, look up the short code directly.

Advantage: the same long URL always produces the same short code (idempotent). Disadvantage: collisions require detection and resolution, adding latency and complexity.

Compute MD5(long_url) to get a 128-bit hash (32 hex characters).

Take the first 7 characters of the Base62-encoded hash as the short code.

Check the database for collisions: does this short code already exist?

If collision detected: append a counter or salt to the long URL and rehash. Repeat until a unique code is found.

Store the mapping. On redirect, look up the short code directly.

Advantage: the same long URL always produces the same short code (idempotent). Disadvantage: collisions require detection and resolution, adding latency and complexity.

Strategy 3: Snowflake-Style Counter

→

Use a distributed ID generator (Snowflake, ULID, or similar) to produce a globally unique 64-bit ID.

→

Convert the ID to Base62 to produce the short code. A 64-bit integer yields up to 11 Base62 characters; truncate or use the lower bits for 7 characters.

→

No collision check needed: the ID generator guarantees uniqueness across all nodes.

→

Store the mapping with the generated short code.

Advantage: no coordination for collision resolution, time-sortable IDs, works across distributed write nodes. Disadvantage: requires infrastructure for the ID generator service.

Use a distributed ID generator (Snowflake, ULID, or similar) to produce a globally unique 64-bit ID.

Convert the ID to Base62 to produce the short code. A 64-bit integer yields up to 11 Base62 characters; truncate or use the lower bits for 7 characters.

No collision check needed: the ID generator guarantees uniqueness across all nodes.

Store the mapping with the generated short code.

Advantage: no coordination for collision resolution, time-sortable IDs, works across distributed write nodes. Disadvantage: requires infrastructure for the ID generator service.

Property	Base62(Auto-Inc)	MD5 Truncation	Snowflake Counter
Collision risk	Zero	Medium (~1 in 62^7 per URL, increases with scale)	Zero
Predictability	High (sequential)	Low (hash is pseudo-random)	Low (timestamp + random bits)
Write latency	Low (single DB insert)	Medium (hash + collision check + retry)	Low (local ID generation)
Distributed writes	Requires ticket server or sequence coordination	Natural (hash is stateless)	Natural (each node generates IDs independently)
Idempotency	No (same URL gets different IDs on each insert)	Yes (same URL = same hash)	No (same URL gets different IDs)
URL enumeration risk	High (increment by 1 to find next URL)	Low	Low (timestamp bits obfuscate)
Implementation complexity	Low	Medium (collision handling)	Medium (ID generator infrastructure)

The Recommended Approach for Interviews

Most FAANG interviewers expect you to discuss the Base62 encoding approach and then address its enumeration weakness. The strongest answer: use a Snowflake-style counter for uniqueness and encode it in Base62 for the short code. This gives you zero collisions, no enumeration risk (the timestamp bits make codes non-sequential), distributed write capability, and simple implementation. Mention that you would add a UNIQUE constraint on short_code as a safety net.

Common Mistake: Ignoring Collision Handling

Candidates who propose MD5 truncation but do not discuss collision resolution will lose points. With 3.52 trillion possible 7-character codes, the birthday paradox means collisions become probable after approximately 2.6 million URLs (sqrt of the address space). At 100M URLs per day, you will see collisions within minutes. Your collision resolution strategy (append counter, rehash with salt, or use a bloom filter to pre-check) is a critical part of the design.

For the remainder of this design, we will assume the Snowflake counter approach: a distributed ID generator produces unique 64-bit IDs, which are encoded in Base62 to produce 7-character short codes. This eliminates collision handling complexity and supports distributed writes across multiple data centers.

Database Schema & Storage Design

The URL mapping table is the core data structure of the entire system. Every redirect must look up this table (or its cache), so the schema and storage engine choice directly impact latency, throughput, and operational complexity. The read-to-write ratio of 100:1 means we must optimize aggressively for reads while maintaining acceptable write performance.

Column	Type	Purpose
id	BIGINT (primary key)	Auto-generated Snowflake ID; used internally for efficient joins and indexing
short_code	VARCHAR(7), UNIQUE INDEX	The Base62-encoded short code; used for redirect lookups
long_url	TEXT	The original destination URL; can be up to 2,048 characters
user_id	BIGINT, INDEX	The account that created this URL; NULL for anonymous creations
created_at	TIMESTAMP	Creation time; useful for analytics and TTL computation
expires_at	TIMESTAMP, INDEX	Optional expiration; NULL means permanent. Indexed for garbage collection queries
click_count	BIGINT DEFAULT 0	Denormalized click counter for quick stats display; updated asynchronously
is_active	BOOLEAN DEFAULT true	Soft delete flag; deactivated URLs return 410 Gone

Index Strategy for Read-Heavy Workloads

The most critical index is on short_code — every redirect hits this index. Use a hash index (not B-tree) if your database supports it, because redirect lookups are always exact-match (WHERE short_code = ?), never range queries. In PostgreSQL, a HASH index on short_code provides O(1) lookups. In MySQL InnoDB, the UNIQUE index on short_code is a B-tree but performs nearly as well for exact matches when the index fits in the buffer pool.

The SQL vs NoSQL decision depends on your scale tier and consistency requirements. For a URL shortener, the access pattern is extremely simple: point lookups by short_code for reads, and single-row inserts for writes. There are no complex joins, no multi-table transactions, and no range queries on the critical path. This access pattern is ideal for both SQL and NoSQL databases.

Criterion	SQL (PostgreSQL/MySQL)	NoSQL (DynamoDB/Cassandra)
Access pattern fit	Good — point lookups on indexed column are fast	Excellent — designed for key-value lookups
Write throughput	5K-20K QPS per node; sharding required beyond this	100K+ QPS with horizontal scaling; linear scalability
Read throughput	50K+ QPS with read replicas; connection pooling critical	Millions of QPS with consistent hashing; no connection limits
Consistency	Strong by default (ACID)	Tunable: eventual to strong; DynamoDB offers strong reads per-request
Schema flexibility	Fixed schema; migrations needed for changes	Schemaless; add fields without migrations
Operational complexity	Moderate: backups, replication, connection pools	Lower with managed services (DynamoDB); higher self-managed (Cassandra)
Cost at scale (90 TB)	High: large instances + read replicas + sharding middleware	Moderate: pay-per-request or provisioned capacity; scales linearly

Partitioning Strategy

Partition (shard) the URL table by short_code hash. Since redirects look up by short_code, the query always hits exactly one partition — no scatter-gather needed. Use consistent hashing to distribute short_codes across N database shards. When adding a shard, only 1/N of the data migrates. For DynamoDB, short_code is the natural partition key and DynamoDB handles distribution automatically. For PostgreSQL, use Citus or manual hash partitioning with short_code hash mod N.

graph TB
    subgraph "Database Sharding by short_code Hash"
        Client["API Server"] --> Router["Shard Router<br/>hash(short_code) % N"]
        Router --> S0["Shard 0<br/>codes a-j<br/>hash range 0-24%"]
        Router --> S1["Shard 1<br/>codes k-r<br/>hash range 25-49%"]
        Router --> S2["Shard 2<br/>codes s-z<br/>hash range 50-74%"]
        Router --> S3["Shard 3<br/>codes A-Z, 0-9<br/>hash range 75-99%"]
    end

    S0 --> R0["Read Replica 0a"]
    S0 --> R1["Read Replica 0b"]
    S1 --> R2["Read Replica 1a"]
    S1 --> R3["Read Replica 1b"]

    style Router fill:#f59e0b,stroke:#333,color:#fff
    style S0 fill:#3b82f6,stroke:#333,color:#fff
    style S1 fill:#3b82f6,stroke:#333,color:#fff
    style S2 fill:#3b82f6,stroke:#333,color:#fff
    style S3 fill:#3b82f6,stroke:#333,color:#fff

URL table sharded by short_code hash. Each shard has its own read replicas for the 100:1 read-heavy workload.

Hot Shard Problem

A viral link (celebrity tweet, breaking news) can receive millions of redirects per second, all hitting the same shard since the short_code maps to one partition. The database shard cannot handle this alone. Solution: the caching layer (Redis) absorbs hot-key traffic before it reaches the database. For extremely viral URLs, replicate the cache entry across all Redis nodes so any node can serve the redirect. This is the same pattern Twitter uses for trending topics.

For the write path, the database only needs to handle ~6,000 writes per second at peak. This is well within the capacity of a single PostgreSQL primary or a small DynamoDB table. The real scaling challenge is on the read path, which is why the caching layer (covered in a later section) is the most critical architectural component.

High-Level Architecture

The architecture of a URL shortener has two distinct paths: the write path (creating short URLs) and the read path (redirecting). The read path is 100x hotter and must be optimized for absolute minimum latency. The write path has more flexibility but must guarantee uniqueness and durability. Understanding these two paths separately is the key to a clean design.

graph TB
    subgraph "Write Path — URL Creation"
        U1["Client / API Consumer"] -->|"POST /api/v1/urls"| LB1["Load Balancer<br/>L7 / ALB"]
        LB1 --> API1["API Server 1"]
        LB1 --> API2["API Server 2"]
        LB1 --> API3["API Server N"]
        API1 --> IDG["ID Generator<br/>Snowflake Service"]
        API1 --> DB["Primary Database<br/>Write Master"]
        API1 --> Cache1["Redis — Write-Through<br/>Cache new mapping"]
    end

    subgraph "Read Path — Redirect"
        U2["Browser / Client"] -->|"GET /a7Bx3q"| CDN["CDN Edge<br/>(301 cached)"]
        CDN -->|"Cache MISS"| LB2["Load Balancer"]
        LB2 --> RS1["Redirect Server 1"]
        LB2 --> RS2["Redirect Server N"]
        RS1 --> Cache2["Redis Cluster<br/>Cache-Aside Lookup"]
        Cache2 -->|"Cache MISS"| DBR["Database<br/>Read Replica"]
    end

    style CDN fill:#10b981,stroke:#333,color:#fff
    style Cache1 fill:#ef4444,stroke:#333,color:#fff
    style Cache2 fill:#ef4444,stroke:#333,color:#fff
    style DB fill:#3b82f6,stroke:#333,color:#fff
    style DBR fill:#3b82f6,stroke:#333,color:#fff
    style IDG fill:#8b5cf6,stroke:#333,color:#fff

Two-path architecture: the write path generates IDs and stores mappings; the read path serves redirects through CDN and Redis cache layers.

Component responsibilities

Load Balancer (L7) — Routes /api/* to write API servers and /{short_code} to redirect servers. Health checks ensure traffic only goes to healthy instances. Use AWS ALB or Nginx with least-connections routing.
API Servers (stateless) — Handle URL creation, validation, and authentication. Stateless: any server can handle any request. Scale horizontally by adding instances behind the load balancer.
ID Generator — Snowflake-style service that produces unique 64-bit IDs. Can be embedded in each API server (one machine ID per instance) or a separate microservice. Each node generates IDs independently — no coordination on the hot path.
Primary Database — Stores the authoritative URL mappings. Handles all writes. Replicates to read replicas asynchronously (replication lag < 1 second in normal conditions).
Read Replicas — Handle the 100:1 read load. Redirect servers query replicas, not the primary. Scale by adding more replicas (up to ~15 for PostgreSQL, unlimited for DynamoDB).
Redis Cluster — Cache layer between redirect servers and the database. Holds the hottest URL mappings in memory. Cache hit rate target: 95%+ (because of the 80/20 rule — 20% of URLs account for 80%+ of traffic).
CDN Edge — For URLs with 301 (permanent) redirects, the CDN caches the redirect response at edge locations worldwide. A user in Tokyo gets redirected from the Tokyo edge without any request reaching your origin servers.

Write Path Sequence

Client sends POST with long_url. API server validates the URL format, checks rate limits, generates a Snowflake ID, encodes it to Base62, inserts the mapping into the primary database, writes through to Redis cache, and returns the short URL. Total write latency: 10-50ms depending on database location. The Redis write-through ensures the URL is immediately available for redirect lookups even before replication to read replicas completes.

Replication Lag on the Read Path

If a user creates a short URL and immediately shares it, the first redirect might hit a read replica that has not yet received the new record (replication lag). The redirect fails with 404. Solution: write-through caching. When the API server creates the URL, it also writes the mapping to Redis. The redirect server checks Redis first (which has the data immediately) before falling back to the database. This eliminates the replication lag window for new URLs.

The stateless design of both API servers and redirect servers means horizontal scaling is straightforward: add more instances behind the load balancer. The ID generator is either embedded (each API server has its own Snowflake machine ID) or a lightweight separate service. The only stateful components are the database and Redis, both of which have well-understood scaling patterns (sharding for the database, clustering for Redis).

Why Separate Write and Read Servers

While you could use the same server fleet for both writes and reads, separating them allows independent scaling and optimization. Redirect servers are CPU-light and I/O-bound (cache lookup + 301 response) — you can run thousands per instance. API servers do more work (validation, ID generation, database writes) and need more CPU. Separating also means a spike in URL creation never impacts redirect latency.

The Redirect Flow — 301 vs 302

The redirect HTTP status code is the single most consequential design decision in a URL shortener, yet many candidates gloss over it. The choice between 301 (Moved Permanently) and 302 (Found / Temporary Redirect) has profound implications for caching behavior, SEO juice transfer, analytics accuracy, and infrastructure costs. Understanding this trade-off separates senior engineers from juniors in interviews.

Property	301 Moved Permanently	302 Found (Temporary)
Browser caching	Browser caches the redirect indefinitely — subsequent visits bypass our servers entirely	Browser does NOT cache — every visit hits our servers
CDN caching	CDN caches the redirect at edge — global acceleration	CDN does not cache by default — origin receives every request
SEO impact	Passes ~90-99% of link equity (PageRank) to destination URL	Passes little to no link equity — search engines may index the short URL instead
Analytics accuracy	Only the first click is tracked — subsequent clicks never reach our servers	Every click is tracked — perfect analytics accuracy
Server load	Dramatically lower — popular URLs are served entirely from browser/CDN cache	Higher — every click generates a server request
Link mutability	Cannot change destination after browser caches it (until cache expires)	Can change destination at any time — users always get the latest target
Infrastructure cost	Lower — fewer requests reach origin	Higher — all requests reach origin

sequenceDiagram
    participant B as Browser
    participant C as CDN Edge
    participant S as Redirect Server
    participant R as Redis Cache
    participant D as Database

    Note over B,D: First Click (301 or 302 - same flow)
    B->>C: GET /a7Bx3q
    C->>S: Cache MISS - forward to origin
    S->>R: GET short:a7Bx3q
    R-->>S: https://example.com/long-url
    S-->>C: 301 Location: https://example.com/long-url
    C-->>B: 301 Location: https://example.com/long-url
    Note over B: Browser follows redirect

    Note over B,D: Subsequent Clicks with 301
    B->>B: Cached! Redirect directly
    Note over B: Never contacts server again

    Note over B,D: Subsequent Clicks with 302
    B->>C: GET /a7Bx3q (every time)
    C->>S: Forward to origin (every time)
    S->>R: GET short:a7Bx3q
    R-->>S: https://example.com/long-url
    S-->>C: 302 Location: https://example.com/long-url
    C-->>B: 302 Location: https://example.com/long-url

With 301, only the first request reaches your servers — all subsequent requests are served from browser cache. With 302, every request hits your infrastructure.

The Practical Answer for Interviews

Most production URL shorteners use 302 (temporary redirect) as the default because analytics accuracy is their primary value proposition. bit.ly, TinyURL, and t.co all use 302 to ensure every click is counted. However, offer the interviewer a nuanced take: you could use 301 for links where the creator does not need analytics (reducing server load by 95%+), and 302 for links where click tracking is enabled. This shows you understand the trade-off rather than just memorizing one answer.

The 301 Trap: You Cannot Undo It

Once a browser caches a 301 redirect, there is no way to update it from the server side. If a user creates a short URL with a typo in the destination and you used 301, some browsers will redirect to the wrong URL forever (or until the user manually clears their browser cache). This is why 302 is the safer default. If you do use 301, add a Cache-Control: max-age=86400 header to limit browser caching to 24 hours rather than forever.

SEO considerations

301 passes link equity — Search engines treat 301 as a permanent signal that the destination URL should inherit the short URL's backlinks and ranking power. This is why marketers prefer 301 for SEO campaigns.
302 may dilute rankings — Search engines may index the short URL itself instead of the destination, splitting link equity between two URLs. Google has gotten better at handling this, but it remains a risk.
rel=canonical on destination — The destination page should include a rel=canonical tag pointing to itself. This signals to search engines that the destination (not the short URL) is the authoritative version.
Meta refresh fallback — Some URL shorteners include a tiny HTML page with a meta refresh tag as a fallback for clients that do not follow HTTP redirects. This is rare but improves compatibility.

In practice, the redirect server does very little work: it receives the request, extracts the short code from the URL path, checks Redis for the mapping, returns a 301 or 302 with the Location header, and optionally fires an async analytics event. The entire operation takes 1-5ms when the cache is hit. This simplicity is why a single redirect server can handle 50,000+ requests per second — the response is just a tiny HTTP header with no body.

Decision Framework

Use 302 when: analytics matter, destination might change, you want control over every request. Use 301 when: you want to minimize server load, SEO juice transfer is critical, and the destination is permanent. In an interview, always default to 302 and explain why, then mention 301 as an optimization for specific use cases.

Caching & Read Optimization

Caching is the most impactful optimization in a URL shortener. The read path handles 100x more traffic than the write path, and URL mappings are immutable once created (the short code always maps to the same long URL). This makes URL shorteners an ideal caching workload: high read volume, small record sizes, and virtually no cache invalidation needed. A well-tuned Redis cache can absorb 95%+ of all redirect traffic, reducing database load by 20x.

The 80/20 Rule for URL Access

URL access follows a power-law distribution: approximately 20% of short URLs account for 80%+ of all redirect traffic. A viral tweet link might receive 10 million clicks, while most URLs are clicked fewer than 10 times. This extreme skew means even a relatively small cache (holding only the top 20% of URLs) can achieve a 80%+ cache hit rate. In practice, production systems report 95-99% cache hit rates.

Cache-aside (lazy loading) pattern for redirects

→

Redirect server receives GET /{short_code} request.

→

Check Redis: GET cache:short:{short_code}. If cache HIT, return the cached long URL as a 301/302 redirect immediately (1-2ms latency).

→

If cache MISS, query the database read replica: SELECT long_url FROM urls WHERE short_code = ? AND is_active = true AND (expires_at IS NULL OR expires_at > NOW()).

→

If database returns a result, write it to Redis with a TTL: SET cache:short:{short_code} {long_url} EX 86400 (24-hour TTL).

→

Return the long URL as a 301/302 redirect.

If neither cache nor database has the short code, return 404 Not Found.

Redirect server receives GET /{short_code} request.

Check Redis: GET cache:short:{short_code}. If cache HIT, return the cached long URL as a 301/302 redirect immediately (1-2ms latency).

If cache MISS, query the database read replica: SELECT long_url FROM urls WHERE short_code = ? AND is_active = true AND (expires_at IS NULL OR expires_at > NOW()).

If database returns a result, write it to Redis with a TTL: SET cache:short:{short_code} {long_url} EX 86400 (24-hour TTL).

Return the long URL as a 301/302 redirect.

If neither cache nor database has the short code, return 404 Not Found.

Write-Through Caching for New URLs

In addition to cache-aside on the read path, implement write-through caching on the write path. When a new URL is created, the API server writes the mapping to both the database AND Redis simultaneously. This ensures the URL is immediately available for redirects without waiting for the first cache miss. This eliminates the replication lag problem (new URL is in Redis before the read replica gets the write) and warms the cache proactively for URLs that are likely to be clicked immediately after creation.

Metric	Value	Calculation
Total unique URLs (5 years)	~182 billion	100M/day x 365 x 5
Hot URLs (20% of daily)	20 million	100M x 20%
Average record size in cache	~250 bytes	Key (20B) + long URL (200B) + overhead (30B)
Hot set memory	~5 GB	20M x 250 bytes
Full day cache (100M URLs)	~25 GB	100M x 250 bytes
Redis cluster recommendation	3 nodes x 32 GB	Fits full day + headroom for peak
Target cache hit rate	95-99%	Power-law distribution of URL access
Cache hit latency	< 1ms	Redis in-memory lookup with persistent connections
Cache miss latency (DB fallback)	5-20ms	Network round-trip + index lookup on read replica

Cache Stampede on Popular URLs

When a hot URL's cache entry expires, hundreds of concurrent redirect requests simultaneously find a cache miss and all query the database for the same short code. This thundering herd can overload the database. Prevention strategies: (1) Add jitter to TTLs so entries do not expire at the same time (TTL = 86400 + random(0, 3600)). (2) Use Redis SETNX-based locking: only one request queries the database while others wait or serve stale data. (3) Probabilistic early recomputation: each request has a small chance of refreshing the cache before TTL expires.

Advanced caching techniques

Cache warming on deploy — When deploying new redirect servers with cold local caches, pre-populate Redis with the top 1,000 most-clicked URLs from analytics data. This prevents a storm of cache misses during the first minutes after deployment.
Negative caching — Cache 404 results too: SET cache:short:{invalid_code} "NOT_FOUND" EX 300. Without this, attackers can DDoS your database by requesting millions of non-existent short codes, all of which are cache misses.
Multi-tier caching — Use a local in-process cache (e.g., Caffeine in Java, lru-cache in Node.js) as L1 in front of Redis as L2. The L1 cache holds the top 10,000 URLs per server with a 60-second TTL. This eliminates the Redis network hop for the hottest URLs.
Redis Cluster sharding — Shard Redis by short_code hash across a cluster of nodes. This distributes memory and throughput. With 6 nodes (3 primaries + 3 replicas), you get ~3x the throughput and memory of a single node, plus automatic failover.
Consistent hashing for cache routing — Use consistent hashing to route cache requests so that a given short_code always goes to the same Redis node. This maximizes cache hit rates by avoiding duplicate entries across nodes.

Cache Invalidation Is (Almost) Free Here

URL shortener mappings are effectively immutable: once short code "a7Bx3q" maps to a long URL, that mapping never changes (the URL might be deactivated but not remapped). This means cache invalidation — usually the hardest problem in caching — is trivial. You only need to invalidate on two events: (1) URL deactivation/deletion, and (2) URL expiration. Both are rare. This is why URL shorteners achieve cache hit rates that other systems can only dream of.

Analytics & Rate Limiting

Analytics and rate limiting are the two supporting pillars that transform a simple redirect service into a production-grade platform. Analytics provide the business value that justifies the service (click data drives marketing decisions), while rate limiting protects the system from abuse. Both must be implemented without adding latency to the critical redirect path.

Async Analytics — Never Block the Redirect

The cardinal rule of URL shortener analytics: never let data collection add latency to the redirect response. The redirect server fires and forgets an analytics event (to Kafka, SQS, or a local buffer) and immediately returns the 301/302 response. Analytics processing happens asynchronously in a separate pipeline. If the analytics system goes down, redirects continue working. If redirects go down, the entire service is broken. Prioritize accordingly.

graph LR
    subgraph "Redirect Hot Path (< 5ms)"
        RS["Redirect Server"] -->|"1. Lookup"| Redis["Redis Cache"]
        RS -->|"2. Return 302"| Client["Client"]
    end

    subgraph "Analytics Cold Path (async)"
        RS -->|"3. Fire & forget"| Kafka["Kafka Topic<br/>click-events"]
        Kafka --> Consumer["Analytics Consumer"]
        Consumer --> ClickDB["ClickHouse / TimescaleDB<br/>Click Analytics Store"]
        Consumer --> GeoIP["GeoIP Enrichment"]
        Consumer --> Agg["Real-time Aggregator<br/>Flink / Spark Streaming"]
        Agg --> Dashboard["Analytics Dashboard<br/>Redis counters + time-series"]
    end

    style RS fill:#3b82f6,stroke:#333,color:#fff
    style Redis fill:#ef4444,stroke:#333,color:#fff
    style Kafka fill:#f59e0b,stroke:#333,color:#fff
    style ClickDB fill:#10b981,stroke:#333,color:#fff

The redirect hot path returns in under 5ms. Analytics events flow asynchronously through Kafka to a separate analytics store.

Data captured per click event

short_code — Which URL was clicked.
timestamp — Millisecond-precision click time for time-series analysis.
IP address (hashed) — Used for GeoIP lookup (country, city, ISP). Hash the IP for privacy compliance (GDPR).
User-Agent — Parsed into device type (mobile/desktop/tablet), browser, and OS.
Referer header — Where the click came from (Twitter, email, direct, etc.). Critical for marketing attribution.
Accept-Language — User language preference for geographic segmentation.

Analytics Storage: ClickHouse or TimescaleDB

Click events are append-only time-series data with high cardinality (millions of distinct short_codes). ClickHouse is the optimal storage engine: it handles 1M+ inserts/second per node, compresses columnar data 10-20x, and executes analytical queries (GROUP BY short_code, date) in milliseconds. TimescaleDB (PostgreSQL extension) is a good alternative if your team already operates PostgreSQL. Avoid storing raw click events in your primary URL database — the write volume would overwhelm it.

Rate limiting protects the system from three types of abuse: creation spam (bots creating millions of URLs to host phishing or malware links), redirect abuse (DDoS attacks against the redirect infrastructure), and analytics pollution (fake clicks to inflate or manipulate analytics data). Each requires a different rate limiting strategy.

Abuse Type	Rate Limit Strategy	Implementation
URL creation spam	Per-API-key: 100 URLs/hour for free tier, 10,000/hour for paid	Redis sliding window counter: INCR rate:{api_key}:{hour} with EXPIRE
Redirect DDoS	Per-IP: 1,000 redirects/minute	Redis token bucket per IP. Block at load balancer (WAF) for known bot IPs
Automated enumeration	Per-IP: 100 unique short codes/minute	Track distinct short_codes per IP in a Redis HyperLogLog. Alert on sequential access patterns
Analytics manipulation	Per-IP per short_code: 10 clicks/hour deduplicated	Bloom filter per short_code per hour. Duplicate clicks increment a "filtered" counter but not the real one
Malware URL creation	Async URL scanning on creation	Google Safe Browsing API check. Quarantine flagged URLs for manual review

Rate Limiting Must Not Block Legitimate Traffic

An overly aggressive rate limiter can block legitimate viral traffic. If a URL goes viral on social media, thousands of unique users may click it from the same corporate network (sharing one IP). Use a combination of signals — IP + User-Agent + cookie — rather than IP alone. When rate limits trigger, return 429 Too Many Requests with a Retry-After header rather than silently dropping requests or returning errors that look like the destination is broken.

Real-Time Click Counter Pattern

For the click count displayed on the analytics dashboard, maintain a Redis counter per short_code (INCR clicks:{short_code}). This counter is incremented by the analytics consumer, not the redirect server. Periodically (every 5 minutes), a background job writes the Redis counter value back to the primary database (UPDATE urls SET click_count = ? WHERE short_code = ?). This gives near-real-time counts without adding write load to the primary database on every redirect.

The analytics pipeline should also detect and flag suspicious patterns: sudden spikes in creation volume from a single API key (potential spam campaign), URLs whose destinations appear on phishing blocklists, and click patterns that resemble bot traffic (identical User-Agent strings, evenly-spaced timestamps, sequential IP addresses). These signals feed into an abuse detection system that can automatically quarantine suspicious URLs.

Scaling to Billions of URLs

At the scale of billions of URLs and tens of billions of monthly redirects, every component of the system faces unique scaling challenges. The strategies that work at millions of URLs (single database, single Redis instance, single data center) break down and must be replaced with distributed alternatives. This section covers the four critical scaling dimensions: database sharding, global distribution, TTL and garbage collection, and operational considerations for extreme scale.

Database sharding strategies

Hash-based sharding on short_code — Compute shard = hash(short_code) % N. Every redirect request goes to exactly one shard — no scatter-gather. This is the default recommendation for URL shorteners because the primary access pattern is point lookups by short_code.
Range-based sharding by creation time — New URLs go to the newest shard; older shards become read-only. Pro: old shards can be moved to cheaper storage. Con: the newest shard is always the write hotspot, and redirect requests must check the time-based shard map to find the right shard.
Consistent hashing for elastic scaling — Use a consistent hash ring to distribute short_codes across shards. Adding a shard moves only 1/N of the data. This is the best option for systems that need to scale incrementally without planned downtime.
Virtual sharding with DynamoDB — Use DynamoDB with short_code as the partition key. DynamoDB automatically distributes data across partitions and rebalances as the table grows. At 90 TB, DynamoDB may have 10,000+ physical partitions, all managed transparently.

Global Distribution for Low Latency

A user in Singapore redirecting through a data center in Virginia experiences ~200ms of latency just from the network round trip. For a service where latency is the product (users perceive redirect latency as the destination site being slow), global distribution is essential. Deploy redirect servers and Redis caches in multiple regions (US-East, EU-West, AP-Southeast). Use GeoDNS (Route 53 latency-based routing or Cloudflare load balancing) to route users to the nearest region. The database can remain in a single region with cross-region read replicas, since the cache handles 95%+ of reads.

graph TB
    subgraph "Global Distribution"
        DNS["GeoDNS<br/>Route to nearest region"] --> US["US-East Region"]
        DNS --> EU["EU-West Region"]
        DNS --> AP["AP-Southeast Region"]

        subgraph US["US-East (Primary)"]
            US_LB["LB"] --> US_RS["Redirect Servers"]
            US_RS --> US_Redis["Redis Cluster"]
            US_Redis --> US_DB["Primary DB<br/>(writes + reads)"]
        end

        subgraph EU["EU-West"]
            EU_LB["LB"] --> EU_RS["Redirect Servers"]
            EU_RS --> EU_Redis["Redis Cluster"]
            EU_Redis --> EU_DB["Read Replica"]
        end

        subgraph AP["AP-Southeast"]
            AP_LB["LB"] --> AP_RS["Redirect Servers"]
            AP_RS --> AP_Redis["Redis Cluster"]
            AP_Redis --> AP_DB["Read Replica"]
        end

        US_DB -.->|"Async replication"| EU_DB
        US_DB -.->|"Async replication"| AP_DB
    end

    style DNS fill:#f59e0b,stroke:#333,color:#fff
    style US_Redis fill:#ef4444,stroke:#333,color:#fff
    style EU_Redis fill:#ef4444,stroke:#333,color:#fff
    style AP_Redis fill:#ef4444,stroke:#333,color:#fff
    style US_DB fill:#3b82f6,stroke:#333,color:#fff

Global distribution with GeoDNS routing, regional Redis caches, and cross-region database replication. Writes go to US-East primary; reads are served locally in each region.

Cross-Region Write Conflict

If you accept URL creation in multiple regions, two users in different regions could request the same custom alias simultaneously. Without cross-region coordination, both writes succeed locally but conflict during replication. Solutions: (1) Route all writes to a single primary region (simplest — adds ~100ms latency for remote creators, which is acceptable for writes). (2) Use a conflict-free replicated data type (CRDT) where the first-writer-wins based on timestamp. (3) Reserve alias namespaces per region. Option 1 is the standard recommendation because writes are 100x less frequent than reads.

TTL and garbage collection become critical at billion-URL scale. Without expiration, the database grows by 18 TB per year and eventually becomes unmanageable. Even with sharding, each shard accumulates data that may never be accessed again. A garbage collection strategy is essential for long-term operational health.

TTL-based garbage collection pipeline

→

Set a default TTL of 2 years for all URLs unless the creator specifies otherwise. This covers the vast majority of use cases (marketing campaigns, social media posts) while preventing unbounded growth.

→

Run a nightly garbage collection job that scans for expired URLs: SELECT short_code FROM urls WHERE expires_at < NOW() AND is_active = true LIMIT 10000. Process in batches to avoid locking.

→

For each expired URL batch: mark as inactive (UPDATE urls SET is_active = false WHERE short_code IN (...)), delete from Redis cache (DEL cache:short:{code}), and log the deactivation for audit.

→

Monthly archival job: move inactive URLs older than 6 months to cold storage (S3 + Athena for occasional queries). Delete from the primary database to reclaim space.

→

Reclaim short codes: after archival, the Base62 code becomes available for reuse. Maintain a reclaimed codes pool in Redis that the ID generator can draw from.

Monitor: track the ratio of active to expired URLs per shard. If a shard has > 80% expired data, trigger a compaction or rebalance to consolidate active data onto fewer shards.

Run a nightly garbage collection job that scans for expired URLs: SELECT short_code FROM urls WHERE expires_at < NOW() AND is_active = true LIMIT 10000. Process in batches to avoid locking.

For each expired URL batch: mark as inactive (UPDATE urls SET is_active = false WHERE short_code IN (...)), delete from Redis cache (DEL cache:short:{code}), and log the deactivation for audit.

Monthly archival job: move inactive URLs older than 6 months to cold storage (S3 + Athena for occasional queries). Delete from the primary database to reclaim space.

Reclaim short codes: after archival, the Base62 code becomes available for reuse. Maintain a reclaimed codes pool in Redis that the ID generator can draw from.

Monitor: track the ratio of active to expired URLs per shard. If a shard has > 80% expired data, trigger a compaction or rebalance to consolidate active data onto fewer shards.

Operational Monitoring at Billion-URL Scale

Key metrics to monitor: (1) Cache hit rate by region — drop below 90% triggers investigation. (2) p99 redirect latency by region — spike above 50ms triggers alert. (3) Database replication lag — above 5 seconds means reads might return stale data. (4) Shard distribution skew — if one shard has 2x the data of others, trigger a rebalance. (5) Error rate by status code — spike in 404s may indicate a mass expiration event or an attack. (6) Rate limiter trigger rate — sudden increase may indicate a DDoS or a legitimate viral event (check the specific short codes).

Cost Optimization at Scale

At 90 TB of data and 600K reads/second, infrastructure costs become significant. Key optimizations: (1) Use reserved instances for the database and Redis (save 40-60% over on-demand). (2) Move cold data to S3 Standard-IA (save 45% on storage). (3) Use CDN for 301 redirects to reduce origin traffic by 50-80%. (4) Compress long URLs in the database (LZ4 compression saves 30-40% on TEXT columns). (5) Use spot instances for analytics processing (Kafka consumers, ClickHouse batch jobs). These optimizations can reduce total infrastructure cost by 40-50% at scale.

The final scaling consideration is operational: how do you perform database migrations, schema changes, and infrastructure upgrades without downtime on a system that handles billions of redirects? The answer is blue-green deployments for application servers, online schema changes (pt-online-schema-change for MySQL, pg_repack for PostgreSQL) for the database, and feature flags for gradual rollout of new redirect logic. Never perform a stop-the-world migration on a URL shortener — every minute of downtime breaks millions of links across the internet.

How this might come up in interviews

URL shortener design appears in almost every FAANG system design interview loop, either as the primary question or as a warm-up. It is the most common system design question at Google, Meta, and Amazon for L4-L5 candidates. Interviewers use it to calibrate breadth (can you identify all the components?) and depth (can you discuss hash collision math, cache sizing, 301 vs 302 trade-offs?). At L6+, expect follow-up questions about global distribution, analytics pipelines, and abuse prevention that push into staff-level territory.

Common questions:

L4: Design a basic URL shortener. Walk me through the API, database schema, and how you generate short codes. [Tests: basic CRUD design, understanding of hashing or encoding, simple schema design]
L4-L5: How would you handle the case where two users try to create the same custom alias simultaneously? What happens at the database level? [Tests: race conditions, UNIQUE constraints, optimistic vs pessimistic locking]
L5: Your URL shortener needs to handle 500K redirects per second. Walk me through your caching strategy and size the Redis cluster. [Tests: cache-aside pattern, hit rate estimation, memory sizing, hot key handling]
L5-L6: Compare 301 and 302 redirects for a URL shortener. When would you use each, and how does it affect your analytics pipeline and infrastructure costs? [Tests: HTTP semantics depth, trade-off analysis, system-level thinking]
L6: A single short URL goes viral and receives 1M clicks per second. How does your system handle this without degrading other URLs? [Tests: hot key mitigation, cache replication, request coalescing, CDN strategy]
L6-L7: Design the analytics pipeline for a URL shortener processing 10B clicks per day. How do you store, aggregate, and serve real-time click dashboards? [Tests: stream processing (Kafka/Flink), columnar storage (ClickHouse), time-series aggregation, approximate counting]

Key takeaways

A URL shortener is a distributed key-value store with a 100:1 read-to-write ratio. Every design decision — code generation, database choice, caching strategy — is driven by this extreme read skew. Optimize the read path first.
Use Snowflake-style counter-based ID generation (not hash truncation) for short codes at scale. It eliminates collisions by construction, supports distributed writes without coordination, and avoids the cascade failure mode that hash collision retries create under burst load.
The 301 vs 302 redirect decision is the most consequential trade-off: 302 enables analytics (every click tracked) but costs more infrastructure; 301 reduces server load by 50-80% but makes click tracking impossible after the first visit. Default to 302 and explain why.
Redis caching is the critical scaling lever. The power-law distribution of URL access (20% of URLs receive 80%+ of clicks) means a modestly-sized cache (25-50 GB) can absorb 95%+ of all redirect traffic, reducing database load by 20x.
Separate the write path (URL creation) from the read path (redirects) in both architecture and infrastructure. They have different scaling characteristics, different latency requirements, and must be independently scalable so that a write-side problem never degrades redirect performance.

Before you move on: can you answer these?

Why is 302 the default redirect status for most URL shorteners instead of 301, and what would change if you used 301?

Most URL shorteners use 302 (temporary redirect) because it ensures every click reaches the server, enabling accurate analytics tracking. With 301 (permanent redirect), browsers and CDNs cache the redirect and subsequent visits bypass the server entirely, making click counting impossible. However, 301 reduces server load by 50-80% for popular URLs and passes more SEO link equity to the destination URL. The best approach is configurable: 302 by default for analytics-enabled links, 301 optional for links where traffic reduction is more important than click tracking.

How does the birthday paradox affect MD5 hash truncation for short code generation, and at what scale does it become a problem?

The birthday paradox states that in a set of n randomly chosen values from a space of N possibilities, collisions become probable when n approaches sqrt(N). For 7-character Base62 codes (62^7 = 3.52 trillion possibilities), collisions become likely after approximately sqrt(3.52 trillion) = 1.88 million URLs. At 100M new URLs per day, this means collisions will occur within the first hour of operation. Each collision requires a retry (rehash with salt or append counter), which adds a database round-trip. Under burst load, cascading retries can overwhelm the database — this is exactly what caused the bit.ly 2016 outage.

You need to support 600,000 redirect requests per second at peak. Walk through the caching math that makes this feasible.

At 600K redirects/second, the database alone cannot handle the load (PostgreSQL maxes at ~50K reads/second per node with read replicas). The cache must absorb 95%+ of reads. With the 80/20 rule, caching the top 20% of URLs (20M daily URLs = 4M hot entries x 250 bytes = ~1 GB) achieves 80%+ hit rate. Caching a full day of URLs (25 GB in Redis) pushes hit rate to 95%+. At 95% hit rate, only 30K requests/second reach the database — well within capacity for 2-3 read replicas. Redis handles 100K+ operations/second per node, so a 6-node cluster comfortably serves 600K reads/second. Total Redis memory: 3 primary nodes x 32 GB = 96 GB, costing approximately $500/month on AWS.

🧠Mental Model

💡 Analogy

A URL shortener works like the post office PO Box system. Every PO Box has a short, easy-to-remember number (like "Box 4217") that maps to a real physical address where mail gets forwarded. When someone sends a letter to PO Box 4217, the post office looks up the real address and routes the letter there. The PO Box number is much shorter than "1742 Oakwood Drive, Apartment 3B, Building C" — just like a short URL is easier to share than a 200-character URL with query parameters. The post office maintains a registry (our database), has a fast lookup desk for frequent deliveries (our Redis cache), and can assign new box numbers to new customers without conflicting with existing ones (our ID generator). When a PO Box expires because the customer stopped paying, the number can be recycled (our TTL and garbage collection).

⚡ Core Idea

A URL shortener is fundamentally a distributed key-value store with an extremely read-heavy workload (100:1 read-to-write ratio). The short code is the key, the long URL is the value, and the HTTP redirect is the lookup operation. Every design decision — from the code generation algorithm to the caching strategy to the database choice — is driven by this simple access pattern and extreme read skew. The system succeeds when 95%+ of reads are served from cache in under 1ms, and fails when hot URLs overwhelm a single database shard.

🎯 Why It Matters

URL shorteners appear in system design interviews more than almost any other topic because they compress a remarkable number of distributed systems concepts into a single, easily understood service. In 45 minutes, a candidate must demonstrate competency in API design, hashing algorithms, database schema design, caching strategies, CDN integration, analytics pipelines, rate limiting, and global distribution. The simplicity of the product (redirect a URL) contrasts with the complexity of doing it at scale (billions of redirects per day across multiple continents), which makes it the perfect vehicle for testing engineering depth.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

Designing a URL Shortener at Scale

Why URL Shorteners Matter at Scale

Requirements Gathering

Short URL Generation Strategies

Database Schema & Storage Design

High-Level Architecture

The Redirect Flow — 301 vs 302

Caching & Read Optimization

Analytics & Rate Limiting

Scaling to Billions of URLs

Discussion

In-app Q&A

Designing a URL Shortener at Scale

Why URL Shorteners Matter at Scale

Requirements Gathering

Short URL Generation Strategies

Database Schema & Storage Design

High-Level Architecture

The Redirect Flow — 301 vs 302

Caching & Read Optimization

Analytics & Rate Limiting

Scaling to Billions of URLs

Discussion

In-app Q&A