Skip to main content
Career Paths
Concepts
Distributed Id Generator
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Designing a distributed ID generator

Strategies for generating globally unique, often sortable identifiers across a distributed system — covering Snowflake IDs, UUIDs, ULIDs, MongoDB ObjectIDs, and database ticket servers with deep analysis of bit layouts, clock skew, B-tree fragmentation, and epoch exhaustion.

🎯Key Takeaways
Snowflake-style IDs (1+41+10+12 bits) are the industry standard for high-throughput systems: they produce compact 64-bit integers that are time-sorted, locally generated, and collision-free across 1,024 nodes at 4,096 IDs/ms/node.
UUID v7 (RFC 9562) should replace UUID v4 in all new systems — it provides the same 128-bit format and zero-coordination generation but with time-ordering that eliminates B-tree index fragmentation, improving insert throughput by 5-10x at scale.
Clock skew is the fundamental enemy of time-based IDs. Configure NTP/chrony for slew-only mode, persist the timestamp high-water mark to disk, and monitor cross-node clock offsets. Never silently generate IDs with a backward timestamp.
Machine ID assignment requires deliberate coordination — whether through ZooKeeper leases, database shard numbers (Instagram), or infrastructure metadata. Failing to ensure unique machine IDs makes Snowflake collision-prone during infrastructure events.
Storage impact compounds across every foreign key and index: choosing 64-bit Snowflake over 128-bit UUID v4 halves your index sizes for every table that references the ID, and eliminates fragmentation — a decision that saves terabytes at billion-row scale.

Designing a distributed ID generator

Strategies for generating globally unique, often sortable identifiers across a distributed system — covering Snowflake IDs, UUIDs, ULIDs, MongoDB ObjectIDs, and database ticket servers with deep analysis of bit layouts, clock skew, B-tree fragmentation, and epoch exhaustion.

~21 min read
Be the first to complete!
What you'll learn
  • Snowflake-style IDs (1+41+10+12 bits) are the industry standard for high-throughput systems: they produce compact 64-bit integers that are time-sorted, locally generated, and collision-free across 1,024 nodes at 4,096 IDs/ms/node.
  • UUID v7 (RFC 9562) should replace UUID v4 in all new systems — it provides the same 128-bit format and zero-coordination generation but with time-ordering that eliminates B-tree index fragmentation, improving insert throughput by 5-10x at scale.
  • Clock skew is the fundamental enemy of time-based IDs. Configure NTP/chrony for slew-only mode, persist the timestamp high-water mark to disk, and monitor cross-node clock offsets. Never silently generate IDs with a backward timestamp.
  • Machine ID assignment requires deliberate coordination — whether through ZooKeeper leases, database shard numbers (Instagram), or infrastructure metadata. Failing to ensure unique machine IDs makes Snowflake collision-prone during infrastructure events.
  • Storage impact compounds across every foreign key and index: choosing 64-bit Snowflake over 128-bit UUID v4 halves your index sizes for every table that references the ID, and eliminates fragmentation — a decision that saves terabytes at billion-row scale.

Lesson outline

Why distributed ID generation is hard

Every record in a database needs a primary key. On a single machine, an auto-incrementing integer works perfectly. But the moment you have two or more machines writing concurrently, you face a fundamental coordination problem: how do two nodes independently produce IDs that never collide, ideally preserve rough time ordering, and fit efficiently into indexes?

This problem appears everywhere in system design interviews because it sits at the intersection of distributed systems fundamentals — clock synchronization, coordination overhead, partition tolerance — and very practical database performance concerns like index locality and storage size.

Where ID Generation Shows Up

Every service that writes data needs IDs: user accounts, messages, orders, events, logs, media uploads, payment transactions. At FAANG scale (billions of writes per day), ID generation becomes a critical infrastructure service. Twitter processes 400M+ tweets/day. Discord generates IDs for every message across millions of channels. Instagram needed unique IDs across dozens of database shards.

Core requirements for distributed IDs

  • Global uniqueness — No two nodes may ever produce the same ID, even during network partitions or clock drift.
  • Rough time ordering (k-sortability) — Newer IDs should generally be larger than older IDs. This enables efficient range queries and natural chronological ordering without a separate timestamp column.
  • High throughput — Individual nodes must generate thousands to millions of IDs per second without coordination.
  • Low latency — ID generation should be a local operation — no network round-trips to a central authority on the hot path.
  • Compact size — Smaller IDs mean smaller indexes, faster joins, and less memory consumption. A 64-bit ID uses half the space of a 128-bit UUID in every index entry.
  • No single point of failure — The system must continue generating IDs even if coordination services are temporarily unreachable.

The ID Trilemma

You can optimize for any two of: (1) compact size, (2) no coordination, (3) strict ordering. UUIDs choose no-coordination + uniqueness but sacrifice compactness and ordering. Auto-increment chooses compactness + ordering but requires coordination. Snowflake-style IDs approximate all three by tolerating "rough" ordering within a clock-skew window.

Snowflake IDs — the industry standard bit layout

Twitter's Snowflake (open-sourced in 2010) established the template that most large-scale systems now follow. The core insight is to pack a timestamp, a machine identifier, and a per-machine sequence counter into a single 64-bit integer. Because the timestamp occupies the most significant bits, IDs are naturally time-sorted when compared as integers.

graph LR
    subgraph "64-bit Snowflake ID"
        A["0
1 bit
Unused
(sign)"] --> B["Timestamp
41 bits
ms since epoch"]
        B --> C["Machine ID
10 bits
1024 nodes"]
        C --> D["Sequence
12 bits
4096/ms/node"]
    end

    style A fill:#e2e8f0,stroke:#333
    style B fill:#3b82f6,stroke:#333,color:#fff
    style C fill:#f59e0b,stroke:#333,color:#fff
    style D fill:#10b981,stroke:#333,color:#fff

Snowflake bit layout: 1 unused sign bit + 41 timestamp bits + 10 machine bits + 12 sequence bits = 64 bits total.

FieldBitsRangePractical meaning
Sign bit1Always 0Keeps the ID positive in signed 64-bit languages (Java long, JavaScript safe int boundary)
Timestamp410 to 2^41 - 1 ms~69.7 years from custom epoch. Twitter uses epoch 1288834974657 (Nov 4, 2010). At the default epoch this exhausts around year 2079.
Machine ID100 to 10231,024 unique worker nodes. Can be split into 5-bit datacenter + 5-bit machine (32 DCs x 32 machines).
Sequence120 to 40954,096 IDs per millisecond per node. At 1024 nodes, that is 4,096 x 1,024 = 4.19M IDs/ms system-wide.

Epoch Math You Should Know

41 bits of milliseconds = 2^41 ms = 2,199,023,255,552 ms = ~69.7 years. If your custom epoch is January 1, 2020, IDs exhaust around September 2089. Always choose a recent custom epoch to maximize your runway. Twitter chose November 4, 2010 (their internal launch date), giving them until ~2080.

How a Snowflake node generates an ID

→

01

Read the current timestamp in milliseconds from the system clock.

→

02

If the timestamp equals the last timestamp, increment the sequence counter. If the sequence overflows (hits 4096), spin-wait until the next millisecond.

→

03

If the timestamp is greater than the last timestamp, reset the sequence counter to 0.

→

04

If the timestamp is less than the last timestamp (clock went backward), either reject the request, wait for the clock to catch up, or use the last known timestamp with an incremented sequence — depending on your tolerance policy.

→

05

Combine: (timestamp - custom_epoch) << 22 | machine_id << 12 | sequence.

06

Return the 64-bit integer.

1

Read the current timestamp in milliseconds from the system clock.

2

If the timestamp equals the last timestamp, increment the sequence counter. If the sequence overflows (hits 4096), spin-wait until the next millisecond.

3

If the timestamp is greater than the last timestamp, reset the sequence counter to 0.

4

If the timestamp is less than the last timestamp (clock went backward), either reject the request, wait for the clock to catch up, or use the last known timestamp with an incremented sequence — depending on your tolerance policy.

5

Combine: (timestamp - custom_epoch) << 22 | machine_id << 12 | sequence.

6

Return the 64-bit integer.

Sequence Overflow Under Burst Traffic

4,096 IDs per millisecond sounds like a lot, but a batch insert of 10,000 rows from a single service hits this in ~2.5 ms. When the sequence overflows, the node must spin-wait until the next millisecond. This introduces latency spikes. Discord solved this by batching ID pre-generation: a background thread pre-allocates blocks of IDs so the hot path never blocks.

UUID variants — v4 random vs v7 time-ordered (RFC 9562)

UUIDs are 128-bit identifiers standardized across every language and database. UUID v4 (random) has been the default for years, but its randomness causes severe B-tree index fragmentation. The new UUID v7 (RFC 9562, published May 2024) fixes this by placing a Unix timestamp in the most significant bits, making UUIDs time-sortable while retaining the 128-bit format.

PropertyUUID v4UUID v7 (RFC 9562)
Size128 bits (36 chars with hyphens)128 bits (36 chars with hyphens)
SortabilityRandom — no time orderingTime-ordered — encodes Unix timestamp in ms in the high bits
Uniqueness guarantee122 random bits — collision probability ~1 in 2^61 after 2.7 x 10^18 IDs48-bit timestamp + 74 random bits — collision requires same millisecond + random match
Index performancePoor — random inserts scatter across B-tree pages causing ~50% page splitsExcellent — monotonically increasing prefix means append-only inserts into rightmost leaf page
Database supportNative in all databasesPostgreSQL 17+ gen_random_uuid_v7(), MySQL 8.0.33+ UUID_TO_BIN(UUID()) with swap flag
Cross-languageEvery language has UUID v4Growing adoption: Java uuid-creator, Python uuid7, Go google/uuid v1.6+
CoordinationNone requiredNone required — uses local clock + random bits
graph LR
    subgraph "UUID v7 Layout (128 bits)"
        A["unix_ts_ms
48 bits"] --> B["ver
4 bits
0111"]
        B --> C["rand_a
12 bits"]
        C --> D["var
2 bits
10"]
        D --> E["rand_b
62 bits"]
    end

    style A fill:#3b82f6,stroke:#333,color:#fff
    style B fill:#8b5cf6,stroke:#333,color:#fff
    style C fill:#10b981,stroke:#333,color:#fff
    style D fill:#8b5cf6,stroke:#333,color:#fff
    style E fill:#10b981,stroke:#333,color:#fff

UUID v7 bit layout: 48-bit ms timestamp + 4-bit version + 12 random bits + 2-bit variant + 62 random bits. The timestamp prefix makes these sortable.

The UUID v4 index fragmentation problem

A B-tree index on a UUID v4 primary key has random insert targets. Each new row goes to a random leaf page. Once the index exceeds RAM, every insert triggers a random disk read to fetch the target page, then a write. With 100M rows, insert throughput can drop 10-50x compared to sequential IDs. This is the single biggest reason teams migrate away from UUID v4. Benchmarks on PostgreSQL show UUID v4 inserts at 3,200 rows/sec vs auto-increment at 28,000 rows/sec on the same hardware with a 500M row table.

Migrating from UUID v4 to v7

You do not need to change your column type — both are 128-bit values stored in the same format. Generate new IDs as v7 going forward. Old v4 IDs remain valid but will sort before or randomly among v7 IDs. For new tables, always prefer v7. For existing tables with heavy write loads and B-tree fragmentation, consider a background migration: generate v7 replacements, update foreign keys in batches, then swap.

ULID and MongoDB ObjectID — alternative time-prefixed schemes

ULID (Universally Unique Lexicographically Sortable Identifier) and MongoDB ObjectID predate UUID v7 and solve the same problem — time-ordered unique IDs without coordination — using different bit layouts.

PropertyULIDMongoDB ObjectID
Size128 bits (26 Crockford Base32 chars)96 bits (24 hex chars)
Timestamp48 bits — Unix ms (same as UUID v7)32 bits — Unix seconds (coarser granularity)
Randomness80 bits of cryptographic randomness40-bit random value + 24-bit incrementing counter
EncodingCrockford Base32: 01ARZ3NDEKTSV4RRFFQ69G5FAVHex: 507f1f77bcf86cd799439011
SortabilityLexicographic string sort = chronological orderHex sort = chronological order (timestamp is big-endian prefix)
MonotonicityOptional: within same ms, increment random bits to guarantee orderingCounter field guarantees ordering within same second on same process
Epoch exhaustionSame as UUID v7: ~8,900 years from Unix epoch32-bit seconds: overflows January 19, 2038 (Y2038 problem). MongoDB is migrating.
AdoptionPopular in event sourcing, Kafka, and TypeScript/Go ecosystemsDefault _id in MongoDB; used by millions of applications

MongoDB ObjectID Y2038 Risk

MongoDB ObjectIDs use a 32-bit Unix timestamp (seconds). This overflows on January 19, 2038 at 03:14:07 UTC. MongoDB 7.0+ has started internal work on extended ObjectIDs, but applications that parse ObjectIDs to extract timestamps need to plan for this transition. If you are designing a new system today and considering ObjectID format, prefer a 48-bit millisecond timestamp (ULID/UUID v7) instead.

ULID Monotonicity Trick

The ULID spec allows an implementation to detect when two ULIDs are generated in the same millisecond and increment the random portion by 1 instead of generating fresh random bits. This guarantees strict monotonic ordering within a process — useful for event sourcing where total order within a partition matters. The ulid/javascript and oklog/ulid (Go) libraries both support this mode.

Database ticket servers — the Flickr approach

Before Snowflake, Flickr solved distributed ID generation with a deceptively simple approach: two MySQL servers with auto-increment, each configured to produce only odd or even IDs. This "ticket server" pattern remains viable and is used at scale by companies that want simplicity over cleverness.

graph TB
    App1[App Server 1] --> LB[Round-Robin
Load Balancer]
    App2[App Server 2] --> LB
    App3[App Server 3] --> LB

    LB --> TS1["Ticket Server A
auto_increment_increment = 2
auto_increment_offset = 1
Produces: 1, 3, 5, 7..."]
    LB --> TS2["Ticket Server B
auto_increment_increment = 2
auto_increment_offset = 2
Produces: 2, 4, 6, 8..."]

    style TS1 fill:#3b82f6,stroke:#333,color:#fff
    style TS2 fill:#f59e0b,stroke:#333,color:#fff

Flickr ticket server pattern: two MySQL instances alternate between odd and even IDs. If one fails, the other continues producing unique IDs.

Setting up a ticket server pair

→

01

Create a dedicated MySQL instance A with auto_increment_increment=2, auto_increment_offset=1 (produces 1,3,5,7...).

→

02

Create a dedicated MySQL instance B with auto_increment_increment=2, auto_increment_offset=2 (produces 2,4,6,8...).

→

03

Create a simple table: CREATE TABLE tickets (id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, stub CHAR(1) NOT NULL DEFAULT "") ENGINE=InnoDB.

→

04

To get an ID: REPLACE INTO tickets (stub) VALUES ("a"); SELECT LAST_INSERT_ID();. The REPLACE reuses the single row to keep the table tiny.

→

05

Put a load balancer in front of both servers. If one goes down, the other still produces unique IDs (just all odd or all even).

06

To scale beyond two, use increment=N and offset=1..N for N ticket servers. Flickr used 2; Uber experimented with 4.

1

Create a dedicated MySQL instance A with auto_increment_increment=2, auto_increment_offset=1 (produces 1,3,5,7...).

2

Create a dedicated MySQL instance B with auto_increment_increment=2, auto_increment_offset=2 (produces 2,4,6,8...).

3

Create a simple table: CREATE TABLE tickets (id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, stub CHAR(1) NOT NULL DEFAULT "") ENGINE=InnoDB.

4

To get an ID: REPLACE INTO tickets (stub) VALUES ("a"); SELECT LAST_INSERT_ID();. The REPLACE reuses the single row to keep the table tiny.

5

Put a load balancer in front of both servers. If one goes down, the other still produces unique IDs (just all odd or all even).

6

To scale beyond two, use increment=N and offset=1..N for N ticket servers. Flickr used 2; Uber experimented with 4.

When Ticket Servers Beat Snowflake

Ticket servers produce strictly monotonic 64-bit integers with zero clock dependency. If your system has strong requirements for gap-free sequential IDs (financial transaction numbering, invoice IDs, regulatory audit trails), ticket servers are simpler and more predictable than Snowflake. The trade-off: a network round-trip per ID, so they add ~1-2ms latency. Batch pre-allocation (fetch 1000 IDs at a time) amortizes this cost.

Single Point of Failure Risk

Even with two ticket servers, both are in the critical write path. If both go down simultaneously (shared rack, shared network), all writes across your entire system halt. Mitigations: place servers in different availability zones, use connection pooling with fast failover, and maintain a local fallback (switch to UUIDs temporarily) if both are unreachable.

Clock skew, NTP, and the backward clock problem

Every time-based ID scheme depends on clocks, and clocks in distributed systems are unreliable. NTP (Network Time Protocol) can adjust a server's clock forward or backward. A VM migration can shift the clock. A leap second insertion can cause the clock to repeat a second. If your ID generator uses a timestamp that goes backward, two IDs generated at "different times" may collide or sort incorrectly.

Clock issueMagnitudeCauseImpact on ID generation
NTP step adjustment1-500 ms typical, up to secondsNTP daemon corrects drift by jumping the clockTimestamp goes backward; Snowflake detects this and either rejects or waits
NTP slew adjustment<0.5 ms/sec rate changeNTP gradually speeds up or slows down the clockSafe — clock moves forward but at slightly wrong speed; IDs stay ordered
Leap second1 secondUTC inserts a 61st second (23:59:60)Clock repeats a second; IDs generated at :60 may duplicate those at :00
VM live migration10-200 msHypervisor pauses VM, migrates, resumes with stale TSCClock jumps forward on resume; gap in IDs but no collision
Hardware clock drift~100 ppm without NTP (8.6 sec/day)Crystal oscillator temperature varianceIDs drift from wall-clock time; cross-node ordering becomes unreliable

Configuring NTP for ID Generation

Use chrony instead of ntpd on Linux. Configure it with "makestep 0.1 3" (allow step adjustments only during the first 3 clock updates after boot, max 100ms) and "maxslewrate 500" to limit slew speed. After initial sync, chrony will only slew (gradually adjust) the clock, never step it backward. This eliminates backward clock jumps during normal operation. AWS, GCP, and Azure all offer dedicated NTP endpoints (e.g., 169.254.169.123 on AWS) with sub-millisecond accuracy.

Handling backward clock in your ID generator

→

01

On every ID generation call, compare current_timestamp to last_timestamp.

→

02

If current_timestamp > last_timestamp: normal path. Update last_timestamp, reset sequence to 0.

→

03

If current_timestamp == last_timestamp: increment sequence. If sequence overflows, spin-wait until next millisecond.

→

04

If current_timestamp < last_timestamp (clock went backward): calculate the delta. If delta < 5ms, spin-wait for the clock to catch up. If delta > 5ms, log a critical alert and either (a) refuse to generate IDs until the clock recovers, or (b) continue using last_timestamp with incrementing sequence.

→

05

For option (b), the node "borrows" time from the future. Track the borrowed amount and stop borrowing if it exceeds a configurable threshold (e.g., 1 second). This is what Discord's implementation does.

06

Never silently generate IDs with a backward timestamp — this creates subtle ordering bugs that are extremely hard to debug in production.

1

On every ID generation call, compare current_timestamp to last_timestamp.

2

If current_timestamp > last_timestamp: normal path. Update last_timestamp, reset sequence to 0.

3

If current_timestamp == last_timestamp: increment sequence. If sequence overflows, spin-wait until next millisecond.

4

If current_timestamp < last_timestamp (clock went backward): calculate the delta. If delta < 5ms, spin-wait for the clock to catch up. If delta > 5ms, log a critical alert and either (a) refuse to generate IDs until the clock recovers, or (b) continue using last_timestamp with incrementing sequence.

5

For option (b), the node "borrows" time from the future. Track the borrowed amount and stop borrowing if it exceeds a configurable threshold (e.g., 1 second). This is what Discord's implementation does.

6

Never silently generate IDs with a backward timestamp — this creates subtle ordering bugs that are extremely hard to debug in production.

Leap second disasters

On June 30, 2012, the Linux kernel leap second bug caused hundreds of servers at Reddit, Mozilla, Foursquare, and others to spin at 100% CPU. Any ID generator running during that event could have produced duplicate timestamps. Google's solution: "leap smear" — spread the extra second across a 24-hour window so the clock never jumps. AWS and GCP both use leap smearing on their NTP servers. If your infrastructure does not smear, your ID generator must handle the repeated second.

Machine ID assignment and coordination

Snowflake-style IDs reserve 10 bits for a machine (or worker) ID. But how does each node know its own ID? This is a coordination problem with several solutions, each with different trade-offs around availability, complexity, and failure modes.

StrategyHow it worksProsConsUsed by
Static configHard-code machine_id in each node's config file or environment variableZero runtime dependenciesManual management; error-prone at scale; config driftSmall deployments, on-prem
ZooKeeper sequential nodeEach worker creates an ephemeral sequential znode; the sequence number becomes machine_idAutomatic assignment; survives restarts with sessionZooKeeper is a critical dependency; ephemeral nodes expire if session diesTwitter Snowflake (original)
Database row leaseWorker claims a row in a machine_id table with a TTL lease; periodically renewsWorks with existing DB; no new infrastructureMust handle lease expiry races; adds DB dependency to ID generationInstagram, many startups
Container metadataUse last 10 bits of container IP, pod ordinal (StatefulSet), or ECS task IDNo external coordination for Kubernetes/ECSIP reuse can cause collisions; need to validate uniqueness windowCloud-native deployments
Hash of hostnamemachine_id = hash(hostname) % 1024Dead simple; no coordinationHash collisions are possible; must verify no duplicates in fleetQuick prototypes
Consul/etcd KV with TTLSimilar to ZooKeeper but using Consul or etcd key-value store with TTLModern alternative to ZooKeeper; simpler operationsStill an external dependency; TTL management adds complexityNewer microservice architectures

The Phantom Machine ID Problem

If a node crashes without releasing its machine ID, and a new node takes the same ID before the lease expires, both nodes generate IDs with the same machine bits. For the overlap window, IDs can collide if both nodes happen to generate an ID in the same millisecond with the same sequence number. Mitigation: use a lease duration longer than your maximum restart time (e.g., 5 minutes), and on startup, wait until any previous lease for your ID has expired before generating IDs.

Instagram's Approach: Database Shard ID as Machine ID

Instagram avoided a separate machine ID service entirely. Their Snowflake-style IDs use the logical shard number (determined by user_id % num_shards) as the machine ID field. Since each shard maps to exactly one PostgreSQL primary at any given time, uniqueness is guaranteed by the database itself. This is elegant because it requires zero additional infrastructure — the sharding layer you already need doubles as your ID coordination layer.

Machine ID Assignment

Static

  • Config file
  • Environment variable
  • Command-line flag

Coordination service

  • ZooKeeper
  • etcd
  • Consul

Database-backed

  • Lease table with TTL
  • Shard ID (Instagram)
  • Ticket server assignment

Infrastructure-derived

  • Container IP bits
  • K8s pod ordinal
  • AWS instance ID hash

B-tree index performance and storage implications

The choice of ID format has enormous consequences for database index performance. This section quantifies the difference between random IDs (UUID v4) and sequential IDs (Snowflake, UUID v7, auto-increment) in terms of B-tree behavior, page splits, cache hit rates, and storage overhead.

MetricSequential IDs (Snowflake, auto-inc)Random IDs (UUID v4)
Insert patternAlways appends to rightmost leaf pageRandom page targeted for each insert
Page split rate~0% — pages fill sequentially~50% — half-full pages everywhere
B-tree page utilization~90-95% (near-full pages)~65-70% (pages split at ~50% fill)
Buffer pool hit rate at 500M rows>99% (hot rightmost pages always cached)~60-70% (random access pattern thrashes cache)
Insert throughput (PostgreSQL, 500M rows)~25,000-30,000 rows/sec~2,500-5,000 rows/sec
Index size (1B rows, 8-byte key)~8 GB (compact, well-packed)N/A — UUID v4 is 16 bytes
Index size (1B rows, 16-byte key)~16 GB (UUID v7, well-packed)~24 GB (UUID v4, 70% fill factor)
Write amplification1x (sequential writes)3-10x (random reads + writes for each insert)

Measuring Index Fragmentation

In PostgreSQL, check page utilization with: SELECT * FROM pgstattuple('your_index_name'); Look at avg_leaf_density — sequential IDs typically show 90%+, random UUIDs show 60-70%. In MySQL: SELECT index_name, stat_value FROM mysql.innodb_index_stats WHERE stat_name = "size" AND table_name = "your_table"; Compare index size to theoretical minimum (row_count x key_size / page_size) to estimate fragmentation.

The Hidden Cost: Foreign Key Indexes

Your primary key ID appears in every foreign key column in every related table. A users table with UUID v4 PKs means the orders.user_id, sessions.user_id, payments.user_id columns all store 16-byte UUIDs, and each has its own B-tree index suffering the same fragmentation. A system with 20 tables referencing users has 20 fragmented indexes. Switching from UUID v4 to Snowflake 64-bit IDs halves the storage of every FK column and index while eliminating fragmentation.

Storage size comparison for 1 billion rows

  • 64-bit integer (Snowflake) — PK index: ~8 GB. Each FK index: ~8 GB. Total with 10 FK references: ~88 GB.
  • UUID v7 (128-bit, sequential) — PK index: ~16 GB. Each FK index: ~16 GB. Total with 10 FK references: ~176 GB. 2x the integer but well-packed.
  • UUID v4 (128-bit, random) — PK index: ~24 GB (fragmented). Each FK index: ~24 GB. Total with 10 FK references: ~264 GB. 3x the integer, poorly packed.
  • ULID as string (26 chars) — PK index: ~40 GB. Catastrophic. Never store ULIDs as strings — always convert to 128-bit binary.

Real-world implementations at scale

Every major tech company has built or adopted an ID generation system. Their choices reflect their specific constraints — existing infrastructure, scale requirements, and engineering culture. Understanding these implementations gives you concrete talking points in system design interviews.

CompanySystemBit layoutKey design choices
TwitterSnowflake1+41+10+12 (64 bits)Custom epoch (2010-11-04). ZooKeeper for machine IDs. Open-sourced the original Thrift service; now internal.
DiscordDiscord Snowflake1+41+10+12 (64 bits)Custom epoch 2015-01-01. Process ID as machine ID. Handles clock regression by incrementing sequence on last known timestamp.
InstagramPL/pgSQL function1+41+13+10 (64 bits)Generated inside PostgreSQL using a stored function. 13-bit shard ID (8192 shards) + 10-bit sequence (1024/ms/shard). No external service.
SonySonyflake1+39+8+16 (64 bits)39 bits of 10ms units (174 years). 8-bit machine ID (256 nodes). 16-bit sequence (65,536 per 10ms). Trades node count for sequence space.
Baiduuid-generator1+28+22+13 (64 bits)28-bit delta seconds (8.5 years). 22-bit machine ID (4M workers — designed for containers). 13-bit sequence. Container-native.
SegmentKSUID160 bits (27 chars Base62)32-bit timestamp (seconds) + 128-bit random payload. Optimized for simplicity over compactness. No coordination needed.

Instagram's Elegant PL/pgSQL Solution

Instagram generates IDs inside PostgreSQL itself, eliminating any external ID service. Their function: extract(epoch from now())::bigint shifted left by 23 bits, OR'd with (shard_id << 10), OR'd with (nextval(sequence) % 1024). Each shard has its own sequence. Since writes to a given shard always go to the same PostgreSQL primary, uniqueness is guaranteed by the database's own sequence. Total throughput: 1,024 IDs per millisecond per shard, across 8,192 shards = 8.3M IDs/ms system-wide.

Choosing your bit allocation

The total is always 63 usable bits (1 sign bit reserved). You choose how to split them based on your constraints: - More timestamp bits → longer epoch runway but fewer machine/sequence bits. - More machine bits → more nodes but shorter sequences or epoch. - More sequence bits → higher per-node throughput but fewer nodes. Rule of thumb: 41 bits timestamp (69 years) + 10 bits machine (1024 nodes) + 12 bits sequence (4096/ms) works for most companies. Only change this if you have a specific constraint (e.g., Baidu needed 4M container IDs, so they used 22 machine bits).

Design decisions and trade-off analysis

Choosing an ID generation strategy is not about finding the "best" option — it is about understanding which trade-offs matter for your specific system. This section synthesizes the key decision axes and provides a framework for making the right choice.

ID strategy decision flowchart

Do you need IDs to be time-sortable?
YesUse time-prefixed scheme (Snowflake, UUID v7, ULID)
NoUUID v4 is acceptable if index fragmentation is tolerable
Must IDs fit in 64 bits (for JavaScript safe integers or compact storage)?
YesUse Snowflake-style ID with custom epoch
NoUUID v7 or ULID (128 bits) provide more entropy and simpler generation
Can you tolerate a network round-trip per ID?
YesTicket servers give strict monotonicity and gap-free sequences
NoUse local generation (Snowflake, UUID v7, ULID) with no coordination on hot path
Do you need to embed metadata (shard ID, type, region) in the ID?
YesSnowflake-style with custom bit allocation for your metadata fields
NoStandard layouts work; avoid over-engineering the ID format

The Modern Default: UUID v7 or Snowflake

For new systems in 2024+, the decision is usually between UUID v7 and Snowflake-style IDs: - Choose UUID v7 if: you want zero coordination, your database supports 128-bit natively, you do not need to embed custom metadata, and you want cross-platform compatibility. - Choose Snowflake if: you need 64-bit IDs (JavaScript BigInt concerns, compact storage), you want to embed shard/datacenter info, or you need >4096 IDs/ms/node with custom sequence sizing. - Choose ticket servers if: you need strictly monotonic gap-free sequences (financial systems, regulatory requirements). - Avoid UUID v4 for new systems: there is no reason to choose random over time-ordered when UUID v7 exists.

Common design mistakes

  • Using UUID v4 as a primary key at scale — Causes B-tree fragmentation, 5-10x slower inserts at 100M+ rows. Use UUID v7 or Snowflake instead.
  • Storing ULIDs as strings — 26-char string = 26 bytes vs 16 bytes binary. Indexes are 60% larger. Always store as binary(16) or use the native UUID column type.
  • Not setting a custom epoch — Using Unix epoch (1970) wastes 50+ years of your 41-bit timestamp range. Set your epoch to your system launch date.
  • Ignoring clock skew — Not handling backward clock jumps means your ID generator can silently produce duplicate or out-of-order IDs under NTP corrections.
  • Hardcoding machine IDs — Works for 5 servers, fails at 50. Use a coordination service or infrastructure-derived IDs for any non-trivial deployment.
  • Exposing sequential IDs to users — Auto-increment PKs leak information (total user count, creation rate). If IDs are user-facing, use a separate public identifier with obfuscation (Hashids, nanoid, or encrypted IDs).
How this might come up in interviews

Distributed ID generation appears directly in system design interviews ("Design a URL shortener", "Design Twitter", "Design a distributed message queue") and indirectly in any question where you need to discuss database schema and primary keys. Interviewers use it to test your understanding of distributed coordination, clock synchronization, database internals (B-tree behavior), and trade-off analysis. Strong candidates discuss bit layouts, epoch math, and clock skew handling without prompting.

Common questions:

  • Design a globally unique ID generation service for a social media platform handling 100K writes/second across 5 data centers.
  • Compare UUID, Snowflake, and auto-increment for a messaging system. Which would you choose and why?
  • How would you handle clock skew in a Snowflake-style ID generator? What happens if NTP steps the clock backward?
  • Why do random UUIDs cause database performance problems at scale? How would you fix this without changing the application layer?
  • Your Snowflake ID generator runs out of sequence numbers (hits 4096) multiple times per second during traffic spikes. How do you solve this?
  • Design the ID generation strategy for a system that needs to shard across 10,000 database nodes. How do you assign machine IDs?

Try this question: Ask the interviewer: What is the expected write throughput? Do IDs need to be time-sortable? Are IDs exposed to end users (security/enumeration concerns)? What database engine is used (affects index performance trade-offs)? How many nodes will generate IDs concurrently?

Strong answer: Drawing the Snowflake bit layout from memory. Calculating throughput limits (4096/ms/node x 1024 nodes). Mentioning real implementations (Twitter, Discord, Instagram). Discussing NTP slew vs step mode. Bringing up the foreign key storage multiplier effect.

Red flags: Suggesting UUID v4 without mentioning index fragmentation. Not addressing clock skew. Proposing a centralized ID service without discussing availability. Confusing UUIDs with sequential IDs. Not knowing the size difference between 64-bit and 128-bit IDs.

Key takeaways

  • Snowflake-style IDs (1+41+10+12 bits) are the industry standard for high-throughput systems: they produce compact 64-bit integers that are time-sorted, locally generated, and collision-free across 1,024 nodes at 4,096 IDs/ms/node.
  • UUID v7 (RFC 9562) should replace UUID v4 in all new systems — it provides the same 128-bit format and zero-coordination generation but with time-ordering that eliminates B-tree index fragmentation, improving insert throughput by 5-10x at scale.
  • Clock skew is the fundamental enemy of time-based IDs. Configure NTP/chrony for slew-only mode, persist the timestamp high-water mark to disk, and monitor cross-node clock offsets. Never silently generate IDs with a backward timestamp.
  • Machine ID assignment requires deliberate coordination — whether through ZooKeeper leases, database shard numbers (Instagram), or infrastructure metadata. Failing to ensure unique machine IDs makes Snowflake collision-prone during infrastructure events.
  • Storage impact compounds across every foreign key and index: choosing 64-bit Snowflake over 128-bit UUID v4 halves your index sizes for every table that references the ID, and eliminates fragmentation — a decision that saves terabytes at billion-row scale.
Before you move on: can you answer these?

In a Snowflake-style ID with 41 timestamp bits and a custom epoch of January 1, 2024, approximately when will the timestamp field overflow?

2^41 milliseconds = ~69.7 years. Starting from January 2024, the IDs will exhaust around September 2093. This is why choosing a recent custom epoch matters — using the Unix epoch (1970) would waste 54 years of your 69.7-year budget.

Why does UUID v4 cause B-tree index fragmentation, and how does UUID v7 fix this?

UUID v4 is fully random, so each insert targets a random leaf page in the B-tree, causing frequent page splits and ~65-70% page utilization. UUID v7 places a millisecond timestamp in the most significant bits, making inserts monotonically increasing — they always append to the rightmost leaf page, achieving ~90-95% utilization and 5-10x better insert throughput at scale.

What happens in a Snowflake ID generator when the system clock jumps backward by 100ms due to an NTP correction?

The generator detects that current_timestamp < last_timestamp. Correct implementations either spin-wait for 100ms until the clock catches up, or continue generating IDs using the last known timestamp with incrementing sequence numbers (borrowing from the future). Incorrect implementations that silently use the backward timestamp risk producing duplicate IDs if another node generates IDs with the same timestamp and machine ID.

🧠Mental Model

💡 Analogy

Think of distributed ID generation like license plate systems across 50 US states. Each state (node) has its own prefix (machine ID) and issues plates sequentially within that prefix (sequence number). The year of registration is encoded in the sticker (timestamp). No two states will ever issue the same plate, no state needs to call a central office before issuing a plate, and you can roughly tell when a car was registered by looking at the plate. Snowflake IDs work the same way: timestamp prefix + node prefix + local sequence = globally unique, roughly ordered, locally generated.

⚡ Core Idea

Distributed ID generation solves the problem of creating globally unique, time-ordered identifiers without requiring nodes to coordinate on every ID creation. The key insight is partitioning the ID space so that each node owns a non-overlapping slice and can generate IDs locally at full speed.

🎯 Why It Matters

At FAANG scale, ID generation sits in the critical path of every write operation. A poorly chosen ID strategy can halve your database write throughput (UUID v4 fragmentation), create subtle ordering bugs (clock skew), or become an availability bottleneck (centralized ticket servers). Getting this right is foundational infrastructure that affects every service in your system.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.