Interactive Explainer

🎯Key Takeaways

Start with PostgreSQL — migrate to specialized databases only when you hit concrete limitations

Index design is the single biggest performance lever — master composite, partial, and covering indexes

ACID isolation levels matter: use SERIALIZABLE for financial transactions

Sharding is a last resort — exhaust read replicas, caching, and vertical scaling first

Connection pooling (PgBouncer) is mandatory for production PostgreSQL deployments

Never let two services write to the same table — it creates race conditions you cannot easily fix

Database Architecture: Choosing and Scaling Your Data Layer

Most production outages trace back to bad database decisions made early. Learn how senior engineers think about data modeling, indexing, sharding, and choosing between SQL and NoSQL.

~8 min read

Be the first to complete!

What you'll learn

Start with PostgreSQL — migrate to specialized databases only when you hit concrete limitations
Index design is the single biggest performance lever — master composite, partial, and covering indexes
ACID isolation levels matter: use SERIALIZABLE for financial transactions
Sharding is a last resort — exhaust read replicas, caching, and vertical scaling first
Connection pooling (PgBouncer) is mandatory for production PostgreSQL deployments
Never let two services write to the same table — it creates race conditions you cannot easily fix

Lesson outline

The database is your most important decision

Your application code is easy to replace. Your database schema and the data it contains is not. Migrating 500 million rows from MySQL to Cassandra while keeping the service running is a multi-month engineering project. Getting the database choice right early saves enormous pain later.

The right database depends on: your data model (relational, document, graph, time-series?), your access patterns (reads vs writes, point lookups vs range scans?), your consistency requirements (ACID or eventual?), and your scale requirements (how many rows, how many writes per second?).

Start with PostgreSQL

If you are uncertain, start with PostgreSQL. It handles relational data, JSON documents (JSONB), full-text search, time-series (with TimescaleDB), and geospatial data. Migrate to specialized databases only when you hit a concrete limitation.

SQL vs NoSQL: the real trade-off

Relational databases (PostgreSQL, MySQL): ACID transactions, powerful joins, schema-enforced structure, strong consistency. Read replicas for horizontal read scaling. Vertical scaling for writes (or sharding — complex). Best for: financial transactions, user profiles, anything with complex relationships.

Document databases (MongoDB, Firestore): Schema-flexible, horizontal write scaling, denormalized data model. No joins — you embed related data. Best for: CMS content, product catalogs, user preferences, anything schema changes frequently.

Wide-column databases (Cassandra, DynamoDB): Extreme write throughput (100,000+ writes/sec per node), linear horizontal scaling, eventual consistency by default. No joins, no ad-hoc queries — you design tables around your queries. Best for: time-series data, IoT sensor readings, activity feeds, audit logs.

Search engines (Elasticsearch, OpenSearch): Inverted index optimized for full-text search and complex filtering. Not a primary database — use as a secondary index synced from your primary. Best for: product search, log analytics.

Database	ACID	Horizontal Scale	Query Flexibility	Best Use Case
PostgreSQL	✅ Full	Read replicas (easy), write (hard)	Excellent (SQL + JSONB)	Default choice, financial data
MongoDB	✅ Multi-doc (v4+)	Sharding built-in	Good (no joins)	CMS, catalogs, flexible schema
Cassandra	❌ Row-level only	Linear (petabytes)	Limited (by partition key)	High-write IoT, time-series
DynamoDB	✅ Item-level	Managed, auto	Limited (GSI/LSI)	Serverless, AWS-native apps
Redis	❌ (persistence optional)	Cluster mode	Data structure ops	Cache, sessions, leaderboards

Indexing: the single biggest performance lever

An index is a separate data structure that lets the database find rows without scanning every row. Without an index, `SELECT * FROM orders WHERE user_id = 123` on a 100M-row table does a full table scan — seconds of latency. With a B-tree index on `user_id`, it is milliseconds.

B-tree index: The default. Supports equality, range, prefix, and ORDER BY. Handles 99% of cases.

Composite index: Index on multiple columns `(user_id, created_at)`. Order matters — the leftmost prefix rule: `(user_id, created_at)` can serve queries filtering on `user_id` alone, or `user_id AND created_at`, but NOT `created_at` alone.

Partial index: Only index rows matching a condition. `CREATE INDEX ON orders(user_id) WHERE status = 'pending'`. Smaller, faster for targeted queries.

Covering index: Include extra columns in the index so the query never touches the main table (index-only scan). Dramatic performance improvement for hot queries.

Indexes have a write cost

Every write must update all indexes on that table. A table with 20 indexes will have 20x the write overhead vs no indexes. Index only what you query. Remove unused indexes — they are pure overhead.

indexing-patterns.sql

1-- ❌ SLOW: Full table scan on 50M orders
2EXPLAIN ANALYZE
EXPLAIN ANALYZE actually executes the query — use on a staging DB
3SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
4-- Seq Scan on orders  (cost=0.00..2M rows, actual=45sec)
5 
6-- ✅ FIX 1: Simple index
7CREATE INDEX idx_orders_user ON orders(user_id);
8-- Bitmap Index Scan  (cost=100 rows, actual=3ms)
9 
10-- ✅ FIX 2: Composite index (even better for this query)
11CREATE INDEX idx_orders_user_status ON orders(user_id, status);
12-- Index Only Scan on idx_orders_user_status  (cost=2 rows, actual=0.1ms)
13 
14-- ✅ FIX 3: Partial index (smallest, fastest for pending-only queries)
15CREATE INDEX idx_orders_user_pending
16  ON orders(user_id)
17  WHERE status = 'pending';
18 
19-- ✅ COVERING INDEX: include columns the SELECT needs → no table heap read
20CREATE INDEX idx_orders_cover
Partial indexes are underused — they are smaller and faster for selective queries
21  ON orders(user_id, status)
22  INCLUDE (order_total, created_at);
23-- Query hits only the index, never the table (fastest possible)
24 
25-- 🔍 ALWAYS check your query plans
26EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
27SELECT user_id, order_total, created_at
28FROM orders
29WHERE user_id = 42 AND status = 'pending';

Sharding: when vertical scaling hits its ceiling

Sharding splits your data across multiple database nodes, each owning a subset of the data. No single node has all the data — queries that span shards require scatter-gather or are routed to the correct shard.

Range sharding: Shard by key range (user_id 1-1M → shard 1, 1M-2M → shard 2). Simple but creates hot shards (new users all go to the latest shard).

Hash sharding: Hash the key and assign to a shard bucket. Even distribution but range queries touch all shards.

Directory-based sharding: A lookup table maps keys to shards. Maximum flexibility — move data between shards by updating the directory. Added latency for the directory lookup.

The hard problems: cross-shard transactions (avoid — redesign so you do not need them), cross-shard joins (bring data to application layer), rebalancing (moving data when you add shards — use consistent hashing to minimize data movement).

Delay sharding as long as possible

Sharding adds massive operational complexity. Exhaust vertical scaling (bigger machine), read replicas, and caching first. Many "at scale" companies do not shard their primary database — they shard specific high-volume tables only.

shard-router.ts

1// Consistent Hashing Shard Router
2// Minimizes data movement when shards are added/removed
3 
4import crypto from 'crypto';
5 
6class ShardRouter {
7  private ring: Map<number, string> = new Map();
8  private sortedKeys: number[] = [];
9  private virtualNodes = 150; // More virtual nodes = more even distribution
150 virtual nodes ensures even distribution even with few physical shards
10 
11  addShard(shardId: string) {
12    for (let i = 0; i < this.virtualNodes; i++) {
13      const virtualKey = this.hash(`${shardId}:vnode:${i}`);
14      this.ring.set(virtualKey, shardId);
15    }
16    this.sortedKeys = [...this.ring.keys()].sort((a, b) => a - b);
17  }
18 
19  getShardForKey(key: string): string {
20    if (this.sortedKeys.length === 0) throw new Error('No shards registered');
21    const hash = this.hash(key);
22    // Walk clockwise on the ring to find the first shard >= hash
Clockwise walk = consistent hashing algorithm
23    const idx = this.sortedKeys.findIndex(k => k >= hash);
24    const ringKey = idx === -1
25      ? this.sortedKeys[0]          // wrap around
26      : this.sortedKeys[idx];
27    return this.ring.get(ringKey)!;
28  }
29 
30  private hash(input: string): number {
31    const h = crypto.createHash('md5').update(input).digest('hex');
32    return parseInt(h.substring(0, 8), 16);
33  }
34}
35 
36// Usage
37const router = new ShardRouter();
38router.addShard('pg-shard-1');
39router.addShard('pg-shard-2');
40router.addShard('pg-shard-3');
41 
42console.log(router.getShardForKey('user:42'));    // → 'pg-shard-2'
43console.log(router.getShardForKey('user:9999')); // → 'pg-shard-1'
44// Adding pg-shard-4 only remaps ~25% of keys

ACID transactions: what they guarantee (and what they do not)

Atomicity: All operations in a transaction succeed, or all fail. No partial writes. Example: transfer $100 between accounts — debit AND credit happen together or neither does.

Consistency: A transaction brings the database from one valid state to another. Constraints, foreign keys, and triggers are respected.

Isolation: Concurrent transactions see a consistent view of data. Isolation level controls how:

Isolation Level	Dirty Read	Non-repeatable Read	Phantom Read	Use When
READ UNCOMMITTED	Possible	Possible	Possible	Never (analytics only)
READ COMMITTED	Prevented	Possible	Possible	Default in PostgreSQL/Oracle
REPEATABLE READ	Prevented	Prevented	Possible	Inventory checks, reports
SERIALIZABLE	Prevented	Prevented	Prevented	Financial transactions

Durability: Once committed, data survives crashes. Achieved via write-ahead log (WAL) — changes are written to a log before the data pages, so after a crash the log can replay committed transactions.

Use SERIALIZABLE for money

Financial applications should use SERIALIZABLE isolation. READ COMMITTED (the default) allows phantom reads — a user's balance could appear different in two reads within the same "transaction", enabling double-spend attacks.

Connection pooling: the hidden scaling limit

PostgreSQL creates a new OS process for each connection. At 500 connections, PostgreSQL consumes ~10GB RAM just for connections. Each connection requires a round-trip to establish (TCP handshake + auth).

Connection poolers (PgBouncer, pgpool-II) maintain a pool of established connections and multiplex application connections through them. With PgBouncer, 10,000 application "connections" use only 50 real database connections.

Pool modes: Transaction pooling (a connection is returned to pool after each transaction — recommended), Session pooling (dedicated connection per client session — wastes connections), Statement pooling (fastest but breaks multi-statement transactions).

RDS Proxy (AWS) and Cloud SQL Proxy provide managed connection pooling as a service — recommended for serverless functions that create thousands of short-lived connections.

pgbouncer.ini

1[databases]
2; Route "app" database through to PostgreSQL
3app = host=postgres-primary port=5432 dbname=production
4 
5[pgbouncer]
6listen_port = 6432
7listen_addr = 0.0.0.0
8auth_type = md5
9auth_file = /etc/pgbouncer/userlist.txt
10 
11; Transaction pooling: connection returned to pool after each transaction
12pool_mode = transaction
Transaction pooling is recommended — session pooling wastes connections
13 
14; Maximum connections to PostgreSQL (tune based on max_connections setting)
15max_db_connections = 100
Set this to ~80% of PostgreSQL max_connections to leave headroom
16 
17; Maximum client connections PgBouncer will accept
18max_client_conn = 5000
19 
20; Pool size per (database, user) pair
21default_pool_size = 25
22 
23; Kill idle connections after 10 minutes
24server_idle_timeout = 600
25 
26; Queue client connections if pool is exhausted (instead of rejecting)
27reserve_pool_size = 5
28reserve_pool_timeout = 3

How this might come up in interviews

Database questions test whether you understand trade-offs, not whether you can recite SQL syntax.

Common questions:

How would you design a database schema for a social network (users, posts, likes, follows)?
Our PostgreSQL is slow. Walk me through how you would diagnose and fix it.
When would you choose Cassandra over PostgreSQL?
Explain what happens during a PostgreSQL VACUUM and why it matters.
How do you handle database migrations on a live system with zero downtime?

Strong answers include:

Asks about access patterns before recommending a database
Mentions EXPLAIN ANALYZE for query optimization
Understands connection pooling and why it matters in production
Can explain trade-offs between consistency levels with concrete examples

Red flags:

Recommends NoSQL as a universal solution
Cannot explain what an index actually is
Does not know about ACID transaction isolation levels
Suggests sharding as the first solution to any performance problem

Quick check · Database Architecture: Choosing and Scaling Your Data Layer

1 / 2

You have a table with 100M rows. A query `SELECT * FROM events WHERE created_at > '2024-01-01'` is slow. You have an index on `created_at`. Why might it still be slow?

Key takeaways

Start with PostgreSQL — migrate to specialized databases only when you hit concrete limitations
Index design is the single biggest performance lever — master composite, partial, and covering indexes
ACID isolation levels matter: use SERIALIZABLE for financial transactions
Sharding is a last resort — exhaust read replicas, caching, and vertical scaling first
Connection pooling (PgBouncer) is mandatory for production PostgreSQL deployments
Never let two services write to the same table — it creates race conditions you cannot easily fix

From the books

Designing Data-Intensive Applications — Martin Kleppmann (2017)

Chapters 2-4: Data Models, Storage Engines, Encoding

Understanding how databases store data on disk (B-trees vs LSM-trees) explains why some databases are fast for reads and others for writes. This knowledge separates architects from implementers.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

Database Architecture: Choosing and Scaling Your Data Layer

The database is your most important decision

SQL vs NoSQL: the real trade-off

Indexing: the single biggest performance lever

Sharding: when vertical scaling hits its ceiling

ACID transactions: what they guarantee (and what they do not)

Connection pooling: the hidden scaling limit

You have a table with 100M rows. A query `SELECT * FROM events WHERE created_at > '2024-01-01'` is slow. You have an index on `created_at`. Why might it still be slow?

Discussion

In-app Q&A

Database Architecture: Choosing and Scaling Your Data Layer

The database is your most important decision

SQL vs NoSQL: the real trade-off

Indexing: the single biggest performance lever

Sharding: when vertical scaling hits its ceiling

ACID transactions: what they guarantee (and what they do not)

Connection pooling: the hidden scaling limit

You have a table with 100M rows. A query `SELECT * FROM events WHERE created_at > '2024-01-01'` is slow. You have an index on `created_at`. Why might it still be slow?

Discussion

In-app Q&A