A bad query plan at 1k users becomes a 3am PagerDuty alert at 10M users
A bad query plan at 1k users becomes a 3am PagerDuty alert at 10M users
Without indexes, a database scans every row in a table (Sequential Scan). For 100 million rows, that's catastrophic. An index is a separate B-tree data structure that lets the database jump directly to matching rows.
B-Tree Index Internals
PostgreSQL's default index is a B-tree (balanced tree). Values stored in sorted order. WHERE email = 'alice@example.com' navigates the tree: O(log N) comparisons vs O(N) for a full scan. For 100M rows: tree = ~27 comparisons vs 100M row scan.
Index Types and When to Use Each
1-- Diagnose with EXPLAIN ANALYZEEXPLAIN ANALYZE actually runs the query — use on production with caution for expensive queries2EXPLAIN ANALYZE3SELECT u.name, COUNT(o.id) as order_count4FROM users u5LEFT JOIN orders o ON o.user_id = u.id6WHERE u.created_at > NOW() - INTERVAL '30 days'7GROUP BY u.id, u.name8ORDER BY order_count DESC9LIMIT 100;1011-- Bad plan output:12-- Seq Scan on users (cost=0.00..12847.00 rows=50000)13-- Seq Scan on orders (cost=0.00..48000.00 rows=2000000)1415-- Fix: add indexes16CREATE INDEX idx_users_created_at ON users(created_at);17CREATE INDEX idx_orders_user_id ON orders(user_id);1819-- Good plan after indexes:20-- Index Scan on users using idx_users_created_at21-- Index Scan on orders using idx_orders_user_id2223-- Composite index: column order matters24-- Query: WHERE user_id = ? AND status = ? ORDER BY created_at25CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);Composite index leftmost-prefix rule: (a, b, c) supports queries on a, (a,b), or (a,b,c) — not b or c alone26-- Supports: WHERE user_id=? AND status=?27-- Also supports: WHERE user_id=? alone28-- Does NOT support: WHERE status=? alone (leftmost prefix rule)2930-- Partial index: only pending orders (tiny fraction)Partial index: if only 0.1% of orders are pending, this index is 1000x smaller than a full index31CREATE INDEX idx_orders_pending ON orders(user_id, created_at)32WHERE status = 'pending';33-- 1000x smaller than full index on large tables with few pending orders3435-- Covering index: satisfy query from index alone36CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name, id);INCLUDE stores columns in leaf nodes — query never hits main table, much faster for read-heavy workloads37-- Query: SELECT id, name FROM users WHERE email = ?38-- → Index-only scan — never touches the main table3940-- EXPLAIN keywords to know:41-- "Seq Scan" → full table scan, add an index42-- "Index Scan" → using index + fetching from table43-- "Index Only" → index has all needed data (fastest)
Top Query Performance Anti-Patterns
Enable the Slow Query Log
Log all queries > 100ms. PostgreSQL: log_min_duration_statement = 100. MySQL: slow_query_log = ON, long_query_time = 0.1. Review weekly. This is your most valuable diagnostic tool.
| Technique | Where Data Lives | App Transparency | Cross-JOINs | When to Use |
|---|---|---|---|---|
| Partitioning | Same server, different files | Transparent to app | Supported within DB | Large tables (>100M rows), time-series, archiving |
| Sharding | Different database servers | Requires routing layer | Not possible across shards | When one server can't handle the load |
Partition Before You Shard
Table partitioning (PostgreSQL PARTITION BY RANGE/LIST/HASH) is transparent to your application and dramatically improves query performance for time-series and large tables. You can still run cross-partition JOINs. Sharding requires routing logic in your app and eliminates cross-shard JOINs forever.
Sharding Strategies
Database optimization is a practical skill. Show you can diagnose slow queries with EXPLAIN ANALYZE and apply the right fix — not just "add an index" or "shard it".
Common questions:
Strong answers include:
Red flags:
Quick check · Database Optimization: Indexing, Query Performance, and Sharding
1 / 2
Key takeaways
From the books
Use The Index, Luke — Markus Winand (2011)
Chapter 1: Anatomy of an Index
Indexes speed up reads but slow down writes. The art is knowing when an index is worth the write overhead — and knowing which index type to use.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.