Skip to main content
Career Paths
Concepts
Bep Database Fundamentals
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Database Fundamentals: SQL, Scaling, and Data Modeling

The database is the hardest part of your system to change. Get it right first.

🎯Key Takeaways
Choose PostgreSQL as your default — it handles more scale than most companies ever reach.
Design schemas based on your query patterns, not just the data model.
Store money as integer cents. Use TIMESTAMPTZ. Use UUID primary keys.
The scaling ladder: optimize → vertical scale → read replicas → caching → sharding.
Sharding is a last resort. It eliminates cross-shard JOINs and makes transactions between shards impossible.
ACID guarantees (especially Atomicity and Durability) are why relational databases power banking, e-commerce, and healthcare.

Database Fundamentals: SQL, Scaling, and Data Modeling

The database is the hardest part of your system to change. Get it right first.

~7 min read
Be the first to complete!
What you'll learn
  • Choose PostgreSQL as your default — it handles more scale than most companies ever reach.
  • Design schemas based on your query patterns, not just the data model.
  • Store money as integer cents. Use TIMESTAMPTZ. Use UUID primary keys.
  • The scaling ladder: optimize → vertical scale → read replicas → caching → sharding.
  • Sharding is a last resort. It eliminates cross-shard JOINs and makes transactions between shards impossible.
  • ACID guarantees (especially Atomicity and Durability) are why relational databases power banking, e-commerce, and healthcare.

Lesson outline

Your Database Is the One Thing You Can't Easily Replace

In 2012, Instagram was acquired by Facebook for $1 billion. At the time, they had 13 employees and 30 million users. Their entire backend ran on PostgreSQL. They never needed to switch.

The Migration Nightmare

Twitter spent 3 years migrating from MySQL to Manhattan (their custom storage system). Uber migrated from PostgreSQL to MySQL in 2016 and wrote a 6,000-word blog post explaining why. Database migrations are the most expensive engineering projects most companies undertake.

The question isn't "which database is best." It's "which database fits my data model, access patterns, and scale requirements."

Database TypeBest ForExamplesAvoid When
Relational (SQL)Structured data, complex queries, transactionsPostgreSQL, MySQL, SQLiteMassive scale with simple lookups, unstructured data
Document (NoSQL)Flexible schemas, nested data, rapid iterationMongoDB, Firestore, DynamoDBComplex JOIN-heavy queries, strong consistency required
Key-ValueSession storage, caching, simple lookupsRedis, DynamoDB, etcdComplex queries, data that needs relationships
Wide-ColumnTime-series, high write throughput, append-heavyCassandra, HBase, BigTableComplex queries, strong consistency, small datasets
GraphRelationship-heavy data, social graphs, recommendationsNeo4j, Amazon NeptuneNon-graph data, simple queries
SearchFull-text search, faceted filteringElasticsearch, Solr, TypesenseAuthoritative data source, complex transactions

The Relational Model: Still the Default for Good Reason

PostgreSQL and MySQL power the majority of the world's business data. The relational model — tables, rows, columns, foreign keys — has survived 50 years because it works.

ACID: The Guarantee That Makes Banks Trust Databases

  • 🔵Atomicity — A transaction either fully completes or fully rolls back. No partial writes. "Transfer $100: debit A AND credit B — or neither."
  • 🟢Consistency — Every transaction brings the database from one valid state to another. Foreign key constraints, check constraints — all enforced.
  • 🟡Isolation — Concurrent transactions don't see each other's intermediate state. Four isolation levels: Read Uncommitted → Read Committed → Repeatable Read → Serializable.
  • 🔴Durability — Committed transactions survive crashes. PostgreSQL writes a WAL (Write-Ahead Log) before confirming success.

Isolation Levels in Production

Most databases default to Read Committed. PostgreSQL uses it. MySQL InnoDB defaults to Repeatable Read. Serializable is the safest but slowest. Pick the lowest isolation level that still prevents your specific concurrency bugs.

database-schema.sql
1-- PostgreSQL schema with production-grade patterns
2
3-- Users table: always use UUID primary keys for distributed systems
4CREATE TABLE users (
UUID over auto-increment: no coordination needed across database replicas
5 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
6 email TEXT UNIQUE NOT NULL,
7 name TEXT NOT NULL,
TIMESTAMPTZ stores timezone — use it. TIMESTAMP without TZ causes daylight saving bugs
8 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- timezone-aware!
9 deleted_at TIMESTAMPTZ -- soft delete, not hard delete
10);
11
12-- Orders table: proper foreign keys with cascading
13CREATE TABLE orders (
14 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ON DELETE RESTRICT prevents orphaned orders. ON DELETE CASCADE would delete orders with users
15 user_id UUID NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
Store money as integer cents, never as FLOAT. 0.1 + 0.2 = 0.30000000000000004 in floating point
16 status TEXT NOT NULL CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
17 total_cents INTEGER NOT NULL CHECK (total_cents >= 0), -- store money as integers!
18 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
19);
20
Composite index on (user_id, created_at) supports "get user orders sorted by date" in one index scan
21-- Index for common query pattern: user's orders
22CREATE INDEX idx_orders_user_id_created ON orders(user_id, created_at DESC);
23
24-- ACID transaction: transfer money atomically
25BEGIN;
26 UPDATE accounts SET balance = balance - 10000 WHERE id = 'account-A';
27 UPDATE accounts SET balance = balance + 10000 WHERE id = 'account-B';
28 -- If any error here, entire transaction rolls back
29COMMIT;
30
31-- Window functions: FAANG interview favorite
32SELECT
33 user_id,
34 order_id,
35 total_cents,
36 SUM(total_cents) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total,
37 RANK() OVER (PARTITION BY user_id ORDER BY total_cents DESC) AS order_rank
38FROM orders;

Data Modeling: Design for Queries, Not Just for Structure

Most junior engineers design schemas based on the real world. Senior engineers design schemas based on the queries that will run against them.

The Query-First Design Rule

Before designing any table, write the 5 most common queries your application needs. Then design the schema to answer those queries efficiently. Your indexes flow from your query patterns, not from your data model.

Normal FormEliminatesTrade-OffUse When
1NFRepeating groups, multi-valued columnsMore tablesAlways (minimum standard)
2NFPartial functional dependenciesMore joinsAlways for relational data
3NFTransitive dependenciesEven more joinsStandard for OLTP systems
DenormalizedJoins (intentionally)Data duplication, update anomaliesRead-heavy, analytics, search indexes

When to Denormalize Intentionally

Instagram stores the follower count directly on the user row (denormalized) instead of counting follower rows every time. The count is eventually consistent but the read is instant. This is the right trade-off for 10B reads/day vs occasional inconsistency.

Data Modeling Process at FAANG

→

01

Write the 5–10 most critical queries your system needs to answer

→

02

Identify the entities (User, Order, Product) and their relationships

→

03

Normalize to 3NF as a starting point

→

04

Identify which queries would require expensive JOINs or full table scans

→

05

Selectively denormalize those specific hot paths (add materialized counts, embed frequently co-read data)

→

06

Add indexes for every WHERE clause and JOIN column in your critical queries

07

Load test with production-scale data before shipping

1

Write the 5–10 most critical queries your system needs to answer

2

Identify the entities (User, Order, Product) and their relationships

3

Normalize to 3NF as a starting point

4

Identify which queries would require expensive JOINs or full table scans

5

Selectively denormalize those specific hot paths (add materialized counts, embed frequently co-read data)

6

Add indexes for every WHERE clause and JOIN column in your critical queries

7

Load test with production-scale data before shipping

Scaling Databases: When One Machine Isn't Enough

The Database Scaling Ladder (climb before you jump)

  • Optimize queries first — Add indexes, rewrite slow queries, use EXPLAIN ANALYZE. 80% of "we need to scale our database" problems are solved here.
  • Vertical scaling — Bigger machine: more RAM, faster disk. A $5,000/month RDS instance handles enormous load. Often cheaper than engineering time.
  • Read replicas — Route read queries to replicas. Instagram ran on one primary + multiple read replicas for years. Replication lag is the tradeoff.
  • Caching layer — Add Redis/Memcached in front of hot data. The database never sees cache-hit queries.
  • Horizontal sharding — Split data across multiple databases by key (user_id % N). Only do this when you've exhausted everything else. Huge operational complexity.

Sharding Is a Last Resort

Sharding eliminates cross-shard JOINs, makes transactions spanning shards impossible, and requires a routing layer. Pinterest moved away from sharding to Vitess (MySQL with transparent sharding). Stripe and Shopify still run on single-region PostgreSQL with clever partitioning. Don't shard prematurely.

TechniqueAddressesTrade-OffWhen to Apply
Index optimizationSlow queriesSlightly slower writesAlways — first step
Connection pooling (PgBouncer)Connection exhaustionConfig complexity>100 concurrent connections
Read replicasRead throughputReplication lag, eventual consistency>70% reads in workload
PartitioningTable bloat, time-series dataPartition pruning must be planned>100M rows in one table
ShardingWrite throughput, data sizeNo cross-shard JOINs, complex opsAfter everything else fails
NewSQL (CockroachDB, Spanner)Global scale with SQLLatency, cost, operational complexityGlobal consistency required
read-replica-routing.ts
1// Read replica routing with Prisma
2import { PrismaClient } from '@prisma/client';
3
4// Primary: all writes go here
5const primary = new PrismaClient({
Never mix connection pools — primary is for writes, replica for reads
6 datasources: { db: { url: process.env.DATABASE_PRIMARY_URL } }
7});
8
9// Replica: reads only
10const replica = new PrismaClient({
11 datasources: { db: { url: process.env.DATABASE_REPLICA_URL } }
12});
13
14// Route based on operation type
15class DatabaseRouter {
Replica reads are eventually consistent. Replication lag is typically 10-100ms
16 async findUser(id: string) {
17 // READ → replica (eventual consistency acceptable)
18 return replica.user.findUnique({ where: { id } });
19 }
20
21 async createOrder(data: OrderCreateInput) {
22 // WRITE → primary (strong consistency required)
23 return primary.order.create({ data });
Post-write reads should go to primary to avoid read-after-write inconsistency
24 }
25
26 async getOrderForDisplay(id: string) {
27 // Reads that MUST be fresh (after a write) → primary
28 return primary.order.findUnique({
29 where: { id },
30 include: { user: true, items: true }
31 });
32 }
33}
34
35// ⚠️ Read-after-write problem:
36// User creates order → replica may not have it yet
37// Solution: For 1 second after writes, route reads to primary
38// Or: use a cache to store "just created" IDs
How this might come up in interviews

Database questions at FAANG assess whether you understand trade-offs, not just syntax. They want to see you reason about consistency vs availability, normalization vs performance, and when NOT to scale.

Common questions:

  • How would you design the database schema for a social media platform?
  • What is the difference between OLTP and OLAP? When would you use a separate analytics database?
  • Explain when you would denormalize a database schema
  • How do read replicas work and what consistency guarantees do they provide?

Strong answers include:

  • Mentions query-first design unprompted
  • Knows ACID and can explain isolation levels
  • Understands read-after-write consistency problem with replicas

Red flags:

  • Says "use NoSQL for scale" without understanding why
  • Doesn't know what an index is or when to add one
  • Designs schema without asking about query patterns

Quick check · Database Fundamentals: SQL, Scaling, and Data Modeling

1 / 3

You need to store monetary values in a database. Which column type should you use?

Key takeaways

  • Choose PostgreSQL as your default — it handles more scale than most companies ever reach.
  • Design schemas based on your query patterns, not just the data model.
  • Store money as integer cents. Use TIMESTAMPTZ. Use UUID primary keys.
  • The scaling ladder: optimize → vertical scale → read replicas → caching → sharding.
  • Sharding is a last resort. It eliminates cross-shard JOINs and makes transactions between shards impossible.
  • ACID guarantees (especially Atomicity and Durability) are why relational databases power banking, e-commerce, and healthcare.

From the books

Designing Data-Intensive Applications — Martin Kleppmann (2017)

Part I: Foundations of Data Systems

The fundamentals of data systems (replication, partitioning, transactions) are more important than any specific database product. These principles apply across all systems.

The Art of PostgreSQL — Dimitri Fontaine (2019)

Chapter on Indexing and Query Optimization

PostgreSQL is more powerful than most engineers realize. Window functions, CTEs, and partial indexes can often replace application-layer complexity.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.