Tables, keys, and relationships explained from zero. Learn how to model real data with primary and foreign keys, when to normalize, and when to break the rules.
You build a feature. A user signs up, places an order, the order has a few items, each item points to a product. Simple in your head, but the moment you try to store it, the questions start. Do you stuff the whole order into one row? What happens when the same product appears in a thousand orders? When the product price changes, do all the old orders silently change too?
These aren't database trivia. They're the difference between an app that stays correct as it grows and one that quietly corrupts itself. Relational modeling is the skill of taking the messy real world and laying it out as tables that reference each other, so every fact lives in exactly one place, and nothing can drift out of sync.
Who this is for
Anyone who can write a bit of code but has never deliberately designed a database. If you've used an ORM without knowing what it generates underneath, or you've felt unsure whether something is "one table or two," this is for you. No prior SQL required, we build it up from nothing.
The one-sentence definition (and a filing cabinet)
A relational database stores data as tables of rows, where each table holds one kind of thing, and rows point at each other using keys.
That's it. Everything else, keys, joins, normalization, is just the machinery that makes those pointers reliable. Before any syntax, get the mental picture right.
A labelled filing cabinet, one drawer for Customers, one for OrdersA table, each holds exactly one kind of record
A single sheet of paper in a drawerA row, one customer, one order
The fields printed on that sheet (name, email, date)Columns, the attributes of the record
A unique customer number stamped on every sheetA primary key, the row's permanent ID
An order sheet that writes "Customer #4471" instead of recopying the addressA foreign key, a reference to a row in another table
Think of a well-organized office, not a single giant spreadsheet.
The magic is in that last pair. An order sheet doesn't recopy the customer's name and address, it just *references* the customer by their number. Change the address once, in the Customers drawer, and every order automatically points at the new one. That single idea is what relational databases are built to protect.
The picture: users → orders → order_items → products
Here's a classic e-commerce model, small enough to hold in your head, complete enough to teach every concept in this article. Read it left to right: a user has orders, each order has line items, each line item points to a product.
Four tables, three relationships. order_items sits in the middle to connect orders and products.
Notice order_items carries *two* foreign keys, one to orders, one to products. That little table in the middle is doing something important: it turns the many-to-many relationship between orders and products into two clean one-to-many relationships. More on that pattern shortly.
1
List the nouns
Write down the real-world things your feature talks about: user, order, product. Each distinct noun is usually a table. If you can say "a ___," it's probably a row.
2
Give every table an identity
Add an `id` primary key to each table, a value that uniquely and permanently names one row. Auto-incrementing integers or UUIDs are both fine; the point is that it never changes and never repeats.
3
Draw the relationships
For each pair of tables ask: does one of these have many of the other? An order has many items; a user has many orders. The "many" side gets a foreign key pointing back to the "one" side.
4
Break up the many-to-many
An order has many products and a product appears in many orders. You can't express that with a single foreign key, so introduce a join table, `order_items`, that holds one row per (order, product) pairing.
5
Park each fact in one home
A product's price belongs on `products`, not copied onto every order item. A user's email belongs on `users`. If a fact is being written in two places, your model isn't finished.
The three relationship types
Almost every model is built from just three relationship shapes. The trick is recognizing which one you have and where the foreign key goes.
Relationship
Example
How to model it
1 : 1
A user has one profile
Foreign key on either table, marked `UNIQUE`. Often just keep the columns on the main table unless you have a strong reason to split (e.g. large optional fields).
1 : many
A user has many orders
Put the foreign key on the **many** side. `orders.user_id` references `users.id`. No join table needed.
many : many
Orders contain many products; products appear in many orders
Introduce a **join table** (`order_items`) with a foreign key to each side. Extra columns like `quantity` live there too.
Pick the shape, then place the key (or the join table).
Pro tip
The single most common beginner mistake is putting the foreign key on the wrong side of a 1:many. It always goes on the *many* side. "Many orders, one user" → `user_id` lives on `orders`. Say it out loud: the row that there are many of carries the reference.
Showing it in SQL: tables, keys, and a join
Here is the model as real CREATE TABLE statements. Read the keys carefully, PRIMARY KEY gives each row its identity, and REFERENCES is the foreign key that ties the tables together and stops you from inserting an order for a user who doesn't exist.
schema.sql
sql
-- Each table holds one kind of thing.
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL
);
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
price_cents INTEGER NOT NULL CHECK (price_cents >= 0)
);
-- An order belongs to exactly one user (the FK lives on the "many" side).
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- The join table: turns orders <-> products (many:many) into two 1:many links.
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id),
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price_cents INTEGER NOT NULL, -- price captured at purchase time
UNIQUE (order_id, product_id)
);
Now the payoff. Because the data is split across tables, you reassemble it on demand with a JOIN, following the foreign keys to stitch rows back together. This query answers "what did Ada actually buy?"
query.sql
sql
SELECT
u.name AS customer,
o.id AS order_id,
p.name AS product,
oi.quantity,
oi.unit_price_cents * oi.quantity AS line_total_cents
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE u.email = 'ada@example.com'
ORDER BY o.created_at DESC;
One detail worth pausing on: order_items copies unit_price_cents instead of always joining to products.price_cents. That's deliberate. The product's *current* price can change, but an order should forever remember what the customer actually paid. Knowing when to copy a value is half of good modeling, and it's exactly what normalization is about.
Normalization in plain words: 1NF → 2NF → 3NF
Normalization sounds academic, but it's just a checklist for "is every fact stored in exactly one place?" You don't need to memorize the formal definitions, you need the intuition behind each step.
First Normal Form (1NF): one value per cell
No lists hiding inside a column. A tags column holding "sql,backend,data" violates 1NF, that's three values pretending to be one. The fix: give each value its own row (often in a related table). If you ever find yourself writing LIKE '%,sql,%' to search a column, you've found a 1NF violation.
Second Normal Form (2NF): no partial dependencies
Every non-key column must depend on the *whole* key, not just part of it. This only bites tables with a composite key. If order_items were keyed on (order_id, product_id) and you stored product_name there, that name depends only on product_id, half the key. It belongs on products. The fix: move it to the table whose key it actually depends on.
Third Normal Form (3NF): no transitive dependencies
No column should depend on another *non-key* column. If orders stored user_email, that email depends on user_id, which depends on the order, a chain. Change a user's email and every old order is suddenly stale. The fix: drop user_email from orders and reach it through the foreign key when you need it.
The pocket version
1NF: no lists in a cell. 2NF: depend on the whole key. 3NF: depend on nothing but the key. Folk summary, "every column depends on **the key, the whole key, and nothing but the key**." Hit 3NF and you've eliminated almost all of the duplication bugs that haunt junior schemas.
...and when to denormalize on purpose
Normalization optimizes for *correctness*; denormalization trades a little of that back for *speed*. If a dashboard joins six tables on every page load, you might cache a computed order_total_cents on orders, or copy a rarely-changing country_name to avoid a join. That's fine, as long as it's a deliberate choice with a plan to keep the copy in sync (a trigger, a job, or app code). The capture of unit_price_cents earlier is denormalization done right: it isn't stale data, it's a *different fact* (price-at-purchase vs. price-now). Normalize first, denormalize later with evidence, never the reverse.
Common mistakes that cost hours
No foreign keys. Storing user_id as a plain integer with no REFERENCES lets you insert orders for users who don't exist and orphan rows when a user is deleted. The constraint is free correctness, use it. Add an index on FK columns too; joins and lookups depend on it.
Storing CSV (or JSON blobs) in a column. A tags field of "a,b,c" feels easy until you need to filter, count, or rename a tag. It breaks 1NF, defeats indexes, and turns every query into string surgery. Give multi-valued data its own rows.
Over-normalizing. Splitting a single users.name into a separate names table, or factoring out a value that will never be shared, just adds joins for no benefit. Normalize where duplication causes drift, not everywhere mechanically.
Under-normalizing. The opposite sin: one giant orders table with product1_name, product2_name, product3_name columns. You'll run out of columns, can't query "who bought product X," and duplicate data everywhere. If you're numbering columns, you need another table.
No unique constraints. Two rows with the same email, or two order_items for the same (order, product) pair. The database can enforce "this combination happens at most once", say so with UNIQUE, don't hope your app remembers.
Premature denormalization. Copying values "for performance" before you've measured a problem. You inherit all the sync bugs and none of the proven speedup. Model clean first; optimize against real query plans.
Takeaways
The whole article in seven lines
A relational database is tables (one kind of thing each), rows, and columns, with rows pointing at each other via keys.
A **primary key** gives a row permanent identity; a **foreign key** is a reference to a row in another table.
1:1 → key on either side (UNIQUE). 1:many → key on the **many** side. many:many → a **join table** with two foreign keys.
JOINs reassemble split-up data by following foreign keys, that's why splitting is safe.
Normalization = every fact in one place. 1NF: no lists in a cell. 2NF: depend on the whole key. 3NF: nothing but the key.
Denormalize only deliberately, with a sync plan, capturing price-at-purchase is a different fact, not stale data.
Let the database enforce truth: foreign keys, `UNIQUE`, and `CHECK` constraints are free correctness.
Where to go next
You can now turn a feature description into a sane set of tables and relationships, the single most transferable backend skill there is. Two natural next steps: deciding *which kind* of database fits your data, and making your queries fast once the data grows.
SQL vs NoSQL, when a relational model is the right tool, and when a document or key-value store fits better.
Follow the full Backend Engineer path to see where data modeling sits alongside APIs, caching, and system design.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.