Databases 101: Relational Modeling

On this page

Your data has a shape, your job is to find it
The one-sentence definition (and a filing cabinet)
The picture: users → orders → order_items → products
The three relationship types
Showing it in SQL: tables, keys, and a join
Normalization in plain words: 1NF → 2NF → 3NF
Common mistakes that cost hours
Takeaways
Where to go next

TL;DR

Model real data with tables, primary keys, and foreign keys, and recognize the three relationship types. Normalize through 1NF to 3NF to kill duplication, and know the cases where breaking the rules is the right call.

Your data has a shape, your job is to find it

You build a feature. A user signs up, places an order, the order has a few items, each item points to a product. Simple in your head, but the moment you try to store it, the questions start. Do you stuff the whole order into one row? What happens when the same product appears in a thousand orders? When the product price changes, do all the old orders silently change too?

These aren't database trivia. They're the difference between an app that stays correct as it grows and one that quietly corrupts itself. Relational modeling is the skill of taking the messy real world and laying it out as tables that reference each other, so every fact lives in exactly one place, and nothing can drift out of sync.

Who this is for

Anyone who can write a bit of code but has never deliberately designed a database. If you've used an ORM without knowing what it generates underneath, or you've felt unsure whether something is "one table or two," this is for you. No prior SQL required, we build it up from nothing.

The one-sentence definition (and a filing cabinet)

A relational database stores data as tables of rows, where each table holds one kind of thing, and rows point at each other using keys.
The whole idea, in one line

That's it. Everything else, keys, joins, normalization, is just the machinery that makes those pointers reliable. Before any syntax, get the mental picture right.

A labelled filing cabinet, one drawer for Customers, one for OrdersA table, each holds exactly one kind of record

A single sheet of paper in a drawerA row, one customer, one order

The fields printed on that sheet (name, email, date)Columns, the attributes of the record

A unique customer number stamped on every sheetA primary key, the row's permanent ID

An order sheet that writes "Customer #4471" instead of recopying the addressA foreign key, a reference to a row in another table

Think of a well-organized office, not a single giant spreadsheet.

The magic is in that last pair. An order sheet doesn't recopy the customer's name and address, it just *references* the customer by their number. Change the address once, in the Customers drawer, and every order automatically points at the new one. That single idea is what relational databases are built to protect.

The picture: users → orders → order_items → products

Here's a classic e-commerce model, small enough to hold in your head, complete enough to teach every concept in this article. Read it left to right: a user has orders, each order has line items, each line item points to a product.

Four tables, three relationships. order_items sits in the middle to connect orders and products.

Notice order_items carries *two* foreign keys, one to orders, one to products. That little table in the middle is doing something important: it turns the many-to-many relationship between orders and products into two clean one-to-many relationships. More on that pattern shortly.

1
List the nouns
Write down the real-world things your feature talks about: user, order, product. Each distinct noun is usually a table. If you can say "a ___," it's probably a row.
2
Give every table an identity
Add an id primary key to each table, a value that uniquely and permanently names one row. Auto-incrementing integers or UUIDs are both fine; the point is that it never changes and never repeats.
3
Draw the relationships
For each pair of tables ask: does one of these have many of the other? An order has many items; a user has many orders. The "many" side gets a foreign key pointing back to the "one" side.
4
Break up the many-to-many
An order has many products and a product appears in many orders. You can't express that with a single foreign key, so introduce a join table, order_items, that holds one row per (order, product) pairing.
5
Park each fact in one home
A product's price belongs on products, not copied onto every order item. A user's email belongs on users. If a fact is being written in two places, your model isn't finished.

The three relationship types

Almost every model is built from just three relationship shapes. The trick is recognizing which one you have and where the foreign key goes.

Relationship	Example	How to model it
1 : 1	A user has one profile	Foreign key on either table, marked `UNIQUE`. Often just keep the columns on the main table unless you have a strong reason to split (e.g. large optional fields).
1 : many	A user has many orders	Put the foreign key on the many side. `orders.user_id` references `users.id`. No join table needed.
many : many	Orders contain many products; products appear in many orders	Introduce a join table (`order_items`) with a foreign key to each side. Extra columns like `quantity` live there too.

Pick the shape, then place the key (or the join table).

Pro tip

The single most common beginner mistake is putting the foreign key on the wrong side of a 1:many. It always goes on the *many* side. "Many orders, one user" → user_id lives on orders. Say it out loud: the row that there are many of carries the reference.

Showing it in SQL: tables, keys, and a join

Here is the model as real CREATE TABLE statements. Read the keys carefully, PRIMARY KEY gives each row its identity, and REFERENCES is the foreign key that ties the tables together and stops you from inserting an order for a user who doesn't exist.

schema.sql

sql

-- Each table holds one kind of thing.
CREATE TABLE users (
  id    BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name  TEXT NOT NULL
);

CREATE TABLE products (
  id          BIGSERIAL PRIMARY KEY,
  name        TEXT NOT NULL,
  price_cents INTEGER NOT NULL CHECK (price_cents >= 0)
);

-- An order belongs to exactly one user (the FK lives on the "many" side).
CREATE TABLE orders (
  id         BIGSERIAL PRIMARY KEY,
  user_id    BIGINT NOT NULL REFERENCES users(id),
  status     TEXT NOT NULL DEFAULT 'pending',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- The join table: turns orders <-> products (many:many) into two 1:many links.
CREATE TABLE order_items (
  id              BIGSERIAL PRIMARY KEY,
  order_id        BIGINT NOT NULL REFERENCES orders(id),
  product_id      BIGINT NOT NULL REFERENCES products(id),
  quantity        INTEGER NOT NULL CHECK (quantity > 0),
  unit_price_cents INTEGER NOT NULL,   -- price captured at purchase time
  UNIQUE (order_id, product_id)
);

Now the payoff. Because the data is split across tables, you reassemble it on demand with a JOIN, following the foreign keys to stitch rows back together. This query answers "what did Ada actually buy?"

query.sql

sql

SELECT
  u.name           AS customer,
  o.id             AS order_id,
  p.name           AS product,
  oi.quantity,
  oi.unit_price_cents * oi.quantity AS line_total_cents
FROM orders o
JOIN users        u  ON u.id  = o.user_id
JOIN order_items  oi ON oi.order_id = o.id
JOIN products     p  ON p.id  = oi.product_id
WHERE u.email = 'ada@example.com'
ORDER BY o.created_at DESC;

One detail worth pausing on: order_items copies unit_price_cents instead of always joining to products.price_cents. That's deliberate. The product's *current* price can change, but an order should forever remember what the customer actually paid. Knowing when to copy a value is half of good modeling, and it's exactly what normalization is about.

Normalization in plain words: 1NF → 2NF → 3NF

Normalization sounds academic, but it's just a checklist for "is every fact stored in exactly one place?" You don't need to memorize the formal definitions, you need the intuition behind each step.

First Normal Form (1NF): one value per cell

No lists hiding inside a column. A tags column holding "sql,backend,data" violates 1NF, that's three values pretending to be one. The fix: give each value its own row (often in a related table). If you ever find yourself writing LIKE '%,sql,%' to search a column, you've found a 1NF violation.

Second Normal Form (2NF): no partial dependencies

Every non-key column must depend on the *whole* key, not just part of it. This only bites tables with a composite key. If order_items were keyed on (order_id, product_id) and you stored product_name there, that name depends only on product_id, half the key. It belongs on products. The fix: move it to the table whose key it actually depends on.

Third Normal Form (3NF): no transitive dependencies

No column should depend on another *non-key* column. If orders stored user_email, that email depends on user_id, which depends on the order, a chain. Change a user's email and every old order is suddenly stale. The fix: drop user_email from orders and reach it through the foreign key when you need it.

The pocket version

1NF: no lists in a cell. 2NF: depend on the whole key. 3NF: depend on nothing but the key. Folk summary, "every column depends on the key, the whole key, and nothing but the key." Hit 3NF and you've eliminated almost all of the duplication bugs that haunt junior schemas.

...and when to denormalize on purpose

Normalization optimizes for *correctness*; denormalization trades a little of that back for *speed*. If a dashboard joins six tables on every page load, you might cache a computed order_total_cents on orders, or copy a rarely-changing country_name to avoid a join. That's fine, as long as it's a deliberate choice with a plan to keep the copy in sync (a trigger, a job, or app code). The capture of unit_price_cents earlier is denormalization done right: it isn't stale data, it's a *different fact* (price-at-purchase vs. price-now). Normalize first, denormalize later with evidence, never the reverse.

Common mistakes that cost hours

1No foreign keys. Storing user_id as a plain integer with no REFERENCES lets you insert orders for users who don't exist and orphan rows when a user is deleted. The constraint is free correctness, use it. Add an index on FK columns too; joins and lookups depend on it.
2Storing CSV (or JSON blobs) in a column. A tags field of "a,b,c" feels easy until you need to filter, count, or rename a tag. It breaks 1NF, defeats indexes, and turns every query into string surgery. Give multi-valued data its own rows.
3Over-normalizing. Splitting a single users.name into a separate names table, or factoring out a value that will never be shared, just adds joins for no benefit. Normalize where duplication causes drift, not everywhere mechanically.
4Under-normalizing. The opposite sin: one giant orders table with product1_name, product2_name, product3_name columns. You'll run out of columns, can't query "who bought product X," and duplicate data everywhere. If you're numbering columns, you need another table.
5No unique constraints. Two rows with the same email, or two order_items for the same (order, product) pair. The database can enforce "this combination happens at most once", say so with UNIQUE, don't hope your app remembers.
6Premature denormalization. Copying values "for performance" before you've measured a problem. You inherit all the sync bugs and none of the proven speedup. Model clean first; optimize against real query plans.

Takeaways

The whole article in seven lines

A relational database is tables (one kind of thing each), rows, and columns, with rows pointing at each other via keys.
A primary key gives a row permanent identity; a foreign key is a reference to a row in another table.
1:1 → key on either side (UNIQUE). 1:many → key on the many side. many:many → a join table with two foreign keys.
JOINs reassemble split-up data by following foreign keys, that's why splitting is safe.
Normalization = every fact in one place. 1NF: no lists in a cell. 2NF: depend on the whole key. 3NF: nothing but the key.
Denormalize only deliberately, with a sync plan, capturing price-at-purchase is a different fact, not stale data.
Let the database enforce truth: foreign keys, UNIQUE, and CHECK constraints are free correctness.

Where to go next

You can now turn a feature description into a sane set of tables and relationships, the single most transferable backend skill there is. Two natural next steps: deciding *which kind* of database fits your data, and making your queries fast once the data grows.

SQL vs NoSQL, when a relational model is the right tool, and when a document or key-value store fits better.
Database Indexing & Query Performance, how to make the JOINs you just learned run fast at scale.
Follow the full Backend Engineer path to see where data modeling sits alongside APIs, caching, and system design.

You are modeling a new relationship between two tables. Which shape do you have, and where does the foreign key go?

Check your understanding

1. In the e-commerce model, where does the foreign key go for the relationship between users and orders?

2. Why does the order_items table carry two foreign keys?

Frequently asked questions

What is relational modeling in one sentence?

A relational database stores data as tables of rows, where each table holds one kind of thing and rows point at each other using keys. Relational modeling is the skill of laying out the messy real world as those tables so every fact lives in exactly one place.

Why not just store a customer's name and address on every order?

Recopying the same fact across many rows lets it drift out of sync, so when the address changes the old orders silently disagree. Instead an order references the customer by their id, so changing the address once updates what every order points to.

On which side of a one-to-many relationship does the foreign key go?

It always goes on the many side. For "many orders, one user," the user_id column lives on the orders table: the row there are many of carries the reference.

What does the order_items table accomplish in the e-commerce example?

It carries two foreign keys, one to orders and one to products, which turns the many-to-many relationship between orders and products into two clean one-to-many relationships. That middle table is what keeps the model tidy as products appear across many orders.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Cloud

Managed Databases: Relational vs NoSQL in the Cloud

Read

Backend

What Is a Backend Engineer?

Read

Backend

How the Web Works: HTTP Requests

Read

Databases 101: Relational Modeling

01Your data has a shape, your job is to find it

02The one-sentence definition (and a filing cabinet)

03The picture: users → orders → order_items → products

04The three relationship types

05Showing it in SQL: tables, keys, and a join

06Normalization in plain words: 1NF → 2NF → 3NF

First Normal Form (1NF): one value per cell

Second Normal Form (2NF): no partial dependencies

Third Normal Form (3NF): no transitive dependencies

...and when to denormalize on purpose

07Common mistakes that cost hours

08Takeaways

09Where to go next

Frequently asked questions

Want to go deeper?

Managed Databases: Relational vs NoSQL in the Cloud

What Is a Backend Engineer?

How the Web Works: HTTP Requests

Your data has a shape, your job is to find it

The one-sentence definition (and a filing cabinet)

The picture: users → orders → order_items → products

The three relationship types

Showing it in SQL: tables, keys, and a join

Normalization in plain words: 1NF → 2NF → 3NF

Common mistakes that cost hours

Takeaways

Where to go next