Testing Backend Services: A Practical Strategy for APIs

On this page

Why most backend test suites lie to you
The mental model: bench-test the parts, road-test the car
The pyramid and the trophy
Four levels, side by side
A unit test: pure logic, no I/O
An integration test: real Postgres via Testcontainers
A contract test: don't break your consumers
Testing async workers and queues
What to mock, and what never to
Killing flaky tests
What coverage actually buys you
Common mistakes that cost hours
Takeaways
Where to go next

TL;DR

Stop chasing coverage numbers and test by level: fast unit tests for logic, real-Postgres integration tests via Testcontainers, and contract tests so you never break consumers. Know what to mock, what never to, and how to kill flaky tests.

Why most backend test suites lie to you

Who this is for

You ship a REST or gRPC service, you have *some* tests, and you are not sure they are the right ones. Maybe coverage is 85% but bugs still reach production. Maybe CI is red half the time for reasons no one understands. This article gives you a strategy: which tests to write, where to put the effort, what to mock, and how to make the suite trustworthy. Junior-friendly, but the trade-offs are real.

A test suite has exactly one job: tell you the truth about whether your code works. A green suite that misses real bugs is worse than no suite, it gives false confidence. The most common failure mode in backend testing is a thousand unit tests that mock everything, so they pass even when the database, the serializer, and the HTTP layer are all broken.

The fix is not *more* tests. It is the *right shape* of tests, placed where they catch the bugs you actually ship. Let us build that shape from the ground up.

The mental model: bench-test the parts, road-test the car

A good test suite is a set of bets about where bugs live, and you should put your money where the failures actually happen.

Bench-testing a single piston on a rigUnit test: one pure function, no I/O, microseconds to run

Bolting the engine to a test stand and running itIntegration test: real DB, real repository, wired together

Two factories agreeing on the bolt-pattern specContract test: producer and consumer agree on the API shape

Driving the finished car on a real roadEnd-to-end test: full system, real HTTP, real dependencies

Different test levels answer different questions, none of them replaces the others.

A piston that passes on the bench can still fail in the assembled engine, bolted to the wrong torque, fed the wrong fuel. That gap between *the part works* and *the whole works* is exactly where integration tests live, and it is the level most teams under-invest in.

The pyramid and the trophy

The classic test pyramid says: lots of fast unit tests at the bottom, fewer integration tests in the middle, a handful of slow end-to-end tests at the top. Cheap and fast where you can, expensive and slow only where you must.

The pyramid (bottom-heavy on units) vs the trophy (weighted toward integration), same levels, different emphasis.

The testing trophy (popularized by Kent C. Dodds) reweights this. Types and static analysis form a free base; then it argues integration tests give the best confidence-per-dollar, so the *middle* should be the fattest layer, not the bottom. For backend services this is usually right: most real bugs live at the seams between your code and the database, not inside a single pure function.

Level	Speed	Confidence	Typical count
Static / types	Instant	Low (shape only)	Whole codebase
Unit	Microseconds	Low–medium	Hundreds
Integration	100ms–2s	High	Dozens–hundreds
Contract	Fast	High at boundaries	Per consumer
E2E	Seconds–minutes	Highest	A handful

Each level trades speed for confidence. Count and effort should follow the trophy: weight the middle.

1
Let types catch the dumb stuff
A TypeScript compile or a Python type-check rejects whole classes of bugs before a single test runs. This is your free first gate.
2
Unit-test the logic-dense, I/O-free code
Pricing rules, validators, parsers, state machines. Pure in, pure out, fast and deterministic.
3
Integration-test against a real database
Spin up real Postgres, run real migrations, hit the real repository. This is where the confidence is.
4
Contract-test every service boundary
If another team calls you, lock the shape with a contract so you cannot break them silently.
5
E2E-test only the critical user journeys
Sign up, check out, the one flow that must never break. A few, run in CI, kept ruthlessly stable.

Four levels, side by side

Before writing code, get crisp on what each level *is for*. The biggest waste in backend testing is writing an expensive integration test for something a unit test covers, or mocking the database in a test whose whole point is to exercise the database.

Type	Scope	Speed	What it catches	What you mock
Unit	One function/class	Microseconds	Logic errors, edge cases	Everything external
Integration	Code + real DB/queue	100ms–2s	SQL, mapping, migrations, tx	3rd-party APIs only
Contract	API shape between services	Fast	Breaking changes at boundaries	The other service
E2E	The whole running system	Seconds+	Wiring, config, deploy issues	Almost nothing

Pick the cheapest level that actually catches the bug you are worried about.

Rule of thumb

If the bug you fear is in *your logic*, write a unit test. If it is in *how your code talks to the database*, write an integration test. If it is in *what you promise another service*, write a contract test. Match the test to the failure.

A unit test: pure logic, no I/O

Unit tests shine on code that is all decision and no I/O. Here is a discount calculator, no database, no network, just rules. Notice each test names a specific behavior and an edge case.

pricing.ts

typescript

export interface Cart {
  subtotalCents: number;
  couponCode?: string;
  isFirstOrder: boolean;
}

// Pure function: same input -> same output, no side effects.
export function discountCents(cart: Cart): number {
  let discount = 0;
  if (cart.couponCode === "SAVE10") {
    discount += Math.round(cart.subtotalCents * 0.1);
  }
  if (cart.isFirstOrder) {
    discount += 500; // flat $5 welcome credit
  }
  // Never discount more than the cart is worth.
  return Math.min(discount, cart.subtotalCents);
}

pricing.test.ts

typescript

import { describe, it, expect } from "vitest";
import { discountCents } from "./pricing";

describe("discountCents", () => {
  it("applies a 10% coupon", () => {
    expect(discountCents({ subtotalCents: 10_000, isFirstOrder: false, couponCode: "SAVE10" }))
      .toBe(1_000);
  });

  it("stacks the first-order credit on top of the coupon", () => {
    expect(discountCents({ subtotalCents: 10_000, isFirstOrder: true, couponCode: "SAVE10" }))
      .toBe(1_500);
  });

  it("never discounts more than the cart total", () => {
    // $3 cart, $5 welcome credit -> capped at $3.
    expect(discountCents({ subtotalCents: 300, isFirstOrder: true }))
      .toBe(300);
  });
});

These run in microseconds and never flake, because there is nothing to flake, no clock, no network, no shared state. That is the gold standard. The trick is keeping your logic *in functions like this* so it is unit-testable, instead of tangled inside a route handler.

An integration test: real Postgres via Testcontainers

Here is the test that earns its keep. Instead of mocking the database, which would test your mock, not your SQL, we spin up a real Postgres in a throwaway Docker container, run real migrations, and exercise the real repository. Testcontainers makes this a few lines: it starts a container, hands you a connection string, and tears it down when the suite ends.

userRepo.integration.test.ts

typescript

import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { PostgreSqlContainer, StartedPostgreSqlContainer } from "@testcontainers/postgresql";
import { Pool } from "pg";
import { UserRepository } from "./userRepo";
import { runMigrations } from "./migrate";

let container: StartedPostgreSqlContainer;
let pool: Pool;
let repo: UserRepository;

beforeAll(async () => {
  // Real Postgres in Docker, disposable, isolated, identical to prod engine.
  container = await new PostgreSqlContainer("postgres:16-alpine").start();
  pool = new Pool({ connectionString: container.getConnectionUri() });
  await runMigrations(pool); // exercise the SAME migrations prod runs
  repo = new UserRepository(pool);
}, 60_000); // pulling the image the first time is slow

afterAll(async () => {
  await pool?.end();
  await container?.stop();
});

describe("UserRepository (real DB)", () => {
  it("persists and reads back a user", async () => {
    const created = await repo.create({ email: "ada@example.com", name: "Ada" });
    const found = await repo.findById(created.id);
    expect(found?.email).toBe("ada@example.com");
  });

  it("enforces the unique email constraint", async () => {
    await repo.create({ email: "dup@example.com", name: "One" });
    // This only fails if the real UNIQUE index exists, a mock would never catch it.
    await expect(repo.create({ email: "dup@example.com", name: "Two" }))
      .rejects.toThrow(/unique|duplicate/i);
  });
});

Why this beats a mocked DB

The unique-constraint test passes only if the real index exists, the migration ran, and your code maps the driver error correctly. A mocked repository would happily return success and ship the bug. This is the difference between testing your code and testing your assumptions about your code. For the transactional behavior these tests rely on, see database transactions and consistency.

Keep tests isolated from each other: wrap each test in a transaction you roll back, or truncate tables in a beforeEach. Sharing a container across the file is fine and fast; sharing *state* between tests is how you get order-dependent flakes. To get comfortable running Postgres in a container locally, try the docker lab.

A contract test: don't break your consumers

In a microservice world, the scary bugs are at the *boundaries*: you rename a JSON field, your tests stay green, and another team's service breaks in production. Consumer-driven contract testing (the Pact approach) flips the control: the *consumer* declares exactly what it needs, that expectation becomes a contract, and the *provider* is tested against it.

orderClient.pact.test.ts

typescript

import { PactV3, MatchersV3 } from "@pact-foundation/pact";
import { getUser } from "./userClient";

const { like, integer } = MatchersV3;

const provider = new PactV3({
  consumer: "orders-service",
  provider: "users-service",
});

describe("users-service contract", () => {
  it("returns the fields orders-service depends on", async () => {
    provider
      .given("user 42 exists")
      .uponReceiving("a request for user 42")
      .withRequest({ method: "GET", path: "/users/42" })
      .willRespondWith({
        status: 200,
        // Match on TYPE and SHAPE, not exact values, we care about the contract.
        body: like({ id: integer(42), email: like("ada@example.com") }),
      });

    await provider.executeTest(async (mock) => {
      const user = await getUser(mock.url, 42);
      expect(user.email).toBeDefined(); // consumer only needs id + email
    });
  });
});

Running this generates a pact file describing the consumer's expectations. The provider's CI then verifies it can satisfy every published pact before it deploys. If the users-service tries to remove email, *its* build goes red, not the orders-service's production traffic. The contract becomes a tripwire on the boundary.

Contract tests are not E2E

A contract test never runs both services together, it tests each side against the agreed shape, in isolation. That is why it is fast and stable while still guarding the integration. Use a broker (Pactflow or self-hosted) to share pacts between the two CI pipelines.

Testing async workers and queues

Queue workers are notoriously under-tested because the result is not in the HTTP response, it is a side effect that happens *later*. Two layers cover them. First, unit-test the handler directly with a fake message; it is just a function. Second, integration-test the round trip against a real broker (Testcontainers runs RabbitMQ, Kafka, or Redis the same way it ran Postgres).

emailWorker.test.ts

typescript

import { describe, it, expect, vi } from "vitest";
import { handleOrderPlaced } from "./emailWorker";

it("sends one confirmation email per order event", async () => {
  // The email provider is a 3rd-party boundary -> fake it.
  const mailer = { send: vi.fn().mockResolvedValue({ id: "msg_1" }) };

  await handleOrderPlaced({ orderId: "o_99", email: "ada@example.com" }, { mailer });

  expect(mailer.send).toHaveBeenCalledOnce();
  expect(mailer.send).toHaveBeenCalledWith(
    expect.objectContaining({ to: "ada@example.com" }),
  );
});

it("is idempotent, replaying the same event sends nothing twice", async () => {
  const mailer = { send: vi.fn().mockResolvedValue({ id: "msg_1" }) };
  const event = { orderId: "o_99", email: "ada@example.com" };

  await handleOrderPlaced(event, { mailer });
  await handleOrderPlaced(event, { mailer }); // at-least-once delivery happens!

  expect(mailer.send).toHaveBeenCalledOnce(); // dedupe must hold
});

Test idempotency explicitly

Real queues deliver at-least-once, so duplicate messages WILL happen. The second test above is the one that saves you at 2am: it proves replaying an event does not double-charge or double-email. If your worker is not idempotent, this test is how you find out before production does.

What to mock, and what never to

This is the debate that derails the most code reviews. The honest answer: mock at boundaries you do not own; use the real thing for boundaries you do. A *fake* (a working in-memory stand-in) is usually better than a *mock* (a scripted expectation), because a fake tests behavior while a mock tests that you called it a certain way, which couples your test to your implementation.

Dependency	Mock it?	Why
Your own database	No, use a real one	Testcontainers makes it cheap; mocks hide SQL bugs
3rd-party payment / email API	Yes, fake or mock	Slow, costs money, you don't control it
Another internal service	Use a contract	Pact guards the shape without running both
The system clock	Yes, inject it	Real time makes tests flaky and non-deterministic
Randomness / UUIDs	Yes, seed or inject	Determinism beats realism in tests

The default. Deviate only with a reason you can say out loud.

Do not mock your own database

A mocked DB tests your understanding of the database, not the database. Constraints, cascades, transaction isolation, JSON column quirks, and migration drift all slip through. If a test's whole purpose is the persistence layer, mocking the persistence layer makes the test meaningless.

Killing flaky tests

A flaky test, one that passes and fails on identical code, is more dangerous than a missing test, because it trains your team to ignore red CI. Once people start re-running until green, the suite has stopped telling the truth. Almost every flake comes from one of three sources.

Time. Tests that depend on Date.now(), sleep, timeouts, or ordering-by-timestamp. Fix: inject the clock and control it; never sleep to wait, poll for the condition instead.
Ordering. Tests that pass alone but fail in a suite because they assume the order they run in. Fix: make each test set up and tear down its own state; never rely on a previous test's leftovers.
Shared state. A reused database row, a global singleton, a cached connection, a leaked port. Fix: isolate, transaction-per-test rollback, fresh fixtures, unique identifiers per test run.

Quarantine, then fix

When a flake appears, tag it and move it out of the blocking path immediately so it stops eroding trust, then fix it on a deadline. A flaky test left in the gate teaches everyone to ignore failures, which defeats the entire point of having tests.

What coverage actually buys you

Coverage measures which lines *ran* during tests, not whether anything was *asserted*. You can hit 100% coverage with zero expect calls. So treat coverage as a smoke detector, not a goal: a sudden drop tells you a new code path is untested; a high number tells you almost nothing about quality.

Coverage tells you what you did NOT test. It cannot tell you whether what you did test was worth testing.
the only honest framing of code coverage

Chasing a coverage *target* produces low-value tests written to color in lines, getters, trivial mappers, generated code. Better signals: does the suite catch a deliberately introduced bug (mutation testing), and do the critical paths have integration coverage? A 70% suite weighted toward integration on the money-making flows beats a 95% suite of mocked unit trivia every time.

Common mistakes that cost hours

1Mocking the database in tests whose whole point is persistence, you end up testing the mock.
2No integration layer at all, hundreds of mocked unit tests, green CI, and bugs at every seam.
3Asserting on implementation, not behavior, checking *that a method was called* instead of *what the system did*. Refactors break these tests even when behavior is correct.
4`sleep` instead of polling, fixed waits are slow when long and flaky when short. Poll for the condition.
5Shared mutable fixtures, one giant seed file every test depends on; change it and 50 tests break for unrelated reasons.
6E2E for everything, pushing logic checks up to the slowest, flakiest level because the unit seam was never built.
7Treating coverage % as the goal, writing tests to color in lines instead of to catch bugs.
8Ignoring idempotency on workers, at-least-once delivery means duplicates; if you never test a replay, production finds the bug.

Takeaways

The whole article in nine lines

A test suite has one job: tell the truth about whether your code works.
Follow the trophy, not just the pyramid, weight the middle: integration tests give the best confidence per dollar.
Unit-test pure logic; integration-test how your code talks to the DB; contract-test what you promise other services; E2E only critical journeys.
Use a real Postgres via Testcontainers, never mock your own database.
Mock boundaries you do not own (payments, email, the clock); prefer fakes over mocks.
Consumer-driven contracts (Pact) catch breaking changes without running both services together.
Test worker idempotency explicitly, at-least-once delivery guarantees duplicates.
Kill flakes at the source: time, ordering, shared state. Quarantine then fix.
Coverage is a smoke detector, not a goal, it shows what you did NOT test, not whether your tests are worth running.

Where to go next

Testing is one half of a feedback loop, the other half is running it automatically on every change. Wire your unit, integration, and contract tests into a pipeline so a broken seam goes red before it merges, and add security scanning while you are there.

Run your suite on every push in the CI/CD lab, gate merges on green.
Get comfortable spinning up real dependencies in the docker lab, the foundation Testcontainers builds on.
Understand the transactional behavior your integration tests rely on: database transactions and consistency.
Add automated security testing to the same pipeline: DevSecOps: SAST, DAST, SCA.

Start small: pick your most important endpoint, write one integration test against a real database, and put it in CI today. One honest test that exercises the real seam is worth more than a hundred mocks that confirm what you already believed.

You're about to write a test for a bug you fear. Which level should it be?

Check your understanding

1. According to the article, what is the most common failure mode in backend testing?

2. How does the testing trophy reweight the classic test pyramid for backend services?

Frequently asked questions

Why can a suite with high coverage still miss real bugs?

The most common failure mode is a thousand unit tests that mock everything, so they pass even when the database, the serializer, and the HTTP layer are all broken. A green suite that misses real bugs is worse than no suite because it gives false confidence. The fix is the right shape of tests, not simply more of them.

How is the testing trophy different from the test pyramid?

The pyramid says lots of fast unit tests at the bottom, fewer integration tests in the middle, and a handful of slow end-to-end tests at the top. The testing trophy reweights this so the integration layer is the fattest, arguing it gives the best confidence per dollar. For backend services that is usually right, since most real bugs live at the seams between your code and the database.

How do I decide which kind of test to write?

Match the test to the failure you fear. If the bug is in your logic, write a unit test; if it is in how your code talks to the database, write an integration test; if it is in what you promise another service, write a contract test.

How can I run integration tests against a real database?

Use Testcontainers to spin up a real Postgres for the test, rather than mocking the database in a test whose whole point is to exercise the database. This catches bugs at the seam between your code and the data store that mocked tests would miss.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

SRE

Chaos Engineering in Practice

Read

AI Engineering

Evaluating LLM Applications

Read

Backend

What Is a Backend Engineer?

Read

Testing Backend Services: A Practical Strategy for APIs

01Why most backend test suites lie to you

02The mental model: bench-test the parts, road-test the car

03The pyramid and the trophy

04Four levels, side by side

05A unit test: pure logic, no I/O

06An integration test: real Postgres via Testcontainers

07A contract test: don't break your consumers

08Testing async workers and queues

09What to mock, and what never to

10Killing flaky tests

11What coverage actually buys you

12Common mistakes that cost hours

13Takeaways

14Where to go next

Frequently asked questions

Want to go deeper?

Chaos Engineering in Practice

Evaluating LLM Applications

What Is a Backend Engineer?

Why most backend test suites lie to you

The mental model: bench-test the parts, road-test the car

The pyramid and the trophy

Four levels, side by side

A unit test: pure logic, no I/O

An integration test: real Postgres via Testcontainers

A contract test: don't break your consumers

Testing async workers and queues

What to mock, and what never to

Killing flaky tests

What coverage actually buys you

Common mistakes that cost hours

Takeaways

Where to go next