Back to Blog
Frontend13 min readJun 2026

Frontend Testing: Confidence Without Brittleness

How to test UIs so your suite catches real bugs instead of breaking on every refactor, the testing trophy, behavior over implementation, component tests, and end-to-end with Playwright.

FrontendTestingPlaywrightQuality
SB

Sri Balaji

Founder · TheSimplifiedTech

On this page

The suite that broke on every refactor

Here is a story you have probably lived. Your team has 600 frontend tests and a green checkmark on every pull request. Then one day you rename a prop from onSubmit to onConfirm and clean up some internal state, pure refactor, zero behavior change. Forty tests go red. They were asserting that a useState hook held a certain value, that a child got a specific prop, that a function was called with exact arguments. None of those things matter to a user. You spend the afternoon updating tests to match the new internals.

The cruel twist: a week earlier, that same 600-test suite shipped a checkout button that did nothing when clicked. The handler was wired to the wrong element. Every test passed. The suite was loud about things that did not matter and silent about the one thing that did.

That is the central tension of frontend testing. Tests coupled to how a component is built break constantly and protect nothing. Tests coupled to what the user experiences survive refactors and catch the bugs that reach production. This article is about deliberately writing the second kind.

Who this is for

Frontend engineers who write tests but feel they are paying a tax without getting safety in return. You know the tools (Jest, Vitest, Testing Library, Playwright) but want a model for **what** to test, at **which** layer, and what to leave alone. Comfort with React and TypeScript is assumed; the principles apply to any component framework.

The one principle that fixes most of it

Test what the user does, not how the component is built. The more your tests resemble the way your software is used, the more confidence they give you.
The Testing Library philosophy, paraphrased

Every decision downstream, which queries to use, what to mock, where to draw the line, falls out of this one idea. A user does not know your component uses a reducer. They click a button labeled "Add to cart" and expect the cart count to go up. So your test should find the button by its label, click it, and assert the count went up. If you can swap the reducer for a signal and the test still passes, the test was measuring the right thing.

Test-driving the car: turn the key, it starts; press the brake, it stopsBehavior test: render the UI, click the button, assert what the user now sees
Opening the hood to check the exact spark-plug part numberImplementation test: assert a specific hook value or internal prop name
A new model with a redesigned engine still passes the test-driveA refactored component still passes the behavior test
The part-number check fails the moment the supplier changes, even if the car drives fineThe internals test fails on every refactor, even when nothing user-visible broke
Two ways to verify a car works, one survives every redesign, one breaks on a new model year.

The testing trophy: where effort should go

The old advice was a pyramid: a wide base of unit tests, fewer integration tests, a thin cap of end-to-end. For frontends, that shape underinvests in the layer where most user value, and most bugs, actually lives: components wired together. The modern shape is the testing trophy: a sliver of static checks, a modest band of pure-unit tests, a fat middle of component/integration tests, and a focused crown of end-to-end.

fastest → slowercheap → expensive
Static

TypeScript, ESLint

Unit

Pure functions, hooks

Component / Integration

Render + interact, the bulk

End-to-End

Critical flows in a real browser

The testing trophy, width hints at relative volume. Cheap static checks at the base, a fat component layer in the middle, a focused E2E crown on top.

Read the trophy top to bottom as a cost gradient. Static analysis is nearly free and runs as you type, a typo or a wrong prop type is caught before a test ever executes. Unit tests cover logic with no DOM: a currency formatter, a date helper, a reducer. Component tests render real UI in a fake DOM (jsdom) and interact with it, this is where you spend most of your effort because it most resembles real use while staying fast. End-to-end drives a real browser through a whole flow; slow and occasionally flaky, so reserve it for the handful of journeys that must never break.

  1. 1

    Let the type checker and linter run first

    `tsc --noEmit` and ESLint catch a whole class of bugs, undefined props, unhandled null, wrong shapes, with zero test code. Treat them as the base of the trophy and run them in CI on every push.

  2. 2

    Unit-test the logic with no UI

    Extract pure functions (formatting, validation, calculations) and test them directly. Fast, deterministic, no rendering. If a piece of logic is hard to unit-test, that is a hint it should be extracted from the component.

  3. 3

    Component-test the user-facing behavior

    Render the component, query by what the user sees (role, label, text), interact, and assert the resulting DOM. This is your highest-leverage layer, aim most of your test count here.

  4. 4

    End-to-end the critical flows only

    Pick the 5–10 journeys that, if broken, cost real money or trust, sign-in, checkout, the core create-action. Run them in a real browser with Playwright or Cypress against a deployed-like build.

Choosing the right layer for each test

When you are unsure where a test belongs, match the thing you are protecting against the layer that catches it most cheaply. Push tests down the trophy whenever a cheaper layer can catch the same bug, but never so far down that you stop testing real behavior.

Test typeWhat it catchesCost / speedWhen to reach for it
Static (TS, lint)Type errors, undefined props, dead code, unsafe accessFree, instantAlways on, the default safety net for every change
UnitWrong logic in pure functions, edge cases in helpers and reducersVery cheap, millisecondsNon-trivial logic you can isolate from the DOM
ComponentBroken rendering, wiring, conditional UI, accessibility of interactionsCheap, fast (jsdom)The default for anything a user sees or clicks, most of your suite
End-to-endWhole-flow breakage: routing, real network, auth, cross-page stateExpensive, seconds, occasionally flakyA small set of business-critical journeys only
Pick the cheapest layer that still catches the bug you care about.

A component test that survives refactors

Here is a component test the right way: it renders the real component, finds elements the way a user (or a screen reader) would, by role and text, never by class name or test-id-on-everything, interacts, and asserts the visible outcome. Notice there is not a single reference to internal state, hook names, or child props.

AddToCart.test.tsx
tsx
import { render, screen } from "@testing-library/react";
import userEvent from "@testing-library/user-event";
import { AddToCart } from "./AddToCart";

test("adds an item and reflects it in the cart count", async () => {
  const user = userEvent.setup();
  render(<AddToCart productName="Wireless Mouse" price={29} />);

  // Query the way a user perceives the UI: by role + accessible name.
  const button = screen.getByRole("button", { name: /add to cart/i });
  expect(screen.getByText(/cart: 0 items/i)).toBeInTheDocument();

  await user.click(button);

  // Assert the user-visible outcome, not internal state.
  expect(screen.getByText(/cart: 1 item/i)).toBeInTheDocument();
  expect(
    screen.getByRole("status", { name: /added wireless mouse/i })
  ).toBeInTheDocument();
});

test("disables the button while a request is in flight", async () => {
  const user = userEvent.setup();
  render(<AddToCart productName="Wireless Mouse" price={29} />);

  const button = screen.getByRole("button", { name: /add to cart/i });
  await user.click(button);

  // The user sees a disabled, busy button, we never inspect a loading flag.
  expect(button).toBeDisabled();
});

Two things make this test durable. First, queries by role and accessible name mean the test only passes if the markup is actually accessible, a free accessibility check baked into every test. Second, `userEvent` simulates real interaction (focus, key events, the works) rather than firing a synthetic click, so you exercise the same path a person would. Rename the internal state, swap the styling, move logic into a custom hook, as long as the button still says "Add to cart" and the count still updates, the test stays green.

Query priority

Reach for queries in this order: **getByRole** (with a name), then **getByLabelText** / **getByText**, then **getByPlaceholderText**. Use **getByTestId** only as a last resort for elements with no accessible handle. If you cannot find an element by role or text, that is often a sign the markup itself is not accessible.

Behavior over implementation, in practice

The rule "test behavior, not implementation" sounds obvious until you are mid-test and tempted to peek at internals. A quick litmus test: would this assertion break if I refactored the component without changing what the user sees or does? If yes, you are testing implementation. Asserting a useState value, that a specific function was called, or that a child received a named prop, all break on refactor and protect nothing.

Implementation tests also lie in the dangerous direction. A test that checks setLoading(true) was called will pass even if the spinner never actually renders because of a CSS bug or a missing conditional. A behavior test that asserts the spinner is *visible* catches that. Always assert the outcome the user perceives, not the mechanism you hope produces it.

Mocking the network, at the boundary, not inside

Components fetch data, and tests should not hit a real server. The brittle approach is to mock your fetch wrapper or stub a hook's return value, that couples the test to your data-fetching internals. The durable approach is to intercept at the network boundary with a tool like MSW (Mock Service Worker). You declare what the endpoint returns; your component does its real fetching, parsing, and error handling against that fake server. Swap fetch for axios for React Query and the test does not care.

ProductList.test.tsx
tsx
import { render, screen } from "@testing-library/react";
import { http, HttpResponse } from "msw";
import { setupServer } from "msw/node";
import { ProductList } from "./ProductList";

const server = setupServer(
  http.get("/api/products", () =>
    HttpResponse.json([{ id: "1", name: "Wireless Mouse", price: 29 }])
  )
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

test("renders products from the API", async () => {
  render(<ProductList />);
  // The component does its real fetch; we wait for the user-visible result.
  expect(await screen.findByText("Wireless Mouse")).toBeInTheDocument();
});

test("shows an error state when the API fails", async () => {
  // Override just this test's response to exercise the error path.
  server.use(
    http.get("/api/products", () => new HttpResponse(null, { status: 500 }))
  );
  render(<ProductList />);
  expect(
    await screen.findByText(/couldn't load products/i)
  ).toBeInTheDocument();
});

The same MSW handlers can back your component tests and your Playwright end-to-end runs, and even power local development with a mock backend. One source of fake truth, reused everywhere, far better than a thicket of per-test jest.mock calls.

End-to-end: the critical-flow crown

Component tests run in jsdom, a simulated DOM with no real layout, no real browser quirks, no real navigation. For the journeys that absolutely cannot break, you want a real browser. Playwright (and Cypress) drives Chromium, Firefox, and WebKit through a complete flow: load the page, fill the form, click through, assert what renders. It catches things component tests structurally cannot, broken routing, a real network round-trip, auth redirects, a button hidden behind a z-index bug.

checkout.spec.ts
tsx
import { test, expect } from "@playwright/test";

test("a user can complete checkout", async ({ page }) => {
  await page.goto("/products");

  // Same philosophy as component tests: locate by role and text.
  await page.getByRole("button", { name: /add to cart/i }).first().click();
  await page.getByRole("link", { name: /view cart/i }).click();
  await page.getByRole("button", { name: /checkout/i }).click();

  await page.getByLabel(/card number/i).fill("4242 4242 4242 4242");
  await page.getByRole("button", { name: /pay now/i }).click();

  // Assert the user-visible success outcome.
  await expect(
    page.getByRole("heading", { name: /order confirmed/i })
  ).toBeVisible();
});

Notice the queries look almost identical to the component test, by role and accessible name. That consistency is the payoff of the behavior-first philosophy: the same mental model scales from a single component up to a full browser flow. Keep this layer small. A handful of E2E tests for your money-making journeys gives enormous confidence; a hundred of them gives you a slow, flaky pipeline nobody trusts.

Common mistakes that cost hours

  1. Testing implementation details. Asserting hook values, internal state, or that a function was called with exact args. These break on every refactor and pass even when the UI is visibly broken. Assert what the user sees instead.
  2. Snapshot overuse. A wall of auto-generated snapshots becomes noise: people update them blindly when they go red, so they catch nothing and just nag. Reserve snapshots for small, stable, intentional outputs, and review every diff.
  3. No end-to-end for critical flows. Component tests in jsdom cannot catch broken routing, real auth, or a button buried under an overlay. If sign-in or checkout breaking would be a disaster, it needs at least one real-browser test.
  4. Over-mocking. Mock the network boundary, not your own modules. When you stub the very code under test (the fetch wrapper, the hook, the child component), you are testing your mocks, not your app, green tests, broken product.
  5. Querying by class or deep test-ids. container.querySelector('.btn-primary') couples tests to styling and bypasses accessibility. Prefer role and text; the test doubles as an a11y check.
  6. Chasing 100% coverage. Coverage measures lines executed, not behavior verified. Test the paths that matter, error states, edge cases, the happy path, and let trivial getters go untested.

Takeaways

The whole article in seven lines

  • Test what the user **does**, not how the component is **built**, that one rule prevents most brittleness.
  • Follow the **testing trophy**: a little static, some unit, a **fat layer of component tests**, a focused crown of E2E.
  • Static checks (TypeScript, ESLint) are the cheapest layer, let them catch a whole class of bugs for free.
  • Query by **role, label, and text**; touch internals never. Accessible markup falls out for free.
  • Mock at the **network boundary** (MSW), not inside your own modules, reuse the handlers across unit, component, and E2E.
  • Reserve **end-to-end** for the few money-making journeys; keep that layer small to keep it trustworthy.
  • A green suite that breaks on refactors and misses real bugs is worse than fewer, behavior-focused tests.

Where to go next

Testing rides on top of how your components are structured and how your state flows. Well-factored components with clear inputs and accessible markup are dramatically easier to test, so the testing payoff is also an architecture payoff.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.