Back to Blog
SRE13 min readJun 2026

The Four Golden Signals: The Minimal Set of Metrics That Catch Almost Everything

Latency, traffic, errors, saturation. Google's SRE book distilled monitoring down to four signals that catch most user-facing problems. Here is what each one means, what to measure, and how to instrument a service with Prometheus.

SREMonitoringObservabilityMetrics
SB

Sri Balaji

Founder · TheSimplifiedTech

On this page

You can't dashboard your way to calm

Here is the trap every team falls into. The service is slow, someone is paging you, and you open a dashboard with forty graphs on it. CPU, heap, GC pauses, thread pools, cache hit ratios, p50/p90/p99 for twelve endpoints, three flavors of queue depth. You stare at the wall of sparklines and feel *less* sure of what's wrong than when you started. More metrics did not buy you more clarity, they bought you more places to look.

The fix is not another graph. It's a smaller, sharper set. In 2016, Google's Site Reliability Engineering book made a deceptively simple claim: if you can only measure four things about a user-facing system, measure latency, traffic, errors, and saturation. They called them the Four Golden Signals, and they hold up because they are framed around the user's experience, not the machine's internals.

Who this is for

Engineers who own a service in production and want a starting point for monitoring that isn't "graph everything and hope." If you can deploy a service and read a chart, you're ready. No prior SRE background assumed, by the end you'll be able to instrument a service with all four signals and know what a spike in each one is telling you.

The four signals in one breath

If you can measure only four metrics of your user-facing system, focus on these four: latency, traffic, errors, and saturation.
Google SRE Book, Monitoring Distributed Systems

Think of these four as the dashboard of a car. You don't need to watch the engine's combustion timing to know something is wrong, the car's dashboard already surfaces the signals that matter to the driver, and each one maps cleanly to a golden signal.

Speedometer, how fast you're actually goingLatency, how fast requests are actually served
Trip odometer / RPM, how hard the car is working right nowTraffic, how much demand is hitting the service right now
Check-engine light, something just went wrongErrors, the rate of requests that are failing
Fuel gauge / temperature, how close to the limit you areSaturation, how full the most constrained resource is
A driver doesn't read the engine block. They read four gauges. Your on-call self should too.

The power of the set is *coverage*. A user-facing problem almost always shows up as: requests got slow (latency), demand spiked or vanished (traffic), requests started failing (errors), or a resource is about to tip over (saturation). Watch four, catch most.

The picture: a service emitting four signals

Here's the mental model. Users hit your service on the happy path. The service emits the four golden signals out to an observability stack, a metrics store that feeds dashboards and an alerting pipeline. The signals are a side-channel; they never block the request.

requestsqueries
Users

real traffic

Service

your app

Database / deps

downstream

Latency

request duration

Traffic

requests / sec

Errors

failure rate

Saturation

resource fullness

Metrics + Alerting

Prometheus / dashboards

Users flow through the service on the happy path (solid). The four golden signals fan out to the observability node (dashed, async).

  1. 1

    A user sends a request

    The request enters your service on the happy path. This is the only flow the user cares about, everything else exists to keep this fast and correct.

  2. 2

    The service times the request

    It records when work started and finished. That duration is your latency sample, ideally tagged by endpoint and split by success vs. failure.

  3. 3

    The service counts the request

    A counter ticks up for every request handled, that count over time is traffic. A second counter ticks only when the request fails, that's the errors signal.

  4. 4

    The runtime exposes saturation

    CPU, memory, connection-pool usage, queue depth, how full the most constrained resource is. This is the leading indicator: it climbs before latency and errors do.

  5. 5

    An agent scrapes the signals

    Prometheus pulls these numbers on an interval and stores them as time series. Dashboards visualize them; alert rules fire when they cross a threshold tied to your SLOs.

Signal by signal: what to measure and why

Each signal answers a different question about the user's experience. Here's the cheat sheet, what it measures, a concrete metric to emit, and what a spike is usually telling you.

SignalWhat it measuresExample metricWhat a spike means
LatencyHow long a request takes, as a distribution (not an average)p99 of request_duration_seconds, split success vs. errorSomething downstream is slow or you're resource-starved, users are waiting
TrafficHow much demand the service is handling right nowrequests per second from http_requests_totalA real load spike, a retry storm, or a drop to zero, meaning an upstream is broken
ErrorsThe rate of requests that fail, explicitly or implicitlyratio of 5xx (and bad 200s) to total requestsA bad deploy or failing dependency, correctness is breaking, not just speed
SaturationHow full your most constrained resource is, vs. its limitmemory / CPU / pool utilization as a percentageYou're approaching a cliff, performance degrades nonlinearly past ~80%
Latency, traffic, errors measure what the user feels now; saturation predicts what they'll feel soon.

Latency, measure the tail, not the average

The single most common latency mistake is reporting the average. Averages hide pain: if 99 requests take 10ms and one takes 10 seconds, the average is a comfortable 110ms while a real user stares at a frozen page. Always track percentiles, p50 for the typical user, p99 for the unlucky tail. And split latency of *successful* requests from *failed* ones: a fast error and a slow success are both interesting, and averaging them together is meaningless.

Traffic, your denominator and your context

Traffic is requests per second (or transactions, or queries). On its own it's rarely an alert, but it's the context that makes the other three readable. A 500-error count means nothing without knowing whether you served 10 requests or 10 million. Traffic is also the denominator for your error *rate*, and a sudden drop to zero is its own kind of incident.

Errors, count the lies, not just the crashes

Explicit errors (HTTP 5xx, exceptions) are easy. The dangerous ones are implicit: a 200 OK that returns the wrong content, a response that violates a contract, or a success that took 30 seconds when your SLO is 300ms. Define what "failure" means for *your* service and count all of it. Errors are about correctness, a fast wrong answer is still wrong.

Saturation, the leading indicator

Saturation is how full your most constrained resource is: memory, CPU, disk I/O, a connection pool, a thread pool, a queue. It's special because it's a leading indicator, it climbs *before* latency and errors degrade. Most systems behave badly well below 100%; a queue at 80% utilization is already adding latency. Find your bottleneck resource, measure it as a fraction of its limit, and alert before the cliff.

Monitoring vs. observability, RED, and USE

The four signals sit inside a bigger vocabulary. Two distinctions are worth knowing so you don't get lost in the jargon.

Monitoring vs. observability. Monitoring answers *known* questions, "is the error rate above 1%?", with predefined dashboards and alerts. The golden signals are classic monitoring. Observability is the property of being able to ask *new* questions of a running system without shipping new code: "why is p99 high *only* for users in region X on checkout after the 2pm deploy?" You get there by adding high-cardinality context (traces, structured logs, exemplars). Golden signals tell you *that* something is wrong; observability helps you find out *why*.

RED and USE are two popular reframings of the same idea, each aimed at a different layer:

MethodBest forWhat it tracks
Golden SignalsAny user-facing system (the general case)Latency, Traffic, Errors, Saturation
REDRequest-driven services and microservicesRate (traffic), Errors, Duration (latency)
USEHardware and finite resources (a host, a disk, a pool)Utilization, Saturation, Errors
RED is the golden signals minus saturation, viewed per service. USE is the resource-side view. They overlap by design.

In practice: use RED to instrument each service from the request's point of view, USE to investigate the resources underneath it, and treat the four golden signals as the umbrella that guarantees you've covered both angles.

Instrument a service with the four signals

Here's the concrete walkthrough for a request-driven HTTP service exposing Prometheus metrics. The pattern is the same in any language with a Prometheus client.

  1. 1

    Expose a /metrics endpoint

    Add a Prometheus client library and mount a /metrics route. This is the surface Prometheus scrapes, it returns the current value of every metric in plaintext.

  2. 2

    Emit a histogram for latency

    Wrap your request handler in a timer that observes duration into a histogram labeled by route and status. Histograms let you compute p50/p99 later in PromQL.

  3. 3

    Emit a counter for traffic and errors

    Increment http_requests_total on every request, labeled by status code. Traffic is the rate of the whole counter; errors are the rate of the 5xx slice. One counter, two signals.

  4. 4

    Expose saturation from the runtime

    Most client libraries auto-export process CPU/memory. Add your bottleneck explicitly, connection-pool in use vs. max, or queue depth vs. capacity, as a gauge.

  5. 5

    Scrape and alert

    Point Prometheus at the endpoint, then write alert rules in PromQL that map to your SLOs. Alert on symptoms users feel (error ratio, p99), not on raw CPU.

prometheus.yml
yaml
# Tell Prometheus to scrape your service's /metrics endpoint
scrape_configs:
  - job_name: "checkout-service"
    metrics_path: /metrics
    scrape_interval: 15s
    static_configs:
      - targets: ["checkout:8080"]
        labels:
          team: payments

With those three metric types in place, all four signals fall out of a handful of PromQL queries. These are the expressions you put on a dashboard and behind alerts:

golden-signals.promql
promql
# 1) LATENCY, p99 of successful requests over the last 5 minutes
histogram_quantile(
  0.99,
  sum by (le) (
    rate(http_request_duration_seconds_bucket{status=~"2.."}[5m])
  )
)

# 2) TRAFFIC, requests per second across the whole service
sum(rate(http_requests_total[5m]))

# 3) ERRORS, fraction of requests returning 5xx (the error RATE)
sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))

# 4) SATURATION, connection pool fullness as a fraction of its limit
max(db_pool_connections_in_use) / max(db_pool_connections_max)

Alert on the ratio, not the raw count

A rule like "5xx > 10/s" fires on a traffic spike even when reliability is fine. Alert on the error *ratio* against your SLO (e.g. error rate > 1% for 5 minutes) so the threshold scales with load instead of paging you every Black Friday.

Common mistakes that cost hours

  1. Reporting averages for latency. The mean hides the tail where real users hurt. Always use percentiles (p50, p99), and never average a histogram, compute the quantile.
  2. Alerting on causes instead of symptoms. "CPU > 80%" pages you when nothing is broken for users. Alert on what users feel, error ratio and p99 latency, and use saturation for investigation.
  3. Counting errors without traffic. "50 errors" is meaningless. 50 of 50 is an outage; 50 of 5 million is noise. Always express errors as a *rate* over total requests.
  4. Ignoring implicit errors. A 200 OK with a corrupt body, or a success that blew your latency SLO, is still a failure. Define failure for your service and count all of it.
  5. Watching saturation last. It's the leading indicator, it moves first. If you only look at it in the postmortem, you missed the warning the system gave you ten minutes before the page.
  6. Confusing dashboards with observability. Forty static graphs is still monitoring. If you can't ask a new question without deploying code, you have monitoring, not observability.

Takeaways

The whole article in seven lines

  • Measure four things to catch most user-facing problems: **latency, traffic, errors, saturation**.
  • Latency: track **percentiles** (p99), and split successful from failed requests.
  • Traffic: requests per second, the context and denominator for everything else.
  • Errors: count as a **rate** over total, and include implicit failures (wrong 200s, blown SLOs).
  • Saturation: how full your bottleneck is, the **leading indicator** that moves first.
  • **RED** = the per-service request view; **USE** = the per-resource view; golden signals = the umbrella.
  • Monitoring answers known questions; **observability** lets you ask new ones. Alert on symptoms, not causes.

Where to go next

The four signals are the *what* to measure. The next two questions are how to set targets for them and how to alert without burning out your on-call. Both have their own articles in this track.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.