DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are the industry-standard measures of software delivery performance. Elite performers deploy thousands of times per day with 0-15 minute lead times. Understanding these metrics — and how to improve them — is foundational for DevOps interviews and organizational improvement.
DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are the industry-standard measures of software delivery performance. Elite performers deploy thousands of times per day with 0-15 minute lead times. Understanding these metrics — and how to improve them — is foundational for DevOps interviews and organizational improvement.
Lesson outline
The DORA (DevOps Research and Assessment) metrics emerged from six years of research by Dr. Nicole Forsgren, Jez Humble, and Gene Kim, published in the 2018 book "Accelerate." They surveyed thousands of technology organizations and identified four metrics that consistently differentiated high-performing teams from low-performing ones. These four metrics are now the industry standard for measuring software delivery performance.
| Metric | What it measures | Elite | High | Medium | Low |
|---|---|---|---|---|---|
| Deployment Frequency | How often code is deployed to production | Multiple times/day | Between once/day and once/week | Between once/week and once/month | Fewer than once every 6 months |
| Lead Time for Changes | Time from code commit to running in production | Less than 1 hour | 1 day to 1 week | 1 week to 1 month | 6 months to 1 year |
| Change Failure Rate | % of deployments causing a production incident | 0-15% | 16-30% | 16-30% | 16-30% |
| Mean Time to Restore (MTTR) | Time to recover from a production failure | Less than 1 hour | Less than 1 day | 1 day to 1 week | 6 months to 1 year |
The 2023 State of DevOps Report Key Finding
Elite performers are not just incrementally better — they are orders of magnitude better. Elite teams deploy 127x more frequently than low performers, with 6,570x faster lead times and 3x lower change failure rates. This is not about effort or talent; it is about systems: CI/CD pipelines, automated testing, small batch sizes, and psychological safety.
DORA Metric Mnemonics
DF: "How OFTEN do we ship?" Lead Time: "How FAST do we ship?" CFR: "How SAFE is our shipping?" MTTR: "How QUICKLY do we recover?" Together they answer: speed + stability. Elite performers are fast AND stable — disproving the myth that you must choose between speed and reliability.
Deployment frequency and lead time measure the throughput of your delivery pipeline. Low-performing teams treat deployments as risky, infrequent events. Elite teams treat them as routine, low-risk activities. The difference comes from architecture, automation, and culture — not more hours.
What drives low deployment frequency and long lead times
Trunk-Based Development Enables High Deployment Frequency
Elite teams practice trunk-based development: all developers commit to the main branch (or short-lived branches of 1-2 days). Combined with feature flags, you can deploy incomplete features to production safely. No long-lived feature branches means no merge conflicts, no integration nightmares, and no "big bang" releases. Deployment frequency naturally increases when deploying is just merging to main.
Change failure rate and MTTR measure the stability of your delivery pipeline. A high change failure rate means your deployments frequently cause incidents. A high MTTR means when something goes wrong, you take a long time to recover. Both are solvable with engineering discipline.
| Problem | Symptom | Fix |
|---|---|---|
| High change failure rate | Frequent production incidents after deploys; rollbacks common | More automated testing (unit, integration, E2E, contract); canary deployments; feature flags; better staging parity |
| High MTTR from detection lag | Incidents run for hours before anyone notices | Alerting on SLOs and error budgets (not just uptime); anomaly detection; real-user monitoring |
| High MTTR from diagnosis time | Takes hours to find root cause after alert fires | Centralized logs with trace IDs; distributed tracing; dashboards with correlated metrics |
| High MTTR from slow rollback | Rollback takes 30+ minutes and is risky | Automated rollback triggers; deploy with feature flags (turn off, no deploy needed); blue-green deployments |
Improve MTTR: Invest in Observability Before Incidents
MTTR is dominated by time to detect + time to diagnose + time to remediate. Most improvements come from reducing diagnosis time (observable systems with centralized logs, traces, and dashboards) and reducing remediation time (automated rollbacks, feature flags). Detection is improved by SLO-based alerting that fires on user-impacting issues, not just CPU thresholds.
Change Failure Rate Does Not Mean Zero Changes
A common misreading: "we should reduce deployments to reduce change failure rate." Wrong. Elite teams deploy hundreds of times per day with a 0-15% CFR. Reducing deployment frequency does not reduce failure rate — it increases the blast radius of each deployment (more changes bundled together = higher risk per deployment). The fix for CFR is better testing and deployment practices, not less deploying.
Before optimizing, you need to know where time is actually being spent. Value stream mapping is the practice of documenting every step in your delivery process — from "idea approved" to "running in production" — with actual timing data.
01
Map the current state: Document every step: developer writes code → opens PR → waits for CI → peer review → approval → merge → CI/CD pipeline → deploy to staging → manual testing → deploy to production. For each step, measure: process time (how long it actually takes) and queue time (how long it waits before starting).
02
Identify the constraint: In most teams, 80-90% of lead time is queue time, not process time. Common bottlenecks: PR waiting for reviewer (hours to days), manual QA queue (days), deployment window only on Tuesdays.
03
Improve the constraint: The Theory of Constraints says: improving anything other than the bottleneck does not improve throughput. If PRs wait 2 days for review, speeding up CI from 10 minutes to 5 minutes does nothing for lead time.
04
Measure, repeat: After improving the bottleneck, a new bottleneck emerges elsewhere. Continuous improvement is iterative.
Map the current state: Document every step: developer writes code → opens PR → waits for CI → peer review → approval → merge → CI/CD pipeline → deploy to staging → manual testing → deploy to production. For each step, measure: process time (how long it actually takes) and queue time (how long it waits before starting).
Identify the constraint: In most teams, 80-90% of lead time is queue time, not process time. Common bottlenecks: PR waiting for reviewer (hours to days), manual QA queue (days), deployment window only on Tuesdays.
Improve the constraint: The Theory of Constraints says: improving anything other than the bottleneck does not improve throughput. If PRs wait 2 days for review, speeding up CI from 10 minutes to 5 minutes does nothing for lead time.
Measure, repeat: After improving the bottleneck, a new bottleneck emerges elsewhere. Continuous improvement is iterative.
The Five Ideals from The Unicorn Project
Gene Kim's "The Unicorn Project" describes five ideals for high-performing engineering organizations: 1. Locality and Simplicity — systems should be loosely coupled so teams can work independently. 2. Focus, Flow, and Joy — minimize interruptions; create flow states for developers. 3. Improvement of Daily Work — fix problems in the system, not just workarounds. 4. Psychological Safety — people speak up about problems without fear. 5. Customer Focus — every engineering decision ties back to customer outcomes.
DORA metrics are engineering metrics. Executives often need to see them translated into business outcomes. High-performing engineering organizations are not just technically better — they deliver better business results.
How DORA metrics connect to business outcomes
Vanity Metrics vs DORA Metrics
"We have 99.9% uptime" sounds impressive but tells you nothing about software delivery performance. A company with 99.9% uptime might deploy once per quarter with a 2-week lead time. Their uptime number hides that they are 100x slower than competitors. DORA metrics are not vanity metrics — they directly measure the capability of your delivery system and predict future performance.
How to Start Measuring DORA Metrics
Deployment frequency: count production deployments in your CI/CD system (GitHub Actions, Jenkins, Argo CD sync events). Lead time: timestamp when a commit is created (git commit time) and when it reaches production (deployment timestamp). Median of (deploy time - commit time). Change failure rate: count deployments that result in an incident or rollback / total deployments. MTTR: average of (incident resolved time - incident start time) across all incidents. LinearB, DORA DevOps metrics tools, Sleuth, and Faros AI automate this measurement.
DORA metrics are expected knowledge for DevOps, SRE, and engineering leadership roles. System design rounds may ask how you would measure a CI/CD system's performance. Behavioral questions may probe how you have improved delivery metrics at a previous company.
Common questions:
Strong answer: Knowing all four metrics with elite/low benchmarks. Explaining that elite performers are both faster AND more stable (disproving the speed-stability trade-off myth). Being able to identify specific bottlenecks in a delivery pipeline using value stream mapping. Connecting DORA metrics to business outcomes (time to market, revenue per incident). Knowing the Five Ideals from The Unicorn Project.
Red flags: Not knowing the four DORA metrics by name. Claiming that deploying less frequently improves stability. Confusing MTTR with MTTD (mean time to detect). Not knowing the difference between elite and low performer benchmarks. Using uptime as the primary delivery metric.
Quick check · DevOps Lifecycle and Metrics
1 / 4
Key takeaways
💡 Analogy
DORA metrics are like vital signs for a hospital. Pulse rate (deployment frequency) — is the patient actively producing output? Blood pressure (lead time) — is there healthy flow through the system? Temperature (change failure rate) — is there infection/inflammation in the system? Oxygen saturation (MTTR) — when something goes wrong, how quickly can the system recover? A healthy patient has all four in normal range. Abnormal readings in any one signal a specific type of problem — you do not treat them all the same way.
⚡ Core Idea
Four metrics capture the full picture of software delivery performance: speed (deployment frequency, lead time) and stability (change failure rate, MTTR). Elite organizations are faster AND more stable — they achieved both by automating testing, reducing batch sizes, and investing in observability. Speed and stability are not in tension; they are achieved together.
🎯 Why It Matters
Without metrics, DevOps improvement efforts are guesswork. "We should deploy more frequently" is not actionable. "Our lead time is 3 weeks, driven primarily by a 2-day PR review queue and a 3-day manual QA queue — we will eliminate manual QA for services with >80% test coverage" is actionable. DORA metrics make the engineering system visible.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.