How Netflix Streams to 200 Million Devices Without Breaking

On this page

The scale problem
What "streaming at scale" actually means
What happens when you press play
Open Connect: Netflix's own CDN
Adaptive bitrate streaming (ABR)
Microservices at 1,000+ services
Chaos engineering: break things on purpose
Mistakes that cost teams hours (and how Netflix avoids them)
Takeaways
Where to go next

TL;DR

See how Netflix serves 200 million devices: Open Connect CDN boxes inside your ISP, adaptive bitrate that shifts mid-scene, 1,000+ microservices, and chaos engineering in production. The real lesson is the scaling patterns you can borrow.

The scale problem

It's 9 PM. Across one timezone, millions of people press play within the same five minutes. Each one wants a different show, at a different resolution, on a different device, over a different network, some on fibre, some on a phone with two bars. Nobody wants to wait. Nobody tolerates a spinner. This is the problem Netflix solved, and almost nothing off-the-shelf came close.

Netflix serves 200 million+ subscribers across 190 countries. At peak hours they handle over 15 million concurrent streams, every one a different bitrate, codec, and resolution. The architecture had to be built from the ground up, and the patterns they invented along the way are now the patterns you'll be asked about in senior interviews.

Who this is for

Engineers who already understand the basics, what a CDN is, what a container does, and want to see those pieces assembled into a planet-scale system. If you've shipped a service to one region and wondered "how does this work at 1000x?", this is for you. No Netflix-internal knowledge required; everything here is from their public engineering blog.

What "streaming at scale" actually means

Streaming at scale is less about playing video and more about putting the right bytes as close to the viewer as physically possible, before they ask for them.

The naive design, one big origin server that streams video to everyone, collapses instantly. The public internet is congested, lossy, and far away. The further the bytes travel, the more chances for a stall. So the real game is distance: shrink the gap between the viewer and the video, and most of your problems disappear.

A central warehouse shipping every order cross-countryA single origin server streaming to the whole world

Local corner shops stocked overnight with what the neighbourhood buysOpen Connect appliances pre-loaded with tomorrow's popular titles

Restocking shelves at 4 AM before the shop opensProactive caching during off-peak hours

A delivery driver who picks the closest shop with your itemClient steering, routing your play request to the nearest healthy appliance

Why Netflix doesn't stream from one place.

What happens when you press play

Before the concepts, the picture. Here's the path a single play request takes, from the remote in your hand to the bytes hitting your screen. The solid line is the happy path; the dashed branch is everything watching that path to make sure it stays healthy.

One play request: control plane on AWS decides, data plane (Open Connect) delivers.

1
You press play
The Netflix client calls the control plane running on AWS, not to fetch video, but to ask permission and get directions.
2
The control plane decides
Playback services check your subscription, your device's supported codecs, and your DRM license. Recommendations and metadata come from the data store.
3
You get a steering decision
The control plane returns a list of the best Open Connect appliances for you right now, usually one physically inside your ISP's network.
4
Video comes from the edge, not AWS
Your client fetches the actual video chunks directly from that appliance. The bytes never touch the public internet backbone.
5
Quality adjusts in real time
As you watch, the client measures bandwidth and silently swaps to higher- or lower-quality chunks. You never see the seams.

Pro tip

Notice the split: the control plane (decisions, on AWS) is separate from the data plane (video bytes, on Open Connect). Separating "who decides" from "who delivers" is one of the most reusable ideas in the whole architecture.

Open Connect: Netflix's own CDN

Most companies rent a CDN, Cloudflare, AWS CloudFront, Akamai. Netflix built their own, called Open Connect. They ship physical servers, Open Connect Appliances (OCAs), and place them directly inside ISPs and internet exchange points around the world. When you hit play, your stream often comes from a box literally in the same building as your internet provider's router.

The trick that makes this work is proactive caching. Netflix predicts what a region will watch tomorrow, based on what's trending, what just dropped, what people in that area binge, and pushes that content to local appliances during off-peak hours, when bandwidth is cheap and idle. By the time you want it, the show is already next door.

Dimension	Public-internet CDN	Open Connect
Where it lives	Shared edge POPs on the public internet	Appliances inside ISPs and IXPs
How content arrives	Cached on first request (cache miss = slow)	Pre-positioned overnight before anyone asks
Backbone traffic	Travels the congested public internet	Stays off the public backbone entirely
Cost model	Pay per GB egress, forever	Capital cost upfront, near-zero marginal egress
Control	Vendor's roadmap and tuning	Full control of hardware, firmware, steering

Renting a public CDN vs. running Open Connect.

Note

Netflix has 17,000+ Open Connect Appliances deployed in 158 countries. This is why Netflix buffering is rare even when your connection is mediocre, the bytes barely had to travel.

Adaptive bitrate streaming (ABR)

Your Netflix stream isn't one video file. It's hundreds of pre-encoded chunks at different quality levels, from 235 kbps (mobile, bad signal) up to 16 Mbps (4K HDR). The client measures your available bandwidth every few seconds and switches quality levels mid-stream. You never notice because consecutive chunks overlap and align at segment boundaries.

Netflix doesn't encode every title the same way, either. Their Dynamic Optimizer uses per-title and per-scene analysis to allocate bits where the eye needs them, a slow, dark dialogue scene gets fewer bits than a fast action sequence at the same perceived quality. The result: the same visual quality at a lower average bitrate, which means fewer stalls on weak networks and less bandwidth everywhere.

Encode once, serve many, a ladder of bitrates is generated ahead of time, not on the fly.
Measure constantly, the client tracks throughput and buffer health every few seconds.
Switch invisibly, quality changes happen at chunk boundaries, so playback never pauses.
Optimize per scene, Dynamic Optimizer spends bits where they're visible, saves them where they aren't.

Microservices at 1,000+ services

Netflix's backend is the textbook example of microservice architecture: over 1,000 services, each owning one function, recommendations, search, profiles, payments, subtitle delivery, playback authorization. Each is independently deployable, independently scalable, and independently monitored. This is how Netflix pushes code hundreds of times a day without downtime.

It isn't free. A thousand services means a thousand things that can fail and a network call between every pair of them. Netflix had to solve problems most teams never face: service discovery, distributed tracing, circuit breaking, and, the one they're most famous for, engineering for constant, expected failure.

Concern	Monolith	Microservices
Deploy	Whole app at once, coordinated	Each service ships on its own schedule
Scaling	Scale the entire app together	Scale only the hot service (e.g. playback)
Blast radius	One bug can take down everything	A failure is contained to one service
Team ownership	Many teams touch one codebase	One team owns one service end-to-end
Hard parts	Coordination, slow releases	Service discovery, tracing, network failure

Why Netflix moved off the monolith.

Pro tip

Microservices are a trade, not an upgrade. You swap a deployment problem for a distributed-systems problem. Netflix made that trade because the deployment problem was killing them at their scale, most teams aren't there yet.

Chaos engineering: break things on purpose

Netflix invented chaos engineering. They built Chaos Monkey, a tool that randomly terminates production services during business hours, on purpose, while engineers are awake to watch. The philosophy is blunt: if your system can't survive random failure in a controlled experiment, it will absolutely fail in an uncontrolled one at 3 AM.

Chaos Monkey grew into the Simian Army, a suite that simulates entire region failures (Chaos Kong), injects latency, and probes for misconfiguration. The point isn't destruction; it's evidence. Every experiment either proves the system degrades gracefully or hands you a bug before a customer finds it. Resilience stops being a hope and becomes a measured property.

1Start with a steady state, a metric that means "healthy" (e.g. successful plays per second).
2Hypothesize, "if we kill this service, steady state holds."
3Inject the failure in production, on a small blast radius, during working hours.
4Measure, did steady state hold, or did you just find a weakness to fix?

Pro tip

Chaos engineering is now industry standard. AWS Fault Injection Service (FIS) is the managed version, so you don't have to build your own Simian Army to start. The Cloud Engineer path covers resilience patterns like this in depth.

Mistakes that cost teams hours (and how Netflix avoids them)

1Treating the network as reliable. Every call between services can be slow, fail, or hang. No timeouts and no circuit breakers means one slow dependency stalls the whole request chain. Netflix wraps every remote call in a timeout and a fallback.
2Caching nothing at the edge. Sending every request to the origin is the single biggest avoidable latency. Even a short edge cache on static assets changes the experience, Open Connect is this idea taken to its extreme.
3Shipping a monolith of microservices. Splitting services but deploying them together gives you all the complexity and none of the independence. If two services must ship together, they're really one.
4Skipping observability until it's on fire. If you can't measure it, you can't fix it. Build metrics and tracing in from day one, not after the first incident.
5Assuming failures are rare. At scale, something is always broken. Design for graceful degradation, a missing recommendation row should never break playback.

Takeaways

The whole article in seven lines

Streaming at scale is a distance problem, get the bytes physically close before they're requested.
Open Connect puts Netflix's own CDN appliances inside ISPs and pre-caches content overnight.
Separate the control plane (decisions on AWS) from the data plane (video on the edge).
Adaptive bitrate swaps quality mid-stream; Dynamic Optimizer spends bits only where the eye needs them.
Microservices trade a deployment problem for a distributed-systems problem, worth it only at scale.
Chaos engineering turns resilience from a hope into a measured, tested property.
You don't need Netflix scale to apply Netflix principles: timeouts, edge caching, and observability pay off on day one.

Where to go next

You don't need 200 million users to use these patterns, you need the fundamentals underneath them. Build the mental model with hands-on labs, then see where they fit in the broader role.

Kubectl lab, practice orchestrating the kind of microservices Netflix runs at scale.
Docker lab, package a service into the portable, immutable unit that makes independent deploys possible.
Networking lab, the routing and latency fundamentals behind CDNs and edge delivery.
Cloud Engineer path, where resilience, observability, and chaos engineering live as a full track.

And read Netflix's own engineering blog, it's one of the best free resources in cloud engineering, and most of what's above came straight from it.

Check your understanding

1. What is the core idea behind Netflix's Open Connect CDN?

2. What does the article mean by separating the control plane from the data plane?

Frequently asked questions

Why did Netflix build its own CDN instead of renting one like Cloudflare or Akamai?

Renting a CDN still leaves the video bytes traveling across a congested public internet, which causes stalls. Netflix built Open Connect so it could ship physical appliances (OCAs) directly inside ISPs and internet exchange points, putting the video in the same building as your provider's router and shrinking the distance the bytes have to travel.

What is the difference between the control plane and the data plane in Netflix's architecture?

The control plane handles decisions and runs on AWS, while the data plane delivers the actual video bytes and runs on Open Connect. Separating who decides from who delivers is one of the most reusable ideas in the whole design.

How does Netflix have the right video cached before you even press play?

It uses proactive caching: Netflix predicts what a region will watch based on what is trending and what just dropped, then pushes those files to the local appliances ahead of demand. The goal is to put the right bytes as close to the viewer as physically possible before they ask for them.

What does adaptive bitrate streaming actually do during playback?

Adaptive bitrate (ABR) adjusts the stream's quality mid-scene to match your current device and network conditions. That is how the same title can serve millions of concurrent viewers at different bitrates, codecs, and resolutions without forcing anyone to wait or stare at a spinner.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Cloud

Cloud Networking Fundamentals: How a VPC Actually Works

Read

Cloud

How the Cloud Actually Works: Regions, AZs & the Edge

Read

Cloud

Cloud Identity & Access (IAM) From First Principles

Read

How Netflix Streams to 200 Million Devices Without Breaking

01The scale problem

02What "streaming at scale" actually means

03What happens when you press play

04Open Connect: Netflix's own CDN

05Adaptive bitrate streaming (ABR)

06Microservices at 1,000+ services

07Chaos engineering: break things on purpose

08Mistakes that cost teams hours (and how Netflix avoids them)

09Takeaways

10Where to go next

Frequently asked questions

Want to go deeper?

Cloud Networking Fundamentals: How a VPC Actually Works

How the Cloud Actually Works: Regions, AZs & the Edge

Cloud Identity & Access (IAM) From First Principles

The scale problem

What "streaming at scale" actually means

What happens when you press play

Open Connect: Netflix's own CDN

Adaptive bitrate streaming (ABR)

Microservices at 1,000+ services

Chaos engineering: break things on purpose

Mistakes that cost teams hours (and how Netflix avoids them)

Takeaways

Where to go next