How Netflix Streams to 200 Million Devices Without Breaking
The architecture behind Netflix's global streaming platform, from scratch: Open Connect CDN appliances living inside your ISP, adaptive bitrate that adjusts mid-scene, 1,000+ microservices, and chaos engineering that breaks production on purpose.
It's 9 PM. Across one timezone, millions of people press play within the same five minutes. Each one wants a different show, at a different resolution, on a different device, over a different network, some on fibre, some on a phone with two bars. Nobody wants to wait. Nobody tolerates a spinner. This is the problem Netflix solved, and almost nothing off-the-shelf came close.
Netflix serves 200 million+ subscribers across 190 countries. At peak hours they handle over 15 million concurrent streams, every one a different bitrate, codec, and resolution. The architecture had to be built from the ground up, and the patterns they invented along the way are now the patterns you'll be asked about in senior interviews.
Who this is for
Engineers who already understand the basics, what a CDN is, what a container does, and want to see those pieces assembled into a planet-scale system. If you've shipped a service to one region and wondered "how does this work at 1000x?", this is for you. No Netflix-internal knowledge required; everything here is from their public engineering blog.
What "streaming at scale" actually means
Streaming at scale is less about playing video and more about putting the right bytes as close to the viewer as physically possible, before they ask for them.
The naive design, one big origin server that streams video to everyone, collapses instantly. The public internet is congested, lossy, and far away. The further the bytes travel, the more chances for a stall. So the real game is distance: shrink the gap between the viewer and the video, and most of your problems disappear.
A central warehouse shipping every order cross-countryA single origin server streaming to the whole world
Local corner shops stocked overnight with what the neighbourhood buysOpen Connect appliances pre-loaded with tomorrow's popular titles
Restocking shelves at 4 AM before the shop opensProactive caching during off-peak hours
A delivery driver who picks the closest shop with your itemClient steering, routing your play request to the nearest healthy appliance
Why Netflix doesn't stream from one place.
What happens when you press play
Before the concepts, the picture. Here's the path a single play request takes, from the remote in your hand to the bytes hitting your screen. The solid line is the happy path; the dashed branch is everything watching that path to make sure it stays healthy.
One play request: control plane on AWS decides, data plane (Open Connect) delivers.
1
You press play
The Netflix client calls the control plane running on AWS, not to fetch video, but to ask permission and get directions.
2
The control plane decides
Playback services check your subscription, your device's supported codecs, and your DRM license. Recommendations and metadata come from the data store.
3
You get a steering decision
The control plane returns a list of the best Open Connect appliances for you right now, usually one physically inside your ISP's network.
4
Video comes from the edge, not AWS
Your client fetches the actual video chunks directly from that appliance. The bytes never touch the public internet backbone.
5
Quality adjusts in real time
As you watch, the client measures bandwidth and silently swaps to higher- or lower-quality chunks. You never see the seams.
Pro tip
Notice the split: the **control plane** (decisions, on AWS) is separate from the **data plane** (video bytes, on Open Connect). Separating "who decides" from "who delivers" is one of the most reusable ideas in the whole architecture.
Open Connect: Netflix's own CDN
Most companies rent a CDN, Cloudflare, AWS CloudFront, Akamai. Netflix built their own, called Open Connect. They ship physical servers, Open Connect Appliances (OCAs), and place them directly inside ISPs and internet exchange points around the world. When you hit play, your stream often comes from a box literally in the same building as your internet provider's router.
The trick that makes this work is proactive caching. Netflix predicts what a region will watch tomorrow, based on what's trending, what just dropped, what people in that area binge, and pushes that content to local appliances during off-peak hours, when bandwidth is cheap and idle. By the time you want it, the show is already next door.
Dimension
Public-internet CDN
Open Connect
Where it lives
Shared edge POPs on the public internet
Appliances inside ISPs and IXPs
How content arrives
Cached on first request (cache miss = slow)
Pre-positioned overnight before anyone asks
Backbone traffic
Travels the congested public internet
Stays off the public backbone entirely
Cost model
Pay per GB egress, forever
Capital cost upfront, near-zero marginal egress
Control
Vendor's roadmap and tuning
Full control of hardware, firmware, steering
Renting a public CDN vs. running Open Connect.
Note
Netflix has **17,000+ Open Connect Appliances deployed in 158 countries**. This is why Netflix buffering is rare even when your connection is mediocre, the bytes barely had to travel.
Adaptive bitrate streaming (ABR)
Your Netflix stream isn't one video file. It's hundreds of pre-encoded chunks at different quality levels, from 235 kbps (mobile, bad signal) up to 16 Mbps (4K HDR). The client measures your available bandwidth every few seconds and switches quality levels mid-stream. You never notice because consecutive chunks overlap and align at segment boundaries.
Netflix doesn't encode every title the same way, either. Their Dynamic Optimizer uses per-title and per-scene analysis to allocate bits where the eye needs them, a slow, dark dialogue scene gets fewer bits than a fast action sequence at the same perceived quality. The result: the same visual quality at a lower average bitrate, which means fewer stalls on weak networks and less bandwidth everywhere.
Encode once, serve many, a ladder of bitrates is generated ahead of time, not on the fly.
Measure constantly, the client tracks throughput and buffer health every few seconds.
Switch invisibly, quality changes happen at chunk boundaries, so playback never pauses.
Optimize per scene, Dynamic Optimizer spends bits where they're visible, saves them where they aren't.
Microservices at 1,000+ services
Netflix's backend is the textbook example of microservice architecture: over 1,000 services, each owning one function, recommendations, search, profiles, payments, subtitle delivery, playback authorization. Each is independently deployable, independently scalable, and independently monitored. This is how Netflix pushes code hundreds of times a day without downtime.
It isn't free. A thousand services means a thousand things that can fail and a network call between every pair of them. Netflix had to solve problems most teams never face: service discovery, distributed tracing, circuit breaking, and, the one they're most famous for, engineering for constant, expected failure.
Concern
Monolith
Microservices
Deploy
Whole app at once, coordinated
Each service ships on its own schedule
Scaling
Scale the entire app together
Scale only the hot service (e.g. playback)
Blast radius
One bug can take down everything
A failure is contained to one service
Team ownership
Many teams touch one codebase
One team owns one service end-to-end
Hard parts
Coordination, slow releases
Service discovery, tracing, network failure
Why Netflix moved off the monolith.
Pro tip
Microservices are a trade, not an upgrade. You swap a deployment problem for a distributed-systems problem. Netflix made that trade because the deployment problem was killing them at their scale, most teams aren't there yet.
Chaos engineering: break things on purpose
Netflix invented chaos engineering. They built Chaos Monkey, a tool that randomly terminates production services during business hours, on purpose, while engineers are awake to watch. The philosophy is blunt: if your system can't survive random failure in a controlled experiment, it will absolutely fail in an uncontrolled one at 3 AM.
Chaos Monkey grew into the Simian Army, a suite that simulates entire region failures (Chaos Kong), injects latency, and probes for misconfiguration. The point isn't destruction; it's evidence. Every experiment either proves the system degrades gracefully or hands you a bug before a customer finds it. Resilience stops being a hope and becomes a measured property.
Start with a steady state, a metric that means "healthy" (e.g. successful plays per second).
Hypothesize, "if we kill this service, steady state holds."
Inject the failure in production, on a small blast radius, during working hours.
Measure, did steady state hold, or did you just find a weakness to fix?
Pro tip
Chaos engineering is now industry standard. **AWS Fault Injection Service (FIS)** is the managed version, so you don't have to build your own Simian Army to start. The [Cloud Engineer path](/career-paths/cloud-engineer) covers resilience patterns like this in depth.
Mistakes that cost teams hours (and how Netflix avoids them)
Treating the network as reliable. Every call between services can be slow, fail, or hang. No timeouts and no circuit breakers means one slow dependency stalls the whole request chain. Netflix wraps every remote call in a timeout and a fallback.
Caching nothing at the edge. Sending every request to the origin is the single biggest avoidable latency. Even a short edge cache on static assets changes the experience, Open Connect is this idea taken to its extreme.
Shipping a monolith of microservices. Splitting services but deploying them together gives you all the complexity and none of the independence. If two services must ship together, they're really one.
Skipping observability until it's on fire. If you can't measure it, you can't fix it. Build metrics and tracing in from day one, not after the first incident.
Assuming failures are rare. At scale, something is always broken. Design for graceful degradation, a missing recommendation row should never break playback.
Takeaways
The whole article in seven lines
Streaming at scale is a **distance** problem, get the bytes physically close before they're requested.
**Open Connect** puts Netflix's own CDN appliances inside ISPs and pre-caches content overnight.
Separate the **control plane** (decisions on AWS) from the **data plane** (video on the edge).
**Adaptive bitrate** swaps quality mid-stream; Dynamic Optimizer spends bits only where the eye needs them.
**Microservices** trade a deployment problem for a distributed-systems problem, worth it only at scale.
**Chaos engineering** turns resilience from a hope into a measured, tested property.
You don't need Netflix scale to apply Netflix principles: timeouts, edge caching, and observability pay off on day one.
Where to go next
You don't need 200 million users to use these patterns, you need the fundamentals underneath them. Build the mental model with hands-on labs, then see where they fit in the broader role.
Kubectl lab, practice orchestrating the kind of microservices Netflix runs at scale.
Docker lab, package a service into the portable, immutable unit that makes independent deploys possible.
Networking lab, the routing and latency fundamentals behind CDNs and edge delivery.
Cloud Engineer path, where resilience, observability, and chaos engineering live as a full track.
And read Netflix's own engineering blog, it's one of the best free resources in cloud engineering, and most of what's above came straight from it.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.