Back to Blog
Architecture12 min readMay 2026

How Netflix Streams to 200 Million Devices Without Breaking

The architecture behind Netflix's global streaming platform, from scratch: Open Connect CDN appliances living inside your ISP, adaptive bitrate that adjusts mid-scene, 1,000+ microservices, and chaos engineering that breaks production on purpose.

AWSCDNMicroservicesResilience
SB

Sri Balaji

Founder · TheSimplifiedTech

On this page

The scale problem

It's 9 PM. Across one timezone, millions of people press play within the same five minutes. Each one wants a different show, at a different resolution, on a different device, over a different network, some on fibre, some on a phone with two bars. Nobody wants to wait. Nobody tolerates a spinner. This is the problem Netflix solved, and almost nothing off-the-shelf came close.

Netflix serves 200 million+ subscribers across 190 countries. At peak hours they handle over 15 million concurrent streams, every one a different bitrate, codec, and resolution. The architecture had to be built from the ground up, and the patterns they invented along the way are now the patterns you'll be asked about in senior interviews.

Who this is for

Engineers who already understand the basics, what a CDN is, what a container does, and want to see those pieces assembled into a planet-scale system. If you've shipped a service to one region and wondered "how does this work at 1000x?", this is for you. No Netflix-internal knowledge required; everything here is from their public engineering blog.

What "streaming at scale" actually means

Streaming at scale is less about playing video and more about putting the right bytes as close to the viewer as physically possible, before they ask for them.

The naive design, one big origin server that streams video to everyone, collapses instantly. The public internet is congested, lossy, and far away. The further the bytes travel, the more chances for a stall. So the real game is distance: shrink the gap between the viewer and the video, and most of your problems disappear.

A central warehouse shipping every order cross-countryA single origin server streaming to the whole world
Local corner shops stocked overnight with what the neighbourhood buysOpen Connect appliances pre-loaded with tomorrow's popular titles
Restocking shelves at 4 AM before the shop opensProactive caching during off-peak hours
A delivery driver who picks the closest shop with your itemClient steering, routing your play request to the nearest healthy appliance
Why Netflix doesn't stream from one place.

What happens when you press play

Before the concepts, the picture. Here's the path a single play request takes, from the remote in your hand to the bytes hitting your screen. The solid line is the happy path; the dashed branch is everything watching that path to make sure it stays healthy.

1. authorize + steer2. playback decision3. metadata + license4. nearest appliance URL5. fetch video chunks6. adaptive bitrate streammetrics + traceschaos injection
Client

TV / phone / browser

Open Connect

Appliance inside your ISP

Control plane

Netflix on AWS

Microservices

Playback · DRM · Recs

Data store

Cassandra / EVCache

Observability

Atlas · tracing · chaos

One play request: control plane on AWS decides, data plane (Open Connect) delivers.

  1. 1

    You press play

    The Netflix client calls the control plane running on AWS, not to fetch video, but to ask permission and get directions.

  2. 2

    The control plane decides

    Playback services check your subscription, your device's supported codecs, and your DRM license. Recommendations and metadata come from the data store.

  3. 3

    You get a steering decision

    The control plane returns a list of the best Open Connect appliances for you right now, usually one physically inside your ISP's network.

  4. 4

    Video comes from the edge, not AWS

    Your client fetches the actual video chunks directly from that appliance. The bytes never touch the public internet backbone.

  5. 5

    Quality adjusts in real time

    As you watch, the client measures bandwidth and silently swaps to higher- or lower-quality chunks. You never see the seams.

Pro tip

Notice the split: the **control plane** (decisions, on AWS) is separate from the **data plane** (video bytes, on Open Connect). Separating "who decides" from "who delivers" is one of the most reusable ideas in the whole architecture.

Open Connect: Netflix's own CDN

Most companies rent a CDN, Cloudflare, AWS CloudFront, Akamai. Netflix built their own, called Open Connect. They ship physical servers, Open Connect Appliances (OCAs), and place them directly inside ISPs and internet exchange points around the world. When you hit play, your stream often comes from a box literally in the same building as your internet provider's router.

The trick that makes this work is proactive caching. Netflix predicts what a region will watch tomorrow, based on what's trending, what just dropped, what people in that area binge, and pushes that content to local appliances during off-peak hours, when bandwidth is cheap and idle. By the time you want it, the show is already next door.

DimensionPublic-internet CDNOpen Connect
Where it livesShared edge POPs on the public internetAppliances inside ISPs and IXPs
How content arrivesCached on first request (cache miss = slow)Pre-positioned overnight before anyone asks
Backbone trafficTravels the congested public internetStays off the public backbone entirely
Cost modelPay per GB egress, foreverCapital cost upfront, near-zero marginal egress
ControlVendor's roadmap and tuningFull control of hardware, firmware, steering
Renting a public CDN vs. running Open Connect.

Note

Netflix has **17,000+ Open Connect Appliances deployed in 158 countries**. This is why Netflix buffering is rare even when your connection is mediocre, the bytes barely had to travel.

Adaptive bitrate streaming (ABR)

Your Netflix stream isn't one video file. It's hundreds of pre-encoded chunks at different quality levels, from 235 kbps (mobile, bad signal) up to 16 Mbps (4K HDR). The client measures your available bandwidth every few seconds and switches quality levels mid-stream. You never notice because consecutive chunks overlap and align at segment boundaries.

Netflix doesn't encode every title the same way, either. Their Dynamic Optimizer uses per-title and per-scene analysis to allocate bits where the eye needs them, a slow, dark dialogue scene gets fewer bits than a fast action sequence at the same perceived quality. The result: the same visual quality at a lower average bitrate, which means fewer stalls on weak networks and less bandwidth everywhere.

  • Encode once, serve many, a ladder of bitrates is generated ahead of time, not on the fly.
  • Measure constantly, the client tracks throughput and buffer health every few seconds.
  • Switch invisibly, quality changes happen at chunk boundaries, so playback never pauses.
  • Optimize per scene, Dynamic Optimizer spends bits where they're visible, saves them where they aren't.

Microservices at 1,000+ services

Netflix's backend is the textbook example of microservice architecture: over 1,000 services, each owning one function, recommendations, search, profiles, payments, subtitle delivery, playback authorization. Each is independently deployable, independently scalable, and independently monitored. This is how Netflix pushes code hundreds of times a day without downtime.

It isn't free. A thousand services means a thousand things that can fail and a network call between every pair of them. Netflix had to solve problems most teams never face: service discovery, distributed tracing, circuit breaking, and, the one they're most famous for, engineering for constant, expected failure.

ConcernMonolithMicroservices
DeployWhole app at once, coordinatedEach service ships on its own schedule
ScalingScale the entire app togetherScale only the hot service (e.g. playback)
Blast radiusOne bug can take down everythingA failure is contained to one service
Team ownershipMany teams touch one codebaseOne team owns one service end-to-end
Hard partsCoordination, slow releasesService discovery, tracing, network failure
Why Netflix moved off the monolith.

Pro tip

Microservices are a trade, not an upgrade. You swap a deployment problem for a distributed-systems problem. Netflix made that trade because the deployment problem was killing them at their scale, most teams aren't there yet.

Chaos engineering: break things on purpose

Netflix invented chaos engineering. They built Chaos Monkey, a tool that randomly terminates production services during business hours, on purpose, while engineers are awake to watch. The philosophy is blunt: if your system can't survive random failure in a controlled experiment, it will absolutely fail in an uncontrolled one at 3 AM.

Chaos Monkey grew into the Simian Army, a suite that simulates entire region failures (Chaos Kong), injects latency, and probes for misconfiguration. The point isn't destruction; it's evidence. Every experiment either proves the system degrades gracefully or hands you a bug before a customer finds it. Resilience stops being a hope and becomes a measured property.

  1. Start with a steady state, a metric that means "healthy" (e.g. successful plays per second).
  2. Hypothesize, "if we kill this service, steady state holds."
  3. Inject the failure in production, on a small blast radius, during working hours.
  4. Measure, did steady state hold, or did you just find a weakness to fix?

Pro tip

Chaos engineering is now industry standard. **AWS Fault Injection Service (FIS)** is the managed version, so you don't have to build your own Simian Army to start. The [Cloud Engineer path](/career-paths/cloud-engineer) covers resilience patterns like this in depth.

Mistakes that cost teams hours (and how Netflix avoids them)

  1. Treating the network as reliable. Every call between services can be slow, fail, or hang. No timeouts and no circuit breakers means one slow dependency stalls the whole request chain. Netflix wraps every remote call in a timeout and a fallback.
  2. Caching nothing at the edge. Sending every request to the origin is the single biggest avoidable latency. Even a short edge cache on static assets changes the experience, Open Connect is this idea taken to its extreme.
  3. Shipping a monolith of microservices. Splitting services but deploying them together gives you all the complexity and none of the independence. If two services must ship together, they're really one.
  4. Skipping observability until it's on fire. If you can't measure it, you can't fix it. Build metrics and tracing in from day one, not after the first incident.
  5. Assuming failures are rare. At scale, something is always broken. Design for graceful degradation, a missing recommendation row should never break playback.

Takeaways

The whole article in seven lines

  • Streaming at scale is a **distance** problem, get the bytes physically close before they're requested.
  • **Open Connect** puts Netflix's own CDN appliances inside ISPs and pre-caches content overnight.
  • Separate the **control plane** (decisions on AWS) from the **data plane** (video on the edge).
  • **Adaptive bitrate** swaps quality mid-stream; Dynamic Optimizer spends bits only where the eye needs them.
  • **Microservices** trade a deployment problem for a distributed-systems problem, worth it only at scale.
  • **Chaos engineering** turns resilience from a hope into a measured, tested property.
  • You don't need Netflix scale to apply Netflix principles: timeouts, edge caching, and observability pay off on day one.

Where to go next

You don't need 200 million users to use these patterns, you need the fundamentals underneath them. Build the mental model with hands-on labs, then see where they fit in the broader role.

  • Kubectl lab, practice orchestrating the kind of microservices Netflix runs at scale.
  • Docker lab, package a service into the portable, immutable unit that makes independent deploys possible.
  • Networking lab, the routing and latency fundamentals behind CDNs and edge delivery.
  • Cloud Engineer path, where resilience, observability, and chaos engineering live as a full track.

And read Netflix's own engineering blog, it's one of the best free resources in cloud engineering, and most of what's above came straight from it.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.