Interactive Explainer

🎯Key Takeaways

The fan-out strategy is the single most important architectural decision in a news feed system. Fan-out on write provides the fastest read latency (pre-computed feeds in Redis) but suffers from catastrophic write amplification for high-follower accounts. Fan-out on read eliminates write amplification but shifts the cost to read time. Every production system at FAANG scale uses a hybrid approach that routes posts through write or read fan-out based on the author follower count.

The hybrid fan-out approach classifies users as normal (below a follower threshold, typically 10K-100K) or celebrity (above the threshold). Normal users' posts are fanned out on write to follower feed caches. Celebrity posts are stored once and fetched on demand at read time. The feed generation service merges both sources before applying ML ranking. This bounds write amplification while keeping read latency under 500ms at p99.

Feed ranking has evolved from simple chronological ordering to multi-stage ML pipelines: candidate generation (500-2000 posts), lightweight first-pass scoring (reduce to 200-300), full deep learning ranking (final scoring with 1000+ features), and re-ranking with diversity and policy rules. Real-time engagement features (updated within seconds via streaming pipelines) are the highest-impact signal for ranking accuracy.

The feed cache (Redis sorted sets) stores pre-computed feeds for all active users. Memory sizing: 56 bytes per entry x 800 entries per user x 500M DAU = approximately 22 TB primary. Cursor-based pagination is mandatory to handle concurrent feed insertions. Cache warming, TTL-based eviction for inactive users, and trimming to bounded size per user are essential operational concerns.

Real-time feed delivery uses a hybrid transport: WebSocket connections for active users (sub-second push) and pull-based loading for inactive users. At 500M concurrent connections, the WebSocket fleet requires careful connection management, a distributed presence store, and graceful degradation to polling mode during load spikes. The social graph must be partitioned with awareness of geographic locality and hot follower lists must be replicated to avoid fan-out bottlenecks.

Designing a News Feed System at Scale

A comprehensive deep-dive into designing a production-grade news feed system capable of serving billions of personalized feeds daily across platforms like Facebook, Twitter, Instagram, and LinkedIn. Covers the fundamental fan-out on write versus fan-out on read trade-off, Facebook's hybrid approach for celebrity and normal users, feed generation pipelines with fan-out services and feed caches, ML-based ranking systems from EdgeRank to modern deep learning models, Redis-based feed storage with cursor-based pagination, real-time update delivery via WebSockets and long polling, and scaling the social graph with billions of edges. This is the single most asked system design interview question at FAANG companies because it tests every distributed systems concept: data modeling, fan-out strategies, caching, ranking algorithms, real-time delivery, and graph storage — all in a single 45-minute session.

~44 min read

Be the first to complete!

What you'll learn

The fan-out strategy is the single most important architectural decision in a news feed system. Fan-out on write provides the fastest read latency (pre-computed feeds in Redis) but suffers from catastrophic write amplification for high-follower accounts. Fan-out on read eliminates write amplification but shifts the cost to read time. Every production system at FAANG scale uses a hybrid approach that routes posts through write or read fan-out based on the author follower count.
The hybrid fan-out approach classifies users as normal (below a follower threshold, typically 10K-100K) or celebrity (above the threshold). Normal users' posts are fanned out on write to follower feed caches. Celebrity posts are stored once and fetched on demand at read time. The feed generation service merges both sources before applying ML ranking. This bounds write amplification while keeping read latency under 500ms at p99.
Feed ranking has evolved from simple chronological ordering to multi-stage ML pipelines: candidate generation (500-2000 posts), lightweight first-pass scoring (reduce to 200-300), full deep learning ranking (final scoring with 1000+ features), and re-ranking with diversity and policy rules. Real-time engagement features (updated within seconds via streaming pipelines) are the highest-impact signal for ranking accuracy.
The feed cache (Redis sorted sets) stores pre-computed feeds for all active users. Memory sizing: 56 bytes per entry x 800 entries per user x 500M DAU = approximately 22 TB primary. Cursor-based pagination is mandatory to handle concurrent feed insertions. Cache warming, TTL-based eviction for inactive users, and trimming to bounded size per user are essential operational concerns.
Real-time feed delivery uses a hybrid transport: WebSocket connections for active users (sub-second push) and pull-based loading for inactive users. At 500M concurrent connections, the WebSocket fleet requires careful connection management, a distributed presence store, and graceful degradation to polling mode during load spikes. The social graph must be partitioned with awareness of geographic locality and hot follower lists must be replicated to avoid fan-out bottlenecks.

Lesson outline

The News Feed Problem — Why It Is the Hardest Interview Question

The news feed is the central product surface of every major social platform. When a user opens Facebook, Twitter, Instagram, or LinkedIn, the first thing they see is a personalized stream of content selected from millions of possible posts. The feed must feel instant (under 200 milliseconds to render), feel relevant (showing content the user actually cares about), and feel fresh (reflecting activity from seconds ago, not hours). Building this experience at scale is arguably the hardest problem in consumer internet engineering.

Consider the numbers. Facebook has over 3 billion monthly active users. The average user follows hundreds of friends and pages, each producing multiple posts per day. At any given moment, there are tens of thousands of candidate posts that could appear in a single user's feed. The system must evaluate all of these candidates, score them with a machine learning model, assemble the top results, and deliver them to the user's device — all within a few hundred milliseconds. This happens billions of times per day, every day.

Why Interviewers Love This Question

A news feed system touches every distributed systems concept: data modeling (posts, users, follow graphs), fan-out strategies (write-time vs read-time vs hybrid), caching (feed cache, content cache, social graph cache), ranking algorithms (ML scoring, feature extraction, real-time signals), real-time delivery (WebSockets, long polling, server-sent events), and graph storage (adjacency lists, edge tables, graph partitioning). It can be discussed at L4 depth (basic fan-out + chronological feed) or L7 depth (hybrid fan-out with ML ranking and real-time feature pipelines). This range makes it the most versatile and frequently asked system design question at FAANG companies.

Twitter processes over 500 million tweets per day from 400 million monthly active users. Instagram handles over 100 million photo uploads daily. LinkedIn serves feeds to over 900 million professionals. Each platform has evolved its own approach to the feed problem, but they all grapple with the same fundamental trade-offs: latency versus freshness, personalization versus simplicity, write amplification versus read amplification.

Scale numbers across major platforms

Facebook — 3B+ MAU, 2B+ DAU, average user has 338 friends, median feed request returns 250+ candidate stories scored by ML models. Over 100 billion feed impressions served per day.
Twitter — 400M+ MAU, 500M+ tweets/day, median user follows 400 accounts. Timeline service processes 300K+ requests/second at peak. Celebrity accounts have 10-100M+ followers each.
Instagram — 2B+ MAU, 100M+ photos/day, Explore feed evaluates 500M+ candidate posts per request. Stories and Reels add additional feed surfaces beyond the main feed.
LinkedIn — 900M+ members, 1.5M+ posts/day in the main feed. Professional content has longer shelf life than social content, changing ranking dynamics significantly.

Scale Numbers to Remember for Interviews

A feed system at Facebook scale serves 2B+ daily feed requests. Each request evaluates 500+ candidate posts through ML ranking. The fan-out service processes 10M+ new posts per minute during peak hours. The feed cache (Redis) stores pre-computed feeds for all active users, consuming 100+ TB of memory across thousands of Redis instances. Timeline generation latency target: p99 under 500ms. These numbers drive every architectural decision.

The core technical challenge is the fan-out problem. When a user publishes a post, that post must somehow reach every one of their followers' feeds. For a regular user with 500 followers, this means 500 feed updates. For a celebrity with 50 million followers, this means 50 million feed updates from a single post. The naive approach — writing to every follower's feed at post time — works beautifully for regular users but catastrophically fails for celebrities. The opposite approach — computing feeds at read time by pulling from all followed accounts — works for celebrities but is too slow for users who follow thousands of accounts.

The Interview Opening

When asked to design a news feed, strong candidates immediately frame the problem around the fan-out trade-off. Start by saying: "The central architectural decision is fan-out on write versus fan-out on read. Let me walk through both approaches and then explain why production systems use a hybrid." This shows the interviewer you understand the problem deeply before drawing a single box.

This entry walks through the complete design from requirements to production architecture. We cover the data model, the fan-out trade-off in depth, Facebook's actual hybrid approach, the feed generation pipeline, ML-based ranking, feed storage and caching, real-time update delivery, and scaling the social graph. Each section is written at the depth expected in a FAANG L5-L6 system design interview.

Requirements & Data Model

Before drawing any architecture diagrams, a strong system design answer begins with clarifying requirements. For a news feed system, the functional requirements define what the feed shows and how users interact with it, while non-functional requirements define the latency, availability, and scale targets that drive every architectural decision.

Functional requirements

Publish posts — Users can create posts containing text, images, videos, and links. Posts are associated with the author and a timestamp. Posts can be edited or deleted after creation.
Follow/friend relationships — Users can follow other users (asymmetric, like Twitter/Instagram) or establish bidirectional friendships (symmetric, like Facebook). The feed shows content from followed/friended accounts.
Personalized feed generation — Each user sees a unique feed ranked by relevance, not just reverse chronological order. The ranking considers engagement signals, relationship strength, content type preferences, and timeliness.
Feed pagination — Users can scroll through their feed, loading older content progressively. New content appears at the top when the user refreshes or returns to the feed.
Real-time updates — Active users see new posts appear in their feed without manually refreshing. A notification badge or inline indicator shows the count of new posts available.
Engagement actions — Users can like, comment, share, and bookmark posts directly from the feed. These engagement signals feed back into the ranking algorithm for future feed generation.

Non-functional requirements

Availability — 99.99% uptime — the feed is the primary product surface, so any downtime directly impacts DAU and revenue. Feed generation must be fault-tolerant with graceful degradation (fall back to cached feed if ranking is unavailable).
Latency — Feed load time under 500ms at p99 for the initial page. Subsequent pagination requests under 200ms. Real-time updates delivered within 5 seconds of post creation for active users.
Scale — Support 2B+ DAU with 10B+ feed requests per day. Handle 10M+ new posts per minute during peak hours. Feed cache must store pre-computed feeds for 500M+ daily active users.
Consistency — Eventual consistency is acceptable — a new post may take up to 30 seconds to appear in all followers' feeds. But a user's own posts must appear in their feed immediately (read-your-writes consistency).
Throughput — The fan-out service must handle burst write loads during viral events (elections, sporting events, celebrity announcements) where post volume increases 5-10x above baseline.

The data model is the foundation of the entire system. Every architectural decision — from fan-out strategy to caching to ranking — depends on how we model posts, users, and their relationships.

Entity	Key Fields	Storage	Notes
Post	post_id (snowflake), author_id, content, media_urls[], created_at, updated_at, post_type	Sharded MySQL/PostgreSQL by author_id	Posts are immutable after creation for caching purposes. Edits create a new version. Media stored in object storage (S3) with URLs in the post record.
User	user_id, username, display_name, avatar_url, follower_count, following_count, is_celebrity (boolean)	Sharded MySQL by user_id	The is_celebrity flag (follower_count > threshold) determines fan-out strategy. Threshold is typically 10K-100K followers.
Follow Graph	follower_id, followee_id, created_at, relationship_type	Edge table sharded by followee_id, replicated shard by follower_id	Two access patterns: "who does user X follow?" (for fan-out on read) and "who follows user X?" (for fan-out on write). Requires two indexes or a graph database.
Feed Entry	user_id, post_id, score, created_at	Redis sorted set per user (feed cache)	Pre-computed feed entries for fan-out on write users. Score is the ML ranking score. TTL of 7 days for inactive users.
Activity/Engagement	user_id, post_id, action_type (like/comment/share), created_at	Sharded MySQL + Kafka stream	Engagement events are both stored for analytics and streamed to the ranking feature pipeline for real-time signal updates.

Post ID Generation

Use a Snowflake-style ID generator for post_id: 41 bits for timestamp (gives 69 years), 10 bits for machine ID (1024 machines), 12 bits for sequence (4096 IDs per millisecond per machine). This gives chronologically sortable IDs that are globally unique without coordination. Twitter invented this approach specifically for tweet IDs, and it eliminates the need for a centralized ID generator.

The Follow Graph Is the Hidden Complexity

The follow graph seems simple (just a table of edges) but it drives the entire system complexity. A bidirectional lookup — both "who do I follow?" and "who follows me?" — is required, and these two access patterns have very different cardinalities. Most users follow hundreds of accounts (small fan-out on read), but some accounts are followed by millions (massive fan-out on write). Sharding the follow graph correctly is critical: shard by followee_id for efficient fan-out on write (find all followers of user X), and maintain a secondary index by follower_id for efficient fan-out on read (find all accounts user X follows).

The activity model captures every user interaction with the feed. Likes, comments, shares, and even implicit signals like scroll dwell time and video watch duration are recorded. These signals serve dual purposes: they are stored in a data warehouse for offline analytics, and they are streamed in real time to the ranking feature pipeline, which updates the ML model's feature store within seconds. This real-time feedback loop is what allows modern feeds to adapt to user behavior within a single session.

Fan-Out on Write vs Fan-Out on Read

The fan-out strategy is the single most important architectural decision in a news feed system. It determines how new posts reach followers' feeds, and it has cascading effects on write amplification, read latency, storage costs, and system complexity. Understanding this trade-off deeply is what separates L4 candidates from L6 candidates in system design interviews.

Fan-out on write (also called push model) means that when a user publishes a post, the system immediately writes that post's ID into the feed cache of every follower. The follower's feed is pre-computed and stored in a sorted set (typically Redis). When a follower opens their feed, the system simply reads the pre-computed sorted set — a single O(log N) Redis read per page. This makes reads extremely fast (single-digit milliseconds) but writes can be expensive if the author has many followers.

Fan-out on read (also called pull model) means that no work is done at write time beyond storing the post in the author's post table. When a follower opens their feed, the system looks up all accounts the user follows, fetches recent posts from each account, merges and ranks them, and returns the result. This makes writes cheap (a single database insert) but reads are expensive because the system must query multiple data sources and merge results in real time.

Dimension	Fan-Out on Write (Push)	Fan-Out on Read (Pull)
Write cost	O(followers) writes per post — a user with 10K followers triggers 10K feed cache insertions	O(1) — a single insert into the posts table regardless of follower count
Read cost	O(1) — read the pre-computed feed from a single Redis sorted set	O(following) — fetch posts from all N followed accounts, merge, and rank in real time
Latency	Read latency: 1-5ms. Write latency: milliseconds to seconds depending on follower count	Read latency: 50-500ms depending on how many accounts the user follows. Write latency: 1ms
Storage	High — every post is duplicated across all followers' feed caches. A post seen by 10K users is stored 10K times.	Low — each post is stored once. Feed is computed on the fly from source data.
Freshness	Eventual — new posts appear in feeds after the fan-out completes (seconds to minutes for celebrities)	Immediate — reads always fetch the latest posts directly from source tables
Celebrity problem	Catastrophic — a single tweet from a user with 30M followers triggers 30M writes, creating massive write amplification	No problem — celebrity post is stored once; reads fetch it alongside other posts
Inactive user waste	High — fan-out writes to feeds of users who may never log in again, wasting write bandwidth and storage	None — no computation happens until a user actually requests their feed

graph TD
    subgraph FOW["Fan-Out on Write (Push)"]
        A1[User A publishes post] --> FOS1[Fan-Out Service]
        FOS1 --> FC1[Follower 1 Feed Cache]
        FOS1 --> FC2[Follower 2 Feed Cache]
        FOS1 --> FC3[Follower 3 Feed Cache]
        FOS1 --> FCN[Follower N Feed Cache]
        FC1 --> R1[Read: Single Redis GET]
    end

    subgraph FOR["Fan-Out on Read (Pull)"]
        R2[User B opens feed] --> FG[Fetch Following List]
        FG --> P1[Query User X Posts]
        FG --> P2[Query User Y Posts]
        FG --> P3[Query User Z Posts]
        P1 --> MR[Merge + Rank]
        P2 --> MR
        P3 --> MR
        MR --> FEED[Return Feed]
    end

Fan-Out on Write vs Fan-Out on Read

The Naive Mistake: Choosing One Approach for Everyone

Many candidates pick fan-out on write or fan-out on read and apply it uniformly to all users. This is a red flag in interviews. Fan-out on write is optimal for the 99% of users who have fewer than 10K followers (fast reads, manageable write cost). Fan-out on read is optimal for the 1% of users who have millions of followers (avoids write amplification). The correct answer is always the hybrid approach, and strong candidates explain the threshold-based routing without being prompted.

The mathematics make the trade-off concrete. Consider a platform with 500 million DAU where the average user follows 200 accounts and posts once per day. With pure fan-out on write, each post triggers 200 feed cache writes on average (the average user has 200 followers). That is 500M posts x 200 writes = 100 billion feed cache writes per day. This is large but manageable with a distributed Redis cluster. Now consider a celebrity with 50 million followers posting a single tweet. That one post alone triggers 50 million writes — equivalent to the write load of 250,000 normal users. Ten celebrities posting simultaneously generates 500 million writes in seconds, which is half the entire day's baseline load concentrated in a burst.

With pure fan-out on read, the celebrity problem disappears, but read latency explodes. A user who follows 1,000 accounts needs 1,000 database queries merged in real time. Even with parallel execution and connection pooling, fetching from 1,000 shards and merging results adds 100-500ms of latency to every feed request. At 2 billion daily feed requests, this means 2 trillion database queries per day just for feed generation — a 10x increase in database load compared to fan-out on write.

The Key Interview Insight

When explaining the trade-off, frame it as write amplification versus read amplification. Fan-out on write has high write amplification (one post becomes N writes) but zero read amplification (one read per feed request). Fan-out on read has zero write amplification but high read amplification (one feed request becomes N reads). The hybrid approach minimizes both by choosing the strategy per-user based on their follower count.

The Hybrid Approach — How Facebook Actually Does It

Every major social platform at scale uses a hybrid fan-out approach. The concept is straightforward: use fan-out on write for the vast majority of users (who have manageable follower counts), and use fan-out on read for the small percentage of users who have massive follower counts (celebrities, news organizations, influencers). The threshold between the two strategies is a configurable parameter, typically set between 10,000 and 100,000 followers.

When a normal user (below the follower threshold) publishes a post, the fan-out service reads their follower list and writes the post ID into each follower's feed cache (a Redis sorted set). This is the fan-out on write path. Since the user has at most a few thousand followers, the fan-out completes in milliseconds to seconds and the write amplification is bounded.

When a celebrity user (above the follower threshold) publishes a post, the system does not fan out at write time. Instead, the post is simply stored in the celebrity's post table and indexed for fast retrieval. When any user opens their feed, the feed generation service first reads the pre-computed feed from cache (containing posts from normal users that were fanned out at write time), then queries the post tables of any celebrities the user follows to fetch their latest posts, merges these two result sets, applies ML ranking, and returns the combined feed. This is the hybrid read path — it combines the pre-computed cache with on-demand queries for celebrity content.

graph TD
    NP[Normal User Posts] --> FS[Fan-Out Service]
    FS -->|"follower_count < threshold"| WP[Write to Each Follower Feed Cache]
    WP --> FC[Feed Cache - Redis Sorted Sets]

    CP[Celebrity Posts] --> PS[Post Store Only]

    UR[User Opens Feed] --> FGS[Feed Generation Service]
    FGS --> FC
    FGS -->|"Fetch celebrity posts on read"| PS
    FC --> MG[Merge Normal + Celebrity Posts]
    PS --> MG
    MG --> RK[ML Ranking Service]
    RK --> FEED[Ranked Feed Response]

Hybrid Fan-Out Architecture

Choosing the Celebrity Threshold

The threshold is not a fixed number — it is tuned based on system capacity. Start with 10,000 followers as the initial threshold. Monitor the fan-out service write queue depth and latency. If fan-out write latency exceeds 5 seconds at p99, lower the threshold. If the read-time merge for celebrity posts adds more than 100ms at p50, raise the threshold. Facebook reportedly uses a threshold around 10,000-50,000 followers, while Twitter historically used a lower threshold. The threshold can also be dynamic — lowered during peak events (elections, Super Bowl) when post volume spikes.

The hybrid approach introduces complexity at the read path. When generating a feed, the service must: (1) read the pre-computed feed cache for the user, which contains post IDs from normal followed accounts fanned out at write time, (2) determine which celebrity accounts the user follows by querying the follow graph, (3) fetch the latest N posts from each celebrity account, (4) merge the two sets of posts by timestamp or score, (5) apply ML ranking to the merged candidate set, and (6) return the top K results. Steps 2-3 are the fan-out on read portion, and they add 20-50ms of latency compared to a pure fan-out on write system.

Optimization strategies for the hybrid read path

Celebrity post cache — Maintain a separate Redis cache of recent posts from celebrity accounts, keyed by celebrity user_id. This avoids hitting the primary post database for celebrity content at read time. TTL of 24 hours with cache-aside invalidation on new posts.
Following list partitioning — When loading a user's following list, separate normal accounts (whose posts are already in the feed cache) from celebrity accounts (which need on-demand fetching). Cache this partitioned list with a 1-hour TTL to avoid recomputing the partition on every feed request.
Parallel celebrity fetch — Fetch posts from all celebrity accounts the user follows in parallel, not sequentially. If a user follows 20 celebrities, issue 20 parallel Redis reads to the celebrity post cache. With a 2ms per-read latency, parallel execution keeps the total under 5ms instead of 40ms sequential.
Prefetch on login — When a user opens the app (login event), asynchronously start fetching celebrity posts and pre-merging them with the cached feed. By the time the user navigates to the feed tab, the merged result is already warm in a request-scoped cache.

The Consistency Challenge

The hybrid approach introduces a subtle consistency issue. Posts from normal users appear in the feed cache after fan-out completes (typically 1-5 seconds). Posts from celebrities appear at read time from the post store (effectively real-time). This means a celebrity post may appear in a user's feed before a normal user's post that was published earlier, because the normal post is still being fanned out. For most products this ordering difference is acceptable because ML ranking reorders posts by score anyway, but it is worth mentioning in an interview to show you understand the trade-off.

An important edge case is users who transition from normal to celebrity status. When a user crosses the follower threshold, the system must stop fanning out their posts to follower feed caches and switch to the celebrity read path. This transition should be gradual: the user's recent posts already in follower feed caches remain there (they will naturally expire via TTL), and new posts are written only to the post store. The follow graph cache is updated to reclassify this account as a celebrity for all followers. The reverse transition (celebrity losing followers below the threshold) is less common but should also be handled gracefully.

Feed Generation Pipeline

The feed generation pipeline is the backbone of the entire system. It orchestrates the flow of a new post from creation through fan-out, caching, ranking, and delivery to the end user. Understanding this pipeline end-to-end is essential for a strong interview answer because it shows you can think about data flow, not just individual components.

graph LR
    UP[User Publishes Post] --> API[API Gateway]
    API --> PW[Post Write Service]
    PW --> DB[(Post Database)]
    PW --> KF[Kafka - Post Events Topic]

    KF --> FOS[Fan-Out Service]
    FOS -->|"Normal user"| REDIS[(Feed Cache - Redis)]
    FOS -->|"Celebrity"| SKIP[Skip - stored in Post DB only]

    KF --> IDX[Search Indexer]
    KF --> AN[Analytics Pipeline]
    KF --> NT[Notification Trigger]

    USER[User Opens Feed] --> FGS[Feed Generation Service]
    FGS --> REDIS
    FGS --> CELEB[Celebrity Post Cache]
    FGS --> MERGE[Merge Candidates]
    MERGE --> RANK[ML Ranking Service]
    RANK --> ASSEMBLE[Feed Assembly]
    ASSEMBLE --> CDN[CDN - Media Attachments]
    ASSEMBLE --> RESP[Feed Response to Client]

End-to-End Feed Generation Pipeline

The pipeline begins when a user publishes a post. The API gateway validates the request, authenticates the user, and forwards it to the Post Write Service. This service writes the post to the primary database (sharded MySQL or PostgreSQL, partitioned by author_id for write locality) and simultaneously publishes a PostCreated event to a Kafka topic. The Kafka event is the trigger for all downstream processing — fan-out, search indexing, analytics, and notification generation all consume from this topic independently.

Fan-out service processing steps

→

Consume PostCreated event from Kafka. Extract author_id and post_id.

→

Look up author_id in the user service to determine follower count and celebrity status.

→

If the author is NOT a celebrity (follower_count < threshold): fetch the complete follower list from the social graph service. For authors with many followers, this list is fetched in batches of 5,000.

→

For each batch of followers, write (post_id, score=timestamp) to each follower's feed cache (Redis sorted set). Use Redis ZADD with the post timestamp as the score for chronological ordering. Pipeline the Redis commands in batches of 100 for throughput.

→

If the author IS a celebrity: do not fan out. The post is already in the Post Database and will be fetched at read time by followers. Optionally update the celebrity post cache (a separate Redis structure) for fast read-time access.

Acknowledge the Kafka offset only after all writes complete successfully. If any batch fails, the entire event is retried from the beginning (idempotent writes via ZADD ensure no duplicates).

Consume PostCreated event from Kafka. Extract author_id and post_id.

Look up author_id in the user service to determine follower count and celebrity status.

Acknowledge the Kafka offset only after all writes complete successfully. If any batch fails, the entire event is retried from the beginning (idempotent writes via ZADD ensure no duplicates).

The fan-out service is the most write-intensive component in the system. At Facebook scale, it processes 10 million or more new posts per minute during peak hours, each triggering an average of 300-500 feed cache writes (average follower count for non-celebrity users). This means the fan-out service generates 3-5 billion Redis writes per minute at peak. To handle this load, the fan-out service runs as a horizontally scaled fleet of stateless workers consuming from Kafka partitions, with the Kafka topic partitioned by author_id to ensure ordering per author.

Fan-Out Service Scaling

Scale the fan-out service by Kafka partition count. Each partition is consumed by exactly one worker in a consumer group. Start with 256 Kafka partitions for the PostCreated topic and scale to 1024+ as load increases. Each worker maintains a Redis connection pool (typically 50-100 connections) and uses Redis pipelining to batch ZADD commands. A single worker can process 50,000 fan-out writes per second. With 256 workers, the cluster sustains 12.8 million fan-out writes per second.

Feed Cache Trimming

Each user's feed cache (Redis sorted set) is trimmed to the most recent 800-1000 entries using ZREMRANGEBYRANK after each ZADD. This bounds memory consumption per user. Users rarely scroll past 200 posts in a session, so keeping 800-1000 provides a buffer for re-ranking and filtering. For users who scroll deeply, the system falls back to a database query for older posts beyond the cache window.

When a user opens their feed, the Feed Generation Service orchestrates the read path. It first reads the user's pre-computed feed from the Redis cache (posts from normal followed accounts). It then identifies celebrity accounts the user follows and fetches their recent posts from the celebrity post cache. These two sets of candidates are merged into a single candidate list, typically containing 500-1000 posts. The candidate list is passed to the ML Ranking Service, which scores each post based on hundreds of features (more detail in the next section). The top 20-50 posts are selected for the first page, and the Feed Assembly service hydrates them with full content, media URLs, engagement counts, and author profiles before returning the response.

The Cold Start Problem

New users have no feed cache and no follow graph. The system must bootstrap their feed with popular content, trending posts, and algorithmically recommended accounts to follow. Similarly, users who return after a long absence may have an expired or stale feed cache. The feed generation service detects cold start conditions (empty or expired cache) and triggers a full fan-out on read from all followed accounts, caching the result for future requests. This cold start path is significantly slower (200-500ms vs 10-50ms for cached feeds) and must be monitored separately.

Feed Ranking & ML Scoring

Chronological feeds are simple to implement but deliver a poor user experience at scale. When a user follows 500 accounts, a chronological feed is dominated by the most prolific posters rather than the most relevant content. Modern feed ranking uses machine learning models to score every candidate post and select the ones most likely to be meaningful to the user. This transformation — from chronological to ranked — is one of the most impactful product changes in social media history.

Facebook's original ranking algorithm was called EdgeRank, introduced around 2010. It used three factors: affinity (how close the viewer is to the content creator, based on interaction history), weight (the type of interaction — comments were weighted higher than likes), and time decay (newer content scored higher). EdgeRank was a hand-tuned linear model with about a dozen features. It was simple to understand and debug, but it could not capture the complex non-linear relationships between hundreds of signals that determine what a user actually wants to see.

The Evolution from EdgeRank to Deep Learning

EdgeRank (2010): ~12 hand-tuned features, linear scoring. Machine Learning v1 (2013): Logistic regression with ~100 features, trained on engagement data. ML v2 (2016): Gradient-boosted decision trees (GBDT) with ~1,000 features, near-real-time training. Deep Learning (2019+): Neural networks with ~10,000+ features including embeddings for text, images, and user behavior. Each generation brought measurably better engagement metrics, but also increased system complexity and computational cost by an order of magnitude.

Modern feed ranking at FAANG companies follows a multi-stage architecture designed to balance accuracy with computational cost. The pipeline has three stages: candidate generation, ranking, and re-ranking. Each stage reduces the number of candidates while increasing the sophistication (and cost) of the scoring model.

Multi-stage ranking pipeline

→

Candidate generation: Retrieve 500-2000 candidate posts from the feed cache (fan-out on write posts) and celebrity post cache (fan-out on read posts). Apply lightweight filters to remove blocked accounts, muted keywords, and already-seen posts. This stage runs in under 10ms.

→

First-pass ranking (lightweight model): Score all candidates using a fast model (logistic regression or small neural network) with 50-100 features. This model runs inference in under 1ms per candidate and reduces the set from 1000+ to 200-300 candidates. Features include: post age, author-viewer interaction count, post type, and pre-computed engagement rate.

→

Full ranking (heavy model): Score the 200-300 candidates using the full deep learning model with 1000+ features including text embeddings, image understanding, user behavior sequences, and real-time engagement signals. Inference takes 5-10ms per candidate, but batching allows the 300 candidates to be scored in 20-50ms total on GPU.

→

Re-ranking and policy layer: Apply diversity rules (no more than 2 consecutive posts from the same author), freshness boosts (ensure recent posts are not buried), and content policy filters (reduce distribution of borderline content). Insert ads at fixed positions. This stage produces the final ranked feed.

Select the top 20-30 posts for the first page and return them. Cache the full ranked list for the user's session so that subsequent pagination requests do not re-trigger ranking.

Key features used in feed ranking models

Author-viewer affinity — How often the viewer interacts with this author: likes, comments, profile visits, message exchanges, and tag co-occurrences. Computed offline daily and updated in real-time with session interactions.
Post engagement rate — The ratio of engagements (likes + comments + shares) to impressions for this post across all viewers so far. A post with a 15% engagement rate from its first 1000 impressions is likely high-quality content.
Content type preference — Does this viewer prefer photos, videos, links, or text posts? Computed from the viewer's historical engagement patterns across content types.
Recency decay — Exponential time decay function that reduces score as the post ages. The decay rate varies by content type — news articles decay faster than personal photos because timeliness matters more for news.
Social proof signals — Has a close friend of the viewer already engaged with this post? Social proof from trusted connections is a strong signal of relevance.
Negative signals — Has the viewer hidden posts from this author before? Has the viewer snoozed this content type? Has the post received a high "hide" or "report" rate from other viewers? These signals reduce the score.

Real-Time Feature Updates

The most impactful ranking improvement at Facebook was adding real-time engagement features. When a post is published, its initial engagement rate is zero, and the model must rely on author-level and content-type features. Within minutes, the post accumulates its first likes and comments, and these real-time signals dramatically improve ranking accuracy. The feature pipeline streams engagement events from Kafka to a feature store (typically a high-throughput key-value store like Memcached or a specialized system like Feast) within 5-10 seconds. The ranking model reads these real-time features at inference time, allowing a post's score to evolve rapidly based on actual audience response.

A critical aspect of feed ranking that is often overlooked in interviews is the training pipeline. The ranking model is trained on implicit feedback data: positive labels are posts the user engaged with (liked, commented, clicked), and negative labels are posts the user saw but did not interact with. The model is retrained continuously — at Facebook, the ranking model is updated every few hours using the latest engagement data. This creates a feedback loop: the model's predictions determine what users see, and what users see determines the training data for the next model version. Managing this feedback loop to avoid filter bubbles and echo chambers is an active research problem.

The Ranking Latency Budget

The total feed generation latency budget is typically 500ms at p99. Within this budget, candidate generation takes 10ms, lightweight ranking takes 20ms, full ranking takes 50ms, re-ranking takes 10ms, and feed assembly (hydrating posts with content and media URLs) takes 30ms. That leaves about 380ms for network round trips, cache misses, and retry overhead. Going over the latency budget means users see a loading spinner, which directly reduces feed engagement metrics. Every additional feature or model layer must justify its latency cost with measurable engagement improvement.

Feed Storage & Caching

The feed cache is the heart of the news feed system's read path. Its design determines whether the system can serve billions of feed requests per day with single-digit millisecond latency. At FAANG scale, the feed cache is almost always built on Redis (or a Redis-compatible system like Twitter's custom Pelikan) because Redis sorted sets provide exactly the data structure needed: a per-user ordered collection of post IDs that supports efficient range queries for pagination.

Each user's feed is stored as a Redis sorted set where the key is the user ID and each member is a post ID with a score representing either the post timestamp (for chronological ordering) or the ML ranking score (for ranked feeds). The ZADD command inserts posts into the feed, ZREVRANGEBYSCORE retrieves a page of posts in descending order, and ZREMRANGEBYRANK trims the set to a maximum size after each insertion.

Operation	Redis Command	Time Complexity	Use Case
Add post to feed	ZADD feed:{user_id} {score} {post_id}	O(log N)	Fan-out on write: insert new post into follower feed
Get feed page	ZREVRANGEBYSCORE feed:{user_id} {max_score} {min_score} LIMIT 0 20	O(log N + M)	Feed read: retrieve 20 posts starting from cursor
Trim feed	ZREMRANGEBYRANK feed:{user_id} 0 -{max_size}	O(log N + M)	Keep feed bounded to 800-1000 entries after each insert
Remove post	ZREM feed:{user_id} {post_id}	O(log N)	Post deletion: remove from all affected feeds
Check feed size	ZCARD feed:{user_id}	O(1)	Monitoring: track feed cache sizes for capacity planning

Cursor-Based Pagination

Never use offset-based pagination (LIMIT offset, count) for feeds. If a user is on page 3 and new posts are inserted at the top, offset-based pagination causes posts to shift and the user sees duplicates or misses posts. Instead, use cursor-based pagination: the client sends the score (timestamp or ranking score) of the last post it received, and the server returns posts with scores lower than the cursor. Redis ZREVRANGEBYSCORE with a max score equal to the cursor naturally supports this. The cursor is opaque to the client — encode it as a base64 string containing the score and post_id (for tie-breaking when scores are equal).

Memory sizing for the feed cache is a critical capacity planning exercise. Consider a platform with 500 million DAU. Each user's feed cache stores 800 entries, where each entry is a post_id (8 bytes) and a score (8 bytes), plus Redis sorted set overhead (~40 bytes per entry). That is approximately 56 bytes per entry x 800 entries = 44.8 KB per user. For 500M users: 44.8 KB x 500M = 22.4 TB of memory. With Redis replication (1 primary + 1 replica), this doubles to approximately 45 TB. A typical Redis instance runs on a machine with 64-128 GB of RAM, so the feed cache requires 350-700 Redis instances, organized into a cluster with consistent hashing for key distribution.

Feed cache sizing estimates

Per-entry size — Post ID (8 bytes, Snowflake) + Score (8 bytes, double) + Redis sorted set overhead (~40 bytes for skiplist node and dict entry) = ~56 bytes per entry.
Per-user size — 800 entries x 56 bytes = 44.8 KB per user. This is the steady-state size after trimming.
Total cache size (500M DAU) — 44.8 KB x 500M = 22.4 TB primary. With replication: ~45 TB. With 128 GB per Redis instance: ~350 instances.
Total cache size (2B DAU) — 44.8 KB x 2B = 89.6 TB primary. With replication: ~180 TB. With 128 GB per Redis instance: ~1,400 instances.
Inactive user eviction — Users who have not logged in for 7+ days have their feed cache evicted (TTL-based). Their feed is regenerated on next login via the cold start path.

Cache Invalidation Challenges

When a post is deleted, it must be removed from every follower's feed cache. For a user with 10K followers, this means 10K ZREM operations. For a celebrity, the post was never fanned out, so deletion only requires removing it from the post store and celebrity post cache. When a user unfollows an account, their feed cache should eventually stop showing that account's posts, but immediate removal of all historical posts is expensive. In practice, unfollowed accounts' posts are filtered at read time (excluded during feed generation) and naturally expire from the cache via the trimming mechanism.

Beyond the feed cache, the system requires several additional caching layers. A content cache stores the full post content (text, media URLs, engagement counts) to avoid hitting the post database for every feed hydration. A social graph cache stores follow relationships for fast lookup during fan-out and feed generation. A user profile cache stores author display names, avatars, and verification badges. These caches are typically implemented as separate Redis clusters or Memcached pools, each sized independently based on their access patterns and memory requirements.

Cache Warming Strategy

After a cache node failure or cluster expansion, new nodes start cold with no data. A cache warming job pre-loads feed caches for the most active users (sorted by login recency) from the post database. This prevents a thundering herd of cache misses when thousands of users open their feeds simultaneously after a cache restart. The warming job typically takes 15-30 minutes and is prioritized by user activity level.

Real-Time Updates — Push vs Pull vs Long Polling

A great news feed does not just load content when the user opens the app — it continuously updates with new posts in real time. When a friend publishes a post, the user should see it within seconds without manually refreshing. Implementing this real-time delivery at scale requires carefully choosing the right transport mechanism based on the user's connection state and device capabilities.

There are three fundamental approaches to delivering real-time updates to clients: short polling (the client repeatedly asks the server for new content at fixed intervals), long polling (the client sends a request that the server holds open until new content is available), and WebSockets (a persistent bidirectional connection between client and server). Each has different trade-offs for latency, server resource consumption, and implementation complexity.

Mechanism	Latency	Server Cost	Best For
Short Polling	High (depends on poll interval, typically 15-60 seconds)	High — most requests return empty responses, wasting server CPU and bandwidth	Legacy clients, simple implementations, very low update frequency
Long Polling	Medium (seconds — server responds as soon as new content arrives)	Medium — each client holds one open connection, but no wasted empty responses	Clients that cannot support WebSockets, moderate update frequency
WebSocket	Low (sub-second — server pushes immediately upon new content)	Low per-update but high connection overhead — each active client maintains a persistent TCP connection	Active users with high update frequency, real-time feed updates
Server-Sent Events (SSE)	Low (sub-second push from server)	Lower than WebSocket — unidirectional, no upgrade handshake, automatic reconnection built into the protocol	Feed updates where only server-to-client push is needed

graph TD
    NP[New Post Published] --> KF[Kafka - Post Events]

    KF --> FOS[Fan-Out Service]
    FOS --> FC[Feed Cache Update]

    KF --> RTS[Real-Time Service]
    RTS --> CM{User Connection State?}

    CM -->|"WebSocket connected"| WS[Push via WebSocket]
    CM -->|"SSE connected"| SSE[Push via SSE]
    CM -->|"Not connected"| SKIP2[Skip - user will pull on next open]

    WS --> CLIENT1[Active Client - Instant Update]
    SSE --> CLIENT2[Web Client - Instant Update]
    SKIP2 --> FC2[Feed Cache Ready for Next Pull]

    CLIENT3[Inactive User Opens App] -->|"HTTP GET /feed"| PULL[Pull Latest Feed from Cache]

Hybrid Real-Time Delivery Architecture

The production approach at most FAANG companies is a hybrid: WebSocket connections for active users and pull-based loading for inactive users. When a user opens the app, the client establishes a WebSocket connection to the real-time service. This connection remains open for the duration of the session. When new posts are published by accounts the user follows, the real-time service pushes a lightweight notification through the WebSocket containing the post ID and a snippet. The client can then either insert the post directly into the feed (if the user is at the top of their feed) or show a "New posts available" banner that the user can tap to refresh.

WebSocket Connection Management at Scale

At 500M concurrent users, the real-time service maintains 500 million WebSocket connections. Each connection consumes approximately 10-20 KB of memory on the server (TCP buffers, connection metadata, user subscription state). That is 5-10 TB of memory across the WebSocket server fleet. Use a horizontally scaled fleet of stateless WebSocket servers behind a load balancer with sticky sessions (based on user_id hash). When a user connects, the server registers the user_id to server mapping in a distributed presence store (Redis or ZooKeeper). When the fan-out service needs to push an update, it looks up the user's WebSocket server in the presence store and routes the message accordingly.

Long polling serves as a fallback mechanism for clients that cannot maintain WebSocket connections (corporate firewalls, older browsers, certain mobile networks). With long polling, the client sends an HTTP request with a "last seen" cursor. The server checks if new posts exist beyond the cursor. If yes, it responds immediately with the new posts. If no, it holds the connection open for up to 30 seconds, checking periodically for new content. When new content arrives or the timeout expires, the server responds and the client immediately sends a new long-poll request.

Real-time delivery optimization techniques

Batching updates — Do not push every individual post the moment it arrives. Batch updates over 2-3 second windows and push them together. This reduces the number of WebSocket messages from potentially hundreds per minute to one every few seconds, saving bandwidth and client CPU.
Priority filtering — Not every new post justifies a real-time push. Only push posts that score above a relevance threshold (using a lightweight version of the ranking model). This prevents flooding the user with low-quality updates from high-volume accounts.
Presence-aware routing — Only push to users who are currently active (have an open WebSocket connection). For users who are offline, the update is already in their feed cache and will be served when they next open the app. This saves enormous server resources — typically only 10-20% of DAU are concurrently active.
Graceful degradation — If the real-time service is overloaded (during a viral event), shed load by temporarily switching active users from push to pull mode. The client falls back to polling every 30 seconds. This is invisible to users because the feed still updates, just with slightly higher latency.
Client-side deduplication — The client maintains a set of post IDs already displayed in the current feed view. When a real-time update arrives containing a post ID already in the set, it is silently discarded. This handles the race condition where the user pulled the feed just before the WebSocket push arrived.

The "New Posts" Banner Pattern

Instead of inserting new posts directly into the feed while the user is reading (which causes jarring layout shifts), the standard UX pattern is to accumulate new posts in a buffer and show a "3 new posts" banner at the top. When the user taps the banner, the buffered posts are inserted at the top with a smooth animation. This pattern is used by Twitter, Facebook, Instagram, and LinkedIn. The buffer is stored client-side and typically caps at 50 posts — beyond that, the banner simply says "Many new posts" and triggers a full feed refresh when tapped.

Scaling the Social Graph

The social graph — the network of follow and friend relationships between users — is the foundation that determines whose content appears in whose feed. At FAANG scale, this graph contains billions of nodes (users) and hundreds of billions of edges (follow relationships). Storing, querying, and partitioning this graph efficiently is critical because every feed generation request, every fan-out operation, and every "mutual friends" computation depends on fast graph lookups.

There are two primary storage approaches for the social graph: an adjacency list model and an edge table model. The adjacency list model stores each user's connections as a list (or set) associated with the user ID. The edge table model stores each relationship as a row in a relational table. Both approaches are used in production at different companies, and the choice depends on query patterns and scale.

Approach	Schema	Fan-Out on Write Query	Fan-Out on Read Query	Trade-offs
Adjacency List	Key: user_id, Value: Set<follower_id> (e.g., Redis SET or sorted set)	SMEMBERS followers:{author_id} — O(N) to get all followers for fan-out	SMEMBERS following:{user_id} — O(N) to get all accounts user follows	Fast for full list retrieval. Poor for relationship existence checks. Requires two separate lists (followers and following) for bidirectional access.
Edge Table	Table: follows(follower_id, followee_id, created_at). Index on both columns.	SELECT follower_id FROM follows WHERE followee_id = :author_id — index scan	SELECT followee_id FROM follows WHERE follower_id = :user_id — index scan	Supports rich queries (mutual follows, relationship age). Better for existence checks (is A following B?). More flexible but higher per-query overhead than Redis sets.
Graph Database	Nodes: User. Edges: FOLLOWS(since, type). Native graph traversal.	MATCH (a:User)<-[:FOLLOWS]-(f:User) WHERE a.id = :author_id RETURN f.id	MATCH (u:User)-[:FOLLOWS]->(a:User) WHERE u.id = :user_id RETURN a.id	Optimized for multi-hop queries (friends of friends, mutual connections). Higher operational complexity. Used by LinkedIn (custom graph DB) and Facebook (TAO).

In practice, most FAANG companies use a hybrid approach. The edge table lives in a sharded relational database (MySQL at Facebook, PostgreSQL at others) as the source of truth. A denormalized adjacency list cache in Redis provides fast access for the two hot query patterns: "get all followers of user X" (for fan-out on write) and "get all accounts user X follows" (for fan-out on read and feed generation). The Redis cache is populated from the edge table and invalidated on follow/unfollow events.

Facebook TAO — The Social Graph Store

Facebook built a custom distributed graph store called TAO (The Associations and Objects store) specifically for the social graph. TAO provides a simple API for creating, reading, and querying objects (users, posts, pages) and associations (friendships, likes, follows) with strict consistency guarantees within a region and eventual consistency across regions. TAO handles over a billion reads per second across Facebook's data centers. It uses a multi-layer cache (TAO cache servers backed by MySQL) with a write-through caching strategy that ensures the cache is always consistent with the database within a single data center.

Graph partitioning is one of the most challenging problems in scaling the social graph. The naive approach is to shard by user_id (user_id mod N), which distributes users evenly across shards. However, social graphs have high locality — friends tend to cluster together (people in the same city, same school, same company). A follow relationship between two users on the same shard requires one local query, but a relationship between users on different shards requires a cross-shard query. With random sharding, approximately (N-1)/N of all relationships are cross-shard, which is nearly 100% for large N.

Graph partitioning strategies

Hash partitioning — Shard by hash(user_id). Simple, uniform distribution, but ignores graph structure. Nearly all edges are cross-shard, requiring network hops for every fan-out operation. Used as a baseline approach.
Geographically-aware partitioning — Assign users to shards based on their geographic region (country, city). Social connections are often geographically clustered, so this reduces cross-shard edges by 30-50% compared to hash partitioning. Facebook uses this approach for TAO, placing users and their immediate social graph on servers in the nearest data center.
Graph-aware partitioning (METIS/community detection) — Use graph partitioning algorithms (like METIS or Louvain community detection) to identify densely connected clusters and assign them to the same shard. This minimizes cross-shard edges but requires periodic re-partitioning as the graph evolves. Practical at medium scale but computationally expensive at billions of nodes.
Replicated hot partitions — Celebrity accounts that are followed by millions of users are "hot" regardless of partitioning strategy. Replicate the follower lists of the top 10,000 accounts across all shards (or into a dedicated high-throughput cluster) to avoid hot-spot bottlenecks during fan-out.

The Hot Follower Problem

When a user with 50 million followers is stored on a single shard, every fan-out operation for that user hammers that one shard with 50 million reads of the follower list. This creates a severe hot-spot. The solution is to split hot follower lists across multiple shards or cache the entire list in a distributed cache (Redis) with multiple replicas. Twitter solves this by pre-computing and caching the follower list for all accounts with more than 10,000 followers, stored as a sorted set in a dedicated Redis cluster with 3x replication for read throughput.

Denormalization is a crucial technique for the social graph. While the edge table is the source of truth, the fan-out and feed generation paths need data in denormalized formats for performance. The follower count is denormalized into the user table (avoiding COUNT(*) queries on the edge table). The "is celebrity" flag is denormalized (avoiding a join with the user table during fan-out routing). The mutual friends count between any two users can be pre-computed and cached for the "people you may know" feature. Each denormalization trades write complexity (maintaining consistency across copies) for read performance (avoiding expensive joins and aggregations in the hot path).

Denormalization Consistency

Every denormalized field must have an eventual consistency mechanism. For follower counts, a background job periodically scans the edge table and corrects any count that drifts from the true value. For the "is celebrity" flag, a threshold-crossing detector listens to follow/unfollow events and flips the flag when a user crosses 10K followers. For mutual friends counts, a batch MapReduce job recomputes all values daily. The key principle: denormalized values are optimistic approximations for the read path, and periodic reconciliation jobs ensure they do not drift far from the source of truth.

How this might come up in interviews

News feed design is the most frequently asked system design question at Meta, Google, Amazon, Twitter, and LinkedIn. It appears directly ("Design Facebook's news feed," "Design Twitter's home timeline") and indirectly in questions about content recommendation, activity feeds, and social platforms. Interviewers use it to test fan-out strategy knowledge, caching architecture, ML ranking pipelines, real-time delivery mechanisms, and graph storage at scale. At L4-L5, candidates are expected to describe the fan-out trade-off and propose a basic hybrid approach. At L6+, candidates must demonstrate deep knowledge of ML ranking, cache sizing, WebSocket scaling, and graph partitioning, and explain how these components interact under peak load.

Common questions:

L4: Design a basic news feed for a social media platform with 10 million users. Walk through the data model, how posts reach followers, and how a user loads their feed. [Tests: basic data modeling, fan-out concept, simple read/write paths]
L4-L5: Explain the difference between fan-out on write and fan-out on read. What are the trade-offs, and when would you choose each approach? [Tests: understanding of write amplification vs read amplification, ability to articulate trade-offs clearly]
L5: A user with 50 million followers publishes a post. How does the system ensure this does not degrade the experience for all other users? [Tests: hybrid fan-out understanding, celebrity threshold routing, priority queue isolation, backpressure mechanisms]
L5-L6: Describe how you would rank posts in a news feed. What features would your model use, and how would you serve predictions at low latency for 2 billion daily feed requests? [Tests: ML ranking pipeline, feature engineering, model serving architecture, latency budgets]
L6: How do you store and cache the news feed for 500 million daily active users? Walk through the cache sizing, pagination strategy, and cache invalidation approach. [Tests: Redis sorted set usage, cursor-based pagination, memory sizing calculations, cache warming and eviction strategies]
L6-L7: Design the end-to-end real-time update delivery system for the news feed. How do you push new posts to active users within seconds, and how does this scale to 500 million concurrent connections? [Tests: WebSocket architecture, presence management, graceful degradation, hybrid push/pull delivery, connection management at scale]

Try this question: Ask the interviewer: What is the target scale (DAU, concurrent users, posts per day)? Is this a symmetric graph (mutual friends like Facebook) or asymmetric (followers like Twitter/Instagram)? Do we need real-time updates or is pull-to-refresh acceptable? How important is ranking versus chronological ordering? Are there celebrity-scale accounts with millions of followers?

Strong answer: Drawing the hybrid fan-out architecture from memory with threshold-based routing. Calculating Redis memory for the feed cache with specific numbers (bytes per entry, entries per user, total memory). Explaining the multi-stage ML ranking pipeline (candidate generation, lightweight scoring, full ranking, re-ranking). Discussing the Twitter Bieber Problem as a motivating example for hybrid fan-out. Mentioning cursor-based pagination with Redis ZREVRANGEBYSCORE. Addressing WebSocket scaling with a presence store for routing.

Red flags: Choosing only fan-out on write or only fan-out on read without acknowledging the other approach. Not mentioning the celebrity/high-follower problem. Using offset-based pagination instead of cursor-based. Ignoring ML ranking and proposing only chronological ordering. Not discussing caching at all. Proposing a single database without sharding for the social graph. Not addressing how real-time updates reach active users.

Key takeaways

The fan-out strategy is the single most important architectural decision in a news feed system. Fan-out on write provides the fastest read latency (pre-computed feeds in Redis) but suffers from catastrophic write amplification for high-follower accounts. Fan-out on read eliminates write amplification but shifts the cost to read time. Every production system at FAANG scale uses a hybrid approach that routes posts through write or read fan-out based on the author follower count.
The hybrid fan-out approach classifies users as normal (below a follower threshold, typically 10K-100K) or celebrity (above the threshold). Normal users' posts are fanned out on write to follower feed caches. Celebrity posts are stored once and fetched on demand at read time. The feed generation service merges both sources before applying ML ranking. This bounds write amplification while keeping read latency under 500ms at p99.
Feed ranking has evolved from simple chronological ordering to multi-stage ML pipelines: candidate generation (500-2000 posts), lightweight first-pass scoring (reduce to 200-300), full deep learning ranking (final scoring with 1000+ features), and re-ranking with diversity and policy rules. Real-time engagement features (updated within seconds via streaming pipelines) are the highest-impact signal for ranking accuracy.
The feed cache (Redis sorted sets) stores pre-computed feeds for all active users. Memory sizing: 56 bytes per entry x 800 entries per user x 500M DAU = approximately 22 TB primary. Cursor-based pagination is mandatory to handle concurrent feed insertions. Cache warming, TTL-based eviction for inactive users, and trimming to bounded size per user are essential operational concerns.
Real-time feed delivery uses a hybrid transport: WebSocket connections for active users (sub-second push) and pull-based loading for inactive users. At 500M concurrent connections, the WebSocket fleet requires careful connection management, a distributed presence store, and graceful degradation to polling mode during load spikes. The social graph must be partitioned with awareness of geographic locality and hot follower lists must be replicated to avoid fan-out bottlenecks.

Before you move on: can you answer these?

A user with 50 million followers publishes a post. Walk through exactly what happens in a hybrid fan-out system, from post creation to the post appearing in a follower's feed.

The post is written to the Post Database and a PostCreated event is published to Kafka. The fan-out service consumes the event, looks up the author's follower count, and determines the author exceeds the celebrity threshold (e.g., 10K followers). Because this is a celebrity, the fan-out service does NOT write to any follower's feed cache. Instead, the post ID is written to the celebrity post cache (a Redis sorted set keyed by celebrity user_id). When a follower opens their feed, the feed generation service reads their pre-computed feed cache (containing posts from normal accounts), queries the celebrity post cache for all celebrity accounts the user follows, merges both result sets, passes the merged candidates through the ML ranking service, and returns the top-ranked posts. The 50M-follower post appears alongside normal posts, ranked by the ML model. Total latency overhead from the celebrity on-read fetch is 20-50ms. No write amplification occurred.

Explain why cursor-based pagination is essential for news feeds and describe exactly how you would implement it with Redis sorted sets.

Offset-based pagination (skip N, take M) breaks when new posts are inserted at the top of the feed between page requests. If the user is viewing page 2 and 5 new posts arrive, offset-based pagination shifts all posts down, causing the user to see duplicates when they scroll to page 3. Cursor-based pagination avoids this by using the last seen post's score as an anchor. Implementation: the feed is a Redis sorted set with scores representing timestamps or ranking scores. The first page request uses ZREVRANGEBYSCORE feed:{user_id} +inf -inf LIMIT 0 20 to get the top 20 posts. The response includes the score of the last post as the cursor. The next page request uses ZREVRANGEBYSCORE feed:{user_id} (last_score -inf LIMIT 0 20 (the opening parenthesis makes the range exclusive). For tie-breaking when multiple posts share the same score, encode both the score and post_id in the cursor and use post_id as a secondary sort key.

Your feed ranking model is showing a 5% decrease in engagement rate after a new model deployment. How do you diagnose and resolve this?

First, verify the regression is real and not a measurement artifact by checking if the control group in the A/B test also shows a dip (which would indicate a platform-wide issue, not a model problem). If the regression is model-specific, examine the feature distribution — compare the feature values the new model receives versus the old model to check for feature pipeline issues (stale features, missing values, schema changes). Check the model's prediction distribution: if the new model scores all posts within a narrow range, the ranking loses discriminative power. Examine engagement by content type — the regression may be concentrated in one type (e.g., video posts) where a new feature was introduced. If the root cause is not immediately clear, roll back to the previous model version to stop the bleeding, then investigate offline. Re-evaluate the training data for the new model: was the training set balanced? Was there a data pipeline bug that introduced label noise? Run offline evaluation metrics (NDCG, precision@k) on the new model against a held-out test set to compare with the old model before redeploying.

🧠Mental Model

💡 Analogy

Think of the news feed as a newspaper delivery system. Fan-out on write is like printing a personalized newspaper and delivering it to every subscriber's doorstep each morning. The printing press (fan-out service) runs once per edition (post), and every subscriber (follower) receives their copy in their mailbox (feed cache) without having to do anything. This works perfectly for a local newspaper with a few thousand subscribers, but imagine a national newspaper with 50 million subscribers — the printing and delivery logistics become overwhelming for a single edition. Fan-out on read is like not delivering newspapers at all — instead, every reader walks to the newsstand (post database) each morning and assembles their own paper by picking articles from every section they follow. This puts zero burden on the publisher but creates massive crowds at the newsstand during rush hour. The hybrid approach, which is what production systems use, is to deliver the local newspaper to most homes (fan-out on write for normal users) but keep the nationally popular stories at the newsstand (fan-out on read for celebrities). Most readers get their paper delivered, and only need to stop by the newsstand to pick up a few high-profile stories. The newsstand is well-stocked and fast because it only serves a handful of popular items, not the full catalog.

⚡ Core Idea

A news feed system is a personalized content delivery pipeline that must solve the fan-out problem — how to efficiently distribute each new post to all of its intended recipients. The fundamental tension is between write amplification (pre-computing feeds at publish time) and read amplification (computing feeds at request time). Production systems at FAANG scale use a hybrid strategy that routes posts through fan-out on write or fan-out on read based on the author's follower count, combined with ML-based ranking to select the most relevant posts from hundreds of candidates, all served from an in-memory cache layer to meet sub-second latency targets.

🎯 Why It Matters

The news feed is the single most important product surface for every social platform — it is the first thing users see and the primary driver of engagement, retention, and advertising revenue. Designing a news feed system at scale tests every core distributed systems concept: data modeling, fan-out strategies, caching architectures, machine learning serving, real-time delivery, and graph storage. This is why it is the most frequently asked system design interview question at FAANG companies. Getting the fan-out strategy right directly determines whether the system can scale to billions of users, and getting the ranking right directly determines whether users find the platform valuable enough to return every day.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

Designing a News Feed System at Scale

The News Feed Problem — Why It Is the Hardest Interview Question

Requirements & Data Model

Fan-Out on Write vs Fan-Out on Read

The Hybrid Approach — How Facebook Actually Does It

Feed Generation Pipeline

Feed Ranking & ML Scoring

Feed Storage & Caching

Real-Time Updates — Push vs Pull vs Long Polling

Discussion

In-app Q&A

Designing a News Feed System at Scale

The News Feed Problem — Why It Is the Hardest Interview Question

Requirements & Data Model

Fan-Out on Write vs Fan-Out on Read

The Hybrid Approach — How Facebook Actually Does It

Feed Generation Pipeline

Feed Ranking & ML Scoring

Feed Storage & Caching

Real-Time Updates — Push vs Pull vs Long Polling

Discussion

In-app Q&A