A comprehensive deep-dive into designing a production-grade news feed system capable of serving billions of personalized feeds daily across platforms like Facebook, Twitter, Instagram, and LinkedIn. Covers the fundamental fan-out on write versus fan-out on read trade-off, Facebook's hybrid approach for celebrity and normal users, feed generation pipelines with fan-out services and feed caches, ML-based ranking systems from EdgeRank to modern deep learning models, Redis-based feed storage with cursor-based pagination, real-time update delivery via WebSockets and long polling, and scaling the social graph with billions of edges. This is the single most asked system design interview question at FAANG companies because it tests every distributed systems concept: data modeling, fan-out strategies, caching, ranking algorithms, real-time delivery, and graph storage — all in a single 45-minute session.
A comprehensive deep-dive into designing a production-grade news feed system capable of serving billions of personalized feeds daily across platforms like Facebook, Twitter, Instagram, and LinkedIn. Covers the fundamental fan-out on write versus fan-out on read trade-off, Facebook's hybrid approach for celebrity and normal users, feed generation pipelines with fan-out services and feed caches, ML-based ranking systems from EdgeRank to modern deep learning models, Redis-based feed storage with cursor-based pagination, real-time update delivery via WebSockets and long polling, and scaling the social graph with billions of edges. This is the single most asked system design interview question at FAANG companies because it tests every distributed systems concept: data modeling, fan-out strategies, caching, ranking algorithms, real-time delivery, and graph storage — all in a single 45-minute session.
Lesson outline
The news feed is the central product surface of every major social platform. When a user opens Facebook, Twitter, Instagram, or LinkedIn, the first thing they see is a personalized stream of content selected from millions of possible posts. The feed must feel instant (under 200 milliseconds to render), feel relevant (showing content the user actually cares about), and feel fresh (reflecting activity from seconds ago, not hours). Building this experience at scale is arguably the hardest problem in consumer internet engineering.
Consider the numbers. Facebook has over 3 billion monthly active users. The average user follows hundreds of friends and pages, each producing multiple posts per day. At any given moment, there are tens of thousands of candidate posts that could appear in a single user's feed. The system must evaluate all of these candidates, score them with a machine learning model, assemble the top results, and deliver them to the user's device — all within a few hundred milliseconds. This happens billions of times per day, every day.
Why Interviewers Love This Question
A news feed system touches every distributed systems concept: data modeling (posts, users, follow graphs), fan-out strategies (write-time vs read-time vs hybrid), caching (feed cache, content cache, social graph cache), ranking algorithms (ML scoring, feature extraction, real-time signals), real-time delivery (WebSockets, long polling, server-sent events), and graph storage (adjacency lists, edge tables, graph partitioning). It can be discussed at L4 depth (basic fan-out + chronological feed) or L7 depth (hybrid fan-out with ML ranking and real-time feature pipelines). This range makes it the most versatile and frequently asked system design question at FAANG companies.
Twitter processes over 500 million tweets per day from 400 million monthly active users. Instagram handles over 100 million photo uploads daily. LinkedIn serves feeds to over 900 million professionals. Each platform has evolved its own approach to the feed problem, but they all grapple with the same fundamental trade-offs: latency versus freshness, personalization versus simplicity, write amplification versus read amplification.
Scale numbers across major platforms
Scale Numbers to Remember for Interviews
A feed system at Facebook scale serves 2B+ daily feed requests. Each request evaluates 500+ candidate posts through ML ranking. The fan-out service processes 10M+ new posts per minute during peak hours. The feed cache (Redis) stores pre-computed feeds for all active users, consuming 100+ TB of memory across thousands of Redis instances. Timeline generation latency target: p99 under 500ms. These numbers drive every architectural decision.
The core technical challenge is the fan-out problem. When a user publishes a post, that post must somehow reach every one of their followers' feeds. For a regular user with 500 followers, this means 500 feed updates. For a celebrity with 50 million followers, this means 50 million feed updates from a single post. The naive approach — writing to every follower's feed at post time — works beautifully for regular users but catastrophically fails for celebrities. The opposite approach — computing feeds at read time by pulling from all followed accounts — works for celebrities but is too slow for users who follow thousands of accounts.
The Interview Opening
When asked to design a news feed, strong candidates immediately frame the problem around the fan-out trade-off. Start by saying: "The central architectural decision is fan-out on write versus fan-out on read. Let me walk through both approaches and then explain why production systems use a hybrid." This shows the interviewer you understand the problem deeply before drawing a single box.
This entry walks through the complete design from requirements to production architecture. We cover the data model, the fan-out trade-off in depth, Facebook's actual hybrid approach, the feed generation pipeline, ML-based ranking, feed storage and caching, real-time update delivery, and scaling the social graph. Each section is written at the depth expected in a FAANG L5-L6 system design interview.
Before drawing any architecture diagrams, a strong system design answer begins with clarifying requirements. For a news feed system, the functional requirements define what the feed shows and how users interact with it, while non-functional requirements define the latency, availability, and scale targets that drive every architectural decision.
Functional requirements
Non-functional requirements
The data model is the foundation of the entire system. Every architectural decision — from fan-out strategy to caching to ranking — depends on how we model posts, users, and their relationships.
| Entity | Key Fields | Storage | Notes |
|---|---|---|---|
| Post | post_id (snowflake), author_id, content, media_urls[], created_at, updated_at, post_type | Sharded MySQL/PostgreSQL by author_id | Posts are immutable after creation for caching purposes. Edits create a new version. Media stored in object storage (S3) with URLs in the post record. |
| User | user_id, username, display_name, avatar_url, follower_count, following_count, is_celebrity (boolean) | Sharded MySQL by user_id | The is_celebrity flag (follower_count > threshold) determines fan-out strategy. Threshold is typically 10K-100K followers. |
| Follow Graph | follower_id, followee_id, created_at, relationship_type | Edge table sharded by followee_id, replicated shard by follower_id | Two access patterns: "who does user X follow?" (for fan-out on read) and "who follows user X?" (for fan-out on write). Requires two indexes or a graph database. |
| Feed Entry | user_id, post_id, score, created_at | Redis sorted set per user (feed cache) | Pre-computed feed entries for fan-out on write users. Score is the ML ranking score. TTL of 7 days for inactive users. |
| Activity/Engagement | user_id, post_id, action_type (like/comment/share), created_at | Sharded MySQL + Kafka stream | Engagement events are both stored for analytics and streamed to the ranking feature pipeline for real-time signal updates. |
Post ID Generation
Use a Snowflake-style ID generator for post_id: 41 bits for timestamp (gives 69 years), 10 bits for machine ID (1024 machines), 12 bits for sequence (4096 IDs per millisecond per machine). This gives chronologically sortable IDs that are globally unique without coordination. Twitter invented this approach specifically for tweet IDs, and it eliminates the need for a centralized ID generator.
The Follow Graph Is the Hidden Complexity
The follow graph seems simple (just a table of edges) but it drives the entire system complexity. A bidirectional lookup — both "who do I follow?" and "who follows me?" — is required, and these two access patterns have very different cardinalities. Most users follow hundreds of accounts (small fan-out on read), but some accounts are followed by millions (massive fan-out on write). Sharding the follow graph correctly is critical: shard by followee_id for efficient fan-out on write (find all followers of user X), and maintain a secondary index by follower_id for efficient fan-out on read (find all accounts user X follows).
The activity model captures every user interaction with the feed. Likes, comments, shares, and even implicit signals like scroll dwell time and video watch duration are recorded. These signals serve dual purposes: they are stored in a data warehouse for offline analytics, and they are streamed in real time to the ranking feature pipeline, which updates the ML model's feature store within seconds. This real-time feedback loop is what allows modern feeds to adapt to user behavior within a single session.
The fan-out strategy is the single most important architectural decision in a news feed system. It determines how new posts reach followers' feeds, and it has cascading effects on write amplification, read latency, storage costs, and system complexity. Understanding this trade-off deeply is what separates L4 candidates from L6 candidates in system design interviews.
Fan-out on write (also called push model) means that when a user publishes a post, the system immediately writes that post's ID into the feed cache of every follower. The follower's feed is pre-computed and stored in a sorted set (typically Redis). When a follower opens their feed, the system simply reads the pre-computed sorted set — a single O(log N) Redis read per page. This makes reads extremely fast (single-digit milliseconds) but writes can be expensive if the author has many followers.
Fan-out on read (also called pull model) means that no work is done at write time beyond storing the post in the author's post table. When a follower opens their feed, the system looks up all accounts the user follows, fetches recent posts from each account, merges and ranks them, and returns the result. This makes writes cheap (a single database insert) but reads are expensive because the system must query multiple data sources and merge results in real time.
| Dimension | Fan-Out on Write (Push) | Fan-Out on Read (Pull) |
|---|---|---|
| Write cost | O(followers) writes per post — a user with 10K followers triggers 10K feed cache insertions | O(1) — a single insert into the posts table regardless of follower count |
| Read cost | O(1) — read the pre-computed feed from a single Redis sorted set | O(following) — fetch posts from all N followed accounts, merge, and rank in real time |
| Latency | Read latency: 1-5ms. Write latency: milliseconds to seconds depending on follower count | Read latency: 50-500ms depending on how many accounts the user follows. Write latency: 1ms |
| Storage | High — every post is duplicated across all followers' feed caches. A post seen by 10K users is stored 10K times. | Low — each post is stored once. Feed is computed on the fly from source data. |
| Freshness | Eventual — new posts appear in feeds after the fan-out completes (seconds to minutes for celebrities) | Immediate — reads always fetch the latest posts directly from source tables |
| Celebrity problem | Catastrophic — a single tweet from a user with 30M followers triggers 30M writes, creating massive write amplification | No problem — celebrity post is stored once; reads fetch it alongside other posts |
| Inactive user waste | High — fan-out writes to feeds of users who may never log in again, wasting write bandwidth and storage | None — no computation happens until a user actually requests their feed |
graph TD
subgraph FOW["Fan-Out on Write (Push)"]
A1[User A publishes post] --> FOS1[Fan-Out Service]
FOS1 --> FC1[Follower 1 Feed Cache]
FOS1 --> FC2[Follower 2 Feed Cache]
FOS1 --> FC3[Follower 3 Feed Cache]
FOS1 --> FCN[Follower N Feed Cache]
FC1 --> R1[Read: Single Redis GET]
end
subgraph FOR["Fan-Out on Read (Pull)"]
R2[User B opens feed] --> FG[Fetch Following List]
FG --> P1[Query User X Posts]
FG --> P2[Query User Y Posts]
FG --> P3[Query User Z Posts]
P1 --> MR[Merge + Rank]
P2 --> MR
P3 --> MR
MR --> FEED[Return Feed]
endFan-Out on Write vs Fan-Out on Read
The Naive Mistake: Choosing One Approach for Everyone
Many candidates pick fan-out on write or fan-out on read and apply it uniformly to all users. This is a red flag in interviews. Fan-out on write is optimal for the 99% of users who have fewer than 10K followers (fast reads, manageable write cost). Fan-out on read is optimal for the 1% of users who have millions of followers (avoids write amplification). The correct answer is always the hybrid approach, and strong candidates explain the threshold-based routing without being prompted.
The mathematics make the trade-off concrete. Consider a platform with 500 million DAU where the average user follows 200 accounts and posts once per day. With pure fan-out on write, each post triggers 200 feed cache writes on average (the average user has 200 followers). That is 500M posts x 200 writes = 100 billion feed cache writes per day. This is large but manageable with a distributed Redis cluster. Now consider a celebrity with 50 million followers posting a single tweet. That one post alone triggers 50 million writes — equivalent to the write load of 250,000 normal users. Ten celebrities posting simultaneously generates 500 million writes in seconds, which is half the entire day's baseline load concentrated in a burst.
With pure fan-out on read, the celebrity problem disappears, but read latency explodes. A user who follows 1,000 accounts needs 1,000 database queries merged in real time. Even with parallel execution and connection pooling, fetching from 1,000 shards and merging results adds 100-500ms of latency to every feed request. At 2 billion daily feed requests, this means 2 trillion database queries per day just for feed generation — a 10x increase in database load compared to fan-out on write.
The Key Interview Insight
When explaining the trade-off, frame it as write amplification versus read amplification. Fan-out on write has high write amplification (one post becomes N writes) but zero read amplification (one read per feed request). Fan-out on read has zero write amplification but high read amplification (one feed request becomes N reads). The hybrid approach minimizes both by choosing the strategy per-user based on their follower count.
Every major social platform at scale uses a hybrid fan-out approach. The concept is straightforward: use fan-out on write for the vast majority of users (who have manageable follower counts), and use fan-out on read for the small percentage of users who have massive follower counts (celebrities, news organizations, influencers). The threshold between the two strategies is a configurable parameter, typically set between 10,000 and 100,000 followers.
When a normal user (below the follower threshold) publishes a post, the fan-out service reads their follower list and writes the post ID into each follower's feed cache (a Redis sorted set). This is the fan-out on write path. Since the user has at most a few thousand followers, the fan-out completes in milliseconds to seconds and the write amplification is bounded.
When a celebrity user (above the follower threshold) publishes a post, the system does not fan out at write time. Instead, the post is simply stored in the celebrity's post table and indexed for fast retrieval. When any user opens their feed, the feed generation service first reads the pre-computed feed from cache (containing posts from normal users that were fanned out at write time), then queries the post tables of any celebrities the user follows to fetch their latest posts, merges these two result sets, applies ML ranking, and returns the combined feed. This is the hybrid read path — it combines the pre-computed cache with on-demand queries for celebrity content.
graph TD
NP[Normal User Posts] --> FS[Fan-Out Service]
FS -->|"follower_count < threshold"| WP[Write to Each Follower Feed Cache]
WP --> FC[Feed Cache - Redis Sorted Sets]
CP[Celebrity Posts] --> PS[Post Store Only]
UR[User Opens Feed] --> FGS[Feed Generation Service]
FGS --> FC
FGS -->|"Fetch celebrity posts on read"| PS
FC --> MG[Merge Normal + Celebrity Posts]
PS --> MG
MG --> RK[ML Ranking Service]
RK --> FEED[Ranked Feed Response]Hybrid Fan-Out Architecture
Choosing the Celebrity Threshold
The threshold is not a fixed number — it is tuned based on system capacity. Start with 10,000 followers as the initial threshold. Monitor the fan-out service write queue depth and latency. If fan-out write latency exceeds 5 seconds at p99, lower the threshold. If the read-time merge for celebrity posts adds more than 100ms at p50, raise the threshold. Facebook reportedly uses a threshold around 10,000-50,000 followers, while Twitter historically used a lower threshold. The threshold can also be dynamic — lowered during peak events (elections, Super Bowl) when post volume spikes.
The hybrid approach introduces complexity at the read path. When generating a feed, the service must: (1) read the pre-computed feed cache for the user, which contains post IDs from normal followed accounts fanned out at write time, (2) determine which celebrity accounts the user follows by querying the follow graph, (3) fetch the latest N posts from each celebrity account, (4) merge the two sets of posts by timestamp or score, (5) apply ML ranking to the merged candidate set, and (6) return the top K results. Steps 2-3 are the fan-out on read portion, and they add 20-50ms of latency compared to a pure fan-out on write system.
Optimization strategies for the hybrid read path
The Consistency Challenge
The hybrid approach introduces a subtle consistency issue. Posts from normal users appear in the feed cache after fan-out completes (typically 1-5 seconds). Posts from celebrities appear at read time from the post store (effectively real-time). This means a celebrity post may appear in a user's feed before a normal user's post that was published earlier, because the normal post is still being fanned out. For most products this ordering difference is acceptable because ML ranking reorders posts by score anyway, but it is worth mentioning in an interview to show you understand the trade-off.
An important edge case is users who transition from normal to celebrity status. When a user crosses the follower threshold, the system must stop fanning out their posts to follower feed caches and switch to the celebrity read path. This transition should be gradual: the user's recent posts already in follower feed caches remain there (they will naturally expire via TTL), and new posts are written only to the post store. The follow graph cache is updated to reclassify this account as a celebrity for all followers. The reverse transition (celebrity losing followers below the threshold) is less common but should also be handled gracefully.
The feed generation pipeline is the backbone of the entire system. It orchestrates the flow of a new post from creation through fan-out, caching, ranking, and delivery to the end user. Understanding this pipeline end-to-end is essential for a strong interview answer because it shows you can think about data flow, not just individual components.
graph LR
UP[User Publishes Post] --> API[API Gateway]
API --> PW[Post Write Service]
PW --> DB[(Post Database)]
PW --> KF[Kafka - Post Events Topic]
KF --> FOS[Fan-Out Service]
FOS -->|"Normal user"| REDIS[(Feed Cache - Redis)]
FOS -->|"Celebrity"| SKIP[Skip - stored in Post DB only]
KF --> IDX[Search Indexer]
KF --> AN[Analytics Pipeline]
KF --> NT[Notification Trigger]
USER[User Opens Feed] --> FGS[Feed Generation Service]
FGS --> REDIS
FGS --> CELEB[Celebrity Post Cache]
FGS --> MERGE[Merge Candidates]
MERGE --> RANK[ML Ranking Service]
RANK --> ASSEMBLE[Feed Assembly]
ASSEMBLE --> CDN[CDN - Media Attachments]
ASSEMBLE --> RESP[Feed Response to Client]End-to-End Feed Generation Pipeline
The pipeline begins when a user publishes a post. The API gateway validates the request, authenticates the user, and forwards it to the Post Write Service. This service writes the post to the primary database (sharded MySQL or PostgreSQL, partitioned by author_id for write locality) and simultaneously publishes a PostCreated event to a Kafka topic. The Kafka event is the trigger for all downstream processing — fan-out, search indexing, analytics, and notification generation all consume from this topic independently.
Fan-out service processing steps
01
Consume PostCreated event from Kafka. Extract author_id and post_id.
02
Look up author_id in the user service to determine follower count and celebrity status.
03
If the author is NOT a celebrity (follower_count < threshold): fetch the complete follower list from the social graph service. For authors with many followers, this list is fetched in batches of 5,000.
04
For each batch of followers, write (post_id, score=timestamp) to each follower's feed cache (Redis sorted set). Use Redis ZADD with the post timestamp as the score for chronological ordering. Pipeline the Redis commands in batches of 100 for throughput.
05
If the author IS a celebrity: do not fan out. The post is already in the Post Database and will be fetched at read time by followers. Optionally update the celebrity post cache (a separate Redis structure) for fast read-time access.
06
Acknowledge the Kafka offset only after all writes complete successfully. If any batch fails, the entire event is retried from the beginning (idempotent writes via ZADD ensure no duplicates).
Consume PostCreated event from Kafka. Extract author_id and post_id.
Look up author_id in the user service to determine follower count and celebrity status.
If the author is NOT a celebrity (follower_count < threshold): fetch the complete follower list from the social graph service. For authors with many followers, this list is fetched in batches of 5,000.
For each batch of followers, write (post_id, score=timestamp) to each follower's feed cache (Redis sorted set). Use Redis ZADD with the post timestamp as the score for chronological ordering. Pipeline the Redis commands in batches of 100 for throughput.
If the author IS a celebrity: do not fan out. The post is already in the Post Database and will be fetched at read time by followers. Optionally update the celebrity post cache (a separate Redis structure) for fast read-time access.
Acknowledge the Kafka offset only after all writes complete successfully. If any batch fails, the entire event is retried from the beginning (idempotent writes via ZADD ensure no duplicates).
The fan-out service is the most write-intensive component in the system. At Facebook scale, it processes 10 million or more new posts per minute during peak hours, each triggering an average of 300-500 feed cache writes (average follower count for non-celebrity users). This means the fan-out service generates 3-5 billion Redis writes per minute at peak. To handle this load, the fan-out service runs as a horizontally scaled fleet of stateless workers consuming from Kafka partitions, with the Kafka topic partitioned by author_id to ensure ordering per author.
Fan-Out Service Scaling
Scale the fan-out service by Kafka partition count. Each partition is consumed by exactly one worker in a consumer group. Start with 256 Kafka partitions for the PostCreated topic and scale to 1024+ as load increases. Each worker maintains a Redis connection pool (typically 50-100 connections) and uses Redis pipelining to batch ZADD commands. A single worker can process 50,000 fan-out writes per second. With 256 workers, the cluster sustains 12.8 million fan-out writes per second.
Feed Cache Trimming
Each user's feed cache (Redis sorted set) is trimmed to the most recent 800-1000 entries using ZREMRANGEBYRANK after each ZADD. This bounds memory consumption per user. Users rarely scroll past 200 posts in a session, so keeping 800-1000 provides a buffer for re-ranking and filtering. For users who scroll deeply, the system falls back to a database query for older posts beyond the cache window.
When a user opens their feed, the Feed Generation Service orchestrates the read path. It first reads the user's pre-computed feed from the Redis cache (posts from normal followed accounts). It then identifies celebrity accounts the user follows and fetches their recent posts from the celebrity post cache. These two sets of candidates are merged into a single candidate list, typically containing 500-1000 posts. The candidate list is passed to the ML Ranking Service, which scores each post based on hundreds of features (more detail in the next section). The top 20-50 posts are selected for the first page, and the Feed Assembly service hydrates them with full content, media URLs, engagement counts, and author profiles before returning the response.
The Cold Start Problem
New users have no feed cache and no follow graph. The system must bootstrap their feed with popular content, trending posts, and algorithmically recommended accounts to follow. Similarly, users who return after a long absence may have an expired or stale feed cache. The feed generation service detects cold start conditions (empty or expired cache) and triggers a full fan-out on read from all followed accounts, caching the result for future requests. This cold start path is significantly slower (200-500ms vs 10-50ms for cached feeds) and must be monitored separately.
Chronological feeds are simple to implement but deliver a poor user experience at scale. When a user follows 500 accounts, a chronological feed is dominated by the most prolific posters rather than the most relevant content. Modern feed ranking uses machine learning models to score every candidate post and select the ones most likely to be meaningful to the user. This transformation — from chronological to ranked — is one of the most impactful product changes in social media history.
Facebook's original ranking algorithm was called EdgeRank, introduced around 2010. It used three factors: affinity (how close the viewer is to the content creator, based on interaction history), weight (the type of interaction — comments were weighted higher than likes), and time decay (newer content scored higher). EdgeRank was a hand-tuned linear model with about a dozen features. It was simple to understand and debug, but it could not capture the complex non-linear relationships between hundreds of signals that determine what a user actually wants to see.
The Evolution from EdgeRank to Deep Learning
EdgeRank (2010): ~12 hand-tuned features, linear scoring. Machine Learning v1 (2013): Logistic regression with ~100 features, trained on engagement data. ML v2 (2016): Gradient-boosted decision trees (GBDT) with ~1,000 features, near-real-time training. Deep Learning (2019+): Neural networks with ~10,000+ features including embeddings for text, images, and user behavior. Each generation brought measurably better engagement metrics, but also increased system complexity and computational cost by an order of magnitude.
Modern feed ranking at FAANG companies follows a multi-stage architecture designed to balance accuracy with computational cost. The pipeline has three stages: candidate generation, ranking, and re-ranking. Each stage reduces the number of candidates while increasing the sophistication (and cost) of the scoring model.
Multi-stage ranking pipeline
01
Candidate generation: Retrieve 500-2000 candidate posts from the feed cache (fan-out on write posts) and celebrity post cache (fan-out on read posts). Apply lightweight filters to remove blocked accounts, muted keywords, and already-seen posts. This stage runs in under 10ms.
02
First-pass ranking (lightweight model): Score all candidates using a fast model (logistic regression or small neural network) with 50-100 features. This model runs inference in under 1ms per candidate and reduces the set from 1000+ to 200-300 candidates. Features include: post age, author-viewer interaction count, post type, and pre-computed engagement rate.
03
Full ranking (heavy model): Score the 200-300 candidates using the full deep learning model with 1000+ features including text embeddings, image understanding, user behavior sequences, and real-time engagement signals. Inference takes 5-10ms per candidate, but batching allows the 300 candidates to be scored in 20-50ms total on GPU.
04
Re-ranking and policy layer: Apply diversity rules (no more than 2 consecutive posts from the same author), freshness boosts (ensure recent posts are not buried), and content policy filters (reduce distribution of borderline content). Insert ads at fixed positions. This stage produces the final ranked feed.
05
Select the top 20-30 posts for the first page and return them. Cache the full ranked list for the user's session so that subsequent pagination requests do not re-trigger ranking.
Candidate generation: Retrieve 500-2000 candidate posts from the feed cache (fan-out on write posts) and celebrity post cache (fan-out on read posts). Apply lightweight filters to remove blocked accounts, muted keywords, and already-seen posts. This stage runs in under 10ms.
First-pass ranking (lightweight model): Score all candidates using a fast model (logistic regression or small neural network) with 50-100 features. This model runs inference in under 1ms per candidate and reduces the set from 1000+ to 200-300 candidates. Features include: post age, author-viewer interaction count, post type, and pre-computed engagement rate.
Full ranking (heavy model): Score the 200-300 candidates using the full deep learning model with 1000+ features including text embeddings, image understanding, user behavior sequences, and real-time engagement signals. Inference takes 5-10ms per candidate, but batching allows the 300 candidates to be scored in 20-50ms total on GPU.
Re-ranking and policy layer: Apply diversity rules (no more than 2 consecutive posts from the same author), freshness boosts (ensure recent posts are not buried), and content policy filters (reduce distribution of borderline content). Insert ads at fixed positions. This stage produces the final ranked feed.
Select the top 20-30 posts for the first page and return them. Cache the full ranked list for the user's session so that subsequent pagination requests do not re-trigger ranking.
Key features used in feed ranking models
Real-Time Feature Updates
The most impactful ranking improvement at Facebook was adding real-time engagement features. When a post is published, its initial engagement rate is zero, and the model must rely on author-level and content-type features. Within minutes, the post accumulates its first likes and comments, and these real-time signals dramatically improve ranking accuracy. The feature pipeline streams engagement events from Kafka to a feature store (typically a high-throughput key-value store like Memcached or a specialized system like Feast) within 5-10 seconds. The ranking model reads these real-time features at inference time, allowing a post's score to evolve rapidly based on actual audience response.
A critical aspect of feed ranking that is often overlooked in interviews is the training pipeline. The ranking model is trained on implicit feedback data: positive labels are posts the user engaged with (liked, commented, clicked), and negative labels are posts the user saw but did not interact with. The model is retrained continuously — at Facebook, the ranking model is updated every few hours using the latest engagement data. This creates a feedback loop: the model's predictions determine what users see, and what users see determines the training data for the next model version. Managing this feedback loop to avoid filter bubbles and echo chambers is an active research problem.
The Ranking Latency Budget
The total feed generation latency budget is typically 500ms at p99. Within this budget, candidate generation takes 10ms, lightweight ranking takes 20ms, full ranking takes 50ms, re-ranking takes 10ms, and feed assembly (hydrating posts with content and media URLs) takes 30ms. That leaves about 380ms for network round trips, cache misses, and retry overhead. Going over the latency budget means users see a loading spinner, which directly reduces feed engagement metrics. Every additional feature or model layer must justify its latency cost with measurable engagement improvement.
The feed cache is the heart of the news feed system's read path. Its design determines whether the system can serve billions of feed requests per day with single-digit millisecond latency. At FAANG scale, the feed cache is almost always built on Redis (or a Redis-compatible system like Twitter's custom Pelikan) because Redis sorted sets provide exactly the data structure needed: a per-user ordered collection of post IDs that supports efficient range queries for pagination.
Each user's feed is stored as a Redis sorted set where the key is the user ID and each member is a post ID with a score representing either the post timestamp (for chronological ordering) or the ML ranking score (for ranked feeds). The ZADD command inserts posts into the feed, ZREVRANGEBYSCORE retrieves a page of posts in descending order, and ZREMRANGEBYRANK trims the set to a maximum size after each insertion.
| Operation | Redis Command | Time Complexity | Use Case |
|---|---|---|---|
| Add post to feed | ZADD feed:{user_id} {score} {post_id} | O(log N) | Fan-out on write: insert new post into follower feed |
| Get feed page | ZREVRANGEBYSCORE feed:{user_id} {max_score} {min_score} LIMIT 0 20 | O(log N + M) | Feed read: retrieve 20 posts starting from cursor |
| Trim feed | ZREMRANGEBYRANK feed:{user_id} 0 -{max_size} | O(log N + M) | Keep feed bounded to 800-1000 entries after each insert |
| Remove post | ZREM feed:{user_id} {post_id} | O(log N) | Post deletion: remove from all affected feeds |
| Check feed size | ZCARD feed:{user_id} | O(1) | Monitoring: track feed cache sizes for capacity planning |
Cursor-Based Pagination
Never use offset-based pagination (LIMIT offset, count) for feeds. If a user is on page 3 and new posts are inserted at the top, offset-based pagination causes posts to shift and the user sees duplicates or misses posts. Instead, use cursor-based pagination: the client sends the score (timestamp or ranking score) of the last post it received, and the server returns posts with scores lower than the cursor. Redis ZREVRANGEBYSCORE with a max score equal to the cursor naturally supports this. The cursor is opaque to the client — encode it as a base64 string containing the score and post_id (for tie-breaking when scores are equal).
Memory sizing for the feed cache is a critical capacity planning exercise. Consider a platform with 500 million DAU. Each user's feed cache stores 800 entries, where each entry is a post_id (8 bytes) and a score (8 bytes), plus Redis sorted set overhead (~40 bytes per entry). That is approximately 56 bytes per entry x 800 entries = 44.8 KB per user. For 500M users: 44.8 KB x 500M = 22.4 TB of memory. With Redis replication (1 primary + 1 replica), this doubles to approximately 45 TB. A typical Redis instance runs on a machine with 64-128 GB of RAM, so the feed cache requires 350-700 Redis instances, organized into a cluster with consistent hashing for key distribution.
Feed cache sizing estimates
Cache Invalidation Challenges
When a post is deleted, it must be removed from every follower's feed cache. For a user with 10K followers, this means 10K ZREM operations. For a celebrity, the post was never fanned out, so deletion only requires removing it from the post store and celebrity post cache. When a user unfollows an account, their feed cache should eventually stop showing that account's posts, but immediate removal of all historical posts is expensive. In practice, unfollowed accounts' posts are filtered at read time (excluded during feed generation) and naturally expire from the cache via the trimming mechanism.
Beyond the feed cache, the system requires several additional caching layers. A content cache stores the full post content (text, media URLs, engagement counts) to avoid hitting the post database for every feed hydration. A social graph cache stores follow relationships for fast lookup during fan-out and feed generation. A user profile cache stores author display names, avatars, and verification badges. These caches are typically implemented as separate Redis clusters or Memcached pools, each sized independently based on their access patterns and memory requirements.
Cache Warming Strategy
After a cache node failure or cluster expansion, new nodes start cold with no data. A cache warming job pre-loads feed caches for the most active users (sorted by login recency) from the post database. This prevents a thundering herd of cache misses when thousands of users open their feeds simultaneously after a cache restart. The warming job typically takes 15-30 minutes and is prioritized by user activity level.
A great news feed does not just load content when the user opens the app — it continuously updates with new posts in real time. When a friend publishes a post, the user should see it within seconds without manually refreshing. Implementing this real-time delivery at scale requires carefully choosing the right transport mechanism based on the user's connection state and device capabilities.
There are three fundamental approaches to delivering real-time updates to clients: short polling (the client repeatedly asks the server for new content at fixed intervals), long polling (the client sends a request that the server holds open until new content is available), and WebSockets (a persistent bidirectional connection between client and server). Each has different trade-offs for latency, server resource consumption, and implementation complexity.
| Mechanism | Latency | Server Cost | Best For |
|---|---|---|---|
| Short Polling | High (depends on poll interval, typically 15-60 seconds) | High — most requests return empty responses, wasting server CPU and bandwidth | Legacy clients, simple implementations, very low update frequency |
| Long Polling | Medium (seconds — server responds as soon as new content arrives) | Medium — each client holds one open connection, but no wasted empty responses | Clients that cannot support WebSockets, moderate update frequency |
| WebSocket | Low (sub-second — server pushes immediately upon new content) | Low per-update but high connection overhead — each active client maintains a persistent TCP connection | Active users with high update frequency, real-time feed updates |
| Server-Sent Events (SSE) | Low (sub-second push from server) | Lower than WebSocket — unidirectional, no upgrade handshake, automatic reconnection built into the protocol | Feed updates where only server-to-client push is needed |
graph TD
NP[New Post Published] --> KF[Kafka - Post Events]
KF --> FOS[Fan-Out Service]
FOS --> FC[Feed Cache Update]
KF --> RTS[Real-Time Service]
RTS --> CM{User Connection State?}
CM -->|"WebSocket connected"| WS[Push via WebSocket]
CM -->|"SSE connected"| SSE[Push via SSE]
CM -->|"Not connected"| SKIP2[Skip - user will pull on next open]
WS --> CLIENT1[Active Client - Instant Update]
SSE --> CLIENT2[Web Client - Instant Update]
SKIP2 --> FC2[Feed Cache Ready for Next Pull]
CLIENT3[Inactive User Opens App] -->|"HTTP GET /feed"| PULL[Pull Latest Feed from Cache]Hybrid Real-Time Delivery Architecture
The production approach at most FAANG companies is a hybrid: WebSocket connections for active users and pull-based loading for inactive users. When a user opens the app, the client establishes a WebSocket connection to the real-time service. This connection remains open for the duration of the session. When new posts are published by accounts the user follows, the real-time service pushes a lightweight notification through the WebSocket containing the post ID and a snippet. The client can then either insert the post directly into the feed (if the user is at the top of their feed) or show a "New posts available" banner that the user can tap to refresh.
WebSocket Connection Management at Scale
At 500M concurrent users, the real-time service maintains 500 million WebSocket connections. Each connection consumes approximately 10-20 KB of memory on the server (TCP buffers, connection metadata, user subscription state). That is 5-10 TB of memory across the WebSocket server fleet. Use a horizontally scaled fleet of stateless WebSocket servers behind a load balancer with sticky sessions (based on user_id hash). When a user connects, the server registers the user_id to server mapping in a distributed presence store (Redis or ZooKeeper). When the fan-out service needs to push an update, it looks up the user's WebSocket server in the presence store and routes the message accordingly.
Long polling serves as a fallback mechanism for clients that cannot maintain WebSocket connections (corporate firewalls, older browsers, certain mobile networks). With long polling, the client sends an HTTP request with a "last seen" cursor. The server checks if new posts exist beyond the cursor. If yes, it responds immediately with the new posts. If no, it holds the connection open for up to 30 seconds, checking periodically for new content. When new content arrives or the timeout expires, the server responds and the client immediately sends a new long-poll request.
Real-time delivery optimization techniques
The "New Posts" Banner Pattern
Instead of inserting new posts directly into the feed while the user is reading (which causes jarring layout shifts), the standard UX pattern is to accumulate new posts in a buffer and show a "3 new posts" banner at the top. When the user taps the banner, the buffered posts are inserted at the top with a smooth animation. This pattern is used by Twitter, Facebook, Instagram, and LinkedIn. The buffer is stored client-side and typically caps at 50 posts — beyond that, the banner simply says "Many new posts" and triggers a full feed refresh when tapped.
News feed design is the most frequently asked system design question at Meta, Google, Amazon, Twitter, and LinkedIn. It appears directly ("Design Facebook's news feed," "Design Twitter's home timeline") and indirectly in questions about content recommendation, activity feeds, and social platforms. Interviewers use it to test fan-out strategy knowledge, caching architecture, ML ranking pipelines, real-time delivery mechanisms, and graph storage at scale. At L4-L5, candidates are expected to describe the fan-out trade-off and propose a basic hybrid approach. At L6+, candidates must demonstrate deep knowledge of ML ranking, cache sizing, WebSocket scaling, and graph partitioning, and explain how these components interact under peak load.
Common questions:
Try this question: Ask the interviewer: What is the target scale (DAU, concurrent users, posts per day)? Is this a symmetric graph (mutual friends like Facebook) or asymmetric (followers like Twitter/Instagram)? Do we need real-time updates or is pull-to-refresh acceptable? How important is ranking versus chronological ordering? Are there celebrity-scale accounts with millions of followers?
Strong answer: Drawing the hybrid fan-out architecture from memory with threshold-based routing. Calculating Redis memory for the feed cache with specific numbers (bytes per entry, entries per user, total memory). Explaining the multi-stage ML ranking pipeline (candidate generation, lightweight scoring, full ranking, re-ranking). Discussing the Twitter Bieber Problem as a motivating example for hybrid fan-out. Mentioning cursor-based pagination with Redis ZREVRANGEBYSCORE. Addressing WebSocket scaling with a presence store for routing.
Red flags: Choosing only fan-out on write or only fan-out on read without acknowledging the other approach. Not mentioning the celebrity/high-follower problem. Using offset-based pagination instead of cursor-based. Ignoring ML ranking and proposing only chronological ordering. Not discussing caching at all. Proposing a single database without sharding for the social graph. Not addressing how real-time updates reach active users.
Key takeaways
A user with 50 million followers publishes a post. Walk through exactly what happens in a hybrid fan-out system, from post creation to the post appearing in a follower's feed.
The post is written to the Post Database and a PostCreated event is published to Kafka. The fan-out service consumes the event, looks up the author's follower count, and determines the author exceeds the celebrity threshold (e.g., 10K followers). Because this is a celebrity, the fan-out service does NOT write to any follower's feed cache. Instead, the post ID is written to the celebrity post cache (a Redis sorted set keyed by celebrity user_id). When a follower opens their feed, the feed generation service reads their pre-computed feed cache (containing posts from normal accounts), queries the celebrity post cache for all celebrity accounts the user follows, merges both result sets, passes the merged candidates through the ML ranking service, and returns the top-ranked posts. The 50M-follower post appears alongside normal posts, ranked by the ML model. Total latency overhead from the celebrity on-read fetch is 20-50ms. No write amplification occurred.
Explain why cursor-based pagination is essential for news feeds and describe exactly how you would implement it with Redis sorted sets.
Offset-based pagination (skip N, take M) breaks when new posts are inserted at the top of the feed between page requests. If the user is viewing page 2 and 5 new posts arrive, offset-based pagination shifts all posts down, causing the user to see duplicates when they scroll to page 3. Cursor-based pagination avoids this by using the last seen post's score as an anchor. Implementation: the feed is a Redis sorted set with scores representing timestamps or ranking scores. The first page request uses ZREVRANGEBYSCORE feed:{user_id} +inf -inf LIMIT 0 20 to get the top 20 posts. The response includes the score of the last post as the cursor. The next page request uses ZREVRANGEBYSCORE feed:{user_id} (last_score -inf LIMIT 0 20 (the opening parenthesis makes the range exclusive). For tie-breaking when multiple posts share the same score, encode both the score and post_id in the cursor and use post_id as a secondary sort key.
Your feed ranking model is showing a 5% decrease in engagement rate after a new model deployment. How do you diagnose and resolve this?
First, verify the regression is real and not a measurement artifact by checking if the control group in the A/B test also shows a dip (which would indicate a platform-wide issue, not a model problem). If the regression is model-specific, examine the feature distribution — compare the feature values the new model receives versus the old model to check for feature pipeline issues (stale features, missing values, schema changes). Check the model's prediction distribution: if the new model scores all posts within a narrow range, the ranking loses discriminative power. Examine engagement by content type — the regression may be concentrated in one type (e.g., video posts) where a new feature was introduced. If the root cause is not immediately clear, roll back to the previous model version to stop the bleeding, then investigate offline. Re-evaluate the training data for the new model: was the training set balanced? Was there a data pipeline bug that introduced label noise? Run offline evaluation metrics (NDCG, precision@k) on the new model against a held-out test set to compare with the old model before redeploying.
💡 Analogy
Think of the news feed as a newspaper delivery system. Fan-out on write is like printing a personalized newspaper and delivering it to every subscriber's doorstep each morning. The printing press (fan-out service) runs once per edition (post), and every subscriber (follower) receives their copy in their mailbox (feed cache) without having to do anything. This works perfectly for a local newspaper with a few thousand subscribers, but imagine a national newspaper with 50 million subscribers — the printing and delivery logistics become overwhelming for a single edition. Fan-out on read is like not delivering newspapers at all — instead, every reader walks to the newsstand (post database) each morning and assembles their own paper by picking articles from every section they follow. This puts zero burden on the publisher but creates massive crowds at the newsstand during rush hour. The hybrid approach, which is what production systems use, is to deliver the local newspaper to most homes (fan-out on write for normal users) but keep the nationally popular stories at the newsstand (fan-out on read for celebrities). Most readers get their paper delivered, and only need to stop by the newsstand to pick up a few high-profile stories. The newsstand is well-stocked and fast because it only serves a handful of popular items, not the full catalog.
⚡ Core Idea
A news feed system is a personalized content delivery pipeline that must solve the fan-out problem — how to efficiently distribute each new post to all of its intended recipients. The fundamental tension is between write amplification (pre-computing feeds at publish time) and read amplification (computing feeds at request time). Production systems at FAANG scale use a hybrid strategy that routes posts through fan-out on write or fan-out on read based on the author's follower count, combined with ML-based ranking to select the most relevant posts from hundreds of candidates, all served from an in-memory cache layer to meet sub-second latency targets.
🎯 Why It Matters
The news feed is the single most important product surface for every social platform — it is the first thing users see and the primary driver of engagement, retention, and advertising revenue. Designing a news feed system at scale tests every core distributed systems concept: data modeling, fan-out strategies, caching architectures, machine learning serving, real-time delivery, and graph storage. This is why it is the most frequently asked system design interview question at FAANG companies. Getting the fan-out strategy right directly determines whether the system can scale to billions of users, and getting the ranking right directly determines whether users find the platform valuable enough to return every day.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.