Skip to main content
Career Paths
Concepts
Design Notification System
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Designing a Notification System at Scale

A comprehensive deep-dive into designing a production-grade notification system capable of delivering billions of messages per day across push, email, SMS, and in-app channels. Covers event ingestion and fan-out strategies, channel adapter design for APNs, FCM, SMTP, and Twilio, delivery guarantees with idempotency keys and deduplication, user preference management with rate limiting and quiet hours, template engines with i18n and A/B testing, and scaling with priority queues, partitioning, and dead letter queues. This is a core system design interview question at FAANG companies because it tests breadth (event-driven architecture, queue design, third-party integrations) and depth (fan-out optimization, exactly-once semantics, failure isolation) in a single 45-minute session.

🎯Key Takeaways
A notification system is an event-driven pipeline with five core stages: ingestion, deduplication, preference checking, template rendering, and channel-specific delivery. Each stage must be independently scalable and fault-tolerant, with failures in one stage contained from cascading to others.
Fan-out strategy is the most critical architectural decision: fan-out on write for small recipient lists (< 10K), batched fan-out with rate-controlled waves for large lists (> 100K), and fan-out on read for in-app notifications. The hybrid approach prevents fan-out storms while maintaining low delivery latency.
Priority isolation must be physical, not logical: critical notifications (2FA, security alerts) require dedicated Kafka topics, consumer pools, and provider connections that are never shared with normal-priority traffic. A marketing blast must never delay a password reset code.
At-least-once delivery combined with idempotency key deduplication provides effectively-exactly-once semantics at practical cost. The idempotency key TTL must exceed the maximum retry window to prevent late duplicates. Dead letter queues capture permanently failed notifications for investigation and replay.
User preference management and rate limiting are not afterthoughts — they are core pipeline components that determine whether users keep notifications enabled or disable them permanently. Check preferences on every notification hot path, enforce per-user per-channel rate limits, respect quiet hours by timezone, and always allow critical notifications to bypass rate limits.

Designing a Notification System at Scale

A comprehensive deep-dive into designing a production-grade notification system capable of delivering billions of messages per day across push, email, SMS, and in-app channels. Covers event ingestion and fan-out strategies, channel adapter design for APNs, FCM, SMTP, and Twilio, delivery guarantees with idempotency keys and deduplication, user preference management with rate limiting and quiet hours, template engines with i18n and A/B testing, and scaling with priority queues, partitioning, and dead letter queues. This is a core system design interview question at FAANG companies because it tests breadth (event-driven architecture, queue design, third-party integrations) and depth (fan-out optimization, exactly-once semantics, failure isolation) in a single 45-minute session.

~36 min read
Be the first to complete!
What you'll learn
  • A notification system is an event-driven pipeline with five core stages: ingestion, deduplication, preference checking, template rendering, and channel-specific delivery. Each stage must be independently scalable and fault-tolerant, with failures in one stage contained from cascading to others.
  • Fan-out strategy is the most critical architectural decision: fan-out on write for small recipient lists (< 10K), batched fan-out with rate-controlled waves for large lists (> 100K), and fan-out on read for in-app notifications. The hybrid approach prevents fan-out storms while maintaining low delivery latency.
  • Priority isolation must be physical, not logical: critical notifications (2FA, security alerts) require dedicated Kafka topics, consumer pools, and provider connections that are never shared with normal-priority traffic. A marketing blast must never delay a password reset code.
  • At-least-once delivery combined with idempotency key deduplication provides effectively-exactly-once semantics at practical cost. The idempotency key TTL must exceed the maximum retry window to prevent late duplicates. Dead letter queues capture permanently failed notifications for investigation and replay.
  • User preference management and rate limiting are not afterthoughts — they are core pipeline components that determine whether users keep notifications enabled or disable them permanently. Check preferences on every notification hot path, enforce per-user per-channel rate limits, respect quiet hours by timezone, and always allow critical notifications to bypass rate limits.

Lesson outline

Why Notification Systems Are Critical

Notification systems are the connective tissue of every modern application. They are the primary mechanism through which products re-engage users, communicate time-sensitive information, and drive daily active usage. Without notifications, most mobile applications would see their engagement metrics collapse — studies consistently show that users who enable push notifications retain at rates two to three times higher than those who do not.

At FAANG scale, notification systems operate at staggering volumes. Facebook sends over one billion push notifications per day. Google delivers billions of notifications across Gmail, YouTube, and Android system alerts. Amazon sends hundreds of millions of order updates, delivery alerts, and promotional emails daily. These are not simple fire-and-forget messages — each notification must be personalized, deduplicated, delivered through the correct channel, and respectful of user preferences and quiet hours.

Why Interviewers Love This Question

A notification system touches every distributed systems concept: event-driven architecture, message queues, fan-out strategies, third-party API integration (APNs, FCM, SMTP, Twilio), idempotency, rate limiting, template rendering, user preference management, and failure handling. It can be discussed at L4 depth (basic queue + channel adapters) or L7 depth (exactly-once delivery semantics, priority-based routing, celebrity fan-out optimization). This range makes it one of the most versatile system design interview questions.

The four primary notification channels each have fundamentally different characteristics. Push notifications (via Apple Push Notification service and Firebase Cloud Messaging) offer near-instant delivery to mobile devices but are limited in content length and require device tokens. Email provides rich formatting, attachments, and permanence but has variable delivery times and spam filtering risks. SMS offers near-universal reach and high open rates (98% within 3 minutes) but is expensive and heavily regulated. In-app notifications are free and fully controlled but only reach users who are actively using the application.

Business functions of notifications

  • Transactional alerts — Order confirmations, payment receipts, password resets, two-factor authentication codes — time-critical messages that users expect immediately.
  • Social engagement — Friend requests, likes, comments, mentions, shares — the social feedback loop that drives daily active usage on platforms like Facebook, Instagram, and Twitter.
  • System alerts — Service outages, security warnings, account compromises, scheduled maintenance — critical operational messages that protect user trust.
  • Marketing and re-engagement — Promotional offers, abandoned cart reminders, content recommendations — revenue-driving messages that must balance engagement with user tolerance.
  • Real-time updates — Live sports scores, stock price alerts, flight status changes, ride arrival notifications — latency-sensitive messages where seconds matter.
  • Digest and summary — Daily email digests, weekly activity summaries, monthly reports — batched notifications that reduce noise while maintaining awareness.

Scale Numbers to Remember

Facebook: 1B+ push notifications/day. WhatsApp: 100B+ messages/day (many triggering notifications). Amazon SES: 10B+ emails/month. Twilio: 100B+ SMS/year. A mid-size SaaS product (10M MAU) typically sends 50-100M notifications/month across all channels. At 100M notifications/day, the system must sustain roughly 1,200 notifications/second average and 6,000-12,000/second at peak (5-10x burst during events like Black Friday or World Cup).

Understanding these scale numbers is critical because they drive every architectural decision downstream. A notification system for a startup with 10,000 users can run on a single server with direct API calls to SendGrid and Firebase. At 100 million notifications per day, you need distributed queues, channel-specific adapters with independent scaling, retry mechanisms with exponential backoff, and sophisticated deduplication to prevent users from receiving the same notification twice.

The Notification Paradox

Notifications that are too frequent annoy users and drive them to disable permissions entirely. Notifications that are too infrequent fail to re-engage users and the product is forgotten. The optimal notification frequency varies by user, channel, and content type — making preference management and rate limiting just as important as the delivery infrastructure itself.

Requirements & Types of Notifications

Before drawing any architecture diagrams, a strong system design answer begins with clarifying requirements. For a notification system, the functional requirements define what types of notifications the system supports and how users interact with them, while non-functional requirements define the reliability, latency, and scale targets that drive the architecture.

Functional requirements

  • Multi-channel delivery — Support push notifications (iOS/Android), email, SMS, and in-app notifications. Each channel has its own delivery API, content format, and retry semantics.
  • Event-driven ingestion — Accept notification trigger events from multiple upstream services (e.g., order service, social service, security service) via a standardized event schema.
  • User preferences — Allow users to configure per-channel opt-in/opt-out, notification categories (marketing, transactional, social), frequency caps, and quiet hours.
  • Template rendering — Support parameterized templates with personalization (user name, order number), localization (i18n), and A/B test variants.
  • Priority levels — Support at least three priority tiers: critical (2FA codes, security alerts — immediate delivery), high (social interactions — within seconds), and normal (marketing, digests — best effort).
  • Delivery tracking — Track notification lifecycle: created, queued, sent, delivered, opened, clicked, bounced, failed. Expose delivery status via API and dashboards.

Non-functional requirements

  • High availability — The notification pipeline must be 99.99% available. Missed critical notifications (2FA codes, security alerts) directly impact user trust and security.
  • Low latency for critical notifications — Critical notifications must reach the delivery provider within 1 second of the triggering event. Marketing notifications can tolerate minutes of delay.
  • At-least-once delivery — Every notification must be delivered at least once. Exactly-once is ideal but hard; use idempotency keys to achieve effectively-exactly-once semantics.
  • Scalability — Support 100M+ notifications per day across all channels, with 10x burst capacity during peak events (product launches, sporting events, breaking news).
  • Rate limiting — Enforce per-user, per-channel, and global rate limits to prevent notification fatigue and comply with carrier regulations (SMS) and provider limits (APNs).
  • Fault isolation — A failure in one channel (e.g., SMS provider outage) must not affect delivery through other channels. Each channel adapter must be independently deployable and scalable.

Back-of-Envelope Estimation

Assume 100M notifications/day across all channels: 40M push, 30M email, 10M SMS, 20M in-app. That is roughly 1,200 notifications/second average. At 10x peak (major event), expect 12,000/second. Each notification record is approximately 1 KB (recipient, channel, template ID, parameters, metadata). Daily storage: 100 GB. Monthly: 3 TB. Delivery log retention (30 days): ~3 TB. The queue must handle 12K messages/second at peak — well within Kafka capacity (millions/second per cluster) but beyond a single RabbitMQ node.

ChannelLatency TargetDelivery GuaranteeCost Per MessageTypical Use Case
Push (APNs/FCM)< 1 secondBest effort (device may be offline)$0 (free)Social updates, real-time alerts, order status
Email (SES/SendGrid)< 30 secondsEventual (SMTP relay chain)$0.0001 per emailReceipts, newsletters, digests, marketing
SMS (Twilio/SNS)< 5 secondsCarrier-dependent delivery$0.0075 per SMS2FA codes, critical alerts, delivery confirmations
In-App (WebSocket)< 200msOnline users only$0 (free)Activity feed, unread badges, live updates

Priority Inversion Trap

If critical notifications (2FA codes) share the same queue as marketing blasts, a large promotional campaign can delay security-sensitive messages by minutes. Always use separate priority queues: a dedicated high-priority queue for critical notifications that is never starved by bulk traffic. This is not optional — it is a security requirement.

A critical question to ask the interviewer: should the notification system support batching and digest mode? If yes, the architecture needs a scheduler component that accumulates events over a time window (e.g., 1 hour) and consolidates them into a single digest notification. This fundamentally changes the pipeline from a real-time streaming model to a hybrid streaming-plus-batch model.

Notification Event Schema

A well-designed event schema is critical. Include: event_id (UUID for idempotency), event_type (e.g., "order.shipped"), recipient_id, channel_preference (push/email/sms/all), priority (critical/high/normal/low), template_id, template_params (JSON), created_at, and idempotency_key. The idempotency_key prevents duplicate delivery when upstream services retry failed publishes.

High-Level Architecture

The high-level architecture of a notification system follows a classic event-driven pipeline pattern: event producers emit notification triggers, a central notification service validates and enriches them, channel-specific adapters format and deliver messages through external providers, and a feedback loop tracks delivery status. This pipeline must handle millions of notifications per day while maintaining strict ordering guarantees for per-user notification sequences.

graph TB
    subgraph "Event Producers"
        OP["Order Service"] --> MQ["Message Queue<br/>(Kafka)"]
        SP["Social Service"] --> MQ
        SEC["Security Service"] --> MQ
        MKT["Marketing Service"] --> MQ
        SYS["System Alerts"] --> MQ
    end

    subgraph "Notification Service"
        MQ --> VAL["Validator &<br/>Deduplicator"]
        VAL --> PREF["Preference<br/>Checker"]
        PREF --> RL["Rate Limiter"]
        RL --> TMPL["Template<br/>Engine"]
        TMPL --> ROUTER["Channel<br/>Router"]
    end

    subgraph "Priority Queues"
        ROUTER --> PQ1["Critical Queue<br/>(2FA, security)"]
        ROUTER --> PQ2["High Queue<br/>(social, orders)"]
        ROUTER --> PQ3["Normal Queue<br/>(marketing, digests)"]
    end

    subgraph "Channel Adapters"
        PQ1 --> PA["Push Adapter<br/>(APNs / FCM)"]
        PQ1 --> EA["Email Adapter<br/>(SES / SendGrid)"]
        PQ1 --> SA["SMS Adapter<br/>(Twilio)"]
        PQ2 --> PA
        PQ2 --> EA
        PQ2 --> SA
        PQ2 --> IA["In-App Adapter<br/>(WebSocket)"]
        PQ3 --> PA
        PQ3 --> EA
        PQ3 --> IA
    end

    subgraph "Delivery Providers"
        PA --> APNS["Apple APNs"]
        PA --> FCM["Google FCM"]
        EA --> SES["Amazon SES"]
        EA --> SG["SendGrid"]
        SA --> TW["Twilio"]
        IA --> WS["WebSocket<br/>Server"]
    end

    subgraph "Feedback & Analytics"
        APNS --> FB["Delivery<br/>Feedback"]
        FCM --> FB
        SES --> FB
        TW --> FB
        WS --> FB
        FB --> DLQ["Dead Letter<br/>Queue"]
        FB --> METRICS["Metrics &<br/>Dashboards"]
    end

    style VAL fill:#3b82f6,stroke:#333,color:#fff
    style ROUTER fill:#f59e0b,stroke:#333,color:#fff
    style DLQ fill:#ef4444,stroke:#333,color:#fff
    style METRICS fill:#10b981,stroke:#333,color:#fff

End-to-end notification pipeline: event producers publish to Kafka, the notification service validates, deduplicates, checks preferences, rate-limits, renders templates, and routes to priority queues. Channel adapters deliver through external providers. Feedback loops track delivery status.

The notification service is the central orchestration layer. It receives raw events from upstream services and transforms them into deliverable notifications. This transformation involves several steps that must execute in sequence: validation (is the event schema correct?), deduplication (have we already processed this event?), preference checking (does the user want this type of notification on this channel?), rate limiting (has the user exceeded their notification budget?), template rendering (generate the final message content), and channel routing (which adapter should deliver this?).

Core components

  • Event ingestion layer — Kafka topics partitioned by recipient_id ensure ordered processing per user. Multiple consumer groups allow independent scaling of validation, preference checking, and delivery stages.
  • Notification service — Stateless workers that consume events, apply business logic (dedup, preferences, rate limiting, templating), and enqueue delivery tasks to priority queues.
  • Priority queue system — Three Kafka topics (critical, high, normal) with different consumer group configurations. Critical queue has dedicated consumers with aggressive polling intervals.
  • Channel adapters — Independent microservices per channel (push, email, SMS, in-app). Each adapter owns its connection pool, retry logic, and provider-specific formatting.
  • Delivery feedback processor — Consumes webhooks and polling results from delivery providers (APNs feedback, SES bounce notifications, Twilio status callbacks) and updates the delivery log.
  • Dead letter queue — Captures permanently failed notifications for manual inspection and replay. Critical for debugging delivery issues without losing messages.

Partition by Recipient ID

Partitioning Kafka topics by recipient_id guarantees that all notifications for a given user are processed by the same consumer in order. This is essential for deduplication (checking if we already sent this notification) and rate limiting (counting how many notifications this user has received in the current window). Without per-user ordering, race conditions in deduplication and rate limiting become extremely difficult to handle.

Common Mistake: Synchronous Channel Delivery

A common architectural mistake is having the notification service call delivery providers synchronously (e.g., calling the APNs API inline during event processing). This creates tight coupling: if APNs is slow or down, the entire notification pipeline backs up. Instead, always enqueue delivery tasks to per-channel queues and let independent adapter workers handle delivery asynchronously. This ensures that an SMS provider outage does not delay push notification delivery.

The architecture must support graceful degradation. If the email provider experiences an outage, email notifications should be queued and retried while push and SMS delivery continues unaffected. If the entire notification service is overloaded, the priority queue system ensures that critical notifications (security alerts, 2FA codes) are processed before marketing messages. This layered resilience is what separates a production-grade notification system from a prototype.

Fan-Out Strategies — Push vs Pull

Fan-out is the most computationally expensive operation in a notification system. When an event occurs (a celebrity posts a photo, a product goes on sale, a sports match ends), the system must determine every affected user and generate an individual notification for each one. At FAANG scale, a single event can trigger notifications for tens of millions of users simultaneously. The fan-out strategy determines whether this expansion happens eagerly at write time or lazily at read time.

The fan-out problem in notification systems mirrors the classic news feed fan-out problem, but with a critical difference: notifications are push-based (delivered to users) rather than pull-based (fetched by users). This means the system cannot defer work until the user opens the app — the notification must be delivered proactively, often within seconds of the triggering event.

The Celebrity Problem

When a user with 100 million followers posts content, a naive fan-out-on-write approach would generate 100 million notification records instantly. At 1 KB per record, that is 100 GB of data generated by a single action. This overwhelms the queue system, creates write hotspots, and can delay notifications for other users by minutes. Every notification system at scale must have a strategy for handling high-fan-out events.

Strategy 1: Fan-Out on Write (Push Model)

→

01

When a triggering event occurs (e.g., a new post), immediately enumerate all recipients (followers, subscribers, group members).

→

02

For each recipient, create an individual notification record with the recipient ID, channel, and rendered content.

→

03

Enqueue each notification record into the appropriate priority queue for delivery.

→

04

Advantage: delivery latency is predictable — every notification is pre-computed and ready for delivery. No work deferred to read time.

→

05

Disadvantage: a single event with millions of recipients creates a massive write burst. The time to complete fan-out grows linearly with recipient count.

06

Best for: events with small to medium recipient lists (< 10,000 recipients) and when low delivery latency is critical.

1

When a triggering event occurs (e.g., a new post), immediately enumerate all recipients (followers, subscribers, group members).

2

For each recipient, create an individual notification record with the recipient ID, channel, and rendered content.

3

Enqueue each notification record into the appropriate priority queue for delivery.

4

Advantage: delivery latency is predictable — every notification is pre-computed and ready for delivery. No work deferred to read time.

5

Disadvantage: a single event with millions of recipients creates a massive write burst. The time to complete fan-out grows linearly with recipient count.

6

Best for: events with small to medium recipient lists (< 10,000 recipients) and when low delivery latency is critical.

Strategy 2: Fan-Out on Read (Pull Model)

→

01

When a triggering event occurs, store a single event record (not per-recipient) in an event log.

→

02

When a user opens the app or checks notifications, query the event log for events relevant to that user (based on subscriptions, group memberships).

→

03

Generate and render the notification on the fly during the read request.

→

04

Advantage: a single event with millions of potential recipients requires only one write. No fan-out burst at write time.

→

05

Disadvantage: delivery latency is unpredictable — notifications are only generated when users actively check. Does not work for push notifications, email, or SMS (which are inherently push-based).

06

Best for: in-app notification feeds where the user is already in the application and pulls their own notification list.

1

When a triggering event occurs, store a single event record (not per-recipient) in an event log.

2

When a user opens the app or checks notifications, query the event log for events relevant to that user (based on subscriptions, group memberships).

3

Generate and render the notification on the fly during the read request.

4

Advantage: a single event with millions of potential recipients requires only one write. No fan-out burst at write time.

5

Disadvantage: delivery latency is unpredictable — notifications are only generated when users actively check. Does not work for push notifications, email, or SMS (which are inherently push-based).

6

Best for: in-app notification feeds where the user is already in the application and pulls their own notification list.

The Hybrid Approach (FAANG Standard)

Production notification systems use a hybrid strategy. For normal users (< 10K followers), fan-out on write with pre-computed recipient lists. For celebrity users (> 100K followers), store the event once and fan-out lazily: push notifications are batched and sent in waves (e.g., 50K per minute over 2 minutes), and in-app notifications use fan-out on read. A threshold parameter (configurable per event type) determines which path is taken. Facebook and Twitter both use this hybrid model.

graph TB
    subgraph "Hybrid Fan-Out Decision"
        EVENT["Notification Event"] --> CHECK{"Recipient<br/>Count?"}
        CHECK -->|"< 10K recipients"| FOW["Fan-Out on Write<br/>(pre-compute all)"]
        CHECK -->|"> 100K recipients"| BATCH["Batched Fan-Out<br/>(waves of 50K/min)"]
        CHECK -->|"In-app only"| FOR["Fan-Out on Read<br/>(compute on demand)"]

        FOW --> PQ["Priority Queues"]
        BATCH --> SCHED["Batch Scheduler<br/>(rate-controlled)"]
        SCHED --> PQ
        FOR --> CACHE["Event Cache<br/>(read at query time)"]
    end

    style CHECK fill:#f59e0b,stroke:#333,color:#fff
    style FOW fill:#10b981,stroke:#333,color:#fff
    style BATCH fill:#3b82f6,stroke:#333,color:#fff
    style FOR fill:#8b5cf6,stroke:#333,color:#fff

Hybrid fan-out: small recipient lists fan out immediately, large lists are batched into rate-controlled waves, and in-app notifications defer to read time.

Pre-computing recipient lists is a key optimization for fan-out on write. Rather than querying the follower graph at notification time (which adds latency and database load), maintain a pre-computed recipient list for each notification source. When a user follows or unfollows someone, update the recipient list asynchronously. This trades storage (maintaining the lists) for write-path latency (instant lookup instead of graph traversal).

StrategyWrite CostRead CostDelivery LatencyBest For
Fan-out on writeO(N) per event (N = recipients)O(1) per deliveryLow and predictableSmall groups, time-critical alerts
Fan-out on readO(1) per eventO(M) per user query (M = subscriptions)High and variableIn-app feeds, non-urgent updates
Hybrid (batched write)O(N) spread over timeO(1) per deliveryModerate but controlledCelebrity accounts, viral events

Fan-Out Storms and Backpressure

Without backpressure mechanisms, a burst of high-fan-out events (e.g., multiple celebrities posting simultaneously during a major event) can overwhelm the queue system. Implement backpressure at the fan-out stage: limit the rate at which individual notifications are enqueued (e.g., 100K per second per event source). Use a semaphore or token bucket to throttle fan-out workers. Monitor queue depth and automatically slow ingestion when queues exceed 80% capacity.

Channel Adapters: Push, Email, SMS, In-App

Channel adapters are the bridge between the notification system and external delivery providers. Each adapter is a specialized microservice that understands the API, content format, rate limits, and error semantics of its delivery provider. The adapter pattern ensures that the core notification service is decoupled from provider-specific logic — you can swap SendGrid for Mailgun or Twilio for Vonage without changing the core pipeline.

Each channel has fundamentally different delivery characteristics. Push notifications travel through proprietary gateways (APNs for iOS, FCM for Android) that maintain persistent connections to devices. Email flows through SMTP relay chains with store-and-forward semantics. SMS routes through carrier networks with per-country regulations and number formatting rules. In-app notifications use WebSocket connections to deliver messages to users who are currently online.

Push notification adapter (APNs / FCM)

  • Device token management — Maintain a mapping of user_id to device tokens. Users may have multiple devices (phone, tablet, watch). Tokens expire or become invalid when users uninstall the app — process APNs feedback service and FCM registration errors to prune stale tokens.
  • Payload formatting — APNs payloads are limited to 4 KB (JSON with alert, badge, sound, custom data). FCM payloads have a 4 KB limit for data messages. Adapter must truncate content intelligently and include deep-link URLs for navigation.
  • Connection management — APNs uses HTTP/2 with persistent connections (multiplexing thousands of notifications per connection). FCM uses HTTP/2 or XMPP. Maintain a pool of long-lived connections — creating new connections for each notification is prohibitively expensive.
  • Silent push and background updates — Support content-available push notifications that wake the app in the background to fetch new data without displaying a visible alert. Useful for pre-fetching content before the user opens the app.

Email adapter (SES / SendGrid)

  • SMTP integration — Use provider SDKs (not raw SMTP) for reliability. Amazon SES supports 50K emails/second per account. SendGrid supports burst rates via dedicated IPs. Authenticate with SPF, DKIM, and DMARC to avoid spam filters.
  • Bounce and complaint handling — Process SES bounce notifications (hard bounce = invalid address, soft bounce = mailbox full) and complaint feedback loops (user marked as spam). Remove hard-bounced addresses immediately. Suppress complained addresses for 30 days.
  • IP warming — New sending IPs must be warmed gradually (start with 100 emails/day, double every 2 days) to build sender reputation. Sending 1M emails from a cold IP results in immediate blacklisting.
  • Rendering — Email requires HTML rendering with inline CSS (most email clients strip external stylesheets). Include a plain-text fallback for accessibility and deliverability. Test rendering across Gmail, Outlook, Apple Mail, and Yahoo.

SMS adapter (Twilio / SNS)

  • Number formatting — Normalize all phone numbers to E.164 format (+1234567890). Validate country codes. Handle short codes for marketing and long codes for transactional messages.
  • Carrier regulations — US: register for A2P 10DLC campaigns. EU: comply with GDPR consent requirements. India: DLT registration for commercial SMS. Violation results in message blocking and fines.
  • Message segmentation — SMS messages over 160 characters (GSM-7) or 70 characters (UCS-2 for Unicode) are split into multiple segments, each billed separately. Adapter must calculate segment count and warn if content exceeds cost thresholds.
  • Delivery receipts — Twilio provides delivery status callbacks (queued, sent, delivered, undelivered, failed). Process these asynchronously to update the delivery log. Note that carrier-level delivery confirmation is not 100% reliable.
PropertyPush (APNs/FCM)Email (SES/SendGrid)SMS (Twilio)In-App (WebSocket)
Delivery speed< 1 second1-30 seconds1-10 seconds< 200ms
Payload size4 KB10 MB (with attachments)160 chars (GSM-7)Unlimited (practical: 10 KB)
Cost per messageFree$0.0001$0.0075Free
Delivery confirmationAPNs feedback / FCM receiptsBounce/complaint notificationsCarrier delivery receiptsWebSocket ACK
Offline deliveryQueued by provider (up to 28 days)Store-and-forward (SMTP)Queued by carrier (24-72 hours)Not possible — user must be online
Rate limitsAPNs: no hard limit (but throttled). FCM: 1000 msg/sec defaultSES: 50K/sec. SendGrid: varies by planTwilio: 1 msg/sec per number (10DLC)Server memory / connection count

In-App Notification via WebSocket

For in-app notifications, maintain persistent WebSocket connections between the client and a WebSocket gateway. When a notification is ready for an online user, push it directly through the WebSocket. For offline users, store the notification in an unread notifications table and deliver it when the user reconnects. Use a presence service (backed by Redis with TTL-based expiry) to track which users are currently online and connected to which gateway instance.

Provider Failover

Never depend on a single delivery provider. Configure primary and fallback providers for each channel (e.g., SES primary, SendGrid fallback for email). Use a circuit breaker pattern: if the primary provider error rate exceeds 5% over a 1-minute window, automatically route traffic to the fallback. Reset the circuit after 5 minutes and gradually shift traffic back. This provides resilience against provider-specific outages.

Delivery Guarantees & Deduplication

Delivery guarantees are the most technically challenging aspect of a notification system. Users expect that every important notification reaches them exactly once — they do not want to miss a 2FA code, and they do not want to receive the same order confirmation three times. Achieving this in a distributed system with multiple failure modes (network partitions, consumer crashes, provider timeouts) requires careful engineering of idempotency, deduplication, and retry mechanisms.

The three standard delivery guarantee levels in distributed systems are: at-most-once (fire and forget — fast but may lose messages), at-least-once (retry until acknowledged — no message loss but may duplicate), and exactly-once (each message processed precisely once — the gold standard but expensive to achieve). For notification systems, at-least-once delivery combined with idempotency-based deduplication provides effectively-exactly-once semantics at a practical cost.

Idempotency Keys Are Non-Negotiable

Every notification event must carry an idempotency key — a unique identifier (typically a UUID or a deterministic hash of event_type + recipient_id + event_timestamp) that the notification service uses to detect duplicates. Before processing any notification, check a fast deduplication store (Redis SET with TTL) for the idempotency key. If present, skip processing. If absent, add the key with a TTL (e.g., 24 hours) and proceed. This simple mechanism prevents the vast majority of duplicate notifications.

At-Least-Once Delivery with Deduplication

→

01

Producer publishes notification event to Kafka with an idempotency_key in the message payload.

→

02

Consumer reads the event and checks Redis for the idempotency_key using SETNX (set if not exists) with a 24-hour TTL.

→

03

If SETNX returns false (key already exists), the event is a duplicate — log and skip it. Commit the Kafka offset.

→

04

If SETNX returns true (new key), process the notification: check preferences, render template, enqueue to channel adapter.

→

05

Channel adapter delivers to the external provider. If the provider returns success, mark the notification as delivered.

→

06

If the provider returns a retryable error (5xx, timeout), re-enqueue the notification with an incremented retry count and exponential backoff delay (1s, 2s, 4s, 8s, 16s, max 60s).

→

07

After max retries (typically 5-8), move the notification to the dead letter queue for manual inspection.

08

If the consumer crashes after Redis SETNX but before committing the Kafka offset, the message will be redelivered — but the Redis key prevents reprocessing.

1

Producer publishes notification event to Kafka with an idempotency_key in the message payload.

2

Consumer reads the event and checks Redis for the idempotency_key using SETNX (set if not exists) with a 24-hour TTL.

3

If SETNX returns false (key already exists), the event is a duplicate — log and skip it. Commit the Kafka offset.

4

If SETNX returns true (new key), process the notification: check preferences, render template, enqueue to channel adapter.

5

Channel adapter delivers to the external provider. If the provider returns success, mark the notification as delivered.

6

If the provider returns a retryable error (5xx, timeout), re-enqueue the notification with an incremented retry count and exponential backoff delay (1s, 2s, 4s, 8s, 16s, max 60s).

7

After max retries (typically 5-8), move the notification to the dead letter queue for manual inspection.

8

If the consumer crashes after Redis SETNX but before committing the Kafka offset, the message will be redelivered — but the Redis key prevents reprocessing.

The Retry-Dedup Window Gap

If the idempotency key TTL (24 hours) is shorter than the maximum retry duration, a retried notification could bypass deduplication. For example, if a notification fails and enters retry with exponential backoff, and the total retry time exceeds 24 hours, the Redis key may expire before the final retry — causing a duplicate. Solution: set the idempotency key TTL to max_retry_duration + 24 hours (e.g., 48 hours total). This is a subtle bug that many implementations miss.

Guarantee LevelHow It WorksFailure ModeWhen to Use
At-most-onceSend once, do not retry on failureMay lose notifications on provider timeout or crashNon-critical marketing messages, analytics events
At-least-onceRetry until provider acknowledges deliveryMay send duplicate notificationsDefault for all notification types (with dedup)
Effectively exactly-onceAt-least-once + idempotency key deduplicationDuplicate only if dedup store fails (rare)Critical notifications: 2FA, security alerts, financial

Retry strategies must be channel-aware. Push notification retries should be aggressive (1s, 2s, 4s) because APNs and FCM are generally fast. Email retries should be more patient (1min, 5min, 15min, 1hr) because SMTP delivery can be legitimately slow. SMS retries should respect carrier rate limits — retrying too aggressively can trigger carrier throttling or blocking. Each channel adapter manages its own retry policy independently.

Dead Letter Queue Strategy

The dead letter queue (DLQ) is your safety net. Every notification that exhausts its retry budget ends up here. Build a DLQ consumer dashboard that shows: failure reason distribution (invalid token, provider error, rate limited), notification priority breakdown (are critical notifications hitting the DLQ?), and a one-click replay button that re-enqueues selected notifications. Alerting: trigger a page if any critical-priority notification reaches the DLQ.

Common Mistake: Retrying Non-Retryable Errors

Not all errors are retryable. APNs 410 (device token no longer active) means the user uninstalled the app — retrying will never succeed. Email hard bounces (invalid address) are permanent. SMS to disconnected numbers will never deliver. Channel adapters must classify errors as retryable (5xx, timeout, rate limited) vs non-retryable (4xx client errors, invalid tokens, hard bounces). Retrying non-retryable errors wastes resources and can trigger provider-level throttling.

User Preferences & Rate Limiting

User preferences and rate limiting are what separate a notification system that users love from one that they disable entirely. The technical infrastructure for delivering millions of notifications is meaningless if users opt out because they feel spammed. Preference management and rate limiting are not afterthoughts — they are core architectural components that must be designed into the notification pipeline from the start.

User preference data must be accessible on every notification's hot path: the notification service checks preferences before rendering templates or enqueuing delivery tasks. This means preference lookups must be extremely fast (sub-millisecond) and highly available. A preference store failure should fail-closed (suppress the notification) rather than fail-open (deliver unwanted notifications), because a missed notification is recoverable but a user disabling all notifications is often permanent.

Preference storage design

  • Schema — Store preferences as a per-user document: { user_id, global_enabled: bool, channels: { push: bool, email: bool, sms: bool, in_app: bool }, categories: { marketing: bool, social: bool, transactional: bool, security: bool }, quiet_hours: { start: "22:00", end: "07:00", timezone: "America/New_York" }, frequency_caps: { push: 10/day, email: 5/day, sms: 2/day } }.
  • Storage layer — Primary store: DynamoDB or PostgreSQL with a user_id primary key. Cache layer: Redis hash per user with 1-hour TTL. The cache absorbs 99%+ of preference lookups on the notification hot path.
  • Default preferences — New users start with sensible defaults: push enabled, email enabled for transactional, SMS disabled (expensive), marketing frequency capped at 3/week. Defaults are configurable per-market (EU defaults are more restrictive per GDPR).
  • Category granularity — Allow users to control notifications at the category level (social, transactional, marketing) and optionally at the sub-category level (e.g., within social: likes, comments, follows). Too few categories frustrate power users; too many overwhelm casual users.

Quiet Hours Implementation

Quiet hours (e.g., 10 PM to 7 AM in the user timezone) must suppress non-critical notifications and queue them for delivery at the end of the quiet period. Implementation: store quiet_hours with the user timezone. When the notification service processes a notification, convert current UTC time to the user timezone and check against the quiet window. If inside the quiet window AND the notification priority is not critical, calculate the delay_until timestamp (quiet_end_time in UTC) and enqueue with a scheduled delivery time. Critical notifications (2FA, security alerts) always bypass quiet hours.

Rate limiting implementation

→

01

Define rate limit tiers: per-user-per-channel (e.g., max 10 push/day), per-user-global (e.g., max 30 notifications/day across all channels), and per-category (e.g., max 3 marketing/week).

→

02

Use Redis counters with sliding window rate limiting. Key format: ratelimit:{user_id}:{channel}:{window}. Increment on each notification. TTL matches the window duration.

→

03

When a notification is rate-limited, decide the disposition based on priority: critical notifications bypass all rate limits, high-priority notifications are deferred to the next available window, normal notifications are silently dropped with a log entry.

→

04

Track rate-limit hit rates as a metric. If more than 20% of notifications are being rate-limited, it signals that upstream services are generating too many events — the problem is at the source, not the limiter.

05

Expose rate limit status in the user preference API so the client can show users how many notifications they have remaining in each budget window.

1

Define rate limit tiers: per-user-per-channel (e.g., max 10 push/day), per-user-global (e.g., max 30 notifications/day across all channels), and per-category (e.g., max 3 marketing/week).

2

Use Redis counters with sliding window rate limiting. Key format: ratelimit:{user_id}:{channel}:{window}. Increment on each notification. TTL matches the window duration.

3

When a notification is rate-limited, decide the disposition based on priority: critical notifications bypass all rate limits, high-priority notifications are deferred to the next available window, normal notifications are silently dropped with a log entry.

4

Track rate-limit hit rates as a metric. If more than 20% of notifications are being rate-limited, it signals that upstream services are generating too many events — the problem is at the source, not the limiter.

5

Expose rate limit status in the user preference API so the client can show users how many notifications they have remaining in each budget window.

The Opt-Out Cascade

When a user disables push notifications at the OS level (iOS Settings or Android app settings), the notification system does not receive explicit feedback — APNs simply stops delivering. Tokens may remain valid in the database for days. Detect this by monitoring delivery feedback: if APNs consistently returns "not registered" or FCM returns "not subscribed," mark the user push preference as disabled server-side and stop wasting queue capacity on undeliverable notifications.

Rate Limit TypeScopeWindowDefault LimitWhen Exceeded
Per-user pushSingle user, push channel24 hours (rolling)10 notificationsQueue for next window or drop if normal priority
Per-user emailSingle user, email channel24 hours (rolling)5 emailsConsolidate into digest at end of window
Per-user SMSSingle user, SMS channel24 hours (rolling)2 messagesDrop non-critical, deliver critical only
Per-user globalSingle user, all channels24 hours (rolling)30 notificationsPrioritize critical and high, drop normal
Per-category marketingSingle user, marketing category7 days (rolling)3 messagesDrop silently — user has seen enough marketing this week
Global systemAll users, all channels1 second50,000 notificationsBackpressure on event ingestion — slow down producers

Do-Not-Disturb (DND) Mode

Beyond quiet hours (time-based), support explicit DND mode where users can silence all non-critical notifications for a defined duration (1 hour, until tomorrow, until I turn it off). DND state is stored in the preference cache with an expiry timestamp. The notification service checks DND before quiet hours — if DND is active and the notification is not critical, suppress it entirely (do not defer, just drop). This gives users a sense of control that reduces the likelihood of them disabling notifications permanently.

Template Engine & Personalization

The template engine is the component that transforms a raw notification event (event_type, recipient_id, template_params) into a fully rendered, channel-specific message ready for delivery. This involves selecting the correct template, injecting personalized data, localizing the content for the recipient's language and locale, selecting the appropriate A/B test variant, and formatting the output for the target channel (HTML for email, JSON for push, plain text for SMS).

Template rendering sits on the critical path of every notification, so it must be fast (< 5ms per render), cacheable (compiled templates), and fault-tolerant (a missing template parameter should produce a degraded message, not a crash). At FAANG scale, the template engine renders millions of notifications per hour across thousands of template variants.

graph LR
    subgraph "Template Rendering Pipeline"
        EVENT["Notification Event<br/>(template_id, params,<br/>recipient_id)"] --> LOOKUP["Template<br/>Lookup"]
        LOOKUP --> LOCALE["Locale<br/>Resolution<br/>(user lang + region)"]
        LOCALE --> AB["A/B Variant<br/>Selection<br/>(experiment cohort)"]
        AB --> RENDER["Template<br/>Rendering<br/>(Handlebars/Mustache)"]
        RENDER --> FORMAT{"Channel<br/>Format?"}
        FORMAT -->|"Push"| PUSH_FMT["JSON Payload<br/>(title, body,<br/>deep_link, badge)"]
        FORMAT -->|"Email"| EMAIL_FMT["HTML + Plain Text<br/>(inline CSS,<br/>unsubscribe link)"]
        FORMAT -->|"SMS"| SMS_FMT["Plain Text<br/>(160 char limit,<br/>short URL)"]
        FORMAT -->|"In-App"| INAPP_FMT["Structured JSON<br/>(title, body, icon,<br/>action_url)"]
    end

    style RENDER fill:#3b82f6,stroke:#333,color:#fff
    style FORMAT fill:#f59e0b,stroke:#333,color:#fff

Template rendering pipeline: resolve locale, select A/B variant, render with Handlebars, then format for the target channel.

Template system design

  • Template storage — Store templates in a versioned repository (Git-backed or database with version history). Each template has a unique template_id, version, locale, channel, and compiled template string. Use Handlebars or Mustache syntax for variable interpolation: "Hi {{user.first_name}}, your order #{{order.id}} has shipped!"
  • Compiled template cache — Pre-compile templates into executable functions at deployment time and cache them in memory. A Handlebars compile step takes 1-5ms; rendering with pre-compiled templates takes < 0.1ms. Cache invalidation: version bump triggers recompilation across all notification service instances.
  • Fallback chain — If a template is missing for the user locale (e.g., fr-CA), fall back to the language base (fr), then to the default locale (en-US). If a template parameter is missing, render with a sensible default rather than crashing: "Hi there" instead of "Hi {{user.first_name}}".
  • Channel-specific rendering — The same notification event produces different outputs per channel. Push: short title (< 50 chars) + body (< 150 chars) + deep_link. Email: full HTML with images, links, and unsubscribe footer. SMS: plain text within 160 characters with a shortened URL. In-app: structured JSON with icon, action, and dismiss behavior.

Internationalization (i18n) Strategy

Store template strings in locale-specific files or database rows, keyed by template_id + locale. Use ICU MessageFormat for complex pluralization and gender rules: "{count, plural, =0 {No new notifications} one {# new notification} other {# new notifications}}". This handles the significant grammatical differences between languages (English: 1 notification / 2 notifications; Russian: 1 уведомление / 2 уведомления / 5 уведомлений — three plural forms). Pre-render for the top 10 locales at compile time; render on-demand for long-tail locales.

A/B Testing Notifications

A/B testing notification content is critical for optimizing engagement. Assign users to experiment cohorts (typically via a consistent hash of user_id + experiment_id) and select the corresponding template variant. Track per-variant metrics: open rate, click-through rate, and conversion rate. Run experiments for at least 7 days to account for day-of-week effects. Typical test: two push notification copy variants — "Your order has shipped!" vs "Great news — order #1234 is on its way!" — can yield 15-30% differences in open rates.

Dynamic content injection allows notifications to include real-time data beyond simple template parameters. For example, an email notification for a flight delay might include current weather at the destination, alternative flight options, and a map showing the current aircraft location. This requires the template engine to call external data services during rendering — which adds latency and failure risk. Use a timeout-with-fallback pattern: attempt to fetch dynamic data with a 500ms timeout; if it fails, render without the dynamic section.

Template Injection Vulnerability

User-generated content (usernames, message text) must be sanitized before injection into templates. A user with the name "{{constructor.constructor(return this.process.exit())}}" could exploit a server-side template injection vulnerability. Always use safe template engines that sandbox expressions (Handlebars, Mustache — which are logic-less by design). Never use eval-based template engines (EJS with unescaped output, Lodash _.template with user input). Escape all user-provided parameters for the target format (HTML entities for email, JSON encoding for push payloads).

Unsubscribe and Compliance

Every marketing email must include a one-click unsubscribe link (CAN-SPAM Act, GDPR). The template engine must automatically append unsubscribe footers to marketing emails. The unsubscribe URL should be a tokenized link that updates the user preference without requiring login: /unsubscribe?token=<signed_jwt_with_user_id_and_category>. Process unsubscribe clicks within 10 business days (legal requirement) — in practice, update preferences immediately.

Scaling, Monitoring & Failure Modes

Scaling a notification system requires careful attention to partitioning strategies, queue management, priority isolation, and failure handling. The notification pipeline has multiple stages (ingestion, processing, delivery), each with different scaling characteristics and bottleneck patterns. A failure in any stage must be contained and not cascade to other stages or other users.

The most common scaling pattern is horizontal partitioning by user ID. Kafka topics are partitioned by recipient_id, ensuring that all notifications for a given user are processed by the same consumer (enabling per-user deduplication and rate limiting). Channel adapters scale independently based on provider throughput limits. The notification service scales based on event ingestion rate.

Partitioning strategies

  • Partition by recipient_id — Hash user_id to a Kafka partition. All notifications for a user go to the same partition, enabling stateful per-user processing (dedup, rate limit) without distributed coordination. Number of partitions: start with 128, scale to 1024 for high throughput.
  • Partition by channel — Separate Kafka topics per channel (notifications.push, notifications.email, notifications.sms). This allows independent consumer group scaling: push adapters can scale to 100 instances while SMS adapters remain at 10 instances.
  • Partition by priority — Separate Kafka topics per priority level (notifications.critical, notifications.high, notifications.normal). Critical consumers poll aggressively (10ms interval), normal consumers poll conservatively (100ms). This ensures priority isolation under load.
graph TB
    subgraph "Scaling Architecture"
        KAFKA["Kafka Cluster<br/>(128 partitions)"] --> CG1["Consumer Group 1<br/>Notification Service<br/>(20 instances)"]
        CG1 --> PQ_C["Critical Topic<br/>(32 partitions)"]
        CG1 --> PQ_H["High Topic<br/>(64 partitions)"]
        CG1 --> PQ_N["Normal Topic<br/>(128 partitions)"]

        PQ_C --> PUSH_C["Push Adapter Pool<br/>(10 instances)<br/>p99 < 100ms"]
        PQ_C --> EMAIL_C["Email Adapter Pool<br/>(5 instances)<br/>p99 < 500ms"]
        PQ_H --> PUSH_H["Push Adapter Pool<br/>(50 instances)"]
        PQ_H --> EMAIL_H["Email Adapter Pool<br/>(20 instances)"]
        PQ_N --> PUSH_N["Push Adapter Pool<br/>(20 instances)"]
        PQ_N --> EMAIL_N["Email Adapter Pool<br/>(30 instances)"]
        PQ_N --> DLQ["Dead Letter Queue"]
    end

    subgraph "Monitoring"
        PUSH_C --> MON["Metrics Pipeline<br/>(Prometheus + Grafana)"]
        EMAIL_C --> MON
        DLQ --> ALERT["PagerDuty Alert<br/>(critical in DLQ)"]
    end

    style PQ_C fill:#ef4444,stroke:#333,color:#fff
    style PQ_H fill:#f59e0b,stroke:#333,color:#fff
    style PQ_N fill:#3b82f6,stroke:#333,color:#fff
    style DLQ fill:#ef4444,stroke:#333,color:#fff
    style ALERT fill:#ef4444,stroke:#333,color:#fff

Scaling architecture: separate Kafka topics per priority level with independent consumer pools per channel. Dead letter queue captures permanently failed notifications with alerting.

SLA Monitoring Dashboard

Build a real-time dashboard tracking: (1) End-to-end latency percentiles (p50, p95, p99) from event ingestion to provider delivery — broken down by channel and priority. (2) Delivery success rate per channel (target: > 99.5% for push, > 98% for email, > 95% for SMS). (3) Queue depth per priority topic — depth growing faster than drain rate is an early warning of capacity problems. (4) Rate limit hit rate — high rates indicate upstream event storms. (5) DLQ depth — any non-zero depth for critical notifications triggers an immediate page.

Failure ModeDetectionImpactMitigation
Kafka broker failureConsumer lag spike, produce errorsEvent ingestion pauses for affected partitionsKafka replication factor 3. Automatic leader election. Producers retry with idempotent writes enabled.
Push provider outage (APNs)HTTP 503 errors from APNs, connection resetsPush notifications for iOS users queuedCircuit breaker opens after 5% error rate. Queue notifications. APNs queues on their side for 28 days.
Email provider outage (SES)SES API returns 503 or throttling errorsEmail delivery delayedFailover to SendGrid via circuit breaker. Email is inherently store-and-forward — delay is tolerable.
SMS provider outage (Twilio)Twilio API errors, webhook delivery failuresSMS delivery stopsFailover to SNS or Vonage. Critical SMS (2FA) gets priority on failover channel.
Notification service crashConsumer lag increases, health check failuresProcessing pauses for assigned partitionsKafka consumer rebalance assigns partitions to healthy consumers within 30 seconds.
Redis (dedup/rate limit) outageSETNX timeouts, rate limit bypassDuplicate notifications or rate limit bypassFail-closed: if Redis is down, suppress non-critical notifications. In-memory fallback for dedup with degraded accuracy.

The Notification Storm

The most dangerous failure mode is a notification storm: a bug in an upstream service or a misconfigured event producer triggers millions of duplicate or erroneous notifications. Prevention: implement a global rate limit at the ingestion layer (max 50K notifications per second system-wide). If exceeded, reject new events with backpressure to producers. Also monitor the ratio of notifications per unique event — if it exceeds the expected fan-out ratio by more than 2x, trigger an automatic circuit breaker that pauses non-critical event processing.

Dead letter queue management is critical for long-term system health. Build an automated DLQ processor that categorizes failures: invalid device tokens (prune from the token store), permanently bounced email addresses (add to suppression list), expired content (discard — a 3-day-old flash sale notification is worthless), and provider errors (schedule for retry during the next low-traffic window). Alert on the DLQ growth rate, not just the absolute depth — a steady growth rate indicates a systemic issue rather than a transient spike.

Capacity Planning Formula

Peak notifications per second = (DAU x avg_notifications_per_user_per_day x peak_multiplier) / 86,400. For 50M DAU, 5 notifications/user/day, 10x peak: (50M x 5 x 10) / 86,400 = ~29,000 notifications/second peak. Each notification requires: 1 Kafka produce, 1 Redis dedup check, 1 Redis preference lookup, 1 template render, 1 Kafka produce to channel topic, 1 provider API call. Total operations per notification: ~6. Total peak operations: ~174,000/second. Size your infrastructure to handle 2x this number for headroom.

Common Mistake: No Priority Isolation Under Load

If critical and normal notifications share the same queue, a marketing blast (millions of normal-priority messages) will delay 2FA codes and security alerts. This is not a theoretical risk — it happens regularly at companies that skip priority queue separation. The fix is simple: dedicate separate Kafka topics and consumer pools for each priority level. Critical consumers should have guaranteed minimum resources that are never shared with lower priorities, even during peak load.

How this might come up in interviews

Notification system design is a core system design interview question at Meta, Google, Amazon, and Apple, typically asked at L5-L6 level. It tests event-driven architecture, queue management, fan-out optimization, third-party API integration, idempotency, rate limiting, and priority isolation. Interviewers use it to evaluate whether candidates can design systems that handle both the happy path (deliver a message) and the failure modes (provider outages, duplicate events, fan-out storms). At L6+, expect deep dives into exactly-once delivery semantics, celebrity fan-out optimization, and SLA monitoring architecture.

Common questions:

  • L4: Design a basic notification service that sends push notifications and emails. Walk me through the API, queue design, and how you handle failures. [Tests: basic event-driven design, queue usage, retry strategy, API design]
  • L4-L5: How do you ensure a user never receives the same notification twice, even if the upstream service retries the event? [Tests: idempotency keys, deduplication stores, at-least-once vs exactly-once semantics]
  • L5: Your notification system needs to support user preferences (opt-out, quiet hours, frequency caps). Where in the pipeline do you check preferences, and how do you store them for fast lookup? [Tests: preference schema design, cache strategy, pipeline ordering]
  • L5-L6: A user with 10 million followers posts a photo. Walk me through how the notification system handles the fan-out without delaying other users notifications. [Tests: fan-out strategies, batched delivery, priority isolation, backpressure]
  • L6: Compare push notifications, email, and SMS as delivery channels. When would you use each, and how does channel selection affect your architecture? [Tests: channel trade-off analysis, adapter pattern, provider failover, cost optimization]
  • L6-L7: Design the monitoring and alerting system for a notification pipeline processing 1 billion notifications per day. What metrics do you track, and what are the SLA thresholds? [Tests: SLA definition, metric selection, alerting strategy, capacity planning, failure mode analysis]

Key takeaways

  • A notification system is an event-driven pipeline with five core stages: ingestion, deduplication, preference checking, template rendering, and channel-specific delivery. Each stage must be independently scalable and fault-tolerant, with failures in one stage contained from cascading to others.
  • Fan-out strategy is the most critical architectural decision: fan-out on write for small recipient lists (< 10K), batched fan-out with rate-controlled waves for large lists (> 100K), and fan-out on read for in-app notifications. The hybrid approach prevents fan-out storms while maintaining low delivery latency.
  • Priority isolation must be physical, not logical: critical notifications (2FA, security alerts) require dedicated Kafka topics, consumer pools, and provider connections that are never shared with normal-priority traffic. A marketing blast must never delay a password reset code.
  • At-least-once delivery combined with idempotency key deduplication provides effectively-exactly-once semantics at practical cost. The idempotency key TTL must exceed the maximum retry window to prevent late duplicates. Dead letter queues capture permanently failed notifications for investigation and replay.
  • User preference management and rate limiting are not afterthoughts — they are core pipeline components that determine whether users keep notifications enabled or disable them permanently. Check preferences on every notification hot path, enforce per-user per-channel rate limits, respect quiet hours by timezone, and always allow critical notifications to bypass rate limits.
Before you move on: can you answer these?

A celebrity with 50 million followers posts a photo. Walk through how the notification system handles the fan-out without overwhelming the pipeline.

The system detects that the recipient count exceeds the fan-out threshold (e.g., 10K). Instead of immediate fan-out on write, it uses the batched fan-out strategy: the event is stored once, and a batch scheduler generates notifications in rate-controlled waves of 50K-100K per minute. Push notifications are sent first to recently active users (most likely to engage), then to less active users. In-app notifications use fan-out on read — stored as a single event and materialized when each user opens their feed. The total fan-out takes 8-16 minutes instead of overwhelming the queue in seconds. Critical and high-priority notifications for other users continue flowing through their dedicated priority queues, completely unaffected by the celebrity fan-out.

Your SMS provider (Twilio) experiences a 30-minute outage. How does the notification system handle this without losing any notifications?

The SMS channel adapter detects elevated error rates (5xx responses or timeouts) from Twilio. After the error rate exceeds 5% over a 1-minute window, the circuit breaker opens, stopping all requests to Twilio. Incoming SMS notifications continue to be consumed from the SMS priority queue but are now routed to the failover provider (e.g., Amazon SNS or Vonage). If no failover is configured, notifications are re-enqueued with exponential backoff delays (1min, 2min, 4min, 8min). Idempotency keys prevent duplicate delivery if the original request actually succeeded but the response was lost. After 30 minutes, the circuit breaker enters half-open state, sending 10% of traffic to Twilio. If responses are healthy, the circuit fully closes and traffic returns to Twilio. Notifications that exceeded max retries during the outage are in the dead letter queue for manual replay.

Explain why critical notifications (2FA codes) must have physically separate infrastructure from marketing notifications, and what happens if they share a queue.

If critical and marketing notifications share the same Kafka topic and consumer pool, a large marketing blast (e.g., 10 million promotional emails) fills the queue with low-priority messages. Consumers process messages in order, so 2FA codes and security alerts are trapped behind millions of marketing messages. Even with consumer-side priority sorting, the queue depth causes memory pressure on Kafka brokers, increasing produce latency for all messages. The result: a user waiting for a 2FA code to log into their bank account waits 5-10 minutes instead of 1 second. Physical separation means dedicated Kafka topics (notifications.critical, notifications.normal) with independent broker resources and dedicated consumer pools. The critical consumer pool is sized to drain the critical queue with p99 latency under 1 second, regardless of what is happening on the normal queue. This isolation is not an optimization — it is a security requirement.

🧠Mental Model

💡 Analogy

A notification system works like a national postal service sorting center. Events arrive at the sorting center (notification service) from many senders (event producers). Each piece of mail is inspected: is the recipient on the do-not-mail list (preference check)? Has the recipient already received too much mail today (rate limiting)? The address label is printed in the right language (template rendering). Then the mail is routed to the correct carrier based on urgency and type: a courier for overnight packages (push notifications — fast, direct delivery), standard mail for letters and catalogs (email — reliable, rich content, but slower), a telegram service for urgent short messages (SMS — expensive but immediate and universal), and an internal office mailbox for people who are in the building right now (in-app — free, instant, but only reaches active users). The sorting center maintains a log of every piece of mail sent, tracks delivery confirmations, and has a dead letter office for undeliverable mail that needs human review.

⚡ Core Idea

A notification system is an event-driven pipeline that transforms raw application events into personalized, channel-appropriate messages delivered reliably to the right users at the right time. The core challenge is not sending messages — any developer can call an API. The core challenge is doing it at scale (billions per day) while respecting user preferences, deduplicating across retries, isolating channel failures, and ensuring that critical notifications (2FA, security) are never delayed by bulk traffic. The architecture is defined by three key decisions: fan-out strategy (when to expand recipients), delivery guarantee (how to handle failures), and priority isolation (how to prevent low-priority traffic from starving high-priority).

🎯 Why It Matters

Notification system design appears in system design interviews at every FAANG company because it compresses an extraordinary number of distributed systems concepts into a single, well-understood product. In 45 minutes, a candidate must demonstrate competency in event-driven architecture, message queue design, fan-out optimization, third-party API integration, idempotency and deduplication, rate limiting, template rendering with i18n, user preference management, priority queue isolation, circuit breaker patterns, and SLA monitoring. The product is simple to understand (send users messages) but the engineering is deep (exactly-once semantics, celebrity fan-out, quiet hours across timezones). This contrast between simple product and complex engineering is what makes it the perfect calibration question for L5-L7 system design interviews.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.