Interactive Explainer

🎯Key Takeaways

A notification system is an event-driven pipeline with five core stages: ingestion, deduplication, preference checking, template rendering, and channel-specific delivery. Each stage must be independently scalable and fault-tolerant, with failures in one stage contained from cascading to others.

Fan-out strategy is the most critical architectural decision: fan-out on write for small recipient lists (< 10K), batched fan-out with rate-controlled waves for large lists (> 100K), and fan-out on read for in-app notifications. The hybrid approach prevents fan-out storms while maintaining low delivery latency.

Priority isolation must be physical, not logical: critical notifications (2FA, security alerts) require dedicated Kafka topics, consumer pools, and provider connections that are never shared with normal-priority traffic. A marketing blast must never delay a password reset code.

At-least-once delivery combined with idempotency key deduplication provides effectively-exactly-once semantics at practical cost. The idempotency key TTL must exceed the maximum retry window to prevent late duplicates. Dead letter queues capture permanently failed notifications for investigation and replay.

User preference management and rate limiting are not afterthoughts — they are core pipeline components that determine whether users keep notifications enabled or disable them permanently. Check preferences on every notification hot path, enforce per-user per-channel rate limits, respect quiet hours by timezone, and always allow critical notifications to bypass rate limits.

Designing a Notification System at Scale

A comprehensive deep-dive into designing a production-grade notification system capable of delivering billions of messages per day across push, email, SMS, and in-app channels. Covers event ingestion and fan-out strategies, channel adapter design for APNs, FCM, SMTP, and Twilio, delivery guarantees with idempotency keys and deduplication, user preference management with rate limiting and quiet hours, template engines with i18n and A/B testing, and scaling with priority queues, partitioning, and dead letter queues. This is a core system design interview question at FAANG companies because it tests breadth (event-driven architecture, queue design, third-party integrations) and depth (fan-out optimization, exactly-once semantics, failure isolation) in a single 45-minute session.

~36 min read

Be the first to complete!

What you'll learn

A notification system is an event-driven pipeline with five core stages: ingestion, deduplication, preference checking, template rendering, and channel-specific delivery. Each stage must be independently scalable and fault-tolerant, with failures in one stage contained from cascading to others.
Fan-out strategy is the most critical architectural decision: fan-out on write for small recipient lists (< 10K), batched fan-out with rate-controlled waves for large lists (> 100K), and fan-out on read for in-app notifications. The hybrid approach prevents fan-out storms while maintaining low delivery latency.
Priority isolation must be physical, not logical: critical notifications (2FA, security alerts) require dedicated Kafka topics, consumer pools, and provider connections that are never shared with normal-priority traffic. A marketing blast must never delay a password reset code.
At-least-once delivery combined with idempotency key deduplication provides effectively-exactly-once semantics at practical cost. The idempotency key TTL must exceed the maximum retry window to prevent late duplicates. Dead letter queues capture permanently failed notifications for investigation and replay.
User preference management and rate limiting are not afterthoughts — they are core pipeline components that determine whether users keep notifications enabled or disable them permanently. Check preferences on every notification hot path, enforce per-user per-channel rate limits, respect quiet hours by timezone, and always allow critical notifications to bypass rate limits.

Lesson outline

Why Notification Systems Are Critical

Notification systems are the connective tissue of every modern application. They are the primary mechanism through which products re-engage users, communicate time-sensitive information, and drive daily active usage. Without notifications, most mobile applications would see their engagement metrics collapse — studies consistently show that users who enable push notifications retain at rates two to three times higher than those who do not.

At FAANG scale, notification systems operate at staggering volumes. Facebook sends over one billion push notifications per day. Google delivers billions of notifications across Gmail, YouTube, and Android system alerts. Amazon sends hundreds of millions of order updates, delivery alerts, and promotional emails daily. These are not simple fire-and-forget messages — each notification must be personalized, deduplicated, delivered through the correct channel, and respectful of user preferences and quiet hours.

Why Interviewers Love This Question

A notification system touches every distributed systems concept: event-driven architecture, message queues, fan-out strategies, third-party API integration (APNs, FCM, SMTP, Twilio), idempotency, rate limiting, template rendering, user preference management, and failure handling. It can be discussed at L4 depth (basic queue + channel adapters) or L7 depth (exactly-once delivery semantics, priority-based routing, celebrity fan-out optimization). This range makes it one of the most versatile system design interview questions.

The four primary notification channels each have fundamentally different characteristics. Push notifications (via Apple Push Notification service and Firebase Cloud Messaging) offer near-instant delivery to mobile devices but are limited in content length and require device tokens. Email provides rich formatting, attachments, and permanence but has variable delivery times and spam filtering risks. SMS offers near-universal reach and high open rates (98% within 3 minutes) but is expensive and heavily regulated. In-app notifications are free and fully controlled but only reach users who are actively using the application.

Business functions of notifications

Transactional alerts — Order confirmations, payment receipts, password resets, two-factor authentication codes — time-critical messages that users expect immediately.
Social engagement — Friend requests, likes, comments, mentions, shares — the social feedback loop that drives daily active usage on platforms like Facebook, Instagram, and Twitter.
System alerts — Service outages, security warnings, account compromises, scheduled maintenance — critical operational messages that protect user trust.
Marketing and re-engagement — Promotional offers, abandoned cart reminders, content recommendations — revenue-driving messages that must balance engagement with user tolerance.
Real-time updates — Live sports scores, stock price alerts, flight status changes, ride arrival notifications — latency-sensitive messages where seconds matter.
Digest and summary — Daily email digests, weekly activity summaries, monthly reports — batched notifications that reduce noise while maintaining awareness.

Scale Numbers to Remember

Facebook: 1B+ push notifications/day. WhatsApp: 100B+ messages/day (many triggering notifications). Amazon SES: 10B+ emails/month. Twilio: 100B+ SMS/year. A mid-size SaaS product (10M MAU) typically sends 50-100M notifications/month across all channels. At 100M notifications/day, the system must sustain roughly 1,200 notifications/second average and 6,000-12,000/second at peak (5-10x burst during events like Black Friday or World Cup).

Understanding these scale numbers is critical because they drive every architectural decision downstream. A notification system for a startup with 10,000 users can run on a single server with direct API calls to SendGrid and Firebase. At 100 million notifications per day, you need distributed queues, channel-specific adapters with independent scaling, retry mechanisms with exponential backoff, and sophisticated deduplication to prevent users from receiving the same notification twice.

The Notification Paradox

Notifications that are too frequent annoy users and drive them to disable permissions entirely. Notifications that are too infrequent fail to re-engage users and the product is forgotten. The optimal notification frequency varies by user, channel, and content type — making preference management and rate limiting just as important as the delivery infrastructure itself.

Requirements & Types of Notifications

Before drawing any architecture diagrams, a strong system design answer begins with clarifying requirements. For a notification system, the functional requirements define what types of notifications the system supports and how users interact with them, while non-functional requirements define the reliability, latency, and scale targets that drive the architecture.

Functional requirements

Multi-channel delivery — Support push notifications (iOS/Android), email, SMS, and in-app notifications. Each channel has its own delivery API, content format, and retry semantics.
Event-driven ingestion — Accept notification trigger events from multiple upstream services (e.g., order service, social service, security service) via a standardized event schema.
User preferences — Allow users to configure per-channel opt-in/opt-out, notification categories (marketing, transactional, social), frequency caps, and quiet hours.
Template rendering — Support parameterized templates with personalization (user name, order number), localization (i18n), and A/B test variants.
Priority levels — Support at least three priority tiers: critical (2FA codes, security alerts — immediate delivery), high (social interactions — within seconds), and normal (marketing, digests — best effort).
Delivery tracking — Track notification lifecycle: created, queued, sent, delivered, opened, clicked, bounced, failed. Expose delivery status via API and dashboards.

Non-functional requirements

High availability — The notification pipeline must be 99.99% available. Missed critical notifications (2FA codes, security alerts) directly impact user trust and security.
Low latency for critical notifications — Critical notifications must reach the delivery provider within 1 second of the triggering event. Marketing notifications can tolerate minutes of delay.
At-least-once delivery — Every notification must be delivered at least once. Exactly-once is ideal but hard; use idempotency keys to achieve effectively-exactly-once semantics.
Scalability — Support 100M+ notifications per day across all channels, with 10x burst capacity during peak events (product launches, sporting events, breaking news).
Rate limiting — Enforce per-user, per-channel, and global rate limits to prevent notification fatigue and comply with carrier regulations (SMS) and provider limits (APNs).
Fault isolation — A failure in one channel (e.g., SMS provider outage) must not affect delivery through other channels. Each channel adapter must be independently deployable and scalable.

Back-of-Envelope Estimation

Assume 100M notifications/day across all channels: 40M push, 30M email, 10M SMS, 20M in-app. That is roughly 1,200 notifications/second average. At 10x peak (major event), expect 12,000/second. Each notification record is approximately 1 KB (recipient, channel, template ID, parameters, metadata). Daily storage: 100 GB. Monthly: 3 TB. Delivery log retention (30 days): ~3 TB. The queue must handle 12K messages/second at peak — well within Kafka capacity (millions/second per cluster) but beyond a single RabbitMQ node.

Channel	Latency Target	Delivery Guarantee	Cost Per Message	Typical Use Case
Push (APNs/FCM)	< 1 second	Best effort (device may be offline)	$0 (free)	Social updates, real-time alerts, order status
Email (SES/SendGrid)	< 30 seconds	Eventual (SMTP relay chain)	$0.0001 per email	Receipts, newsletters, digests, marketing
SMS (Twilio/SNS)	< 5 seconds	Carrier-dependent delivery	$0.0075 per SMS	2FA codes, critical alerts, delivery confirmations
In-App (WebSocket)	< 200ms	Online users only	$0 (free)	Activity feed, unread badges, live updates

Priority Inversion Trap

If critical notifications (2FA codes) share the same queue as marketing blasts, a large promotional campaign can delay security-sensitive messages by minutes. Always use separate priority queues: a dedicated high-priority queue for critical notifications that is never starved by bulk traffic. This is not optional — it is a security requirement.

A critical question to ask the interviewer: should the notification system support batching and digest mode? If yes, the architecture needs a scheduler component that accumulates events over a time window (e.g., 1 hour) and consolidates them into a single digest notification. This fundamentally changes the pipeline from a real-time streaming model to a hybrid streaming-plus-batch model.

Notification Event Schema

A well-designed event schema is critical. Include: event_id (UUID for idempotency), event_type (e.g., "order.shipped"), recipient_id, channel_preference (push/email/sms/all), priority (critical/high/normal/low), template_id, template_params (JSON), created_at, and idempotency_key. The idempotency_key prevents duplicate delivery when upstream services retry failed publishes.

High-Level Architecture

The high-level architecture of a notification system follows a classic event-driven pipeline pattern: event producers emit notification triggers, a central notification service validates and enriches them, channel-specific adapters format and deliver messages through external providers, and a feedback loop tracks delivery status. This pipeline must handle millions of notifications per day while maintaining strict ordering guarantees for per-user notification sequences.

graph TB
    subgraph "Event Producers"
        OP["Order Service"] --> MQ["Message Queue<br/>(Kafka)"]
        SP["Social Service"] --> MQ
        SEC["Security Service"] --> MQ
        MKT["Marketing Service"] --> MQ
        SYS["System Alerts"] --> MQ
    end

    subgraph "Notification Service"
        MQ --> VAL["Validator &<br/>Deduplicator"]
        VAL --> PREF["Preference<br/>Checker"]
        PREF --> RL["Rate Limiter"]
        RL --> TMPL["Template<br/>Engine"]
        TMPL --> ROUTER["Channel<br/>Router"]
    end

    subgraph "Priority Queues"
        ROUTER --> PQ1["Critical Queue<br/>(2FA, security)"]
        ROUTER --> PQ2["High Queue<br/>(social, orders)"]
        ROUTER --> PQ3["Normal Queue<br/>(marketing, digests)"]
    end

    subgraph "Channel Adapters"
        PQ1 --> PA["Push Adapter<br/>(APNs / FCM)"]
        PQ1 --> EA["Email Adapter<br/>(SES / SendGrid)"]
        PQ1 --> SA["SMS Adapter<br/>(Twilio)"]
        PQ2 --> PA
        PQ2 --> EA
        PQ2 --> SA
        PQ2 --> IA["In-App Adapter<br/>(WebSocket)"]
        PQ3 --> PA
        PQ3 --> EA
        PQ3 --> IA
    end

    subgraph "Delivery Providers"
        PA --> APNS["Apple APNs"]
        PA --> FCM["Google FCM"]
        EA --> SES["Amazon SES"]
        EA --> SG["SendGrid"]
        SA --> TW["Twilio"]
        IA --> WS["WebSocket<br/>Server"]
    end

    subgraph "Feedback & Analytics"
        APNS --> FB["Delivery<br/>Feedback"]
        FCM --> FB
        SES --> FB
        TW --> FB
        WS --> FB
        FB --> DLQ["Dead Letter<br/>Queue"]
        FB --> METRICS["Metrics &<br/>Dashboards"]
    end

    style VAL fill:#3b82f6,stroke:#333,color:#fff
    style ROUTER fill:#f59e0b,stroke:#333,color:#fff
    style DLQ fill:#ef4444,stroke:#333,color:#fff
    style METRICS fill:#10b981,stroke:#333,color:#fff

End-to-end notification pipeline: event producers publish to Kafka, the notification service validates, deduplicates, checks preferences, rate-limits, renders templates, and routes to priority queues. Channel adapters deliver through external providers. Feedback loops track delivery status.

The notification service is the central orchestration layer. It receives raw events from upstream services and transforms them into deliverable notifications. This transformation involves several steps that must execute in sequence: validation (is the event schema correct?), deduplication (have we already processed this event?), preference checking (does the user want this type of notification on this channel?), rate limiting (has the user exceeded their notification budget?), template rendering (generate the final message content), and channel routing (which adapter should deliver this?).

Core components

Event ingestion layer — Kafka topics partitioned by recipient_id ensure ordered processing per user. Multiple consumer groups allow independent scaling of validation, preference checking, and delivery stages.
Notification service — Stateless workers that consume events, apply business logic (dedup, preferences, rate limiting, templating), and enqueue delivery tasks to priority queues.
Priority queue system — Three Kafka topics (critical, high, normal) with different consumer group configurations. Critical queue has dedicated consumers with aggressive polling intervals.
Channel adapters — Independent microservices per channel (push, email, SMS, in-app). Each adapter owns its connection pool, retry logic, and provider-specific formatting.
Delivery feedback processor — Consumes webhooks and polling results from delivery providers (APNs feedback, SES bounce notifications, Twilio status callbacks) and updates the delivery log.
Dead letter queue — Captures permanently failed notifications for manual inspection and replay. Critical for debugging delivery issues without losing messages.

Partition by Recipient ID

Partitioning Kafka topics by recipient_id guarantees that all notifications for a given user are processed by the same consumer in order. This is essential for deduplication (checking if we already sent this notification) and rate limiting (counting how many notifications this user has received in the current window). Without per-user ordering, race conditions in deduplication and rate limiting become extremely difficult to handle.

Common Mistake: Synchronous Channel Delivery

A common architectural mistake is having the notification service call delivery providers synchronously (e.g., calling the APNs API inline during event processing). This creates tight coupling: if APNs is slow or down, the entire notification pipeline backs up. Instead, always enqueue delivery tasks to per-channel queues and let independent adapter workers handle delivery asynchronously. This ensures that an SMS provider outage does not delay push notification delivery.

The architecture must support graceful degradation. If the email provider experiences an outage, email notifications should be queued and retried while push and SMS delivery continues unaffected. If the entire notification service is overloaded, the priority queue system ensures that critical notifications (security alerts, 2FA codes) are processed before marketing messages. This layered resilience is what separates a production-grade notification system from a prototype.

Fan-Out Strategies — Push vs Pull

Fan-out is the most computationally expensive operation in a notification system. When an event occurs (a celebrity posts a photo, a product goes on sale, a sports match ends), the system must determine every affected user and generate an individual notification for each one. At FAANG scale, a single event can trigger notifications for tens of millions of users simultaneously. The fan-out strategy determines whether this expansion happens eagerly at write time or lazily at read time.

The fan-out problem in notification systems mirrors the classic news feed fan-out problem, but with a critical difference: notifications are push-based (delivered to users) rather than pull-based (fetched by users). This means the system cannot defer work until the user opens the app — the notification must be delivered proactively, often within seconds of the triggering event.

The Celebrity Problem

When a user with 100 million followers posts content, a naive fan-out-on-write approach would generate 100 million notification records instantly. At 1 KB per record, that is 100 GB of data generated by a single action. This overwhelms the queue system, creates write hotspots, and can delay notifications for other users by minutes. Every notification system at scale must have a strategy for handling high-fan-out events.

Strategy 1: Fan-Out on Write (Push Model)

→

When a triggering event occurs (e.g., a new post), immediately enumerate all recipients (followers, subscribers, group members).

→

For each recipient, create an individual notification record with the recipient ID, channel, and rendered content.

→

Enqueue each notification record into the appropriate priority queue for delivery.

→

Advantage: delivery latency is predictable — every notification is pre-computed and ready for delivery. No work deferred to read time.

→

Disadvantage: a single event with millions of recipients creates a massive write burst. The time to complete fan-out grows linearly with recipient count.

Best for: events with small to medium recipient lists (< 10,000 recipients) and when low delivery latency is critical.

When a triggering event occurs (e.g., a new post), immediately enumerate all recipients (followers, subscribers, group members).

For each recipient, create an individual notification record with the recipient ID, channel, and rendered content.

Enqueue each notification record into the appropriate priority queue for delivery.

Advantage: delivery latency is predictable — every notification is pre-computed and ready for delivery. No work deferred to read time.

Disadvantage: a single event with millions of recipients creates a massive write burst. The time to complete fan-out grows linearly with recipient count.

Best for: events with small to medium recipient lists (< 10,000 recipients) and when low delivery latency is critical.

Strategy 2: Fan-Out on Read (Pull Model)

→

When a triggering event occurs, store a single event record (not per-recipient) in an event log.

→

When a user opens the app or checks notifications, query the event log for events relevant to that user (based on subscriptions, group memberships).

→

Generate and render the notification on the fly during the read request.

→

Advantage: a single event with millions of potential recipients requires only one write. No fan-out burst at write time.

→

Disadvantage: delivery latency is unpredictable — notifications are only generated when users actively check. Does not work for push notifications, email, or SMS (which are inherently push-based).

Best for: in-app notification feeds where the user is already in the application and pulls their own notification list.

When a triggering event occurs, store a single event record (not per-recipient) in an event log.

When a user opens the app or checks notifications, query the event log for events relevant to that user (based on subscriptions, group memberships).

Generate and render the notification on the fly during the read request.

Advantage: a single event with millions of potential recipients requires only one write. No fan-out burst at write time.

Disadvantage: delivery latency is unpredictable — notifications are only generated when users actively check. Does not work for push notifications, email, or SMS (which are inherently push-based).

Best for: in-app notification feeds where the user is already in the application and pulls their own notification list.

The Hybrid Approach (FAANG Standard)

Production notification systems use a hybrid strategy. For normal users (< 10K followers), fan-out on write with pre-computed recipient lists. For celebrity users (> 100K followers), store the event once and fan-out lazily: push notifications are batched and sent in waves (e.g., 50K per minute over 2 minutes), and in-app notifications use fan-out on read. A threshold parameter (configurable per event type) determines which path is taken. Facebook and Twitter both use this hybrid model.

graph TB
    subgraph "Hybrid Fan-Out Decision"
        EVENT["Notification Event"] --> CHECK{"Recipient<br/>Count?"}
        CHECK -->|"< 10K recipients"| FOW["Fan-Out on Write<br/>(pre-compute all)"]
        CHECK -->|"> 100K recipients"| BATCH["Batched Fan-Out<br/>(waves of 50K/min)"]
        CHECK -->|"In-app only"| FOR["Fan-Out on Read<br/>(compute on demand)"]

        FOW --> PQ["Priority Queues"]
        BATCH --> SCHED["Batch Scheduler<br/>(rate-controlled)"]
        SCHED --> PQ
        FOR --> CACHE["Event Cache<br/>(read at query time)"]
    end

    style CHECK fill:#f59e0b,stroke:#333,color:#fff
    style FOW fill:#10b981,stroke:#333,color:#fff
    style BATCH fill:#3b82f6,stroke:#333,color:#fff
    style FOR fill:#8b5cf6,stroke:#333,color:#fff

Hybrid fan-out: small recipient lists fan out immediately, large lists are batched into rate-controlled waves, and in-app notifications defer to read time.

Pre-computing recipient lists is a key optimization for fan-out on write. Rather than querying the follower graph at notification time (which adds latency and database load), maintain a pre-computed recipient list for each notification source. When a user follows or unfollows someone, update the recipient list asynchronously. This trades storage (maintaining the lists) for write-path latency (instant lookup instead of graph traversal).

Strategy	Write Cost	Read Cost	Delivery Latency	Best For
Fan-out on write	O(N) per event (N = recipients)	O(1) per delivery	Low and predictable	Small groups, time-critical alerts
Fan-out on read	O(1) per event	O(M) per user query (M = subscriptions)	High and variable	In-app feeds, non-urgent updates
Hybrid (batched write)	O(N) spread over time	O(1) per delivery	Moderate but controlled	Celebrity accounts, viral events

Fan-Out Storms and Backpressure

Without backpressure mechanisms, a burst of high-fan-out events (e.g., multiple celebrities posting simultaneously during a major event) can overwhelm the queue system. Implement backpressure at the fan-out stage: limit the rate at which individual notifications are enqueued (e.g., 100K per second per event source). Use a semaphore or token bucket to throttle fan-out workers. Monitor queue depth and automatically slow ingestion when queues exceed 80% capacity.

Channel Adapters: Push, Email, SMS, In-App

Channel adapters are the bridge between the notification system and external delivery providers. Each adapter is a specialized microservice that understands the API, content format, rate limits, and error semantics of its delivery provider. The adapter pattern ensures that the core notification service is decoupled from provider-specific logic — you can swap SendGrid for Mailgun or Twilio for Vonage without changing the core pipeline.

Each channel has fundamentally different delivery characteristics. Push notifications travel through proprietary gateways (APNs for iOS, FCM for Android) that maintain persistent connections to devices. Email flows through SMTP relay chains with store-and-forward semantics. SMS routes through carrier networks with per-country regulations and number formatting rules. In-app notifications use WebSocket connections to deliver messages to users who are currently online.

Push notification adapter (APNs / FCM)

Device token management — Maintain a mapping of user_id to device tokens. Users may have multiple devices (phone, tablet, watch). Tokens expire or become invalid when users uninstall the app — process APNs feedback service and FCM registration errors to prune stale tokens.
Payload formatting — APNs payloads are limited to 4 KB (JSON with alert, badge, sound, custom data). FCM payloads have a 4 KB limit for data messages. Adapter must truncate content intelligently and include deep-link URLs for navigation.
Connection management — APNs uses HTTP/2 with persistent connections (multiplexing thousands of notifications per connection). FCM uses HTTP/2 or XMPP. Maintain a pool of long-lived connections — creating new connections for each notification is prohibitively expensive.
Silent push and background updates — Support content-available push notifications that wake the app in the background to fetch new data without displaying a visible alert. Useful for pre-fetching content before the user opens the app.

Email adapter (SES / SendGrid)

SMTP integration — Use provider SDKs (not raw SMTP) for reliability. Amazon SES supports 50K emails/second per account. SendGrid supports burst rates via dedicated IPs. Authenticate with SPF, DKIM, and DMARC to avoid spam filters.
Bounce and complaint handling — Process SES bounce notifications (hard bounce = invalid address, soft bounce = mailbox full) and complaint feedback loops (user marked as spam). Remove hard-bounced addresses immediately. Suppress complained addresses for 30 days.
IP warming — New sending IPs must be warmed gradually (start with 100 emails/day, double every 2 days) to build sender reputation. Sending 1M emails from a cold IP results in immediate blacklisting.
Rendering — Email requires HTML rendering with inline CSS (most email clients strip external stylesheets). Include a plain-text fallback for accessibility and deliverability. Test rendering across Gmail, Outlook, Apple Mail, and Yahoo.

SMS adapter (Twilio / SNS)

Number formatting — Normalize all phone numbers to E.164 format (+1234567890). Validate country codes. Handle short codes for marketing and long codes for transactional messages.
Carrier regulations — US: register for A2P 10DLC campaigns. EU: comply with GDPR consent requirements. India: DLT registration for commercial SMS. Violation results in message blocking and fines.
Message segmentation — SMS messages over 160 characters (GSM-7) or 70 characters (UCS-2 for Unicode) are split into multiple segments, each billed separately. Adapter must calculate segment count and warn if content exceeds cost thresholds.
Delivery receipts — Twilio provides delivery status callbacks (queued, sent, delivered, undelivered, failed). Process these asynchronously to update the delivery log. Note that carrier-level delivery confirmation is not 100% reliable.

Property	Push (APNs/FCM)	Email (SES/SendGrid)	SMS (Twilio)	In-App (WebSocket)
Delivery speed	< 1 second	1-30 seconds	1-10 seconds	< 200ms
Payload size	4 KB	10 MB (with attachments)	160 chars (GSM-7)	Unlimited (practical: 10 KB)
Cost per message	Free	$0.0001	$0.0075	Free
Delivery confirmation	APNs feedback / FCM receipts	Bounce/complaint notifications	Carrier delivery receipts	WebSocket ACK
Offline delivery	Queued by provider (up to 28 days)	Store-and-forward (SMTP)	Queued by carrier (24-72 hours)	Not possible — user must be online
Rate limits	APNs: no hard limit (but throttled). FCM: 1000 msg/sec default	SES: 50K/sec. SendGrid: varies by plan	Twilio: 1 msg/sec per number (10DLC)	Server memory / connection count

In-App Notification via WebSocket

For in-app notifications, maintain persistent WebSocket connections between the client and a WebSocket gateway. When a notification is ready for an online user, push it directly through the WebSocket. For offline users, store the notification in an unread notifications table and deliver it when the user reconnects. Use a presence service (backed by Redis with TTL-based expiry) to track which users are currently online and connected to which gateway instance.

Provider Failover

Never depend on a single delivery provider. Configure primary and fallback providers for each channel (e.g., SES primary, SendGrid fallback for email). Use a circuit breaker pattern: if the primary provider error rate exceeds 5% over a 1-minute window, automatically route traffic to the fallback. Reset the circuit after 5 minutes and gradually shift traffic back. This provides resilience against provider-specific outages.

Delivery Guarantees & Deduplication

Delivery guarantees are the most technically challenging aspect of a notification system. Users expect that every important notification reaches them exactly once — they do not want to miss a 2FA code, and they do not want to receive the same order confirmation three times. Achieving this in a distributed system with multiple failure modes (network partitions, consumer crashes, provider timeouts) requires careful engineering of idempotency, deduplication, and retry mechanisms.

The three standard delivery guarantee levels in distributed systems are: at-most-once (fire and forget — fast but may lose messages), at-least-once (retry until acknowledged — no message loss but may duplicate), and exactly-once (each message processed precisely once — the gold standard but expensive to achieve). For notification systems, at-least-once delivery combined with idempotency-based deduplication provides effectively-exactly-once semantics at a practical cost.

Idempotency Keys Are Non-Negotiable

Every notification event must carry an idempotency key — a unique identifier (typically a UUID or a deterministic hash of event_type + recipient_id + event_timestamp) that the notification service uses to detect duplicates. Before processing any notification, check a fast deduplication store (Redis SET with TTL) for the idempotency key. If present, skip processing. If absent, add the key with a TTL (e.g., 24 hours) and proceed. This simple mechanism prevents the vast majority of duplicate notifications.

At-Least-Once Delivery with Deduplication

→

Producer publishes notification event to Kafka with an idempotency_key in the message payload.

→

Consumer reads the event and checks Redis for the idempotency_key using SETNX (set if not exists) with a 24-hour TTL.

→

If SETNX returns false (key already exists), the event is a duplicate — log and skip it. Commit the Kafka offset.

→

If SETNX returns true (new key), process the notification: check preferences, render template, enqueue to channel adapter.

→

Channel adapter delivers to the external provider. If the provider returns success, mark the notification as delivered.

→

If the provider returns a retryable error (5xx, timeout), re-enqueue the notification with an incremented retry count and exponential backoff delay (1s, 2s, 4s, 8s, 16s, max 60s).

→

After max retries (typically 5-8), move the notification to the dead letter queue for manual inspection.

If the consumer crashes after Redis SETNX but before committing the Kafka offset, the message will be redelivered — but the Redis key prevents reprocessing.