Realtime APIs: WebSockets, SSE, and Long Polling

On this page

The problem: HTTP only speaks when spoken to
A mental model: mailbox, held call, intercom, phone
What a scaled WebSocket layer actually looks like
Choosing: a side-by-side comparison
Code: a minimal SSE endpoint
Code: a WebSocket server with a heartbeat
Scaling stateful connections and the fan-out layer
Auth on long-lived connections
Where managed services fit
Common mistakes that cost hours
Takeaways and where to go next

TL;DR

Match the transport to the job: polling, SSE, and WebSockets sit on a clear progression from one-way updates to full duplex. The real work is scaling long-lived connections with heartbeats, a fan-out layer, and auth that survives.

The problem: HTTP only speaks when spoken to

Who this is for

You can build a normal request/response API, but now the product wants live updates, a chat that types in real time, a dashboard that ticks, a 'your order is ready' toast. You have heard the words WebSocket and SSE and you are not sure which one you need, or what happens when you have ten thousand of them open at once. This is the map.

Plain HTTP has one rule that shapes everything else: the client asks, the server answers, and then the connection is done. The server has no way to tap a client on the shoulder and say 'hey, something changed.' For a blog or a checkout flow that is fine, the user clicks, you respond. But the moment you need the *server* to start the conversation, you are fighting the grain of the protocol, and every realtime technique below is a different way of working around that one limitation.

There are really only four moves. You can ask over and over (polling), you can ask and let the server hold the line until it has news (long polling), you can open a one-way pipe the server streams down (Server-Sent Events), or you can open a two-way pipe both sides talk over (WebSockets). Picking the wrong one is not usually a crash, it is a slow bleed of cost, latency, and 3am pages. So let us make it concrete first.

A mental model: mailbox, held call, intercom, phone

Realtime is just deciding who is allowed to start talking, and how long the line stays open.

Walking to the mailbox every 5 minutes to check for postPolling, client re-requests on a timer, mostly to hear 'nothing yet'

Calling and the operator says 'hold the line, I'll tell you the moment it arrives'Long polling, request hangs open until there is news, then reconnects

A one-way intercom in a flat: the lobby buzzes you, you cannot buzz backSSE, server streams events down a single open channel, one direction

A phone call: either person can speak at any timeWebSocket, full-duplex, both sides push whenever they like

Four ways to find out something changed, from most wasteful to most powerful.

Notice the trade running through the list: more power costs more held-open state. A mailbox check is stateless and cheap, but it is mostly wasted trips. A phone call is instant in both directions, but someone has to keep the line up the whole time, and that someone is your server.

What a scaled WebSocket layer actually looks like

The single-server demo is easy. The moment you run two or more servers, a hard truth shows up: a connection lives on one specific box. If Alice is connected to server A and Bob is on server B, and Alice sends a message for Bob, server A has no idea Bob even exists. You need two things, a way to keep each client glued to its server (sticky sessions), and a shared bus so any server can fan a message out to whichever server holds the recipient (pub/sub, usually Redis).

Clients stick to a WS server via the load balancer; servers fan messages out through a Redis pub/sub bus so any server can reach any connection.

1
Client connects
Browser A opens a WebSocket. The load balancer routes it to WS Server 1 and pins the connection there (affinity) so every frame keeps landing on the same box.
2
A message arrives
Browser A sends a chat message for Browser B. WS Server 1 receives it but does not hold B's socket.
3
Publish to the bus
Server 1 publishes the message to a Redis channel (e.g. 'room:42') instead of trying to find B itself.
4
Fan-out
Every WS server is subscribed to relevant channels. Redis pushes the message to all of them, the one holding B (Server 2) acts on it.
5
Deliver
Server 2 writes the frame down B's socket. Neither server needed to know the cluster's full topology, the bus decoupled them.

Choosing: a side-by-side comparison

Before any code, pick the simplest tool that covers your direction of data flow. If the server only needs to *tell* clients things, you do not need a WebSocket, and SSE will save you a lot of operational pain.

	Polling	Long polling	SSE	WebSockets
Direction	Client pull	Client pull (held)	Server → client	Full duplex
Transport	Repeated HTTP	Held HTTP	One HTTP stream	Upgraded TCP
Reconnection	Trivial (next poll)	Re-issue request	Built-in (auto)	You build it
Proxy / firewall	Friendly	Friendly	Friendly (HTTP)	Can be blocked
Best for	Rare, cheap checks	Legacy / fallback	Feeds, notifications	Chat, collab, games

The same four techniques, compared on the dimensions that actually decide the call.

The default is smaller than you think

If your answer to 'does the client ever need to send data over the live channel?' is no, reach for SSE first. It rides on plain HTTP, reconnects itself, passes through proxies, and is a fraction of the code. Save WebSockets for genuine two-way traffic.

Code: a minimal SSE endpoint

SSE is just an HTTP response with the content type text/event-stream that you never finish writing. Each event is a data: line followed by a blank line. The browser's EventSource handles reconnection and even resumes from the last event id for you.

typescript

// sse-server.ts, Node http, no framework needed
import { createServer } from "node:http";

createServer((req, res) => {
  if (req.url !== "/events") {
    res.writeHead(404).end();
    return;
  }

  // The three headers that make it a stream, not a response.
  res.writeHead(200, {
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
  });

  let id = 0;
  const tick = setInterval(() => {
    id += 1;
    // id: lets the browser resume after a drop via Last-Event-ID.
    res.write(`id: ${id}\n`);
    res.write(`event: price\n`);
    res.write(`data: ${JSON.stringify({ symbol: "ACME", px: 100 + id })}\n\n`);
  }, 1000);

  // Comment lines (starting with ':') act as heartbeats to keep proxies open.
  const beat = setInterval(() => res.write(": ping\n\n"), 15000);

  req.on("close", () => {
    clearInterval(tick);
    clearInterval(beat);
  });
}).listen(3001);

javascript

// browser, EventSource reconnects on its own.
const es = new EventSource("/events");
es.addEventListener("price", (e) => {
  const { symbol, px } = JSON.parse(e.data);
  console.log(symbol, px);
});
es.onerror = () => console.log("dropped, EventSource will retry automatically");

Code: a WebSocket server with a heartbeat

WebSockets give you two-way traffic, but the protocol will *not* tell you when a client silently disappears (laptop lid closed, phone tunnels into the underground). The fix is a ping/pong heartbeat: ping every client on a timer, and reap any that did not pong back. Without it you slowly accumulate dead sockets that eat memory and 'deliver' messages into the void.

typescript

// ws-server.ts, using the 'ws' library
import { WebSocketServer } from "ws";

const wss = new WebSocketServer({ port: 3002 });

wss.on("connection", (ws) => {
  // Mark alive; each pong flips it back to true.
  (ws as any).isAlive = true;
  ws.on("pong", () => ((ws as any).isAlive = true));

  ws.on("message", (raw) => {
    // Backpressure check: if the client can't keep up, don't pile on.
    if (ws.bufferedAmount > 1_000_000) return; // 1 MB queued, drop or slow down
    ws.send(`echo: ${raw}`);
  });
});

// The heartbeat sweep: ping everyone, kill the silent ones.
const sweep = setInterval(() => {
  for (const ws of wss.clients) {
    if (!(ws as any).isAlive) {
      ws.terminate(); // never ponged, it's gone
      continue;
    }
    (ws as any).isAlive = false;
    ws.ping();
  }
}, 30000);

wss.on("close", () => clearInterval(sweep));

javascript

// browser, WebSockets do NOT auto-reconnect. You build it, with backoff.
function connect(attempt = 0) {
  const ws = new WebSocket("wss://api.example.com/ws");

  ws.onopen = () => {
    attempt = 0; // reset once we're healthy
  };

  ws.onmessage = (e) => console.log("recv", e.data);

  ws.onclose = () => {
    // Exponential backoff + jitter so a server restart doesn't get a thundering herd.
    const base = Math.min(1000 * 2 ** attempt, 30000);
    const delay = base / 2 + Math.random() * (base / 2);
    setTimeout(() => connect(attempt + 1), delay);
  };
}
connect();

Scaling stateful connections and the fan-out layer

Connections are state, and state is the hard part

A stateless REST box can be replaced or scaled freely because no request 'lives' anywhere. A WebSocket box is the opposite, every open socket is in-memory state pinned to that one process. Restart it and you drop every connection. This single fact drives every decision below.

Sticky sessions keep each client glued to the server holding its socket. WebSockets need this because the connection cannot hop boxes mid-stream. See load balancing and autoscaling, explained.
A pub/sub bus (Redis, NATS, or a managed broker) decouples senders from receivers. Servers publish to channels; whichever server holds the recipient is subscribed and delivers. This is what lets you run more than one box.
Connection limits are real. A single node handles tens of thousands of idle sockets, but each costs file descriptors and memory. Plan capacity in *connections*, not requests/sec, and set OS ulimits accordingly.
Autoscaling is harder because scaling *in* kills live connections. Drain gracefully: stop accepting new sockets, let clients reconnect elsewhere via backoff, then terminate.
Lighten each connection by leaning on caching strategies so the live channel only carries deltas, not full payloads your cache already holds.

Auth on long-lived connections

Normal APIs re-check your token on every request. A WebSocket is checked once, at the handshake, then it can stay open for hours. That gap is the classic mistake: a user's token is revoked, their plan is downgraded, they are kicked from a room, and yet their socket happily keeps streaming because nobody ever re-checks.

Authenticate at the handshake, but with a short-lived token, pass it as a query param or first message, validate before accepting the upgrade.
Re-authorize on the actions that matter. Before delivering to a room, confirm the user is still a member; do not trust the membership snapshot from connect time.
Expire the connection when the token would have expired. Close the socket and force a reconnect with a fresh token rather than trusting it forever.
Revocation needs a push too. When you ban a user, publish a 'kick' event on the bus so whichever server holds their socket closes it immediately.

Where managed services fit

Everything above, sticky routing, the pub/sub bus, heartbeats, reconnection, presence, scaling, is genuinely hard to run well. Managed realtime services exist precisely so you do not have to. They terminate the connections, fan out the messages, and hand you an SDK; you publish and subscribe.

Option	What it gives you	Trade-off
Pusher / Ably	Channels, presence, history, SDKs	Per-message/connection pricing; vendor lock-in
API Gateway WebSockets	Serverless WS, no servers to run	Routes to Lambda; cold starts; AWS-shaped
Self-hosted (ws + Redis)	Full control, no per-msg cost	You own scaling, ops, and the 3am pages

Roughly where the common managed options sit.

Buy time, build leverage

Reach for a managed service when realtime is a feature, not your product, or when you need it shipped this quarter. Build it yourself when scale makes the per-message bill hurt, or when the realtime layer *is* the differentiator. Either way, understand the parts, the abstractions leak under load.

Common mistakes that cost hours

1No heartbeat. Dead clients linger as zombie sockets, leaking memory and silently swallowing the messages you 'send' them. Ping/pong and reap.
2No backpressure handling. A slow client's outbound buffer grows without bound until the server OOMs. Check bufferedAmount (or your library's equivalent) and shed or slow down.
3Auth only at the handshake. A revoked user keeps streaming for hours. Re-authorize on sensitive actions and expire the connection with the token.
4Assuming one server. It works beautifully on your laptop and breaks the instant you scale to two boxes because there is no shared bus. Design for fan-out from day one.
5Using WebSockets for one-way data. If the client never sends, you took on duplex complexity for nothing. SSE would have reconnected itself and sailed through proxies.

Takeaways and where to go next

The whole article in seven lines

HTTP can't start the conversation, every realtime technique works around that.
Order of power and cost: polling → long polling → SSE → WebSockets.
One-way (feeds, notifications)? Use SSE. Two-way (chat, collab, games)? Use WebSockets.
Connections are in-memory state pinned to one box: that's what makes scaling hard.
Two boxes need sticky sessions + a pub/sub bus (Redis) to fan messages out.
Always add a heartbeat, backpressure handling, and client-side reconnect with backoff.
Re-check auth after the handshake; a token validated once can outlive its permissions.

Realtime sits on top of the same fundamentals as everything else you scale. Once your fan-out works, the next questions are about capacity and failure, which is where the broader scaling playbook comes in.

Get the routing layer right: load balancing and autoscaling, explained.
Keep the live channel thin: caching strategies.
Zoom out to the bigger picture: scalability principles.
Practice the networking underneath it all in the networking lab.

The product wants live updates. Which realtime move fits the data flow?

Check your understanding

1. What single limitation of plain HTTP does every realtime technique in the article work around?

2. When does the article say you should reach for SSE rather than WebSockets?

Frequently asked questions

Why can't plain HTTP push updates to clients on its own?

Plain HTTP has one rule: the client asks, the server answers, and the connection is done. The server has no way to tap a client on the shoulder and say something changed, so every realtime technique is a different way of working around that limitation.

When should I choose SSE instead of WebSockets?

If the server only needs to tell clients things and the client never needs to send data over the live channel, reach for SSE first. It rides on plain HTTP, reconnects itself, and passes through proxies, which saves a lot of operational pain compared to a WebSocket.

What breaks when I run WebSockets across more than one server?

A connection lives on one specific box, so if Alice is on server A and Bob is on server B, server A has no idea Bob exists. You need sticky sessions to keep each client glued to its server, and a shared pub/sub bus, usually Redis, so any server can fan a message out to whichever server holds the recipient.

What are the four ways to do realtime, from cheapest to most powerful?

Polling asks over and over, long polling asks and lets the server hold the line until it has news, Server-Sent Events open a one-way pipe the server streams down, and WebSockets open a two-way pipe both sides talk over. The trade-off is that more power costs more held-open state on your server.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Cloud

Scalability Principles: Stateless, Horizontal & Decoupled

Read

Backend

What Is a Backend Engineer?

Read

Backend

How the Web Works: HTTP Requests

Read

Realtime APIs: WebSockets, SSE, and Long Polling

01The problem: HTTP only speaks when spoken to

02A mental model: mailbox, held call, intercom, phone

03What a scaled WebSocket layer actually looks like

04Choosing: a side-by-side comparison

05Code: a minimal SSE endpoint

06Code: a WebSocket server with a heartbeat

07Scaling stateful connections and the fan-out layer

08Auth on long-lived connections

09Where managed services fit

10Common mistakes that cost hours

11Takeaways and where to go next

Frequently asked questions

Want to go deeper?

Scalability Principles: Stateless, Horizontal & Decoupled

What Is a Backend Engineer?

How the Web Works: HTTP Requests

The problem: HTTP only speaks when spoken to

A mental model: mailbox, held call, intercom, phone

What a scaled WebSocket layer actually looks like

Choosing: a side-by-side comparison

Code: a minimal SSE endpoint

Code: a WebSocket server with a heartbeat

Scaling stateful connections and the fan-out layer

Auth on long-lived connections

Where managed services fit

Common mistakes that cost hours

Takeaways and where to go next