How to push data to clients without hammering your server. The progression from polling to WebSockets, when each one is the right tool, and how to scale long-lived connections without falling over.
You can build a normal request/response API, but now the product wants live updates, a chat that types in real time, a dashboard that ticks, a 'your order is ready' toast. You have heard the words **WebSocket** and **SSE** and you are not sure which one you need, or what happens when you have ten thousand of them open at once. This is the map.
Plain HTTP has one rule that shapes everything else: the client asks, the server answers, and then the connection is done. The server has no way to tap a client on the shoulder and say 'hey, something changed.' For a blog or a checkout flow that is fine, the user clicks, you respond. But the moment you need the *server* to start the conversation, you are fighting the grain of the protocol, and every realtime technique below is a different way of working around that one limitation.
There are really only four moves. You can ask over and over (polling), you can ask and let the server hold the line until it has news (long polling), you can open a one-way pipe the server streams down (Server-Sent Events), or you can open a two-way pipe both sides talk over (WebSockets). Picking the wrong one is not usually a crash, it is a slow bleed of cost, latency, and 3am pages. So let us make it concrete first.
A mental model: mailbox, held call, intercom, phone
Realtime is just deciding who is allowed to start talking, and how long the line stays open.
Walking to the mailbox every 5 minutes to check for postPolling, client re-requests on a timer, mostly to hear 'nothing yet'
Calling and the operator says 'hold the line, I'll tell you the moment it arrives'Long polling, request hangs open until there is news, then reconnects
A one-way intercom in a flat: the lobby buzzes you, you cannot buzz backSSE, server streams events down a single open channel, one direction
A phone call: either person can speak at any timeWebSocket, full-duplex, both sides push whenever they like
Four ways to find out something changed, from most wasteful to most powerful.
Notice the trade running through the list: more power costs more held-open state. A mailbox check is stateless and cheap, but it is mostly wasted trips. A phone call is instant in both directions, but someone has to keep the line up the whole time, and that someone is your server.
What a scaled WebSocket layer actually looks like
The single-server demo is easy. The moment you run two or more servers, a hard truth shows up: a connection lives on one specific box. If Alice is connected to server A and Bob is on server B, and Alice sends a message for Bob, server A has no idea Bob even exists. You need two things, a way to keep each client glued to its server (sticky sessions), and a shared bus so any server can fan a message out to whichever server holds the recipient (pub/sub, usually Redis).
Clients stick to a WS server via the load balancer; servers fan messages out through a Redis pub/sub bus so any server can reach any connection.
1
Client connects
Browser A opens a WebSocket. The load balancer routes it to WS Server 1 and pins the connection there (affinity) so every frame keeps landing on the same box.
2
A message arrives
Browser A sends a chat message for Browser B. WS Server 1 receives it but does not hold B's socket.
3
Publish to the bus
Server 1 publishes the message to a Redis channel (e.g. 'room:42') instead of trying to find B itself.
4
Fan-out
Every WS server is subscribed to relevant channels. Redis pushes the message to all of them, the one holding B (Server 2) acts on it.
5
Deliver
Server 2 writes the frame down B's socket. Neither server needed to know the cluster's full topology, the bus decoupled them.
Choosing: a side-by-side comparison
Before any code, pick the simplest tool that covers your direction of data flow. If the server only needs to *tell* clients things, you do not need a WebSocket, and SSE will save you a lot of operational pain.
Polling
Long polling
SSE
WebSockets
Direction
Client pull
Client pull (held)
Server → client
Full duplex
Transport
Repeated HTTP
Held HTTP
One HTTP stream
Upgraded TCP
Reconnection
Trivial (next poll)
Re-issue request
Built-in (auto)
You build it
Proxy / firewall
Friendly
Friendly
Friendly (HTTP)
Can be blocked
Best for
Rare, cheap checks
Legacy / fallback
Feeds, notifications
Chat, collab, games
The same four techniques, compared on the dimensions that actually decide the call.
The default is smaller than you think
If your answer to 'does the client ever need to send data over the live channel?' is no, reach for **SSE** first. It rides on plain HTTP, reconnects itself, passes through proxies, and is a fraction of the code. Save WebSockets for genuine two-way traffic.
Code: a minimal SSE endpoint
SSE is just an HTTP response with the content type text/event-stream that you never finish writing. Each event is a data: line followed by a blank line. The browser's EventSource handles reconnection and even resumes from the last event id for you.
typescript
// sse-server.ts, Node http, no framework neededimport { createServer } from"node:http";
createServer((req, res) => {
if (req.url !== "/events") {
res.writeHead(404).end();
return;
}
// The three headers that make it a stream, not a response.
res.writeHead(200, {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
});
let id = 0;
const tick = setInterval(() => {
id += 1;
// id: lets the browser resume after a drop via Last-Event-ID.
res.write(`id: ${id}\n`);
res.write(`event: price\n`);
res.write(`data: ${JSON.stringify({ symbol: "ACME", px: 100 + id })}\n\n`);
}, 1000);
// Comment lines (starting with ':') act as heartbeats to keep proxies open.const beat = setInterval(() => res.write(": ping\n\n"), 15000);
req.on("close", () => {
clearInterval(tick);
clearInterval(beat);
});
}).listen(3001);
javascript
// browser, EventSource reconnects on its own.const es = newEventSource("/events");
es.addEventListener("price", (e) => {
const { symbol, px } = JSON.parse(e.data);
console.log(symbol, px);
});
es.onerror = () => console.log("dropped, EventSource will retry automatically");
Code: a WebSocket server with a heartbeat
WebSockets give you two-way traffic, but the protocol will *not* tell you when a client silently disappears (laptop lid closed, phone tunnels into the underground). The fix is a ping/pong heartbeat: ping every client on a timer, and reap any that did not pong back. Without it you slowly accumulate dead sockets that eat memory and 'deliver' messages into the void.
typescript
// ws-server.ts, using the 'ws' libraryimport { WebSocketServer } from"ws";
const wss = newWebSocketServer({ port: 3002 });
wss.on("connection", (ws) => {
// Mark alive; each pong flips it back to true.
(ws as any).isAlive = true;
ws.on("pong", () => ((ws as any).isAlive = true));
ws.on("message", (raw) => {
// Backpressure check: if the client can't keep up, don't pile on.if (ws.bufferedAmount > 1_000_000) return; // 1 MB queued, drop or slow down
ws.send(`echo: ${raw}`);
});
});
// The heartbeat sweep: ping everyone, kill the silent ones.const sweep = setInterval(() => {
for (const ws of wss.clients) {
if (!(ws as any).isAlive) {
ws.terminate(); // never ponged, it's gonecontinue;
}
(ws as any).isAlive = false;
ws.ping();
}
}, 30000);
wss.on("close", () => clearInterval(sweep));
javascript
// browser, WebSockets do NOT auto-reconnect. You build it, with backoff.functionconnect(attempt = 0) {
const ws = newWebSocket("wss://api.example.com/ws");
ws.onopen = () => {
attempt = 0; // reset once we're healthy
};
ws.onmessage = (e) => console.log("recv", e.data);
ws.onclose = () => {
// Exponential backoff + jitter so a server restart doesn't get a thundering herd.const base = Math.min(1000 * 2 ** attempt, 30000);
const delay = base / 2 + Math.random() * (base / 2);
setTimeout(() => connect(attempt + 1), delay);
};
}
connect();
Scaling stateful connections and the fan-out layer
Connections are state, and state is the hard part
A stateless REST box can be replaced or scaled freely because no request 'lives' anywhere. A WebSocket box is the opposite, every open socket is in-memory state pinned to that one process. Restart it and you drop every connection. This single fact drives every decision below.
Sticky sessions keep each client glued to the server holding its socket. WebSockets need this because the connection cannot hop boxes mid-stream. See load balancing and autoscaling, explained.
A pub/sub bus (Redis, NATS, or a managed broker) decouples senders from receivers. Servers publish to channels; whichever server holds the recipient is subscribed and delivers. This is what lets you run more than one box.
Connection limits are real. A single node handles tens of thousands of idle sockets, but each costs file descriptors and memory. Plan capacity in *connections*, not requests/sec, and set OS ulimits accordingly.
Autoscaling is harder because scaling *in* kills live connections. Drain gracefully: stop accepting new sockets, let clients reconnect elsewhere via backoff, then terminate.
Lighten each connection by leaning on caching strategies so the live channel only carries deltas, not full payloads your cache already holds.
Auth on long-lived connections
Normal APIs re-check your token on every request. A WebSocket is checked once, at the handshake, then it can stay open for hours. That gap is the classic mistake: a user's token is revoked, their plan is downgraded, they are kicked from a room, and yet their socket happily keeps streaming because nobody ever re-checks.
Authenticate at the handshake, but with a short-lived token, pass it as a query param or first message, validate before accepting the upgrade.
Re-authorize on the actions that matter. Before delivering to a room, confirm the user is still a member; do not trust the membership snapshot from connect time.
Expire the connection when the token would have expired. Close the socket and force a reconnect with a fresh token rather than trusting it forever.
Revocation needs a push too. When you ban a user, publish a 'kick' event on the bus so whichever server holds their socket closes it immediately.
Where managed services fit
Everything above, sticky routing, the pub/sub bus, heartbeats, reconnection, presence, scaling, is genuinely hard to run well. Managed realtime services exist precisely so you do not have to. They terminate the connections, fan out the messages, and hand you an SDK; you publish and subscribe.
Option
What it gives you
Trade-off
Pusher / Ably
Channels, presence, history, SDKs
Per-message/connection pricing; vendor lock-in
API Gateway WebSockets
Serverless WS, no servers to run
Routes to Lambda; cold starts; AWS-shaped
Self-hosted (ws + Redis)
Full control, no per-msg cost
You own scaling, ops, and the 3am pages
Roughly where the common managed options sit.
Buy time, build leverage
Reach for a managed service when realtime is a feature, not your product, or when you need it shipped this quarter. Build it yourself when scale makes the per-message bill hurt, or when the realtime layer *is* the differentiator. Either way, understand the parts, the abstractions leak under load.
Common mistakes that cost hours
No heartbeat. Dead clients linger as zombie sockets, leaking memory and silently swallowing the messages you 'send' them. Ping/pong and reap.
No backpressure handling. A slow client's outbound buffer grows without bound until the server OOMs. Check bufferedAmount (or your library's equivalent) and shed or slow down.
Auth only at the handshake. A revoked user keeps streaming for hours. Re-authorize on sensitive actions and expire the connection with the token.
Assuming one server. It works beautifully on your laptop and breaks the instant you scale to two boxes because there is no shared bus. Design for fan-out from day one.
Using WebSockets for one-way data. If the client never sends, you took on duplex complexity for nothing. SSE would have reconnected itself and sailed through proxies.
Takeaways and where to go next
The whole article in seven lines
HTTP can't start the conversation, every realtime technique works around that.
Order of power and cost: polling → long polling → SSE → WebSockets.
One-way (feeds, notifications)? Use SSE. Two-way (chat, collab, games)? Use WebSockets.
Connections are in-memory state pinned to one box: that's what makes scaling hard.
Two boxes need sticky sessions + a pub/sub bus (Redis) to fan messages out.
Always add a heartbeat, backpressure handling, and client-side reconnect with backoff.
Re-check auth after the handshake; a token validated once can outlive its permissions.
Realtime sits on top of the same fundamentals as everything else you scale. Once your fan-out works, the next questions are about capacity and failure, which is where the broader scaling playbook comes in.
Practice the networking underneath it all in the networking lab.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.