You’re building a notification bell. The product team wants it to feel “real-time.” You reach for WebSockets because that’s what every blog post tells you. Six months later, you’re debugging a connection manager that handles reconnections, heartbeats, load balancer stickiness, and auth token refresh - all for a feature where a 5-second delay would have been perfectly fine.

The problem isn’t picking the wrong tool. It’s not understanding the tradeoffs before picking.

Short Polling

The simplest approach. Your client asks the server for updates on a fixed interval.

setInterval(async () => {
  const res = await fetch('/api/notifications');
  const data = await res.json();
  updateUI(data);
}, 5000); // every 5 seconds

The server processes each request independently. No state, no open connections, no special infrastructure.

What happens under the hood:

  1. Client sends HTTP request
  2. Server checks for new data
  3. Server responds immediately (even if nothing changed)
  4. Client waits, then repeats

When it works well:

  • Dashboards that refresh every 30-60 seconds
  • Checking the status of a background job
  • Weather or stock tickers with acceptable staleness
  • Low user counts where wasted requests don’t matter

The cost you pay:

Most responses return nothing new. If you have 10,000 users polling every 5 seconds, that’s 2,000 requests/second hitting your server - and 90%+ of them get back an empty response. You’re paying for compute, bandwidth, and database reads that produce no value.

Users Interval Requests/sec % Useful (estimate)
1,000 5s 200 ~5%
10,000 5s 2,000 ~5%
100,000 5s 20,000 ~5%
100,000 30s 3,333 ~15%

Increasing the interval reduces load but increases latency. That’s the fundamental tension with polling.

Long Polling

Long polling flips the model. Instead of the server responding immediately with “nothing new,” it holds the connection open until there’s actually something to send.

async function subscribe() {
  try {
    const res = await fetch('/api/notifications/subscribe');
    const data = await res.json();
    updateUI(data);
  } catch (err) {
    // network error - wait briefly, then retry
    await new Promise(r => setTimeout(r, 1000));
  }
  // immediately reconnect for next update
  subscribe();
}

subscribe();

On the server side (Node.js example):

app.get('/api/notifications/subscribe', async (req, res) => {
  const userId = req.user.id;

  // wait up to 30 seconds for new data
  const data = await waitForUpdate(userId, { timeout: 30000 });

  if (data) {
    res.json(data);
  } else {
    // timeout - respond empty so client reconnects
    res.status(204).end();
  }
});

What happens under the hood:

  1. Client sends HTTP request
  2. Server holds the connection open
  3. When new data exists (or timeout hits), server responds
  4. Client immediately sends a new request

Why this is better than short polling:

The server only responds when there’s something worth sending. No wasted empty responses. The client gets updates almost instantly - latency is limited only by how fast the server detects the change and flushes the response.

The catch:

Each held connection consumes a server thread or socket. With 10,000 users, you have 10,000 open HTTP connections sitting idle. Traditional thread-per-request servers (older Java, PHP) struggle here. Event-loop servers (Node.js, Go, Nginx) handle it better, but it’s still a resource you’re holding.

Load balancers also need attention. If a long-poll request is sitting open for 30 seconds, your LB timeout must be longer than that. And if a server restarts, every connected client reconnects simultaneously - a thundering herd problem.

When it works well:

  • Chat applications at moderate scale
  • Notification systems where near-instant delivery matters
  • Collaborative editing (Google Docs used this before switching to WebSockets)
  • When you can’t use WebSockets (corporate proxies, restrictive firewalls)

WebSockets

WebSockets establish a persistent, full-duplex connection between client and server. Both sides can send messages at any time without the overhead of HTTP headers on every exchange.

const ws = new WebSocket('wss://api.example.com/ws');

ws.onopen = () => {
  ws.send(JSON.stringify({ type: 'subscribe', channel: 'notifications' }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  updateUI(data);
};

ws.onclose = () => {
  // reconnect after delay
  setTimeout(() => connectWebSocket(), 2000);
};

Server side (Node.js with ws):

const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws, req) => {
  const userId = authenticate(req);

  ws.on('message', (msg) => {
    const parsed = JSON.parse(msg);
    if (parsed.type === 'subscribe') {
      addToChannel(userId, parsed.channel, ws);
    }
  });

  ws.on('close', () => {
    removeFromAllChannels(userId);
  });
});

// broadcast to all subscribers of a channel
function broadcast(channel, data) {
  const subscribers = getChannelSubscribers(channel);
  const payload = JSON.stringify(data);
  subscribers.forEach(ws => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(payload);
    }
  });
}

What happens under the hood:

  1. Client sends HTTP upgrade request
  2. Server accepts, connection upgrades from HTTP to WebSocket
  3. Both sides can send messages freely over TCP
  4. Connection stays open until either side closes it

Where WebSockets shine:

  • Multiplayer games (every millisecond matters)
  • Live trading platforms
  • Chat at scale (Slack, Discord)
  • Collaborative real-time editing
  • Live sports scores or auction bidding

The complexity you take on:

WebSockets are stateful. That changes everything.

Connection management. You need heartbeats (ping/pong) to detect dead connections. You need reconnection logic with exponential backoff. You need to handle auth token expiry on long-lived connections.

Scaling. If User A is connected to Server 1 and User B is connected to Server 2, how does A’s message reach B? You need a pub/sub layer - Redis, Kafka, or NATS - sitting behind your WebSocket servers to fan out messages.

Client A ↔ WS Server 1 ↔ Redis Pub/Sub ↔ WS Server 2 ↔ Client B

Load balancing. Sticky sessions or a separate connection routing layer. Standard round-robin HTTP load balancing doesn’t work because connections are persistent.

Monitoring. HTTP gives you request/response metrics for free. With WebSockets, you need custom instrumentation - connection counts, message rates, error rates, latency per message type.

The Comparison

Factor Short Polling Long Polling WebSockets
Latency High (interval-dependent) Low (~instant) Lowest (true real-time)
Server load High (wasted requests) Medium (held connections) Low (per-message)
Complexity Trivial Moderate High
Scaling difficulty Easy (stateless) Medium Hard (stateful)
Bidirectional No (client → server only) No (server → client, sort of) Yes
Firewall/proxy friendly Yes Mostly Sometimes no
Connection overhead New TCP + TLS per request New TCP + TLS per timeout One TCP + TLS, then frames
Browser support Universal Universal Universal (modern)

A Decision Framework

Start with three questions:

1. How fresh does the data need to be?

If 30-second staleness is acceptable, short polling is your answer. Ship it and move on. Don’t over-engineer.

2. Is communication one-way or two-way?

If only the server needs to push updates to the client (notifications, live scores, dashboards), long polling or Server-Sent Events (SSE) are simpler than WebSockets. You don’t need a bidirectional channel for a one-way data flow.

3. How many concurrent users and how frequent are updates?

Scenario Recommendation
< 1K users, updates every 30s+ Short polling
1K-50K users, near-instant updates needed Long polling or SSE
50K+ users, frequent bidirectional messages WebSockets
Multiplayer game, live collaboration WebSockets (no alternative)
Background job status check Short polling
Chat system at scale WebSockets
Notification bell on a dashboard Long polling

Real-World Examples

Uber driver location updates: WebSockets. Riders see driver movement in real-time, and both rider and driver apps send frequent location pings. Bidirectional, high frequency, latency-sensitive.

GitHub Actions build status: Short polling. The UI polls the API every few seconds to check if your build passed. Acceptable latency, simple implementation, stateless.

Slack messages: WebSockets. Real-time messaging across channels, typing indicators, presence status - all require persistent bidirectional connections with pub/sub fan-out.

Email inbox count (like Gmail’s unread badge): Long polling. You want near-instant updates when a new email arrives, but communication is one-way. Gmail actually uses a custom long-polling mechanism they call “BOSH.”

Stock price on a portfolio dashboard: SSE or long polling. One-way server push, moderate frequency. WebSockets would work but add unnecessary complexity.

The One Mistake Everyone Makes

Choosing WebSockets because it sounds impressive in a system design interview. The interviewer isn’t testing whether you know WebSockets exist. They’re testing whether you can evaluate tradeoffs.

Starting with short polling and upgrading when you hit its limits is almost always the right call in production. Twitter ran on polling for years. Facebook’s chat started with long polling. They moved to WebSockets when - and only when - scale demanded it.

The best architecture is the one that solves today’s problem without creating tomorrow’s operational nightmare.

Bottom Line

Short polling is dumb but reliable. Long polling is clever and efficient for one-way push. WebSockets are powerful but operationally expensive. Match the tool to the actual latency and scale requirements - not to what feels most “real-time.” Most features that feel like they need WebSockets work perfectly fine with long polling or even a 5-second poll.