Design a Live Comments System (YouTube Live/Twitch Chat)

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
- REST Endpoints
- WebSocket Protocol
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Users can send text messages to a live stream’s chat in real-time (< 200ms delivery to other viewers)
All viewers of a stream see messages in a consistent chronological order
Support rate limiting per user (e.g., 1 message every 2 seconds) and configurable slow mode (streamer sets interval)
Moderation tools: delete messages, ban users, assign moderators, auto-filter spam/profanity
Persist chat history so users joining late can load recent messages (last 200 messages on join)

Non-Functional Requirements

Availability: 99.95% — chat can briefly degrade (delay messages) but should not go fully offline during a stream
Latency: < 200ms from send to display for 99th percentile viewers. For popular streams (100K+ viewers), < 500ms is acceptable.
Consistency: Causal ordering per stream is required. All viewers must see messages in the same order. Messages from the same user must appear in order.
Scale: 100K concurrent streams. Top streams have 500K concurrent viewers. 50K messages/sec globally across all streams. Top streams receive 5,000 messages/sec.
Durability: Chat history persisted for at least 30 days for moderation review and replay.

2. Estimation (3 min)

Traffic

100K concurrent live streams, 50M total concurrent viewers
50,000 messages/sec globally (write)
Fan-out: each message is delivered to all viewers of that stream
Average stream: 500 viewers. Top stream: 500K viewers.
Fan-out volume: 50,000 messages/sec × average 500 recipients = 25M message deliveries/sec

Storage

Average message: 200 bytes (user_id, stream_id, text, timestamp, metadata)
50,000 messages/sec × 200 bytes = 10 MB/sec = 864 GB/day
30-day retention = ~26 TB (compressible to ~5 TB with gzip)

Connection State

50M concurrent WebSocket connections
Each connection: ~10 KB memory overhead (buffers, metadata)
Total: 50M × 10 KB = 500 GB of connection memory
At 500K connections per server = 100 WebSocket servers

Bandwidth

Average message delivered: 200 bytes × 25M deliveries/sec = 5 GB/sec outbound bandwidth
Top stream (500K viewers × 5,000 msg/sec): 200 bytes × 500K × 5,000 would be 500 GB/sec — impossible without batching
Solution: Batch messages. Send 50 messages every 200ms = 10 KB per batch per viewer. 500K × 10 KB × 5/sec = 25 GB/sec. Still high — need edge-level fan-out.

3. API Design (3 min)

REST Endpoints

// Send a chat message
POST /v1/streams/{stream_id}/messages
  Headers: Authorization: Bearer <token>
  Body: {
    "text": "Great play!",
    "reply_to": "msg_abc"              // optional, for threaded replies
  }
  Response 201: {
    "message_id": "msg_xyz",
    "timestamp": 1708632000123
  }

// Load recent messages (on join)
GET /v1/streams/{stream_id}/messages?limit=200&before=<cursor>
  Response 200: {
    "messages": [...],
    "cursor": "msg_abc"
  }

// Delete a message (moderator)
DELETE /v1/streams/{stream_id}/messages/{message_id}
  Headers: Authorization: Bearer <moderator_token>

// Ban a user from chat
POST /v1/streams/{stream_id}/bans
  Body: { "user_id": "user_123", "duration": 600 }  // 10 min timeout

// Set slow mode
PUT /v1/streams/{stream_id}/settings
  Body: { "slow_mode_seconds": 5 }

WebSocket Protocol

// Client connects
WS /v1/streams/{stream_id}/chat?token=<auth_token>

// Server → Client: new messages (batched)
{
  "type": "messages",
  "data": [
    { "id": "msg_1", "user": "alice", "text": "Hello!", "ts": 1708632000123 },
    { "id": "msg_2", "user": "bob", "text": "GG!", "ts": 1708632000456 }
  ]
}

// Server → Client: message deleted
{ "type": "delete", "message_id": "msg_1" }

// Server → Client: user banned
{ "type": "ban", "user_id": "user_123", "duration": 600 }

// Client → Server: heartbeat (every 30s)
{ "type": "ping" }

4. Data Model (3 min)

Messages (Cassandra — partitioned by stream_id, clustered by timestamp)

Table: messages
  stream_id      (PK)  | varchar
  message_id     (CK)  | varchar       -- time-based UUID (sortable)
  user_id              | varchar
  text                 | varchar(500)
  reply_to             | varchar        -- nullable
  is_deleted           | boolean
  created_at           | timestamp

Stream Settings (Redis Hash)

Key: stream:{stream_id}:settings
Fields:
  slow_mode_seconds   | int (0 = disabled)
  subscribers_only    | boolean
  emote_only          | boolean

Bans (Redis Set with TTL + PostgreSQL for permanent bans)

Key: stream:{stream_id}:bans
Type: Sorted Set
Member: user_id
Score: ban_expiry_timestamp

Rate Limit State (Redis)

Key: ratelimit:{stream_id}:{user_id}
Value: last_message_timestamp
TTL: slow_mode_seconds

Why Cassandra for Messages?

Write-optimized (append-only, log-structured)
Partitioned by stream_id — all messages for a stream are co-located
Clustering by message_id (time-based UUID) gives natural chronological ordering
Handles 50K writes/sec easily with horizontal scaling
TTL support for automatic 30-day cleanup

Why Redis for Real-Time State?

Sub-millisecond lookups for rate limiting, ban checks, and settings
Pub/Sub for cross-server message fan-out
Sorted sets for efficient ban expiry checks

5. High-Level Design (12 min)

Message Send Flow

Client (sender)
  → WebSocket Connection Server (WS Server)
    → 1. Rate limit check (Redis: last message timestamp)
       If too soon → reject with "slow down" error
    → 2. Ban check (Redis: is user_id in ban set?)
       If banned → reject
    → 3. Spam/profanity filter (in-process rules + ML service)
       If spam → silently drop or shadow-ban
    → 4. Assign message_id (time-based UUID) + server timestamp
    → 5. Publish to Message Bus (Kafka topic: chat.{stream_id})
    → 6. Return ACK to sender

Message Bus (Kafka)
  → Chat Dispatcher Service:
    → 1. Persist message to Cassandra
    → 2. Determine which WS Servers hold viewers of this stream
       (lookup: Redis set stream:{stream_id}:servers)
    → 3. Publish to each WS Server via Redis Pub/Sub channel
       Channel: ws_server:{server_id}:messages

WS Server (receivers)
  → 1. Receive message from Redis Pub/Sub
  → 2. Buffer message (batch: up to 50 messages or 200ms, whichever first)
  → 3. Send batched messages to all local WebSocket connections for that stream

Components

WebSocket Connection Servers: Maintain persistent connections with clients. Stateful — each server knows which streams its connected clients are watching. Horizontally scaled (100+ servers). Sticky routing by stream_id hash to concentrate viewers.
Message Bus (Kafka): Durable message queue. One topic per popular stream, shared topics for smaller streams. Provides ordering guarantee within a partition (one stream = one partition).
Chat Dispatcher Service: Consumes from Kafka, persists to Cassandra, fans out to WS Servers. Stateless workers, scaled by partition count.
Redis Cluster: Rate limiting, ban lists, stream settings, pub/sub for WS server fan-out, stream→server mapping.
Cassandra Cluster: Persistent chat history. Read path for “load recent messages” on viewer join.
Spam/Moderation Service: Real-time text classification. Regex rules (profanity word list) + ML model (spam detection). Called synchronously on message send (< 10ms budget).
Stream Registry: Tracks which WS servers have viewers for each stream. Updated when connections are established/dropped.

Connection Management

Viewer joins stream:
  1. Client establishes WebSocket to assigned WS Server (via load balancer)
  2. WS Server registers: SADD stream:{stream_id}:servers {server_id}
  3. WS Server loads last 200 messages from Cassandra for initial hydration
  4. Client starts receiving live messages via WebSocket

Viewer leaves / disconnects:
  1. WS Server detects close/timeout
  2. Decrement local viewer count for stream
  3. If no more local viewers for stream:
     SREM stream:{stream_id}:servers {server_id}
     Unsubscribe from Redis Pub/Sub channel

6. Deep Dives (15 min)

Deep Dive 1: Fan-Out at Scale — Handling 500K Viewers

The hardest problem: a top stream has 500K concurrent viewers receiving 5,000 messages/sec. Naive fan-out (send each message individually to 500K connections) is 2.5 billion message deliveries per second. Impossible.

Solution: Hierarchical Fan-Out with Batching

Layer 1: Message Bus (Kafka)
  → Single message enters for stream_id = "top_stream"

Layer 2: Chat Dispatcher
  → Identifies 50 WS Servers holding viewers for this stream (10K viewers each)
  → Publishes message to 50 Redis Pub/Sub channels (one per server)

Layer 3: WS Server (per-server batching)
  → Receives message from Redis Pub/Sub
  → Buffers for up to 200ms or 50 messages
  → Serializes batch ONCE (single JSON encode)
  → Sends identical bytes to all 10,000 local WebSocket connections
     (kernel-level optimization: same buffer, different file descriptors)

Bandwidth optimization at Layer 3:

50 messages × 200 bytes = 10 KB per batch
10K connections × 10 KB = 100 MB per batch per server
5 batches/sec = 500 MB/sec per server → achievable with 10 Gbps NIC

Further optimization for mega-streams (1M+ viewers):

Edge fan-out: Use CDN edge servers as WebSocket terminators
Each edge PoP maintains one upstream connection to origin
Origin sends to 200 edge PoPs, each edge PoP fans out to 5K local viewers
Total origin bandwidth: 200 × 10 KB × 5/sec = 10 MB/sec (trivial)

Message compression:

Enable WebSocket per-message deflate (RFC 7692)
Chat messages are highly compressible text → 60-70% reduction
Use shared compression dictionary per stream (message patterns repeat)

Deep Dive 2: Message Ordering Guarantees

Users expect to see messages in the same order. If Alice says “knock knock” and Bob says “who’s there?”, everyone should see them in that order.

Ordering strategy:

1. Kafka partition per stream guarantees total order for all messages
   in a single stream.

2. Message ordering pipeline:
   Sender → WS Server → Kafka (single partition per stream) → Dispatcher → Fan-out

   Because Kafka is single-partition-per-stream, messages are processed
   and dispatched in the exact order they were published.

3. Server-assigned timestamps:
   - Do NOT trust client timestamps (clients have clock skew)
   - The first WS Server that receives the message assigns a server timestamp
   - Kafka offset provides the definitive ordering

4. Batch delivery preserves order:
   - Batches are assembled in Kafka consumption order
   - Client renders messages in the order received within each batch

Trade-off: ordering vs. throughput:

Single Kafka partition per stream = total ordering, but max throughput is bounded by a single broker (~50K msg/sec per partition, which is enough for even the busiest streams)
If a stream exceeds this, we can shard messages by message_id hash across multiple partitions, accept per-partition ordering (not total ordering), and use client-side timestamp sorting. Acceptable because at 50K msg/sec, humans cannot perceive individual message order anyway — it’s a wall of text scrolling by.

Handling out-of-order delivery at the edge:

Each message carries a monotonically increasing sequence number per stream
Client maintains last_seen_seq and buffers out-of-order messages for up to 500ms
If a gap is not filled within 500ms, skip it (message was likely deleted by moderation)

Deep Dive 3: Rate Limiting, Spam, and Moderation

Per-user rate limiting:

On message send:
  1. Check Redis: GET ratelimit:{stream_id}:{user_id}
  2. If key exists and (now - stored_timestamp) < slow_mode_seconds:
     Reject with: { "error": "slow_mode", "retry_after": remaining_seconds }
  3. If allowed:
     SET ratelimit:{stream_id}:{user_id} {now} EX {slow_mode_seconds}

Slow mode is per-stream, configured by the streamer. All users in the stream must wait N seconds between messages. Moderators and subscribers can be exempt.

Spam detection pipeline (< 10ms budget):

Layer 1: Regex rules (< 1ms)
  - Profanity word list (with l33t speak variants: "sh1t", "a$$")
  - URL detection (block links from non-subscribers)
  - Repeated character detection ("AAAAAAAAA")
  - Duplicate message detection (hash last 10 messages per user)

Layer 2: ML classifier (< 5ms)
  - Lightweight model (logistic regression or small neural net)
  - Features: message text embeddings, user account age, message frequency,
    character entropy, caps ratio
  - Output: spam probability
  - Threshold: > 0.9 = auto-delete, 0.7-0.9 = shadow-ban (only sender sees it)

Layer 3: Community-based (async)
  - If N users report a message → auto-delete + flag for review
  - Moderators see flagged messages in a queue

Moderation actions:

Delete message:
  1. Moderator sends DELETE /messages/{id}
  2. Set is_deleted = true in Cassandra
  3. Publish deletion event to all WS Servers for that stream
  4. Clients remove message from UI

Ban user:
  1. ZADD stream:{stream_id}:bans {expiry_timestamp} {user_id}
  2. Publish ban event to all WS Servers
  3. WS Servers forcibly close WebSocket for banned user
  4. Any new connection attempt checks ban set on handshake

AutoMod:
  - Streamer configures blocked terms (custom word list)
  - Stored in Redis: stream:{stream_id}:blocklist
  - Checked at Layer 1 of spam detection
  - Can set to "hold for review" instead of auto-delete

7. Extensions (2 min)

Emotes and rich media: Support custom emotes (per-channel subscriber emotes like Twitch). Store emote metadata in a CDN-backed catalog. Client renders emote codes as images. Add sticker/GIF support with content moderation.
Chat replay for VODs: After a live stream ends, the chat history is timestamped relative to the video. VOD player syncs chat replay with video playback position. Requires message timestamps aligned to stream start time.
Subscriber-only and follower-only mode: Gated chat modes where only subscribers or followers of N days can send messages. Reduces spam in popular streams. Checked at rate limit layer via user metadata lookup.
Polls and predictions: Streamer creates a poll, viewers vote via chat commands or UI buttons. Aggregate votes in Redis (HINCRBY). Display results in real-time overlay. Predictions add a points/currency betting mechanic.
Multi-language chat rooms: Split chat by detected language. Viewers opt into language-specific rooms. Optional real-time translation (async ML pipeline, slight delay). Each language room is a separate Kafka partition.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Connection State#

Bandwidth#

3. API Design (3 min)#

REST Endpoints#

WebSocket Protocol#

4. Data Model (3 min)#

Messages (Cassandra — partitioned by stream_id, clustered by timestamp)#

Stream Settings (Redis Hash)#

Bans (Redis Set with TTL + PostgreSQL for permanent bans)#

Rate Limit State (Redis)#

Why Cassandra for Messages?#

Why Redis for Real-Time State?#

5. High-Level Design (12 min)#

Message Send Flow#

Components#

Connection Management#

6. Deep Dives (15 min)#

Deep Dive 1: Fan-Out at Scale — Handling 500K Viewers#

Deep Dive 2: Message Ordering Guarantees#

Deep Dive 3: Rate Limiting, Spam, and Moderation#

7. Extensions (2 min)#