1. Requirements & Scope (5 min)
Functional Requirements
- Users can send text messages to a live stream’s chat in real-time (< 200ms delivery to other viewers)
- All viewers of a stream see messages in a consistent chronological order
- Support rate limiting per user (e.g., 1 message every 2 seconds) and configurable slow mode (streamer sets interval)
- Moderation tools: delete messages, ban users, assign moderators, auto-filter spam/profanity
- Persist chat history so users joining late can load recent messages (last 200 messages on join)
Non-Functional Requirements
- Availability: 99.95% — chat can briefly degrade (delay messages) but should not go fully offline during a stream
- Latency: < 200ms from send to display for 99th percentile viewers. For popular streams (100K+ viewers), < 500ms is acceptable.
- Consistency: Causal ordering per stream is required. All viewers must see messages in the same order. Messages from the same user must appear in order.
- Scale: 100K concurrent streams. Top streams have 500K concurrent viewers. 50K messages/sec globally across all streams. Top streams receive 5,000 messages/sec.
- Durability: Chat history persisted for at least 30 days for moderation review and replay.
2. Estimation (3 min)
Traffic
- 100K concurrent live streams, 50M total concurrent viewers
- 50,000 messages/sec globally (write)
- Fan-out: each message is delivered to all viewers of that stream
- Average stream: 500 viewers. Top stream: 500K viewers.
- Fan-out volume: 50,000 messages/sec × average 500 recipients = 25M message deliveries/sec
Storage
- Average message: 200 bytes (user_id, stream_id, text, timestamp, metadata)
- 50,000 messages/sec × 200 bytes = 10 MB/sec = 864 GB/day
- 30-day retention = ~26 TB (compressible to ~5 TB with gzip)
Connection State
- 50M concurrent WebSocket connections
- Each connection: ~10 KB memory overhead (buffers, metadata)
- Total: 50M × 10 KB = 500 GB of connection memory
- At 500K connections per server = 100 WebSocket servers
Bandwidth
- Average message delivered: 200 bytes × 25M deliveries/sec = 5 GB/sec outbound bandwidth
- Top stream (500K viewers × 5,000 msg/sec): 200 bytes × 500K × 5,000 would be 500 GB/sec — impossible without batching
- Solution: Batch messages. Send 50 messages every 200ms = 10 KB per batch per viewer. 500K × 10 KB × 5/sec = 25 GB/sec. Still high — need edge-level fan-out.
3. API Design (3 min)
REST Endpoints
// Send a chat message
POST /v1/streams/{stream_id}/messages
Headers: Authorization: Bearer <token>
Body: {
"text": "Great play!",
"reply_to": "msg_abc" // optional, for threaded replies
}
Response 201: {
"message_id": "msg_xyz",
"timestamp": 1708632000123
}
// Load recent messages (on join)
GET /v1/streams/{stream_id}/messages?limit=200&before=<cursor>
Response 200: {
"messages": [...],
"cursor": "msg_abc"
}
// Delete a message (moderator)
DELETE /v1/streams/{stream_id}/messages/{message_id}
Headers: Authorization: Bearer <moderator_token>
// Ban a user from chat
POST /v1/streams/{stream_id}/bans
Body: { "user_id": "user_123", "duration": 600 } // 10 min timeout
// Set slow mode
PUT /v1/streams/{stream_id}/settings
Body: { "slow_mode_seconds": 5 }
WebSocket Protocol
// Client connects
WS /v1/streams/{stream_id}/chat?token=<auth_token>
// Server → Client: new messages (batched)
{
"type": "messages",
"data": [
{ "id": "msg_1", "user": "alice", "text": "Hello!", "ts": 1708632000123 },
{ "id": "msg_2", "user": "bob", "text": "GG!", "ts": 1708632000456 }
]
}
// Server → Client: message deleted
{ "type": "delete", "message_id": "msg_1" }
// Server → Client: user banned
{ "type": "ban", "user_id": "user_123", "duration": 600 }
// Client → Server: heartbeat (every 30s)
{ "type": "ping" }
4. Data Model (3 min)
Messages (Cassandra — partitioned by stream_id, clustered by timestamp)
Table: messages
stream_id (PK) | varchar
message_id (CK) | varchar -- time-based UUID (sortable)
user_id | varchar
text | varchar(500)
reply_to | varchar -- nullable
is_deleted | boolean
created_at | timestamp
Stream Settings (Redis Hash)
Key: stream:{stream_id}:settings
Fields:
slow_mode_seconds | int (0 = disabled)
subscribers_only | boolean
emote_only | boolean
Bans (Redis Set with TTL + PostgreSQL for permanent bans)
Key: stream:{stream_id}:bans
Type: Sorted Set
Member: user_id
Score: ban_expiry_timestamp
Rate Limit State (Redis)
Key: ratelimit:{stream_id}:{user_id}
Value: last_message_timestamp
TTL: slow_mode_seconds
Why Cassandra for Messages?
- Write-optimized (append-only, log-structured)
- Partitioned by stream_id — all messages for a stream are co-located
- Clustering by message_id (time-based UUID) gives natural chronological ordering
- Handles 50K writes/sec easily with horizontal scaling
- TTL support for automatic 30-day cleanup
Why Redis for Real-Time State?
- Sub-millisecond lookups for rate limiting, ban checks, and settings
- Pub/Sub for cross-server message fan-out
- Sorted sets for efficient ban expiry checks
5. High-Level Design (12 min)
Message Send Flow
Client (sender)
→ WebSocket Connection Server (WS Server)
→ 1. Rate limit check (Redis: last message timestamp)
If too soon → reject with "slow down" error
→ 2. Ban check (Redis: is user_id in ban set?)
If banned → reject
→ 3. Spam/profanity filter (in-process rules + ML service)
If spam → silently drop or shadow-ban
→ 4. Assign message_id (time-based UUID) + server timestamp
→ 5. Publish to Message Bus (Kafka topic: chat.{stream_id})
→ 6. Return ACK to sender
Message Bus (Kafka)
→ Chat Dispatcher Service:
→ 1. Persist message to Cassandra
→ 2. Determine which WS Servers hold viewers of this stream
(lookup: Redis set stream:{stream_id}:servers)
→ 3. Publish to each WS Server via Redis Pub/Sub channel
Channel: ws_server:{server_id}:messages
WS Server (receivers)
→ 1. Receive message from Redis Pub/Sub
→ 2. Buffer message (batch: up to 50 messages or 200ms, whichever first)
→ 3. Send batched messages to all local WebSocket connections for that stream
Components
- WebSocket Connection Servers: Maintain persistent connections with clients. Stateful — each server knows which streams its connected clients are watching. Horizontally scaled (100+ servers). Sticky routing by stream_id hash to concentrate viewers.
- Message Bus (Kafka): Durable message queue. One topic per popular stream, shared topics for smaller streams. Provides ordering guarantee within a partition (one stream = one partition).
- Chat Dispatcher Service: Consumes from Kafka, persists to Cassandra, fans out to WS Servers. Stateless workers, scaled by partition count.
- Redis Cluster: Rate limiting, ban lists, stream settings, pub/sub for WS server fan-out, stream→server mapping.
- Cassandra Cluster: Persistent chat history. Read path for “load recent messages” on viewer join.
- Spam/Moderation Service: Real-time text classification. Regex rules (profanity word list) + ML model (spam detection). Called synchronously on message send (< 10ms budget).
- Stream Registry: Tracks which WS servers have viewers for each stream. Updated when connections are established/dropped.
Connection Management
Viewer joins stream:
1. Client establishes WebSocket to assigned WS Server (via load balancer)
2. WS Server registers: SADD stream:{stream_id}:servers {server_id}
3. WS Server loads last 200 messages from Cassandra for initial hydration
4. Client starts receiving live messages via WebSocket
Viewer leaves / disconnects:
1. WS Server detects close/timeout
2. Decrement local viewer count for stream
3. If no more local viewers for stream:
SREM stream:{stream_id}:servers {server_id}
Unsubscribe from Redis Pub/Sub channel
6. Deep Dives (15 min)
Deep Dive 1: Fan-Out at Scale — Handling 500K Viewers
The hardest problem: a top stream has 500K concurrent viewers receiving 5,000 messages/sec. Naive fan-out (send each message individually to 500K connections) is 2.5 billion message deliveries per second. Impossible.
Solution: Hierarchical Fan-Out with Batching
Layer 1: Message Bus (Kafka)
→ Single message enters for stream_id = "top_stream"
Layer 2: Chat Dispatcher
→ Identifies 50 WS Servers holding viewers for this stream (10K viewers each)
→ Publishes message to 50 Redis Pub/Sub channels (one per server)
Layer 3: WS Server (per-server batching)
→ Receives message from Redis Pub/Sub
→ Buffers for up to 200ms or 50 messages
→ Serializes batch ONCE (single JSON encode)
→ Sends identical bytes to all 10,000 local WebSocket connections
(kernel-level optimization: same buffer, different file descriptors)
Bandwidth optimization at Layer 3:
- 50 messages × 200 bytes = 10 KB per batch
- 10K connections × 10 KB = 100 MB per batch per server
- 5 batches/sec = 500 MB/sec per server → achievable with 10 Gbps NIC
Further optimization for mega-streams (1M+ viewers):
- Edge fan-out: Use CDN edge servers as WebSocket terminators
- Each edge PoP maintains one upstream connection to origin
- Origin sends to 200 edge PoPs, each edge PoP fans out to 5K local viewers
- Total origin bandwidth: 200 × 10 KB × 5/sec = 10 MB/sec (trivial)
Message compression:
- Enable WebSocket per-message deflate (RFC 7692)
- Chat messages are highly compressible text → 60-70% reduction
- Use shared compression dictionary per stream (message patterns repeat)
Deep Dive 2: Message Ordering Guarantees
Users expect to see messages in the same order. If Alice says “knock knock” and Bob says “who’s there?”, everyone should see them in that order.
Ordering strategy:
1. Kafka partition per stream guarantees total order for all messages
in a single stream.
2. Message ordering pipeline:
Sender → WS Server → Kafka (single partition per stream) → Dispatcher → Fan-out
Because Kafka is single-partition-per-stream, messages are processed
and dispatched in the exact order they were published.
3. Server-assigned timestamps:
- Do NOT trust client timestamps (clients have clock skew)
- The first WS Server that receives the message assigns a server timestamp
- Kafka offset provides the definitive ordering
4. Batch delivery preserves order:
- Batches are assembled in Kafka consumption order
- Client renders messages in the order received within each batch
Trade-off: ordering vs. throughput:
- Single Kafka partition per stream = total ordering, but max throughput is bounded by a single broker (~50K msg/sec per partition, which is enough for even the busiest streams)
- If a stream exceeds this, we can shard messages by message_id hash across multiple partitions, accept per-partition ordering (not total ordering), and use client-side timestamp sorting. Acceptable because at 50K msg/sec, humans cannot perceive individual message order anyway — it’s a wall of text scrolling by.
Handling out-of-order delivery at the edge:
- Each message carries a monotonically increasing sequence number per stream
- Client maintains
last_seen_seqand buffers out-of-order messages for up to 500ms - If a gap is not filled within 500ms, skip it (message was likely deleted by moderation)
Deep Dive 3: Rate Limiting, Spam, and Moderation
Per-user rate limiting:
On message send:
1. Check Redis: GET ratelimit:{stream_id}:{user_id}
2. If key exists and (now - stored_timestamp) < slow_mode_seconds:
Reject with: { "error": "slow_mode", "retry_after": remaining_seconds }
3. If allowed:
SET ratelimit:{stream_id}:{user_id} {now} EX {slow_mode_seconds}
Slow mode is per-stream, configured by the streamer. All users in the stream must wait N seconds between messages. Moderators and subscribers can be exempt.
Spam detection pipeline (< 10ms budget):
Layer 1: Regex rules (< 1ms)
- Profanity word list (with l33t speak variants: "sh1t", "a$$")
- URL detection (block links from non-subscribers)
- Repeated character detection ("AAAAAAAAA")
- Duplicate message detection (hash last 10 messages per user)
Layer 2: ML classifier (< 5ms)
- Lightweight model (logistic regression or small neural net)
- Features: message text embeddings, user account age, message frequency,
character entropy, caps ratio
- Output: spam probability
- Threshold: > 0.9 = auto-delete, 0.7-0.9 = shadow-ban (only sender sees it)
Layer 3: Community-based (async)
- If N users report a message → auto-delete + flag for review
- Moderators see flagged messages in a queue
Moderation actions:
Delete message:
1. Moderator sends DELETE /messages/{id}
2. Set is_deleted = true in Cassandra
3. Publish deletion event to all WS Servers for that stream
4. Clients remove message from UI
Ban user:
1. ZADD stream:{stream_id}:bans {expiry_timestamp} {user_id}
2. Publish ban event to all WS Servers
3. WS Servers forcibly close WebSocket for banned user
4. Any new connection attempt checks ban set on handshake
AutoMod:
- Streamer configures blocked terms (custom word list)
- Stored in Redis: stream:{stream_id}:blocklist
- Checked at Layer 1 of spam detection
- Can set to "hold for review" instead of auto-delete
7. Extensions (2 min)
- Emotes and rich media: Support custom emotes (per-channel subscriber emotes like Twitch). Store emote metadata in a CDN-backed catalog. Client renders emote codes as images. Add sticker/GIF support with content moderation.
- Chat replay for VODs: After a live stream ends, the chat history is timestamped relative to the video. VOD player syncs chat replay with video playback position. Requires message timestamps aligned to stream start time.
- Subscriber-only and follower-only mode: Gated chat modes where only subscribers or followers of N days can send messages. Reduces spam in popular streams. Checked at rate limit layer via user metadata lookup.
- Polls and predictions: Streamer creates a poll, viewers vote via chat commands or UI buttons. Aggregate votes in Redis (HINCRBY). Display results in real-time overlay. Predictions add a points/currency betting mechanic.
- Multi-language chat rooms: Split chat by detected language. Viewers opt into language-specific rooms. Optional real-time translation (async ML pipeline, slight delay). Each language room is a separate Kafka partition.