Design Facebook Messenger / WhatsApp

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
- REST (for non-realtime operations)
- WebSocket (for real-time)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

1-on-1 messaging between users (text messages)
Group messaging (up to 256 members)
Message delivery status: sent, delivered, read
Online/offline presence indicators
Message history — persistent, accessible from any device

Non-Functional Requirements

Availability: 99.99% — messaging is real-time communication; downtime is immediately felt
Latency: Message delivery < 200ms end-to-end for online recipients
Consistency: Messages must be ordered correctly within a conversation. No message loss. Exactly-once delivery semantics (no duplicate messages).
Scale: 2B registered users, 500M DAU, 100B messages/day
Durability: Messages are persistent — stored forever (or until user deletes)

2. Estimation (3 min)

Traffic

100B messages/day ÷ 100K sec/day = 1M messages/sec
Peak: 3x → 3M messages/sec
This is extremely high write throughput

Storage

Average message: 100 bytes (text + metadata)
100B messages/day × 100 bytes = 10TB/day
Per year: ~3.6PB
With media (images, voice notes): 10x → ~36PB/year

Connections

500M DAU with persistent WebSocket connections
Average user online 4 hours/day → at any time ~83M concurrent connections
Peak: ~150M concurrent WebSocket connections

3. API Design (3 min)

REST (for non-realtime operations)

POST /api/v1/messages/send
  Body: {
    "conversation_id": "conv_abc",
    "content": "Hello!",
    "type": "text",
    "client_message_id": "cm_uuid_123"  // idempotency key
  }
  Response 201: {
    "message_id": "m_server_456",
    "timestamp": "2026-02-22T18:00:00.123Z"
  }

GET /api/v1/conversations
  Response 200: [
    {
      "conversation_id": "conv_abc",
      "type": "1on1",
      "participants": ["u_1", "u_2"],
      "last_message": { "content": "Hello!", "timestamp": "..." },
      "unread_count": 3
    }
  ]

GET /api/v1/conversations/{conv_id}/messages?cursor={cursor}&limit=50

WebSocket (for real-time)

// Client → Server
{ "type": "message", "conversation_id": "conv_abc", "content": "Hello!", "client_id": "cm_uuid_123" }
{ "type": "typing", "conversation_id": "conv_abc" }
{ "type": "ack", "message_id": "m_456" }        // delivery receipt
{ "type": "read", "conversation_id": "conv_abc", "up_to": "m_456" }  // read receipt

// Server → Client
{ "type": "message", "message_id": "m_456", "from": "u_2", "conversation_id": "conv_abc", "content": "Hi!", "timestamp": "..." }
{ "type": "delivered", "message_id": "m_123" }   // your message was delivered
{ "type": "read", "conversation_id": "conv_abc", "by": "u_2", "up_to": "m_456" }
{ "type": "typing", "conversation_id": "conv_abc", "user": "u_2" }
{ "type": "presence", "user": "u_2", "status": "online" }

Key decisions:

client_message_id for idempotency — if client retries due to network issues, server deduplicates
Read receipts as “up_to” marker — not per-message, reduces traffic: “I’ve read everything up to message X”
WebSocket for all real-time events: messages, typing, presence, delivery/read receipts

4. Data Model (3 min)

Conversations & Participants (PostgreSQL, sharded by user_id)

Table: conversations
  conversation_id  (PK) | bigint
  type                   | enum('1on1', 'group')
  name                   | varchar(100)  -- for groups
  created_at             | timestamptz

Table: participants
  conversation_id        | bigint
  user_id                | bigint
  last_read_message_id   | bigint
  muted_until            | timestamptz, nullable
  PK: (conversation_id, user_id)
  Index: (user_id, conversation_id)  -- "my conversations" query

Messages (Cassandra, partitioned by conversation_id)

Table: messages
  conversation_id  (partition key) | bigint
  message_id       (clustering key, DESC) | bigint (Snowflake ID = time-ordered)
  sender_id                        | bigint
  content                          | text
  type                             | text ('text', 'image', 'voice')
  created_at                       | timestamp

Why Cassandra for messages?

1M writes/sec requires horizontally scalable write throughput
Access pattern is always: “get messages for conversation X, ordered by time” — perfect for Cassandra’s partition + clustering key model
No cross-conversation queries needed (no joins)

Connection Registry (Redis)

Key: conn:{user_id}
Value: { "server_id": "ws-server-42", "connected_at": "..." }
TTL: auto-expiry on disconnect

5. High-Level Design (12 min)

Message Send Path

Sender's device → WebSocket → Chat Server (sender's connection)
  → Chat Service:
    1. Validate message, dedup by client_message_id
    2. Write to Cassandra (messages table)
    3. Update conversation metadata (last_message) in PostgreSQL
    4. Publish to Kafka topic: messages.{conversation_id}
  → Kafka → Routing Service:
    → For each recipient in the conversation:
      → Check Redis: is recipient online?
      → Online: find their Chat Server → push via WebSocket
      → Offline: push to Push Notification Service (APNs/FCM)
      → Store in undelivered queue (Redis list per user)
  → Sender receives "sent" ack
  → Recipient's Chat Server receives message → pushes to client
  → Recipient sends "delivered" ack → routed back to sender

Message Read Path (History)

Client opens conversation
  → REST API: GET /conversations/{id}/messages?cursor=...
  → Chat Service → Cassandra: query messages partition
  → Return paginated results
  → Client also receives real-time messages via WebSocket

Components

Chat Servers: Maintain WebSocket connections, handle real-time message routing
Chat Service: Business logic — validation, storage, orchestration
Routing Service: Determines how to reach each recipient (online → push, offline → notification)
PostgreSQL: Conversation metadata, user profiles, participant lists
Cassandra: Message storage — optimized for time-series write-heavy workloads
Redis: Connection registry (which user is on which chat server), undelivered message queues, presence cache
Kafka: Message event stream — decouples write from delivery
Push Notification Service: APNs/FCM integration for offline users
Load Balancer (L4): TCP-level balancing for WebSocket connections (sticky sessions)

6. Deep Dives (15 min)

Deep Dive 1: Message Ordering and Delivery Guarantees

Problem: At 1M messages/sec with distributed servers, how do we guarantee messages are ordered correctly and delivered exactly once?

Ordering within a conversation:

Each message gets a Snowflake ID (time-based, globally unique, monotonically increasing per server)
Within a single conversation, messages from different senders may arrive at different servers at slightly different times
We use the Snowflake ID as the canonical order — it’s the timestamp when the message was received by the Chat Service
Client displays messages sorted by this ID
If two messages have IDs 1ms apart, the order might be “wrong” from the users’ perspective, but this is acceptable and matches how every major messaging app works

Exactly-once delivery:

Client assigns a client_message_id (UUID) to each message before sending
Server stores this ID in a dedup cache (Redis, TTL 24h)
If the same client_message_id arrives again (network retry), server returns the existing message_id without re-processing
On the receiving side: client tracks the last received message_id per conversation. Server only sends messages with higher IDs.

Delivery guarantees (at-least-once → exactly-once):

Sender → Server: acknowledged when written to Cassandra + Kafka (persistent)
Server → Recipient: push via WebSocket, wait for client ACK
No ACK within 30 seconds → retry push
Recipient still offline → message stays in Redis undelivered queue → delivered on next connect
Client-side dedup by message_id prevents displaying duplicates

Deep Dive 2: Handling 150M Concurrent WebSocket Connections

Problem: Each WebSocket connection is a persistent TCP connection consuming memory (40-50KB per connection for buffers). 150M connections × 50KB = 7.5TB of memory just for connection state.

Architecture:

Fleet of Chat Servers: each handles ~100K connections (100KB overhead including application state) = ~10GB RAM per server
Need: 150M / 100K = 1,500 Chat Servers
Deployed across multiple availability zones and regions

Connection routing:

When a message needs to be delivered to user X, the Routing Service needs to know: which Chat Server is user X connected to?
Redis stores the mapping: conn:{user_id} → chat-server-42
When user connects: SETEX in Redis with TTL
When user disconnects: DEL from Redis
Heartbeat every 30 seconds to detect dead connections

Sticky sessions:

L4 load balancer (not L7) for WebSocket connections
Once connected, all traffic for that TCP connection goes to the same Chat Server
If a Chat Server dies, clients reconnect (they’re designed for this — mobile networks drop connections constantly)

Multi-region:

Chat Servers deployed in each region (US, EU, Asia, etc.)
Messages between users in different regions: Kafka cross-region replication (MirrorMaker)
Latency: intra-region < 50ms, cross-region < 200ms

Deep Dive 3: Group Messaging

Problem: A group with 256 members means each message must be delivered to 255 other users. A busy group with 100 messages/min → 25,500 deliveries/min for one group.

Fan-out approach:

Sender sends message to Chat Service
Chat Service writes to Cassandra (one write — message is stored once per conversation, not per recipient)
Publishes to Kafka: group_messages.{conversation_id}
Routing Service reads participant list (cached in Redis)
For each participant: check online status → push or notify

Optimization — Group message batching:

Don’t fan out to all 255 members individually in the Routing Service
Instead, Chat Servers subscribe to group topics
Each Chat Server checks: “do I have any connected users who are in this group?”
If yes, push to those users. If no, skip entirely.
This reduces cross-server traffic dramatically because each Chat Server only needs to handle its own connected users.

Unread counts:

Maintain last_read_message_id per user per conversation
Unread count = SELECT count(*) FROM messages WHERE conversation_id = X AND message_id > last_read_message_id
This query is efficient in Cassandra (range scan on clustering key)
Cache frequently accessed unread counts in Redis

7. Extensions (2 min)

End-to-end encryption (E2EE): Signal Protocol for 1-on-1 (Diffie-Hellman key exchange). For groups: Sender Keys protocol. Server stores only ciphertext — cannot read messages. Key challenge: multi-device sync (each device has its own key pair).
Media messages: Images, videos, voice notes → upload to S3 first, send a message with the S3 URL. Thumbnail generation for inline preview.
Message search: Client-side search for E2EE (server can’t index encrypted content). For non-E2EE: Elasticsearch index on message content.
Voice/Video calls: WebRTC for peer-to-peer. TURN/STUN servers for NAT traversal. SFU (Selective Forwarding Unit) for group calls.
Message reactions: Lightweight — just an update to the message record. Fan-out same as regular messages.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Connections#

3. API Design (3 min)#

REST (for non-realtime operations)#

WebSocket (for real-time)#

4. Data Model (3 min)#

Conversations & Participants (PostgreSQL, sharded by user_id)#

Messages (Cassandra, partitioned by conversation_id)#

Connection Registry (Redis)#

5. High-Level Design (12 min)#

Message Send Path#

Message Read Path (History)#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Message Ordering and Delivery Guarantees#

Deep Dive 2: Handling 150M Concurrent WebSocket Connections#

Deep Dive 3: Group Messaging#

7. Extensions (2 min)#