Design Twitter for Millions of Users

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Users can post tweets (280 chars, text only for scope)
Users can follow/unfollow other users
Users can view their home timeline (tweets from followed users)
Users can like and retweet
Users can search for tweets

Non-Functional Requirements

Availability: 99.99% — timeline is the core product
Latency: Timeline load < 200ms at p99, tweet post < 500ms
Consistency: Eventual consistency for timeline (a few seconds stale is fine). Strong consistency for the write path (post → refresh → see it).
Scale: 500M DAU, 200M tweets/day. Average user follows 200 people and loads timeline 10 times/day.

2. Estimation (3 min)

Writes

200M tweets/day ÷ 100K = 2,000 tweets/sec, peak 10,000/sec
1B likes/day → 10,000 likes/sec

Reads (Timeline)

500M DAU × 10 loads/day = 5B loads/day ÷ 100K = 50,000 timeline reads/sec
Peak: 250,000/sec
Read-to-write ratio: 25:1

Storage

Tweet: 280 chars × 2 bytes + metadata ~300 bytes = ~860 bytes → round to 1KB
200M/day × 1KB = 200GB/day → 73TB/year for tweets alone
User metadata, follows, likes: additional ~20TB/year

Fan-out calculation

Average user has 200 followers
200M tweets/day × 200 followers = 40B fan-out writes/day
Celebrity with 50M followers posting → 50M cache writes for one tweet

3. API Design (3 min)

POST /api/v1/tweets
  Body: { "content": "Hello world!", "reply_to": "t_abc" }  // reply_to optional
  Response 201: { "tweet_id": "t_xyz", "created_at": "..." }

GET /api/v1/timeline?cursor={cursor}&limit=20
  Response 200: {
    "tweets": [
      {
        "tweet_id": "t_xyz",
        "user": { "id": "u_1", "username": "chirag", "display_name": "Chirag", "avatar": "..." },
        "content": "Hello world!",
        "like_count": 42,
        "retweet_count": 7,
        "reply_count": 3,
        "liked_by_me": false,
        "retweeted_by_me": false,
        "created_at": "2026-02-22T18:00:00Z"
      }
    ],
    "next_cursor": "1708632000_t_abc"
  }

POST /api/v1/tweets/{tweet_id}/like
DELETE /api/v1/tweets/{tweet_id}/like

POST /api/v1/tweets/{tweet_id}/retweet
DELETE /api/v1/tweets/{tweet_id}/retweet

POST /api/v1/users/{user_id}/follow
DELETE /api/v1/users/{user_id}/follow

GET /api/v1/search?q={query}&cursor={cursor}

4. Data Model (3 min)

Table: users
  user_id       (PK) | bigint
  username             | varchar(15), unique
  display_name         | varchar(50)
  bio                  | varchar(160)
  follower_count       | int (denormalized)
  following_count      | int (denormalized)

Table: follows
  follower_id          | bigint
  followee_id          | bigint
  created_at           | timestamptz
  PK: (follower_id, followee_id)
  Index: (followee_id)  -- for "who follows me"

Tweets (Cassandra or DynamoDB)

Table: tweets
  tweet_id      (PK) | bigint (Snowflake ID)
  user_id              | bigint
  content              | text
  reply_to             | bigint, nullable
  like_count           | int (denormalized, eventually consistent)
  retweet_count        | int
  created_at           | timestamp

Timeline Cache (Redis)

Key: timeline:{user_id}
Type: Sorted Set
Members: tweet_ids, scored by created_at timestamp
Max size: 800 entries (older tweets evicted)

Likes (Cassandra)

Table: likes
  tweet_id  (partition key) | bigint
  user_id   (clustering key)| bigint
  created_at                | timestamp

Table: user_likes  (reverse index for "tweets I liked")
  user_id   (partition key) | bigint
  tweet_id  (clustering key)| bigint

5. High-Level Design (12 min)

Tweet Post Path

Client → Load Balancer → Tweet Service
  → Validate (length, content policy)
  → Write to Cassandra (tweets table)
  → Return success to client (fast ack)
  → Publish to Kafka: tweet_events topic
  → Fan-out Service consumes from Kafka:
    → Fetch follower list for the author
    → For each follower:
      → ZADD timeline:{follower_id} {timestamp} {tweet_id} in Redis
      → ZREMRANGEBYRANK to keep max 800 entries
    → For celebrities (> 10K followers): skip fan-out (pull model)

Timeline Read Path

Client → Load Balancer → Timeline Service
  → Read timeline:{user_id} from Redis
    → ZREVRANGE with cursor-based pagination
  → For each tweet_id, batch fetch tweet data:
    → Redis cache first (hot tweets)
    → Cassandra for cache misses
  → Merge with celebrity tweets (pull):
    → Get list of celebrities the user follows
    → Fetch their recent tweets
    → Merge + sort by timestamp
  → Hydrate: add user info, liked_by_me, retweeted_by_me
  → Return assembled timeline

Search Path

Client → Search Service → Elasticsearch
  → Full-text search on tweet content
  → Filter by time range, user, engagement metrics
  → Return results ranked by relevance + recency

Components

Tweet Service: Write path — validates and stores tweets
Timeline Service: Read path — assembles personalized timelines
Fan-out Service: Async — distributes tweets to followers’ cached timelines
Search Service: Elasticsearch-backed tweet search
Cassandra: Tweet storage, likes, retweets
PostgreSQL: User profiles, social graph (follows)
Redis Cluster: Timeline caches, hot tweet cache, user data cache
Kafka: Event stream — tweet events, like events, follow events
CDN: Cache user avatars, media attachments

6. Deep Dives (15 min)

Deep Dive 1: Fan-out Strategy (The Celebrity Problem)

This is THE defining problem of Twitter’s architecture.

Push model (fan-out-on-write):

When user posts, immediately write to all followers’ timeline caches
Advantage: Timeline read is O(1) — just read from Redis
Disadvantage: If user has 50M followers → 50M writes for one tweet
At 1 tweet/minute from a celebrity → 50M writes/minute → unsustainable

Pull model (fan-out-on-read):

Don’t pre-compute timelines. When user loads timeline, fetch recent tweets from all followed users, merge, sort, return.
Advantage: No fan-out cost on write
Disadvantage: Timeline read is O(N) where N = number of followed users. If user follows 500 people → 500 queries to assemble one timeline. At 50K timeline reads/sec → 25M queries/sec.

Hybrid model (what Twitter actually does):

Threshold: 10K followers
Below 10K: push model (fan-out-on-write to followers’ Redis timelines)
Above 10K: pull model (fetched at read time and merged)
Average user follows ~5 celebrities → at read time, merge 5 celebrity tweet lists with pre-computed timeline
This is ~5 additional Redis reads, negligible vs the push cost

Fan-out service scaling:

200M tweets/day with avg 200 followers = 40B fan-out writes/day
Minus celebrity tweets (pull model) → maybe 30B actual writes/day
30B / 100K sec/day = 300K Redis writes/sec for fan-out alone
Redis cluster with 100+ nodes handles this comfortably

Deep Dive 2: Timeline Ranking

Simple reverse-chronological timeline is fine for v1, but a ranked timeline significantly improves engagement.

Ranking approach:

Candidate generation: Pull 800 candidate tweets from the Redis timeline + celebrity tweets
Feature extraction: For each candidate, extract features:
- Recency (exponential decay)
- Author-viewer engagement: how often has this user liked/retweeted this author’s tweets?
- Tweet popularity: like_count, retweet_count, reply_count (normalized)
- Content type: replies vs original tweets vs retweets
- Social proof: “Chirag liked this” (follower engagement)
Scoring: Lightweight ML model (logistic regression or small neural net) predicts P(engagement) for each candidate
Ranking + diversity: Sort by score, then apply diversity rules (don’t show 5 tweets from same author in a row)
Return top 20

Performance budget: All of this must complete in < 100ms (within the 200ms p99 target):

Redis reads: 10ms
Feature extraction: 20ms (pre-computed features in Redis)
ML scoring: 30ms (serving 800 candidates through a model)
Sorting + selection: 5ms

Infrastructure: Feature store in Redis (user-user engagement scores, tweet popularity, etc.) updated by streaming pipeline from Kafka.

Deep Dive 3: Search Architecture

Elasticsearch setup:

Index: tweets with fields: content (full-text), user_id, created_at, engagement_score
Sharding: by time range (monthly indexes). This lets us efficiently query “recent tweets” and drop old indexes.
Replication: 2 replicas for read throughput

Indexing pipeline:

Kafka (tweet_events) → Search Indexer → Elasticsearch
Near-real-time: tweets are searchable within ~5 seconds of posting
Indexer handles backpressure: if ES is slow, Kafka buffers

Search ranking:

Default: relevance (BM25) × recency boost × engagement score
Engagement score = normalized(likes + 2×retweets + 3×replies) — weighted toward discussion
Personalization: boost results from users you follow or have engaged with

Trending topics:

Streaming pipeline: Kafka → Flink/Spark Streaming
Count hashtag mentions in sliding 1-hour windows
Compute velocity (acceleration of mentions, not just volume)
Geographic segmentation: trending in India vs trending globally
Cache top 50 trending topics in Redis, refresh every 5 minutes

7. Extensions (2 min)

Notifications: Fan-out service also publishes to notification pipeline. Types: likes, retweets, follows, mentions. Batch low-priority notifications (e.g., “5 people liked your tweet”).
Media support: Images/videos uploaded to S3, transcoded, served via CDN. Tweets store media URLs as metadata. Video requires HLS transcoding pipeline.
Thread support: Tweets linked via reply_to forming a tree. Thread view requires recursive fetch or denormalized thread table.
Spaces (live audio): WebRTC-based, SFU for multi-speaker. Completely separate infrastructure.
DMs: Separate messaging system (see Messenger/WhatsApp design).
Ads: Inject sponsored tweets into timeline at specific positions. Separate ad auction + targeting pipeline.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Writes#

Reads (Timeline)#

Storage#

Fan-out calculation#

3. API Design (3 min)#

4. Data Model (3 min)#

User & Social Graph (PostgreSQL, sharded by user_id)#

Tweets (Cassandra or DynamoDB)#

Timeline Cache (Redis)#

Likes (Cassandra)#

5. High-Level Design (12 min)#

Tweet Post Path#

Timeline Read Path#

Search Path#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Fan-out Strategy (The Celebrity Problem)#

Deep Dive 2: Timeline Ranking#

Deep Dive 3: Search Architecture#

7. Extensions (2 min)#