1. Requirements & Scope (5 min)

Functional Requirements

  1. Users can post tweets (280 chars, text only for scope)
  2. Users can follow/unfollow other users
  3. Users can view their home timeline (tweets from followed users)
  4. Users can like and retweet
  5. Users can search for tweets

Non-Functional Requirements

  • Availability: 99.99% — timeline is the core product
  • Latency: Timeline load < 200ms at p99, tweet post < 500ms
  • Consistency: Eventual consistency for timeline (a few seconds stale is fine). Strong consistency for the write path (post → refresh → see it).
  • Scale: 500M DAU, 200M tweets/day. Average user follows 200 people and loads timeline 10 times/day.

2. Estimation (3 min)

Writes

  • 200M tweets/day ÷ 100K = 2,000 tweets/sec, peak 10,000/sec
  • 1B likes/day → 10,000 likes/sec

Reads (Timeline)

  • 500M DAU × 10 loads/day = 5B loads/day ÷ 100K = 50,000 timeline reads/sec
  • Peak: 250,000/sec
  • Read-to-write ratio: 25:1

Storage

  • Tweet: 280 chars × 2 bytes + metadata ~300 bytes = ~860 bytes → round to 1KB
  • 200M/day × 1KB = 200GB/day → 73TB/year for tweets alone
  • User metadata, follows, likes: additional ~20TB/year

Fan-out calculation

  • Average user has 200 followers
  • 200M tweets/day × 200 followers = 40B fan-out writes/day
  • Celebrity with 50M followers posting → 50M cache writes for one tweet

3. API Design (3 min)

POST /api/v1/tweets
  Body: { "content": "Hello world!", "reply_to": "t_abc" }  // reply_to optional
  Response 201: { "tweet_id": "t_xyz", "created_at": "..." }

GET /api/v1/timeline?cursor={cursor}&limit=20
  Response 200: {
    "tweets": [
      {
        "tweet_id": "t_xyz",
        "user": { "id": "u_1", "username": "chirag", "display_name": "Chirag", "avatar": "..." },
        "content": "Hello world!",
        "like_count": 42,
        "retweet_count": 7,
        "reply_count": 3,
        "liked_by_me": false,
        "retweeted_by_me": false,
        "created_at": "2026-02-22T18:00:00Z"
      }
    ],
    "next_cursor": "1708632000_t_abc"
  }

POST /api/v1/tweets/{tweet_id}/like
DELETE /api/v1/tweets/{tweet_id}/like

POST /api/v1/tweets/{tweet_id}/retweet
DELETE /api/v1/tweets/{tweet_id}/retweet

POST /api/v1/users/{user_id}/follow
DELETE /api/v1/users/{user_id}/follow

GET /api/v1/search?q={query}&cursor={cursor}

4. Data Model (3 min)

User & Social Graph (PostgreSQL, sharded by user_id)

Table: users
  user_id       (PK) | bigint
  username             | varchar(15), unique
  display_name         | varchar(50)
  bio                  | varchar(160)
  follower_count       | int (denormalized)
  following_count      | int (denormalized)

Table: follows
  follower_id          | bigint
  followee_id          | bigint
  created_at           | timestamptz
  PK: (follower_id, followee_id)
  Index: (followee_id)  -- for "who follows me"

Tweets (Cassandra or DynamoDB)

Table: tweets
  tweet_id      (PK) | bigint (Snowflake ID)
  user_id              | bigint
  content              | text
  reply_to             | bigint, nullable
  like_count           | int (denormalized, eventually consistent)
  retweet_count        | int
  created_at           | timestamp

Timeline Cache (Redis)

Key: timeline:{user_id}
Type: Sorted Set
Members: tweet_ids, scored by created_at timestamp
Max size: 800 entries (older tweets evicted)

Likes (Cassandra)

Table: likes
  tweet_id  (partition key) | bigint
  user_id   (clustering key)| bigint
  created_at                | timestamp

Table: user_likes  (reverse index for "tweets I liked")
  user_id   (partition key) | bigint
  tweet_id  (clustering key)| bigint

5. High-Level Design (12 min)

Tweet Post Path

Client → Load Balancer → Tweet Service
  → Validate (length, content policy)
  → Write to Cassandra (tweets table)
  → Return success to client (fast ack)
  → Publish to Kafka: tweet_events topic
  → Fan-out Service consumes from Kafka:
    → Fetch follower list for the author
    → For each follower:
      → ZADD timeline:{follower_id} {timestamp} {tweet_id} in Redis
      → ZREMRANGEBYRANK to keep max 800 entries
    → For celebrities (> 10K followers): skip fan-out (pull model)

Timeline Read Path

Client → Load Balancer → Timeline Service
  → Read timeline:{user_id} from Redis
    → ZREVRANGE with cursor-based pagination
  → For each tweet_id, batch fetch tweet data:
    → Redis cache first (hot tweets)
    → Cassandra for cache misses
  → Merge with celebrity tweets (pull):
    → Get list of celebrities the user follows
    → Fetch their recent tweets
    → Merge + sort by timestamp
  → Hydrate: add user info, liked_by_me, retweeted_by_me
  → Return assembled timeline

Search Path

Client → Search Service → Elasticsearch
  → Full-text search on tweet content
  → Filter by time range, user, engagement metrics
  → Return results ranked by relevance + recency

Components

  1. Tweet Service: Write path — validates and stores tweets
  2. Timeline Service: Read path — assembles personalized timelines
  3. Fan-out Service: Async — distributes tweets to followers’ cached timelines
  4. Search Service: Elasticsearch-backed tweet search
  5. Cassandra: Tweet storage, likes, retweets
  6. PostgreSQL: User profiles, social graph (follows)
  7. Redis Cluster: Timeline caches, hot tweet cache, user data cache
  8. Kafka: Event stream — tweet events, like events, follow events
  9. CDN: Cache user avatars, media attachments

6. Deep Dives (15 min)

Deep Dive 1: Fan-out Strategy (The Celebrity Problem)

This is THE defining problem of Twitter’s architecture.

Push model (fan-out-on-write):

  • When user posts, immediately write to all followers’ timeline caches
  • Advantage: Timeline read is O(1) — just read from Redis
  • Disadvantage: If user has 50M followers → 50M writes for one tweet
  • At 1 tweet/minute from a celebrity → 50M writes/minute → unsustainable

Pull model (fan-out-on-read):

  • Don’t pre-compute timelines. When user loads timeline, fetch recent tweets from all followed users, merge, sort, return.
  • Advantage: No fan-out cost on write
  • Disadvantage: Timeline read is O(N) where N = number of followed users. If user follows 500 people → 500 queries to assemble one timeline. At 50K timeline reads/sec → 25M queries/sec.

Hybrid model (what Twitter actually does):

  • Threshold: 10K followers
  • Below 10K: push model (fan-out-on-write to followers’ Redis timelines)
  • Above 10K: pull model (fetched at read time and merged)
  • Average user follows ~5 celebrities → at read time, merge 5 celebrity tweet lists with pre-computed timeline
  • This is ~5 additional Redis reads, negligible vs the push cost

Fan-out service scaling:

  • 200M tweets/day with avg 200 followers = 40B fan-out writes/day
  • Minus celebrity tweets (pull model) → maybe 30B actual writes/day
  • 30B / 100K sec/day = 300K Redis writes/sec for fan-out alone
  • Redis cluster with 100+ nodes handles this comfortably

Deep Dive 2: Timeline Ranking

Simple reverse-chronological timeline is fine for v1, but a ranked timeline significantly improves engagement.

Ranking approach:

  1. Candidate generation: Pull 800 candidate tweets from the Redis timeline + celebrity tweets
  2. Feature extraction: For each candidate, extract features:
    • Recency (exponential decay)
    • Author-viewer engagement: how often has this user liked/retweeted this author’s tweets?
    • Tweet popularity: like_count, retweet_count, reply_count (normalized)
    • Content type: replies vs original tweets vs retweets
    • Social proof: “Chirag liked this” (follower engagement)
  3. Scoring: Lightweight ML model (logistic regression or small neural net) predicts P(engagement) for each candidate
  4. Ranking + diversity: Sort by score, then apply diversity rules (don’t show 5 tweets from same author in a row)
  5. Return top 20

Performance budget: All of this must complete in < 100ms (within the 200ms p99 target):

  • Redis reads: 10ms
  • Feature extraction: 20ms (pre-computed features in Redis)
  • ML scoring: 30ms (serving 800 candidates through a model)
  • Sorting + selection: 5ms

Infrastructure: Feature store in Redis (user-user engagement scores, tweet popularity, etc.) updated by streaming pipeline from Kafka.

Deep Dive 3: Search Architecture

Elasticsearch setup:

  • Index: tweets with fields: content (full-text), user_id, created_at, engagement_score
  • Sharding: by time range (monthly indexes). This lets us efficiently query “recent tweets” and drop old indexes.
  • Replication: 2 replicas for read throughput

Indexing pipeline:

  • Kafka (tweet_events) → Search Indexer → Elasticsearch
  • Near-real-time: tweets are searchable within ~5 seconds of posting
  • Indexer handles backpressure: if ES is slow, Kafka buffers

Search ranking:

  • Default: relevance (BM25) × recency boost × engagement score
  • Engagement score = normalized(likes + 2×retweets + 3×replies) — weighted toward discussion
  • Personalization: boost results from users you follow or have engaged with

Trending topics:

  • Streaming pipeline: Kafka → Flink/Spark Streaming
  • Count hashtag mentions in sliding 1-hour windows
  • Compute velocity (acceleration of mentions, not just volume)
  • Geographic segmentation: trending in India vs trending globally
  • Cache top 50 trending topics in Redis, refresh every 5 minutes

7. Extensions (2 min)

  • Notifications: Fan-out service also publishes to notification pipeline. Types: likes, retweets, follows, mentions. Batch low-priority notifications (e.g., “5 people liked your tweet”).
  • Media support: Images/videos uploaded to S3, transcoded, served via CDN. Tweets store media URLs as metadata. Video requires HLS transcoding pipeline.
  • Thread support: Tweets linked via reply_to forming a tree. Thread view requires recursive fetch or denormalized thread table.
  • Spaces (live audio): WebRTC-based, SFU for multi-speaker. Completely separate infrastructure.
  • DMs: Separate messaging system (see Messenger/WhatsApp design).
  • Ads: Inject sponsored tweets into timeline at specific positions. Separate ad auction + targeting pipeline.