1. Requirements & Scope (5 min)
Functional Requirements
- Users can post tweets (280 chars, text only for scope)
- Users can follow/unfollow other users
- Users can view their home timeline (tweets from followed users)
- Users can like and retweet
- Users can search for tweets
Non-Functional Requirements
- Availability: 99.99% — timeline is the core product
- Latency: Timeline load < 200ms at p99, tweet post < 500ms
- Consistency: Eventual consistency for timeline (a few seconds stale is fine). Strong consistency for the write path (post → refresh → see it).
- Scale: 500M DAU, 200M tweets/day. Average user follows 200 people and loads timeline 10 times/day.
2. Estimation (3 min)
Writes
- 200M tweets/day ÷ 100K = 2,000 tweets/sec, peak 10,000/sec
- 1B likes/day → 10,000 likes/sec
Reads (Timeline)
- 500M DAU × 10 loads/day = 5B loads/day ÷ 100K = 50,000 timeline reads/sec
- Peak: 250,000/sec
- Read-to-write ratio: 25:1
Storage
- Tweet: 280 chars × 2 bytes + metadata ~300 bytes = ~860 bytes → round to 1KB
- 200M/day × 1KB = 200GB/day → 73TB/year for tweets alone
- User metadata, follows, likes: additional ~20TB/year
Fan-out calculation
- Average user has 200 followers
- 200M tweets/day × 200 followers = 40B fan-out writes/day
- Celebrity with 50M followers posting → 50M cache writes for one tweet
3. API Design (3 min)
POST /api/v1/tweets
Body: { "content": "Hello world!", "reply_to": "t_abc" } // reply_to optional
Response 201: { "tweet_id": "t_xyz", "created_at": "..." }
GET /api/v1/timeline?cursor={cursor}&limit=20
Response 200: {
"tweets": [
{
"tweet_id": "t_xyz",
"user": { "id": "u_1", "username": "chirag", "display_name": "Chirag", "avatar": "..." },
"content": "Hello world!",
"like_count": 42,
"retweet_count": 7,
"reply_count": 3,
"liked_by_me": false,
"retweeted_by_me": false,
"created_at": "2026-02-22T18:00:00Z"
}
],
"next_cursor": "1708632000_t_abc"
}
POST /api/v1/tweets/{tweet_id}/like
DELETE /api/v1/tweets/{tweet_id}/like
POST /api/v1/tweets/{tweet_id}/retweet
DELETE /api/v1/tweets/{tweet_id}/retweet
POST /api/v1/users/{user_id}/follow
DELETE /api/v1/users/{user_id}/follow
GET /api/v1/search?q={query}&cursor={cursor}
4. Data Model (3 min)
User & Social Graph (PostgreSQL, sharded by user_id)
Table: users
user_id (PK) | bigint
username | varchar(15), unique
display_name | varchar(50)
bio | varchar(160)
follower_count | int (denormalized)
following_count | int (denormalized)
Table: follows
follower_id | bigint
followee_id | bigint
created_at | timestamptz
PK: (follower_id, followee_id)
Index: (followee_id) -- for "who follows me"
Tweets (Cassandra or DynamoDB)
Table: tweets
tweet_id (PK) | bigint (Snowflake ID)
user_id | bigint
content | text
reply_to | bigint, nullable
like_count | int (denormalized, eventually consistent)
retweet_count | int
created_at | timestamp
Timeline Cache (Redis)
Key: timeline:{user_id}
Type: Sorted Set
Members: tweet_ids, scored by created_at timestamp
Max size: 800 entries (older tweets evicted)
Likes (Cassandra)
Table: likes
tweet_id (partition key) | bigint
user_id (clustering key)| bigint
created_at | timestamp
Table: user_likes (reverse index for "tweets I liked")
user_id (partition key) | bigint
tweet_id (clustering key)| bigint
5. High-Level Design (12 min)
Tweet Post Path
Client → Load Balancer → Tweet Service
→ Validate (length, content policy)
→ Write to Cassandra (tweets table)
→ Return success to client (fast ack)
→ Publish to Kafka: tweet_events topic
→ Fan-out Service consumes from Kafka:
→ Fetch follower list for the author
→ For each follower:
→ ZADD timeline:{follower_id} {timestamp} {tweet_id} in Redis
→ ZREMRANGEBYRANK to keep max 800 entries
→ For celebrities (> 10K followers): skip fan-out (pull model)
Timeline Read Path
Client → Load Balancer → Timeline Service
→ Read timeline:{user_id} from Redis
→ ZREVRANGE with cursor-based pagination
→ For each tweet_id, batch fetch tweet data:
→ Redis cache first (hot tweets)
→ Cassandra for cache misses
→ Merge with celebrity tweets (pull):
→ Get list of celebrities the user follows
→ Fetch their recent tweets
→ Merge + sort by timestamp
→ Hydrate: add user info, liked_by_me, retweeted_by_me
→ Return assembled timeline
Search Path
Client → Search Service → Elasticsearch
→ Full-text search on tweet content
→ Filter by time range, user, engagement metrics
→ Return results ranked by relevance + recency
Components
- Tweet Service: Write path — validates and stores tweets
- Timeline Service: Read path — assembles personalized timelines
- Fan-out Service: Async — distributes tweets to followers’ cached timelines
- Search Service: Elasticsearch-backed tweet search
- Cassandra: Tweet storage, likes, retweets
- PostgreSQL: User profiles, social graph (follows)
- Redis Cluster: Timeline caches, hot tweet cache, user data cache
- Kafka: Event stream — tweet events, like events, follow events
- CDN: Cache user avatars, media attachments
6. Deep Dives (15 min)
Deep Dive 1: Fan-out Strategy (The Celebrity Problem)
This is THE defining problem of Twitter’s architecture.
Push model (fan-out-on-write):
- When user posts, immediately write to all followers’ timeline caches
- Advantage: Timeline read is O(1) — just read from Redis
- Disadvantage: If user has 50M followers → 50M writes for one tweet
- At 1 tweet/minute from a celebrity → 50M writes/minute → unsustainable
Pull model (fan-out-on-read):
- Don’t pre-compute timelines. When user loads timeline, fetch recent tweets from all followed users, merge, sort, return.
- Advantage: No fan-out cost on write
- Disadvantage: Timeline read is O(N) where N = number of followed users. If user follows 500 people → 500 queries to assemble one timeline. At 50K timeline reads/sec → 25M queries/sec.
Hybrid model (what Twitter actually does):
- Threshold: 10K followers
- Below 10K: push model (fan-out-on-write to followers’ Redis timelines)
- Above 10K: pull model (fetched at read time and merged)
- Average user follows ~5 celebrities → at read time, merge 5 celebrity tweet lists with pre-computed timeline
- This is ~5 additional Redis reads, negligible vs the push cost
Fan-out service scaling:
- 200M tweets/day with avg 200 followers = 40B fan-out writes/day
- Minus celebrity tweets (pull model) → maybe 30B actual writes/day
- 30B / 100K sec/day = 300K Redis writes/sec for fan-out alone
- Redis cluster with 100+ nodes handles this comfortably
Deep Dive 2: Timeline Ranking
Simple reverse-chronological timeline is fine for v1, but a ranked timeline significantly improves engagement.
Ranking approach:
- Candidate generation: Pull 800 candidate tweets from the Redis timeline + celebrity tweets
- Feature extraction: For each candidate, extract features:
- Recency (exponential decay)
- Author-viewer engagement: how often has this user liked/retweeted this author’s tweets?
- Tweet popularity: like_count, retweet_count, reply_count (normalized)
- Content type: replies vs original tweets vs retweets
- Social proof: “Chirag liked this” (follower engagement)
- Scoring: Lightweight ML model (logistic regression or small neural net) predicts P(engagement) for each candidate
- Ranking + diversity: Sort by score, then apply diversity rules (don’t show 5 tweets from same author in a row)
- Return top 20
Performance budget: All of this must complete in < 100ms (within the 200ms p99 target):
- Redis reads: 10ms
- Feature extraction: 20ms (pre-computed features in Redis)
- ML scoring: 30ms (serving 800 candidates through a model)
- Sorting + selection: 5ms
Infrastructure: Feature store in Redis (user-user engagement scores, tweet popularity, etc.) updated by streaming pipeline from Kafka.
Deep Dive 3: Search Architecture
Elasticsearch setup:
- Index:
tweetswith fields: content (full-text), user_id, created_at, engagement_score - Sharding: by time range (monthly indexes). This lets us efficiently query “recent tweets” and drop old indexes.
- Replication: 2 replicas for read throughput
Indexing pipeline:
- Kafka (tweet_events) → Search Indexer → Elasticsearch
- Near-real-time: tweets are searchable within ~5 seconds of posting
- Indexer handles backpressure: if ES is slow, Kafka buffers
Search ranking:
- Default: relevance (BM25) × recency boost × engagement score
- Engagement score = normalized(likes + 2×retweets + 3×replies) — weighted toward discussion
- Personalization: boost results from users you follow or have engaged with
Trending topics:
- Streaming pipeline: Kafka → Flink/Spark Streaming
- Count hashtag mentions in sliding 1-hour windows
- Compute velocity (acceleration of mentions, not just volume)
- Geographic segmentation: trending in India vs trending globally
- Cache top 50 trending topics in Redis, refresh every 5 minutes
7. Extensions (2 min)
- Notifications: Fan-out service also publishes to notification pipeline. Types: likes, retweets, follows, mentions. Batch low-priority notifications (e.g., “5 people liked your tweet”).
- Media support: Images/videos uploaded to S3, transcoded, served via CDN. Tweets store media URLs as metadata. Video requires HLS transcoding pipeline.
- Thread support: Tweets linked via
reply_toforming a tree. Thread view requires recursive fetch or denormalized thread table. - Spaces (live audio): WebRTC-based, SFU for multi-speaker. Completely separate infrastructure.
- DMs: Separate messaging system (see Messenger/WhatsApp design).
- Ads: Inject sponsored tweets into timeline at specific positions. Separate ad auction + targeting pipeline.