Design Instagram

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Users can upload photos with captions
Users can follow/unfollow other users
Users can view a personalized news feed (photos from people they follow)
Users can like and comment on photos
Users can view any user’s profile (grid of their photos)

Non-Functional Requirements

Availability: 99.99% — social feeds being down is immediately visible to millions
Latency: Feed load < 300ms at p99, photo upload acknowledgment < 2s
Consistency: Eventual consistency for feed (2-5 seconds stale is fine). Strong consistency for uploads (post → refresh → must see it)
Scale: 500M DAU, 50M photo uploads/day, average user views feed 10 times/day
Storage: Photos are large; need cost-efficient media storage

2. Estimation (3 min)

Traffic

Uploads: 50M/day ÷ 100K = 500 writes/sec, peak 2,500/sec
Feed reads: 500M × 10/day = 5B reads/day ÷ 100K = 50,000 reads/sec, peak 250,000/sec
Read-to-write ratio: 100:1 — extremely read-heavy

Storage

Average photo: 2MB original, store 4 sizes (thumbnail 50KB, small 200KB, medium 500KB, large 2MB) = ~2.75MB total per photo
50M photos/day × 2.75MB = 137TB/day
Per year: ~50PB — this is a massive storage system

Bandwidth

Feed: 10 photos per load × 500KB avg = 5MB per feed load
50,000 feeds/sec × 5MB = 250GB/sec peak read bandwidth
CDN is absolutely critical

3. API Design (3 min)

POST /api/v1/photos
  Content-Type: multipart/form-data
  Body: {
    "photo": <binary>,
    "caption": "Sunset vibes",
    "location": "Mumbai, India"     // optional
  }
  Response 201: {
    "photo_id": "p_abc123",
    "urls": {
      "thumbnail": "https://cdn.ig.com/thumb/p_abc123.jpg",
      "full": "https://cdn.ig.com/full/p_abc123.jpg"
    }
  }

GET /api/v1/feed?cursor={cursor}&limit=10
  Response 200: {
    "photos": [
      {
        "photo_id": "p_abc123",
        "user": { "id": "u_1", "username": "chirag", "avatar": "..." },
        "caption": "Sunset vibes",
        "photo_url": "https://cdn.ig.com/med/p_abc123.jpg",
        "like_count": 2847,
        "comment_count": 43,
        "liked_by_me": true,
        "created_at": "2026-02-22T18:00:00Z"
      }
    ],
    "next_cursor": "ts_1708632000"
  }

POST /api/v1/photos/{photo_id}/like
DELETE /api/v1/photos/{photo_id}/like

POST /api/v1/users/{user_id}/follow
DELETE /api/v1/users/{user_id}/follow

GET /api/v1/users/{user_id}/photos?cursor={cursor}&limit=30

Key decisions:

Cursor-based pagination using timestamp — handles real-time insertions correctly
Pre-signed upload URLs alternative: instead of uploading through our API, return a pre-signed S3 URL for direct client→S3 upload, reducing server load
liked_by_me included in feed response to avoid N+1 queries on the client

4. Data Model (3 min)

Users & Follows (PostgreSQL)

Table: users
  user_id      (PK) | bigint
  username           | varchar(30), unique
  avatar_url         | text
  bio                | varchar(300)
  follower_count     | int (denormalized)
  following_count    | int (denormalized)

Table: follows
  follower_id        | bigint (FK → users)
  followee_id        | bigint (FK → users)
  created_at         | timestamptz
  PK: (follower_id, followee_id)
  Index: (followee_id, follower_id)  -- for "who follows me" queries

Photos (PostgreSQL + S3)

Table: photos
  photo_id     (PK) | bigint
  user_id            | bigint (FK → users)
  caption            | text
  s3_key             | varchar(200)
  location           | varchar(100)
  like_count         | int (denormalized)
  comment_count      | int (denormalized)
  created_at         | timestamptz
  Index: (user_id, created_at DESC)  -- for profile grid

Feed (Redis)

Key: feed:{user_id}
Value: Sorted set of photo_ids, scored by created_at timestamp
TTL: 7 days (rebuild from DB on cold start)

Likes (Cassandra)

Table: likes
  photo_id     (partition key) | bigint
  user_id      (clustering key)| bigint
  created_at                   | timestamp

Why Cassandra for likes? High write volume (millions of likes/sec), simple access pattern (check if user liked a photo, get all likers), and append-heavy workload.

5. High-Level Design (12 min)

Photo Upload Path

Client → Load Balancer → Upload Service
  → Get pre-signed S3 URL → Client uploads directly to S3
  → Upload Service notified (S3 event / callback)
  → Image Processing Service (async via SQS/Kafka):
    → Resize to 4 sizes (thumbnail, small, medium, large)
    → Store resized images in S3
    → Push to CDN
  → Write metadata to PostgreSQL
  → Trigger fan-out: Kafka → Feed Service → update followers' Redis feeds

Feed Read Path

Client → CDN (cache feed API? usually no — personalized)
  → Load Balancer → Feed Service
    → Read from Redis sorted set (feed:{user_id})
      → Get top N photo_ids
      → Batch fetch photo metadata from PostgreSQL (or cache)
      → Batch fetch "liked_by_me" from Cassandra
      → Assemble response
    → Redis miss → Fall back to fan-out-on-read (pull from followees)

Components

Upload Service: Handles photo upload coordination
Image Processing Service: Async resizing pipeline (SQS → Lambda or dedicated workers)
Feed Service: Generates and serves personalized feeds
Fan-out Service: Pushes new posts to followers’ feed caches
PostgreSQL: User/photo metadata (primary + read replicas)
Cassandra: Likes, comments (high write throughput)
Redis Cluster: Pre-computed feed caches
S3: Photo storage (all sizes)
CDN (CloudFront): Serve photos — this handles 99%+ of photo bandwidth
Kafka: Event stream for fan-out, analytics, notifications

6. Deep Dives (15 min)

Deep Dive 1: Feed Generation (Fan-out Strategy)

The problem: When user A posts a photo, all of A’s followers need to see it in their feed. If A has 10M followers, that’s 10M Redis writes for a single post.

Hybrid approach (push + pull):

Regular users (< 10K followers): Fan-out-on-write (push model)
- When they post, immediately push photo_id to all followers’ Redis feeds
- ZADD feed:{follower_id} {timestamp} {photo_id} — for each follower
- This is fast because < 10K writes per post
- Followers get near-instant feed updates
Celebrity users (> 10K followers): Fan-out-on-read (pull model)
- When they post, DON’T update followers’ feeds
- Instead, maintain a celebrity_posts:{user_id} sorted set in Redis
- When a follower loads their feed, merge their pre-computed feed with a real-time fetch of posts from celebrities they follow
- Feed Service does: ZUNIONSTORE of feed:{user_id} + celebrity_posts for each followed celebrity

Feed ranking: The basic feed is reverse-chronological. For an Instagram-like ranked feed, add a lightweight ML scoring layer:

Features: recency, user-author engagement history, photo popularity
Score each candidate post, sort by score, return top N
This runs in < 50ms using pre-computed features stored in Redis

Deep Dive 2: Image Processing Pipeline

Upload flow in detail:

Client requests pre-signed S3 URL from Upload Service
Client uploads original photo directly to S3 (bypasses our servers entirely)
S3 event notification triggers Image Processing pipeline
Pipeline generates 4 sizes:
- Thumbnail: 150×150, 50KB (profile grid, notifications)
- Small: 320×320, 200KB (small screens)
- Medium: 640×640, 500KB (feed view)
- Large: 1080×1080, 2MB (full-screen view)
All sizes uploaded to S3 with deterministic keys: {size}/{photo_id}.jpg
CDN pre-warmed for medium size (most requested)

Processing at scale:

500 uploads/sec → 500 image processing jobs/sec
Each job: ~2 seconds of CPU (resize 4 times)
Need: ~1000 processing cores continuously
Solution: Auto-scaling worker fleet or AWS Lambda (scales to thousands of concurrent executions)

Failure handling: If image processing fails (corrupt image, OOM):

Retry 3 times with exponential backoff
Dead letter queue for persistent failures
Photo shows as “processing” in UI until complete
Alert on DLQ depth > threshold

Deep Dive 3: CDN Strategy for Photo Delivery

Photos account for 99%+ of bandwidth. Without CDN, we’d need 250GB/sec of origin bandwidth.

CDN architecture:

Multi-CDN: Use CloudFront + Cloudflare for redundancy and cost optimization
Cache tiers: Edge → Regional → Origin Shield → S3
Cache hit ratio target: > 98% (most photos are viewed within first 24 hours of posting)

Cache key design: /{size}/{photo_id}.jpg

Deterministic keys → no cache invalidation needed (photos are immutable once processed)
Different sizes cached independently (thumbnail has much higher hit ratio than large)

Cost optimization:

Photos older than 30 days: move to S3 Infrequent Access tier (50% cheaper)
Photos older than 1 year: move to S3 Glacier (90% cheaper)
Access pattern: 80% of CDN traffic is for photos < 7 days old

7. Extensions (2 min)

Stories: Ephemeral content (24-hour TTL). Separate storage and feed pipeline. Higher write volume but short retention reduces storage costs.
Reels/Video: Fundamentally different pipeline — video transcoding (HLS/DASH), adaptive bitrate streaming, higher storage and bandwidth costs.
Explore/Discovery: Content recommendation engine using collaborative filtering + content-based features. Separate from the follow-based feed.
Direct messages: Real-time messaging (WebSocket-based). Separate service entirely. See the Messenger/WhatsApp design.
Push notifications: Fan-out service publishes to notification queue when a user posts. APNs/FCM for delivery.
Content moderation: ML pipeline to detect nudity, violence, spam. Runs async post-upload, can remove content retroactively.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Bandwidth#

3. API Design (3 min)#

4. Data Model (3 min)#

Users & Follows (PostgreSQL)#

Photos (PostgreSQL + S3)#

Feed (Redis)#

Likes (Cassandra)#

5. High-Level Design (12 min)#

Photo Upload Path#

Feed Read Path#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Feed Generation (Fan-out Strategy)#

Deep Dive 2: Image Processing Pipeline#

Deep Dive 3: CDN Strategy for Photo Delivery#

7. Extensions (2 min)#