Design YouTube / Netflix

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Users can upload videos
Users can stream/watch videos (adaptive bitrate)
Users can search for videos
Personalized home feed (recommended videos)
Video metadata: title, description, view count, likes, comments

Non-Functional Requirements

Availability: 99.99% — video playback must be rock-solid
Latency: Video playback start < 2 seconds. Search results < 300ms. Home feed < 500ms.
Consistency: View counts and likes can be eventually consistent (seconds of delay acceptable). Video availability after upload: within minutes (transcoding pipeline).
Scale: 2B MAU, 1B videos watched/day, 500K video uploads/day
Bandwidth: This is a bandwidth-dominated system — video streaming is 80%+ of internet traffic

2. Estimation (3 min)

Storage

500K uploads/day × avg 5 minutes × 10MB/min (original) = 25TB/day raw uploads
After transcoding (5 resolutions × 3 codecs): ~5x storage multiplier = 125TB/day
Per year: ~45PB — massive storage system

Bandwidth

1B video views/day, avg 5 min watch time, avg bitrate 3Mbps
Concurrent viewers (assume 10% of daily active at peak): 200M concurrent
200M × 3Mbps = 600Tbps peak bandwidth
Even with CDN (95%+ cached), origin bandwidth: 30Tbps

Traffic

Upload: 500K/day ÷ 100K = 5 uploads/sec (low, but each is large and long-running)
Video plays: 1B/day ÷ 100K = 10,000 play starts/sec
Search: assume 500M/day = 5,000 searches/sec

3. API Design (3 min)

// Upload flow (chunked, resumable)
POST /api/v1/videos/upload/init
  Body: { "title": "My Video", "description": "...", "filename": "video.mp4" }
  Response 200: { "upload_id": "up_123", "upload_url": "https://upload.yt.com/up_123" }

PUT /upload/{upload_id}
  Headers: Content-Range: bytes 0-5242879/*
  Body: <binary chunk>
  Response 308: { "next_offset": 5242880 }

POST /api/v1/videos/upload/{upload_id}/complete
  Response 202: { "video_id": "v_abc", "status": "processing" }

// Playback
GET /api/v1/videos/{video_id}
  Response 200: {
    "video_id": "v_abc",
    "title": "My Video",
    "description": "...",
    "channel": { "id": "c_1", "name": "TechChannel", "subscriber_count": 1000000 },
    "view_count": 1500000,
    "like_count": 50000,
    "duration": 300,
    "stream_urls": {
      "dash": "https://cdn.yt.com/v_abc/manifest.mpd",
      "hls": "https://cdn.yt.com/v_abc/master.m3u8"
    },
    "thumbnails": { "default": "...", "high": "..." },
    "published_at": "2026-02-22T12:00:00Z"
  }

GET /api/v1/feed?cursor={cursor}&limit=20
GET /api/v1/search?q={query}&cursor={cursor}

POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/view   // fire-and-forget, for analytics

Key decisions:

Resumable uploads — video files are large (GB); network interruptions are common
DASH/HLS manifests — client-side adaptive bitrate selection
202 Accepted for upload complete — video needs transcoding before it’s playable

4. Data Model (3 min)

Video Metadata (PostgreSQL, sharded by video_id)

Table: videos
  video_id       (PK) | bigint
  channel_id           | bigint
  title                | varchar(200)
  description          | text
  duration_sec         | int
  status               | enum('processing', 'ready', 'failed', 'removed')
  storage_key          | varchar(200)  -- S3 prefix for all renditions
  view_count           | bigint (denormalized, eventually consistent)
  like_count           | bigint
  published_at         | timestamptz
  Index: (channel_id, published_at DESC)

Video Files (S3 + CDN)

S3 structure:
  videos/{video_id}/original.mp4
  videos/{video_id}/360p/segment_001.ts ... segment_N.ts
  videos/{video_id}/720p/segment_001.ts ... segment_N.ts
  videos/{video_id}/1080p/segment_001.ts ... segment_N.ts
  videos/{video_id}/manifest.mpd  (DASH)
  videos/{video_id}/master.m3u8   (HLS)

View Counts (Kafka → Redis → periodic flush to PostgreSQL)

Real-time view counting via streaming, not direct DB writes. Batch-update PostgreSQL every minute.

Search Index (Elasticsearch)

Index: videos
Fields: title (full-text), description (full-text), tags, channel_name, published_at, view_count

5. High-Level Design (12 min)

Upload & Processing Path

Client → Upload Service (chunked, resumable)
  → Write chunks to S3 (temporary bucket)
  → On complete: publish to Kafka (video_uploaded event)
  → Transcoding Service (fleet of GPU workers) consumes event:
    → Download original from S3
    → Transcode to multiple resolutions: 240p, 360p, 720p, 1080p, 4K
    → Multiple codecs: H.264 (compatibility), VP9 (efficiency), AV1 (next-gen)
    → Segment into 4-6 second chunks (for adaptive streaming)
    → Generate DASH manifest (.mpd) and HLS playlist (.m3u8)
    → Upload all renditions to S3 (permanent bucket)
    → Generate thumbnails (every 10 seconds for preview strip)
  → Update video status to 'ready' in PostgreSQL
  → Push to CDN (pre-warm popular regions)
  → Index in Elasticsearch
  → Trigger recommendation pipeline

Video Playback Path

Client → CDN (CloudFront / Akamai / own CDN)
  → Client requests manifest: GET /v_abc/master.m3u8
  → Client reads manifest, selects initial quality based on bandwidth
  → Client requests video segments: GET /v_abc/720p/segment_005.ts
  → CDN cache hit (99%+) → stream directly
  → CDN cache miss → fetch from S3 origin → cache → stream
  → Client adaptively switches quality mid-stream based on bandwidth

Home Feed / Recommendation Path

Client → Feed Service
  → Recommendation Engine:
    → Candidate generation (1000s of candidates from multiple sources)
    → Ranking model (score each candidate)
    → Filtering (already watched, content policy)
    → Diversification
  → Return top 20 videos with metadata
  → Pre-fetch thumbnails via CDN

Components

Upload Service: Handles chunked, resumable video uploads
Transcoding Service: GPU worker fleet for video processing
Video Service: Metadata CRUD, view counting
Feed/Recommendation Service: Personalized home feed
Search Service: Elasticsearch-backed video search
PostgreSQL: Video metadata, channel info, user profiles
S3: Video file storage (originals + all renditions)
CDN (multi-CDN): Video delivery — this is 95%+ of all traffic
Kafka: Event streaming (uploads, views, engagements)
Redis: View count aggregation, recommendation feature store, session cache
Elasticsearch: Video search index

6. Deep Dives (15 min)

Deep Dive 1: Video Transcoding Pipeline

The challenge: 500K videos/day, each needing 5 resolutions × 3 codecs = 15 renditions per video. That’s 7.5M transcoding jobs/day.

Architecture:

S3 Event → SQS Queue → Transcoding Workers (auto-scaling GPU fleet)

How transcoding works:

Split: Divide the original video into 4-6 second segments (GOP-aligned splitting)
Transcode in parallel: Each segment can be transcoded independently → massive parallelism
- A 10-minute video = ~150 segments
- Each segment × 15 renditions = 2,250 transcoding tasks
- Distributed across worker fleet → 10-minute video fully transcoded in ~2 minutes
Assemble: Generate manifest files (DASH/HLS) pointing to all segment files

Adaptive Bitrate Streaming (ABR):

Client downloads manifest listing all available quality levels
Client monitors download speed in real time
If bandwidth drops: next segment fetched at lower quality
If bandwidth improves: switch up
Seamless quality transitions because segments are small (4-6 seconds)

Codec strategy:

H.264: Universal compatibility (every device/browser)
VP9: 30-50% better compression than H.264 (Chrome, Android)
AV1: 30% better than VP9 but expensive to encode (only for popular videos)
Decision: Transcode to H.264 + VP9 immediately. AV1 only for videos that exceed 10K views (cost-effective).

Deep Dive 2: CDN Architecture for Video at Scale

600Tbps peak bandwidth cannot come from origin servers. CDN handles 95%+.

Multi-CDN strategy:

Use 3+ CDN providers (CloudFront, Akamai, Cloudflare)
DNS-based routing: direct users to the fastest CDN based on geo + real-time performance metrics
If one CDN has issues in a region, automatically shift traffic to another

Cache hierarchy:

Edge PoPs (hundreds of locations): Cache popular content. Cache hit ratio: ~85%
Regional caches (tens of locations): Catch misses from edge. Hit ratio: ~95% cumulative
Origin shield (3-5 locations): Last cache layer before S3. Hit ratio: ~99%
S3 origin: Only serves ~1% of requests

Popularity-based caching:

Newly uploaded videos: pre-warm in CDN for the uploader’s region
Viral videos (rapid view growth): proactively push to all edge PoPs
Long-tail videos (old, few views): only cached on demand, evicted quickly

Cost optimization:

Video delivery is the largest cost center ($0.02-0.08/GB depending on CDN)
At 600Tbps, that’s enormous
Peering: establish direct peering agreements with ISPs (bypass CDN for top ISPs)
Own CDN: for the largest platforms (YouTube, Netflix), building custom CDN infrastructure is cost-effective at this scale
Encoding efficiency: VP9/AV1 reduces bandwidth 30-50% → direct cost savings

Deep Dive 3: Recommendation System (High Level)

Multi-stage pipeline:

Candidate generation (< 50ms):
- Collaborative filtering: “Users who watched X also watched Y”
- Content-based: similar titles, same channel, same topic
- Social: what are people you follow watching?
- Trending: globally or regionally popular videos
- Result: ~5,000 candidate videos
Ranking (< 50ms):
- Deep neural network predicts: P(click), P(watch > 50%), P(like), P(subscribe after watching)
- Features: user watch history, user demographics, video metadata, video engagement stats, time of day, device type
- Combined score: weighted sum of predicted outcomes
- Result: scored candidates sorted by predicted engagement
Filtering and business rules (< 10ms):
- Remove already watched (from user history in Redis)
- Remove content policy violations
- Apply diversity: no more than 2 videos from same channel in top 20
- Apply freshness: boost recent uploads
Serving:
- Pre-compute recommendations for active users (batch pipeline, updated hourly)
- Real-time re-ranking based on current session (what they just watched)
- Cache in Redis: recommendations:{user_id}

7. Extensions (2 min)

Live streaming: Completely different infrastructure — RTMP ingest, real-time transcoding (no batch), low-latency HLS/DASH (~3-5 second delay), WebRTC for sub-second latency.
Comments: Separate service with its own data store. Threaded comments, moderation pipeline, spam detection.
Monetization: Ad insertion at server-side (SSAI) for pre-roll, mid-roll, post-roll. Ad auction + targeting pipeline.
Content moderation: ML pipeline for detecting policy violations in video and audio. Runs post-upload, can take down content retroactively.
Offline downloads: DRM-protected downloads (Widevine, FairPlay). Pre-download select quality, encrypt with per-device key.
Multi-language: Auto-generated subtitles (speech-to-text), auto-translation, subtitle rendering on client.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Storage#

Bandwidth#

Traffic#

3. API Design (3 min)#

4. Data Model (3 min)#

Video Metadata (PostgreSQL, sharded by video_id)#

Video Files (S3 + CDN)#

View Counts (Kafka → Redis → periodic flush to PostgreSQL)#

Search Index (Elasticsearch)#

5. High-Level Design (12 min)#

Upload & Processing Path#

Video Playback Path#

Home Feed / Recommendation Path#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Video Transcoding Pipeline#

Deep Dive 2: CDN Architecture for Video at Scale#

Deep Dive 3: Recommendation System (High Level)#

7. Extensions (2 min)#