1. Requirements & Scope (5 min)
Functional Requirements
- Users can upload videos
- Users can stream/watch videos (adaptive bitrate)
- Users can search for videos
- Personalized home feed (recommended videos)
- Video metadata: title, description, view count, likes, comments
Non-Functional Requirements
- Availability: 99.99% — video playback must be rock-solid
- Latency: Video playback start < 2 seconds. Search results < 300ms. Home feed < 500ms.
- Consistency: View counts and likes can be eventually consistent (seconds of delay acceptable). Video availability after upload: within minutes (transcoding pipeline).
- Scale: 2B MAU, 1B videos watched/day, 500K video uploads/day
- Bandwidth: This is a bandwidth-dominated system — video streaming is 80%+ of internet traffic
2. Estimation (3 min)
Storage
- 500K uploads/day × avg 5 minutes × 10MB/min (original) = 25TB/day raw uploads
- After transcoding (5 resolutions × 3 codecs): ~5x storage multiplier = 125TB/day
- Per year: ~45PB — massive storage system
Bandwidth
- 1B video views/day, avg 5 min watch time, avg bitrate 3Mbps
- Concurrent viewers (assume 10% of daily active at peak): 200M concurrent
- 200M × 3Mbps = 600Tbps peak bandwidth
- Even with CDN (95%+ cached), origin bandwidth: 30Tbps
Traffic
- Upload: 500K/day ÷ 100K = 5 uploads/sec (low, but each is large and long-running)
- Video plays: 1B/day ÷ 100K = 10,000 play starts/sec
- Search: assume 500M/day = 5,000 searches/sec
3. API Design (3 min)
// Upload flow (chunked, resumable)
POST /api/v1/videos/upload/init
Body: { "title": "My Video", "description": "...", "filename": "video.mp4" }
Response 200: { "upload_id": "up_123", "upload_url": "https://upload.yt.com/up_123" }
PUT /upload/{upload_id}
Headers: Content-Range: bytes 0-5242879/*
Body: <binary chunk>
Response 308: { "next_offset": 5242880 }
POST /api/v1/videos/upload/{upload_id}/complete
Response 202: { "video_id": "v_abc", "status": "processing" }
// Playback
GET /api/v1/videos/{video_id}
Response 200: {
"video_id": "v_abc",
"title": "My Video",
"description": "...",
"channel": { "id": "c_1", "name": "TechChannel", "subscriber_count": 1000000 },
"view_count": 1500000,
"like_count": 50000,
"duration": 300,
"stream_urls": {
"dash": "https://cdn.yt.com/v_abc/manifest.mpd",
"hls": "https://cdn.yt.com/v_abc/master.m3u8"
},
"thumbnails": { "default": "...", "high": "..." },
"published_at": "2026-02-22T12:00:00Z"
}
GET /api/v1/feed?cursor={cursor}&limit=20
GET /api/v1/search?q={query}&cursor={cursor}
POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/view // fire-and-forget, for analytics
Key decisions:
- Resumable uploads — video files are large (GB); network interruptions are common
- DASH/HLS manifests — client-side adaptive bitrate selection
- 202 Accepted for upload complete — video needs transcoding before it’s playable
4. Data Model (3 min)
Video Metadata (PostgreSQL, sharded by video_id)
Table: videos
video_id (PK) | bigint
channel_id | bigint
title | varchar(200)
description | text
duration_sec | int
status | enum('processing', 'ready', 'failed', 'removed')
storage_key | varchar(200) -- S3 prefix for all renditions
view_count | bigint (denormalized, eventually consistent)
like_count | bigint
published_at | timestamptz
Index: (channel_id, published_at DESC)
Video Files (S3 + CDN)
S3 structure:
videos/{video_id}/original.mp4
videos/{video_id}/360p/segment_001.ts ... segment_N.ts
videos/{video_id}/720p/segment_001.ts ... segment_N.ts
videos/{video_id}/1080p/segment_001.ts ... segment_N.ts
videos/{video_id}/manifest.mpd (DASH)
videos/{video_id}/master.m3u8 (HLS)
View Counts (Kafka → Redis → periodic flush to PostgreSQL)
Real-time view counting via streaming, not direct DB writes. Batch-update PostgreSQL every minute.
Search Index (Elasticsearch)
Index: videos
Fields: title (full-text), description (full-text), tags, channel_name, published_at, view_count
5. High-Level Design (12 min)
Upload & Processing Path
Client → Upload Service (chunked, resumable)
→ Write chunks to S3 (temporary bucket)
→ On complete: publish to Kafka (video_uploaded event)
→ Transcoding Service (fleet of GPU workers) consumes event:
→ Download original from S3
→ Transcode to multiple resolutions: 240p, 360p, 720p, 1080p, 4K
→ Multiple codecs: H.264 (compatibility), VP9 (efficiency), AV1 (next-gen)
→ Segment into 4-6 second chunks (for adaptive streaming)
→ Generate DASH manifest (.mpd) and HLS playlist (.m3u8)
→ Upload all renditions to S3 (permanent bucket)
→ Generate thumbnails (every 10 seconds for preview strip)
→ Update video status to 'ready' in PostgreSQL
→ Push to CDN (pre-warm popular regions)
→ Index in Elasticsearch
→ Trigger recommendation pipeline
Video Playback Path
Client → CDN (CloudFront / Akamai / own CDN)
→ Client requests manifest: GET /v_abc/master.m3u8
→ Client reads manifest, selects initial quality based on bandwidth
→ Client requests video segments: GET /v_abc/720p/segment_005.ts
→ CDN cache hit (99%+) → stream directly
→ CDN cache miss → fetch from S3 origin → cache → stream
→ Client adaptively switches quality mid-stream based on bandwidth
Home Feed / Recommendation Path
Client → Feed Service
→ Recommendation Engine:
→ Candidate generation (1000s of candidates from multiple sources)
→ Ranking model (score each candidate)
→ Filtering (already watched, content policy)
→ Diversification
→ Return top 20 videos with metadata
→ Pre-fetch thumbnails via CDN
Components
- Upload Service: Handles chunked, resumable video uploads
- Transcoding Service: GPU worker fleet for video processing
- Video Service: Metadata CRUD, view counting
- Feed/Recommendation Service: Personalized home feed
- Search Service: Elasticsearch-backed video search
- PostgreSQL: Video metadata, channel info, user profiles
- S3: Video file storage (originals + all renditions)
- CDN (multi-CDN): Video delivery — this is 95%+ of all traffic
- Kafka: Event streaming (uploads, views, engagements)
- Redis: View count aggregation, recommendation feature store, session cache
- Elasticsearch: Video search index
6. Deep Dives (15 min)
Deep Dive 1: Video Transcoding Pipeline
The challenge: 500K videos/day, each needing 5 resolutions × 3 codecs = 15 renditions per video. That’s 7.5M transcoding jobs/day.
Architecture:
S3 Event → SQS Queue → Transcoding Workers (auto-scaling GPU fleet)
How transcoding works:
- Split: Divide the original video into 4-6 second segments (GOP-aligned splitting)
- Transcode in parallel: Each segment can be transcoded independently → massive parallelism
- A 10-minute video = ~150 segments
- Each segment × 15 renditions = 2,250 transcoding tasks
- Distributed across worker fleet → 10-minute video fully transcoded in ~2 minutes
- Assemble: Generate manifest files (DASH/HLS) pointing to all segment files
Adaptive Bitrate Streaming (ABR):
- Client downloads manifest listing all available quality levels
- Client monitors download speed in real time
- If bandwidth drops: next segment fetched at lower quality
- If bandwidth improves: switch up
- Seamless quality transitions because segments are small (4-6 seconds)
Codec strategy:
- H.264: Universal compatibility (every device/browser)
- VP9: 30-50% better compression than H.264 (Chrome, Android)
- AV1: 30% better than VP9 but expensive to encode (only for popular videos)
- Decision: Transcode to H.264 + VP9 immediately. AV1 only for videos that exceed 10K views (cost-effective).
Deep Dive 2: CDN Architecture for Video at Scale
600Tbps peak bandwidth cannot come from origin servers. CDN handles 95%+.
Multi-CDN strategy:
- Use 3+ CDN providers (CloudFront, Akamai, Cloudflare)
- DNS-based routing: direct users to the fastest CDN based on geo + real-time performance metrics
- If one CDN has issues in a region, automatically shift traffic to another
Cache hierarchy:
- Edge PoPs (hundreds of locations): Cache popular content. Cache hit ratio: ~85%
- Regional caches (tens of locations): Catch misses from edge. Hit ratio: ~95% cumulative
- Origin shield (3-5 locations): Last cache layer before S3. Hit ratio: ~99%
- S3 origin: Only serves ~1% of requests
Popularity-based caching:
- Newly uploaded videos: pre-warm in CDN for the uploader’s region
- Viral videos (rapid view growth): proactively push to all edge PoPs
- Long-tail videos (old, few views): only cached on demand, evicted quickly
Cost optimization:
- Video delivery is the largest cost center ($0.02-0.08/GB depending on CDN)
- At 600Tbps, that’s enormous
- Peering: establish direct peering agreements with ISPs (bypass CDN for top ISPs)
- Own CDN: for the largest platforms (YouTube, Netflix), building custom CDN infrastructure is cost-effective at this scale
- Encoding efficiency: VP9/AV1 reduces bandwidth 30-50% → direct cost savings
Deep Dive 3: Recommendation System (High Level)
Multi-stage pipeline:
-
Candidate generation (< 50ms):
- Collaborative filtering: “Users who watched X also watched Y”
- Content-based: similar titles, same channel, same topic
- Social: what are people you follow watching?
- Trending: globally or regionally popular videos
- Result: ~5,000 candidate videos
-
Ranking (< 50ms):
- Deep neural network predicts: P(click), P(watch > 50%), P(like), P(subscribe after watching)
- Features: user watch history, user demographics, video metadata, video engagement stats, time of day, device type
- Combined score: weighted sum of predicted outcomes
- Result: scored candidates sorted by predicted engagement
-
Filtering and business rules (< 10ms):
- Remove already watched (from user history in Redis)
- Remove content policy violations
- Apply diversity: no more than 2 videos from same channel in top 20
- Apply freshness: boost recent uploads
-
Serving:
- Pre-compute recommendations for active users (batch pipeline, updated hourly)
- Real-time re-ranking based on current session (what they just watched)
- Cache in Redis:
recommendations:{user_id}
7. Extensions (2 min)
- Live streaming: Completely different infrastructure — RTMP ingest, real-time transcoding (no batch), low-latency HLS/DASH (~3-5 second delay), WebRTC for sub-second latency.
- Comments: Separate service with its own data store. Threaded comments, moderation pipeline, spam detection.
- Monetization: Ad insertion at server-side (SSAI) for pre-roll, mid-roll, post-roll. Ad auction + targeting pipeline.
- Content moderation: ML pipeline for detecting policy violations in video and audio. Runs post-upload, can take down content retroactively.
- Offline downloads: DRM-protected downloads (Widevine, FairPlay). Pre-download select quality, encrypt with per-device key.
- Multi-language: Auto-generated subtitles (speech-to-text), auto-translation, subtitle rendering on client.