1. Requirements & Scope (5 min)

Functional Requirements

  1. Users can upload videos
  2. Users can stream/watch videos (adaptive bitrate)
  3. Users can search for videos
  4. Personalized home feed (recommended videos)
  5. Video metadata: title, description, view count, likes, comments

Non-Functional Requirements

  • Availability: 99.99% — video playback must be rock-solid
  • Latency: Video playback start < 2 seconds. Search results < 300ms. Home feed < 500ms.
  • Consistency: View counts and likes can be eventually consistent (seconds of delay acceptable). Video availability after upload: within minutes (transcoding pipeline).
  • Scale: 2B MAU, 1B videos watched/day, 500K video uploads/day
  • Bandwidth: This is a bandwidth-dominated system — video streaming is 80%+ of internet traffic

2. Estimation (3 min)

Storage

  • 500K uploads/day × avg 5 minutes × 10MB/min (original) = 25TB/day raw uploads
  • After transcoding (5 resolutions × 3 codecs): ~5x storage multiplier = 125TB/day
  • Per year: ~45PB — massive storage system

Bandwidth

  • 1B video views/day, avg 5 min watch time, avg bitrate 3Mbps
  • Concurrent viewers (assume 10% of daily active at peak): 200M concurrent
  • 200M × 3Mbps = 600Tbps peak bandwidth
  • Even with CDN (95%+ cached), origin bandwidth: 30Tbps

Traffic

  • Upload: 500K/day ÷ 100K = 5 uploads/sec (low, but each is large and long-running)
  • Video plays: 1B/day ÷ 100K = 10,000 play starts/sec
  • Search: assume 500M/day = 5,000 searches/sec

3. API Design (3 min)

// Upload flow (chunked, resumable)
POST /api/v1/videos/upload/init
  Body: { "title": "My Video", "description": "...", "filename": "video.mp4" }
  Response 200: { "upload_id": "up_123", "upload_url": "https://upload.yt.com/up_123" }

PUT /upload/{upload_id}
  Headers: Content-Range: bytes 0-5242879/*
  Body: <binary chunk>
  Response 308: { "next_offset": 5242880 }

POST /api/v1/videos/upload/{upload_id}/complete
  Response 202: { "video_id": "v_abc", "status": "processing" }

// Playback
GET /api/v1/videos/{video_id}
  Response 200: {
    "video_id": "v_abc",
    "title": "My Video",
    "description": "...",
    "channel": { "id": "c_1", "name": "TechChannel", "subscriber_count": 1000000 },
    "view_count": 1500000,
    "like_count": 50000,
    "duration": 300,
    "stream_urls": {
      "dash": "https://cdn.yt.com/v_abc/manifest.mpd",
      "hls": "https://cdn.yt.com/v_abc/master.m3u8"
    },
    "thumbnails": { "default": "...", "high": "..." },
    "published_at": "2026-02-22T12:00:00Z"
  }

GET /api/v1/feed?cursor={cursor}&limit=20
GET /api/v1/search?q={query}&cursor={cursor}

POST /api/v1/videos/{video_id}/like
POST /api/v1/videos/{video_id}/view   // fire-and-forget, for analytics

Key decisions:

  • Resumable uploads — video files are large (GB); network interruptions are common
  • DASH/HLS manifests — client-side adaptive bitrate selection
  • 202 Accepted for upload complete — video needs transcoding before it’s playable

4. Data Model (3 min)

Video Metadata (PostgreSQL, sharded by video_id)

Table: videos
  video_id       (PK) | bigint
  channel_id           | bigint
  title                | varchar(200)
  description          | text
  duration_sec         | int
  status               | enum('processing', 'ready', 'failed', 'removed')
  storage_key          | varchar(200)  -- S3 prefix for all renditions
  view_count           | bigint (denormalized, eventually consistent)
  like_count           | bigint
  published_at         | timestamptz
  Index: (channel_id, published_at DESC)

Video Files (S3 + CDN)

S3 structure:
  videos/{video_id}/original.mp4
  videos/{video_id}/360p/segment_001.ts ... segment_N.ts
  videos/{video_id}/720p/segment_001.ts ... segment_N.ts
  videos/{video_id}/1080p/segment_001.ts ... segment_N.ts
  videos/{video_id}/manifest.mpd  (DASH)
  videos/{video_id}/master.m3u8   (HLS)

View Counts (Kafka → Redis → periodic flush to PostgreSQL)

Real-time view counting via streaming, not direct DB writes. Batch-update PostgreSQL every minute.

Search Index (Elasticsearch)

Index: videos
Fields: title (full-text), description (full-text), tags, channel_name, published_at, view_count

5. High-Level Design (12 min)

Upload & Processing Path

Client → Upload Service (chunked, resumable)
  → Write chunks to S3 (temporary bucket)
  → On complete: publish to Kafka (video_uploaded event)
  → Transcoding Service (fleet of GPU workers) consumes event:
    → Download original from S3
    → Transcode to multiple resolutions: 240p, 360p, 720p, 1080p, 4K
    → Multiple codecs: H.264 (compatibility), VP9 (efficiency), AV1 (next-gen)
    → Segment into 4-6 second chunks (for adaptive streaming)
    → Generate DASH manifest (.mpd) and HLS playlist (.m3u8)
    → Upload all renditions to S3 (permanent bucket)
    → Generate thumbnails (every 10 seconds for preview strip)
  → Update video status to 'ready' in PostgreSQL
  → Push to CDN (pre-warm popular regions)
  → Index in Elasticsearch
  → Trigger recommendation pipeline

Video Playback Path

Client → CDN (CloudFront / Akamai / own CDN)
  → Client requests manifest: GET /v_abc/master.m3u8
  → Client reads manifest, selects initial quality based on bandwidth
  → Client requests video segments: GET /v_abc/720p/segment_005.ts
  → CDN cache hit (99%+) → stream directly
  → CDN cache miss → fetch from S3 origin → cache → stream
  → Client adaptively switches quality mid-stream based on bandwidth

Home Feed / Recommendation Path

Client → Feed Service
  → Recommendation Engine:
    → Candidate generation (1000s of candidates from multiple sources)
    → Ranking model (score each candidate)
    → Filtering (already watched, content policy)
    → Diversification
  → Return top 20 videos with metadata
  → Pre-fetch thumbnails via CDN

Components

  1. Upload Service: Handles chunked, resumable video uploads
  2. Transcoding Service: GPU worker fleet for video processing
  3. Video Service: Metadata CRUD, view counting
  4. Feed/Recommendation Service: Personalized home feed
  5. Search Service: Elasticsearch-backed video search
  6. PostgreSQL: Video metadata, channel info, user profiles
  7. S3: Video file storage (originals + all renditions)
  8. CDN (multi-CDN): Video delivery — this is 95%+ of all traffic
  9. Kafka: Event streaming (uploads, views, engagements)
  10. Redis: View count aggregation, recommendation feature store, session cache
  11. Elasticsearch: Video search index

6. Deep Dives (15 min)

Deep Dive 1: Video Transcoding Pipeline

The challenge: 500K videos/day, each needing 5 resolutions × 3 codecs = 15 renditions per video. That’s 7.5M transcoding jobs/day.

Architecture:

S3 Event → SQS Queue → Transcoding Workers (auto-scaling GPU fleet)

How transcoding works:

  1. Split: Divide the original video into 4-6 second segments (GOP-aligned splitting)
  2. Transcode in parallel: Each segment can be transcoded independently → massive parallelism
    • A 10-minute video = ~150 segments
    • Each segment × 15 renditions = 2,250 transcoding tasks
    • Distributed across worker fleet → 10-minute video fully transcoded in ~2 minutes
  3. Assemble: Generate manifest files (DASH/HLS) pointing to all segment files

Adaptive Bitrate Streaming (ABR):

  • Client downloads manifest listing all available quality levels
  • Client monitors download speed in real time
  • If bandwidth drops: next segment fetched at lower quality
  • If bandwidth improves: switch up
  • Seamless quality transitions because segments are small (4-6 seconds)

Codec strategy:

  • H.264: Universal compatibility (every device/browser)
  • VP9: 30-50% better compression than H.264 (Chrome, Android)
  • AV1: 30% better than VP9 but expensive to encode (only for popular videos)
  • Decision: Transcode to H.264 + VP9 immediately. AV1 only for videos that exceed 10K views (cost-effective).

Deep Dive 2: CDN Architecture for Video at Scale

600Tbps peak bandwidth cannot come from origin servers. CDN handles 95%+.

Multi-CDN strategy:

  • Use 3+ CDN providers (CloudFront, Akamai, Cloudflare)
  • DNS-based routing: direct users to the fastest CDN based on geo + real-time performance metrics
  • If one CDN has issues in a region, automatically shift traffic to another

Cache hierarchy:

  1. Edge PoPs (hundreds of locations): Cache popular content. Cache hit ratio: ~85%
  2. Regional caches (tens of locations): Catch misses from edge. Hit ratio: ~95% cumulative
  3. Origin shield (3-5 locations): Last cache layer before S3. Hit ratio: ~99%
  4. S3 origin: Only serves ~1% of requests

Popularity-based caching:

  • Newly uploaded videos: pre-warm in CDN for the uploader’s region
  • Viral videos (rapid view growth): proactively push to all edge PoPs
  • Long-tail videos (old, few views): only cached on demand, evicted quickly

Cost optimization:

  • Video delivery is the largest cost center ($0.02-0.08/GB depending on CDN)
  • At 600Tbps, that’s enormous
  • Peering: establish direct peering agreements with ISPs (bypass CDN for top ISPs)
  • Own CDN: for the largest platforms (YouTube, Netflix), building custom CDN infrastructure is cost-effective at this scale
  • Encoding efficiency: VP9/AV1 reduces bandwidth 30-50% → direct cost savings

Deep Dive 3: Recommendation System (High Level)

Multi-stage pipeline:

  1. Candidate generation (< 50ms):

    • Collaborative filtering: “Users who watched X also watched Y”
    • Content-based: similar titles, same channel, same topic
    • Social: what are people you follow watching?
    • Trending: globally or regionally popular videos
    • Result: ~5,000 candidate videos
  2. Ranking (< 50ms):

    • Deep neural network predicts: P(click), P(watch > 50%), P(like), P(subscribe after watching)
    • Features: user watch history, user demographics, video metadata, video engagement stats, time of day, device type
    • Combined score: weighted sum of predicted outcomes
    • Result: scored candidates sorted by predicted engagement
  3. Filtering and business rules (< 10ms):

    • Remove already watched (from user history in Redis)
    • Remove content policy violations
    • Apply diversity: no more than 2 videos from same channel in top 20
    • Apply freshness: boost recent uploads
  4. Serving:

    • Pre-compute recommendations for active users (batch pipeline, updated hourly)
    • Real-time re-ranking based on current session (what they just watched)
    • Cache in Redis: recommendations:{user_id}

7. Extensions (2 min)

  • Live streaming: Completely different infrastructure — RTMP ingest, real-time transcoding (no batch), low-latency HLS/DASH (~3-5 second delay), WebRTC for sub-second latency.
  • Comments: Separate service with its own data store. Threaded comments, moderation pipeline, spam detection.
  • Monetization: Ad insertion at server-side (SSAI) for pre-roll, mid-roll, post-roll. Ad auction + targeting pipeline.
  • Content moderation: ML pipeline for detecting policy violations in video and audio. Runs post-upload, can take down content retroactively.
  • Offline downloads: DRM-protected downloads (Widevine, FairPlay). Pre-download select quality, encrypt with per-device key.
  • Multi-language: Auto-generated subtitles (speech-to-text), auto-translation, subtitle rendering on client.