Design YouTube Surveys (In-Video Polls)

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Create polls — Creators/advertisers can define survey questions (multiple choice, single select, rating scale) and attach them to specific timestamps in a video
Render polls mid-roll — Display a non-intrusive overlay poll at the configured timestamp during video playback, without pausing or blocking the video
Collect votes — Record user responses in real time, enforce one vote per user per poll, and allow changing a vote before the poll closes
Real-time results — Show live vote counts / percentages to the user after they vote (instant feedback)
Analytics dashboard — Provide creators with detailed poll analytics: response rate, demographic breakdown, completion funnel, and A/B test results for different poll placements

Non-Functional Requirements

Availability: 99.95% — a poll failing to render is a missed data collection opportunity, but not as critical as video playback itself
Latency: Poll UI must render within 200ms of the trigger timestamp. Vote submission must ACK within 100ms (perceived instant).
Consistency: Votes must be counted exactly once. Read-after-write consistency for a user seeing their own vote. Aggregate counts can be eventually consistent (1-2 second delay is fine).
Scale: YouTube has 800M daily active viewers, 500M hours of video watched/day. If 5% of videos have polls and 20% of viewers interact → ~16B poll impressions/day → 185K poll renders/sec, 37K votes/sec at peak.
Durability: Every vote must be durably stored. Zero data loss on votes.

2. Estimation (3 min)

Traffic

Daily active viewers: 800M
Videos watched per viewer: ~8/day → 6.4B video views/day
Videos with polls: 5% → 320M poll-eligible views/day
Poll impression rate (viewer sees the poll): 60% → 192M poll impressions/day
Vote rate (viewer actually votes): 30% of impressions → 57.6M votes/day
Peak vote QPS: 57.6M / 86400 × 3 (peak multiplier) ≈ 2,000 votes/sec (average), 6,000 votes/sec (peak)
Peak poll render QPS: 192M / 86400 × 3 ≈ 6,700 renders/sec (peak)

Storage

Poll definitions: 10M active polls × 2 KB (question, options, targeting rules, schedule) = 20 GB — easily fits in a relational DB
Votes: 57.6M votes/day × 365 days × 3 years retention = 63B votes
- Each vote: poll_id (8B) + user_id (8B) + option_id (4B) + timestamp (8B) + metadata (32B) ≈ 60 bytes
- Total: 63B × 60B = 3.78 TB — manageable with partitioned storage
Aggregated counts: Per-poll, per-option counters. 10M polls × 5 options × 16B = 800 MB — trivially small, lives in Redis

Bandwidth

Poll render payload: ~5 KB (question text, options, styling, targeting metadata)
6,700 renders/sec × 5 KB = 33.5 MB/s — negligible compared to video streaming bandwidth

3. API Design (3 min)

Creator-Facing APIs

// Create a poll attached to a video
POST /api/v1/videos/{video_id}/polls
  Body: {
    "question": "What feature should we build next?",
    "type": "single_select",           // single_select | multi_select | rating
    "options": ["Dark mode", "Offline support", "AI search", "Better perf"],
    "trigger_time_sec": 145,           // show at 2:25 in the video
    "display_duration_sec": 15,        // auto-dismiss after 15s
    "targeting": {
      "geo": ["US", "CA", "GB"],
      "demographics": { "age_min": 18, "age_max": 45 },
      "sample_pct": 10                 // only show to 10% of viewers (A/B test)
    },
    "close_after_hours": 168           // stop accepting votes after 7 days
  }
  → 201 { "poll_id": "p_abc123", ... }

// Get poll analytics
GET /api/v1/polls/{poll_id}/analytics
  → 200 {
    "impressions": 145230,
    "votes": 43120,
    "response_rate": 0.297,
    "results": [
      { "option": "Dark mode", "votes": 18200, "pct": 42.2 },
      { "option": "AI search", "votes": 12500, "pct": 29.0 },
      ...
    ],
    "demographics": { ... },
    "ab_test": { "variant_a_response_rate": 0.31, "variant_b_response_rate": 0.26 }
  }

Viewer-Facing APIs

// Fetch polls for a video (called when video starts playing)
GET /api/v1/videos/{video_id}/polls?viewer_id={uid}
  → 200 {
    "polls": [
      {
        "poll_id": "p_abc123",
        "trigger_time_sec": 145,
        "question": "What feature should we build next?",
        "options": [...],
        "user_vote": null              // or option_id if already voted
      }
    ]
  }

// Submit a vote
POST /api/v1/polls/{poll_id}/vote
  Body: { "option_id": "opt_2", "viewer_id": "u_xyz" }
  → 200 { "results": { "opt_1": 42.2, "opt_2": 29.0, ... }, "total_votes": 43121 }

// Change vote (idempotent PUT)
PUT /api/v1/polls/{poll_id}/vote
  Body: { "option_id": "opt_3", "viewer_id": "u_xyz" }
  → 200 { "results": { ... } }

Key Decisions

Pre-fetch polls at video start: The player fetches all polls for the video when playback begins. This avoids a network request at the exact trigger timestamp (which could cause a visible delay).
Vote response includes live results: After voting, the user immediately sees percentages. This is the “social proof” hook that drives engagement.
Targeting evaluated client-side: The server sends all eligible polls plus targeting rules. The client-side SDK evaluates targeting (geo, demographics, A/B bucket) to avoid an extra server round-trip at trigger time.

4. Data Model (3 min)

Polls Table (PostgreSQL — strong consistency for definitions)

Column	Type	Notes
poll_id	UUID (PK)	Globally unique
video_id	VARCHAR(11)	YouTube video ID, indexed
creator_id	BIGINT	FK to creator accounts
question	TEXT	Poll question text
type	ENUM	single_select, multi_select, rating
options	JSONB	Array of {id, text, display_order}
trigger_time_sec	INT	Seconds into the video
display_duration_sec	INT	How long to show the overlay
targeting	JSONB	Geo, demographics, sample_pct, etc.
status	ENUM	draft, active, paused, closed
close_at	TIMESTAMP	When to stop accepting votes
created_at	TIMESTAMP

Votes Table (Cassandra — high write throughput, partitioned by poll_id)

Column	Type	Notes
poll_id	UUID (partition key)
viewer_id	BIGINT (clustering key)	Ensures uniqueness: one vote per user per poll
option_id	UUID	Which option they chose
voted_at	TIMESTAMP
metadata	MAP<TEXT,TEXT>	Device, geo, referrer, A/B variant

Why Cassandra for votes?

Writes dominate reads (every vote is a write; reads are aggregated separately)
Partition by poll_id: all votes for a poll are co-located → efficient aggregation
Built-in upsert semantics (INSERT with same PK = update) → natural dedup for “change vote”
Scales horizontally to handle 6K writes/sec with ease

Vote Aggregates (Redis — real-time counters)

Key	Type	Example
`poll:{poll_id}:counts`	Hash	`{ "opt_1": 18200, "opt_2": 12500, "opt_3": 8400, "opt_4": 4020 }`
`poll:{poll_id}:total`	Integer	43120
`poll:{poll_id}:voted:{viewer_id}`	String	“opt_2” (used for dedup check, TTL = poll close time)

Analytics Store (ClickHouse — OLAP for dashboard queries)

Materialized from the Kafka vote stream
Columns: poll_id, video_id, creator_id, option_id, viewer_id, voted_at, geo, age_bucket, device, ab_variant
Queries: response rate by demographic, time-series of votes, A/B test significance

5. High-Level Design (12 min)

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                     YouTube Video Player                          │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐  │
│  │ Video Stream  │  │ Poll SDK     │  │ Poll Overlay UI        │  │
│  │ (HLS/DASH)   │  │ (pre-fetched │  │ (renders at trigger    │  │
│  │              │  │  poll data)  │  │  timestamp, shows      │  │
│  │              │  │              │  │  results after vote)   │  │
│  └──────────────┘  └──────┬───────┘  └────────────────────────┘  │
│                           │                                       │
└───────────────────────────┼───────────────────────────────────────┘
                            │ HTTPS
                            ▼
                     ┌─────────────┐
                     │   API GW /   │
                     │   LB (L7)   │
                     └──────┬──────┘
                            │
              ┌─────────────┼──────────────┐
              ▼             ▼              ▼
       ┌────────────┐ ┌──────────┐  ┌────────────┐
       │ Poll Read  │ │ Vote     │  │ Creator    │
       │ Service    │ │ Service  │  │ Analytics  │
       │            │ │          │  │ Service    │
       └─────┬──────┘ └────┬─────┘  └─────┬──────┘
             │              │              │
        ┌────┘         ┌────┼────┐         │
        ▼              ▼    ▼    ▼         ▼
   ┌─────────┐   ┌───────┐ │ ┌──────┐ ┌──────────┐
   │PostgreSQL│   │ Redis │ │ │Kafka │ │ClickHouse│
   │ (polls)  │   │(counts│ │ │(vote │ │(analytics│
   │          │   │+dedup)│ │ │stream│ │  OLAP)   │
   └─────────┘   └───────┘ │ └──┬───┘ └──────────┘
                            ▼    │
                      ┌──────────┤
                      │Cassandra ││
                      │ (votes)  ││
                      └──────────┘│
                            ▲     │
                            │     ▼
                      ┌──────────────┐
                      │ Vote Consumer│
                      │ (aggregation │
                      │  + analytics │
                      │  pipeline)   │
                      └──────────────┘

Component Breakdown

1. Poll SDK (Client-Side)

Embedded in the YouTube player (web, iOS, Android)
On video load: fetches all polls for the video via Poll Read Service
Evaluates targeting rules locally (geo from IP, demographics from user profile, A/B bucket from hash(user_id + poll_id))
At trigger timestamp: renders non-intrusive overlay (bottom-third of screen, semi-transparent)
Handles vote submission, optimistic UI update (show results immediately), and retry on failure

2. Poll Read Service

Serves GET /videos/{video_id}/polls — fetches poll definitions from PostgreSQL (cached in Redis/CDN with 60s TTL)
Checks if the viewer has already voted (Redis lookup) and includes their previous vote in the response
Stateless, horizontally scalable

3. Vote Service

Handles vote submission: dedup check in Redis, write to Kafka, update Redis counters atomically
Flow: Check poll:{poll_id}:voted:{viewer_id} → if exists, it’s a vote change (decrement old option, increment new) → HINCRBY on counts hash → publish to Kafka → ACK to client
Returns updated aggregate counts in the response

4. Kafka Vote Stream

Durable log of all vote events
Consumed by: (a) Cassandra writer for persistent vote storage, (b) ClickHouse sink for analytics, (c) real-time aggregation for any downstream systems
Partitioned by poll_id for ordering guarantees within a poll

5. Creator Analytics Service

Serves the creator dashboard with poll performance data
Queries ClickHouse for complex analytics (response rate by geo, demographic breakdown, A/B test results)
Pre-computes hourly/daily rollups for fast dashboard loads

Request Flow: Viewer Votes on a Poll

Player SDK → API Gateway → Vote Service
Vote Service:
  1. Validate: poll exists, not closed, option_id valid
  2. Redis: GET poll:{p_abc}:voted:{u_xyz}
     → null (first vote) or "opt_1" (changing vote)
  3. Redis Pipeline (atomic):
     - SET poll:{p_abc}:voted:{u_xyz} = "opt_2" EX {ttl}
     - HINCRBY poll:{p_abc}:counts opt_2 1
     - (if changing) HINCRBY poll:{p_abc}:counts opt_1 -1
  4. Kafka: produce VoteEvent{poll_id, viewer_id, option_id, timestamp, metadata}
  5. Redis: HGETALL poll:{p_abc}:counts → {opt_1: 18200, opt_2: 12501, ...}
  6. Return 200 with live results to client

6. Deep Dives (15 min)

Deep Dive 1: Preventing Duplicate Votes & Vote Integrity

The Problem: A user could submit the same vote multiple times due to network retries, client bugs, or intentional abuse. We must ensure exactly-once semantics for votes.

Layer 1: Client-Side Dedup

After submitting a vote, the SDK stores poll_id → option_id in local storage
On subsequent page loads or video replays, the SDK checks local storage before rendering the poll
If already voted, it shows results instead of the voting UI
This prevents accidental double-votes but is easily bypassed (clear storage, different device)

Layer 2: Redis Dedup (Real-Time)

Key: poll:{poll_id}:voted:{viewer_id} with TTL matching poll close time
Before processing a vote, Vote Service checks this key
If the key exists with the same option → idempotent, return current results
If the key exists with a different option → treat as a vote change (atomic swap)
If the key doesn’t exist → new vote, proceed

Layer 3: Cassandra Upsert (Durable)

Cassandra partition key: (poll_id), clustering key: (viewer_id)
INSERT/UPDATE with same (poll_id, viewer_id) is an upsert — no duplicates at the storage level
This is the source of truth for vote integrity

Handling Redis Failure:

If Redis is down, fall back to Cassandra for dedup (slower but correct)
Read from Cassandra: SELECT option_id FROM votes WHERE poll_id = ? AND viewer_id = ?
This adds ~5ms latency but maintains correctness

Preventing Bot/Abuse Votes:

Require authenticated users only (no anonymous voting)
Rate limit: max 10 vote submissions per user per minute across all polls
Behavioral signals: if a user votes on 100 polls in 1 minute, flag as bot
For high-stakes polls (advertiser surveys), require CAPTCHA verification on suspicious accounts

Deep Dive 2: Real-Time Vote Aggregation & Response Rate Optimization

Real-Time Aggregation Architecture:

The challenge is providing live vote counts to millions of concurrent viewers while maintaining accuracy.

Vote arrives → Redis HINCRBY (atomic counter increment)
                    ↓
              Redis Hash: poll:{poll_id}:counts
              { "opt_1": 18200, "opt_2": 12501, "opt_3": 8400, "opt_4": 4020 }
                    ↓
              On each vote response, return HGETALL → client shows live %

Why Redis counters work at this scale:

HINCRBY is O(1), atomic, and ~0.1ms on a single Redis instance
A single Redis shard handles 100K+ HINCRBY/sec easily
Even the most viral poll won’t exceed 10K votes/sec (human click speed is the bottleneck)
Hot poll sharding: if a single poll exceeds Redis throughput, shard counters across N Redis instances and sum on read

Response Rate Optimization:

Response rate is the key metric. Higher response rate = more valuable data. Techniques:

Timing optimization: Don’t show the poll in the first 10 seconds (viewer hasn’t engaged yet) or last 10 seconds (about to leave). Sweet spot: 30-60% through the video.
Visual treatment:
- Semi-transparent overlay on the bottom third — doesn’t block content
- Subtle entrance animation (slide up) to catch attention without being jarring
- Auto-dismiss after 15 seconds if no interaction (don’t annoy the viewer)
- Show a small “1 question” teaser 3 seconds before the full poll appears
Social proof: Show “X people have voted” before the user votes. This creates a bandwagon effect.
Post-vote reward: After voting, show the results with a satisfying animation. This trains users that voting has an immediate payoff.
A/B testing framework:
- Each poll can define sample_pct and targeting variants
- Hash(user_id + poll_id) % 100 determines the A/B bucket (deterministic, consistent across devices)
- Test variables: trigger time, display duration, visual treatment, question wording
- Track response rate per variant, compute statistical significance (chi-squared test), auto-promote the winning variant

Engagement Metrics Pipeline:

Poll rendered (impression event) → Kafka
User interacts (vote event)     → Kafka
User dismisses (dismiss event)  → Kafka
Video continues past poll        → Kafka

All events → ClickHouse → Materialized views:
  - response_rate = votes / impressions
  - dismiss_rate = dismissals / impressions
  - completion_impact = avg(watch_time_with_poll) vs avg(watch_time_without_poll)

Deep Dive 3: Integration with Ads System & Advertiser Surveys

The Problem: YouTube’s ad system is a multi-billion dollar revenue engine. Advertiser surveys (Brand Lift studies) must integrate seamlessly without disrupting ad delivery, frequency capping, or revenue optimization.

Advertiser Survey Flow:

Advertiser creates a Brand Lift campaign: “Did you see an ad for Product X in the last 7 days?”
System identifies two groups: exposed (saw the ad) and control (didn’t see the ad)
Both groups see the same survey → difference in responses = brand lift

Integration Architecture:

Ad Server → "User U saw Ad A at time T"
                    ↓
            Exposure Log (BigQuery)
                    ↓
Survey Targeting Service:
  - For Brand Lift: select exposed + control users
  - For creator polls: use creator-defined targeting
  - For YouTube research: random sampling
                    ↓
Poll Read Service → includes survey in video polls

Key Integration Constraints:

Frequency capping: A user should see at most 1 survey per session (every 7 days). This is enforced by the Poll Read Service checking last_survey_shown:{viewer_id} in Redis.
Ad pod conflict: Don’t show a survey during an ad break. The player SDK coordinates with the ad SDK to avoid overlapping UI elements.
Revenue priority: If showing a survey would displace a paid ad, the ad wins. Surveys are lower priority in the ad auction.
Control group integrity: Control group users must never see the ad. The ad server and survey system share an exclusion list.
Statistical rigor: Brand Lift surveys require minimum sample sizes (typically 2,000 exposed + 2,000 control) for statistical significance. The system tracks sample sizes and stops collecting once significance is achieved (sequential testing).

7. Extensions (2 min)

Multi-language support: Auto-translate poll questions based on viewer locale using a translation service, with creator approval for machine translations before they go live. Store translations as a JSONB map in the polls table.
Polls in live streams: For live/premiere content, enable real-time polls that creators can trigger from their dashboard. Uses WebSocket push instead of pre-fetch. Results update in real time for all viewers simultaneously (a shared social experience).
Gamification & rewards: Award viewers points/badges for participating in polls. Track streaks (voted in 5 polls this week). Drives habitual engagement with surveys and increases long-term response rates.
Content-aware poll suggestions: Use ML to analyze the video content (transcript, visual segments) and suggest relevant poll questions to the creator. “Your video mentions 3 products — want to ask viewers which they prefer?”
Cross-video poll campaigns: Allow creators to run a poll campaign across multiple videos (same question, aggregated results). Useful for ongoing audience feedback like “What series should I start next?” with responses collected over a month of uploads.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Bandwidth#

3. API Design (3 min)#

Creator-Facing APIs#

Viewer-Facing APIs#

Key Decisions#

4. Data Model (3 min)#

Polls Table (PostgreSQL — strong consistency for definitions)#

Votes Table (Cassandra — high write throughput, partitioned by poll_id)#

Vote Aggregates (Redis — real-time counters)#

Analytics Store (ClickHouse — OLAP for dashboard queries)#

5. High-Level Design (12 min)#

Architecture Overview#

Component Breakdown#

Request Flow: Viewer Votes on a Poll#

6. Deep Dives (15 min)#

Deep Dive 1: Preventing Duplicate Votes & Vote Integrity#

Deep Dive 2: Real-Time Vote Aggregation & Response Rate Optimization#

Deep Dive 3: Integration with Ads System & Advertiser Surveys#

7. Extensions (2 min)#