1. Requirements & Scope (5 min)

Functional Requirements

  1. Create polls — Creators/advertisers can define survey questions (multiple choice, single select, rating scale) and attach them to specific timestamps in a video
  2. Render polls mid-roll — Display a non-intrusive overlay poll at the configured timestamp during video playback, without pausing or blocking the video
  3. Collect votes — Record user responses in real time, enforce one vote per user per poll, and allow changing a vote before the poll closes
  4. Real-time results — Show live vote counts / percentages to the user after they vote (instant feedback)
  5. Analytics dashboard — Provide creators with detailed poll analytics: response rate, demographic breakdown, completion funnel, and A/B test results for different poll placements

Non-Functional Requirements

  • Availability: 99.95% — a poll failing to render is a missed data collection opportunity, but not as critical as video playback itself
  • Latency: Poll UI must render within 200ms of the trigger timestamp. Vote submission must ACK within 100ms (perceived instant).
  • Consistency: Votes must be counted exactly once. Read-after-write consistency for a user seeing their own vote. Aggregate counts can be eventually consistent (1-2 second delay is fine).
  • Scale: YouTube has 800M daily active viewers, 500M hours of video watched/day. If 5% of videos have polls and 20% of viewers interact → ~16B poll impressions/day → 185K poll renders/sec, 37K votes/sec at peak.
  • Durability: Every vote must be durably stored. Zero data loss on votes.

2. Estimation (3 min)

Traffic

  • Daily active viewers: 800M
  • Videos watched per viewer: ~8/day → 6.4B video views/day
  • Videos with polls: 5% → 320M poll-eligible views/day
  • Poll impression rate (viewer sees the poll): 60% → 192M poll impressions/day
  • Vote rate (viewer actually votes): 30% of impressions → 57.6M votes/day
  • Peak vote QPS: 57.6M / 86400 × 3 (peak multiplier) ≈ 2,000 votes/sec (average), 6,000 votes/sec (peak)
  • Peak poll render QPS: 192M / 86400 × 3 ≈ 6,700 renders/sec (peak)

Storage

  • Poll definitions: 10M active polls × 2 KB (question, options, targeting rules, schedule) = 20 GB — easily fits in a relational DB
  • Votes: 57.6M votes/day × 365 days × 3 years retention = 63B votes
    • Each vote: poll_id (8B) + user_id (8B) + option_id (4B) + timestamp (8B) + metadata (32B) ≈ 60 bytes
    • Total: 63B × 60B = 3.78 TB — manageable with partitioned storage
  • Aggregated counts: Per-poll, per-option counters. 10M polls × 5 options × 16B = 800 MB — trivially small, lives in Redis

Bandwidth

  • Poll render payload: ~5 KB (question text, options, styling, targeting metadata)
  • 6,700 renders/sec × 5 KB = 33.5 MB/s — negligible compared to video streaming bandwidth

3. API Design (3 min)

Creator-Facing APIs

// Create a poll attached to a video
POST /api/v1/videos/{video_id}/polls
  Body: {
    "question": "What feature should we build next?",
    "type": "single_select",           // single_select | multi_select | rating
    "options": ["Dark mode", "Offline support", "AI search", "Better perf"],
    "trigger_time_sec": 145,           // show at 2:25 in the video
    "display_duration_sec": 15,        // auto-dismiss after 15s
    "targeting": {
      "geo": ["US", "CA", "GB"],
      "demographics": { "age_min": 18, "age_max": 45 },
      "sample_pct": 10                 // only show to 10% of viewers (A/B test)
    },
    "close_after_hours": 168           // stop accepting votes after 7 days
  }
  → 201 { "poll_id": "p_abc123", ... }

// Get poll analytics
GET /api/v1/polls/{poll_id}/analytics
  → 200 {
    "impressions": 145230,
    "votes": 43120,
    "response_rate": 0.297,
    "results": [
      { "option": "Dark mode", "votes": 18200, "pct": 42.2 },
      { "option": "AI search", "votes": 12500, "pct": 29.0 },
      ...
    ],
    "demographics": { ... },
    "ab_test": { "variant_a_response_rate": 0.31, "variant_b_response_rate": 0.26 }
  }

Viewer-Facing APIs

// Fetch polls for a video (called when video starts playing)
GET /api/v1/videos/{video_id}/polls?viewer_id={uid}
  → 200 {
    "polls": [
      {
        "poll_id": "p_abc123",
        "trigger_time_sec": 145,
        "question": "What feature should we build next?",
        "options": [...],
        "user_vote": null              // or option_id if already voted
      }
    ]
  }

// Submit a vote
POST /api/v1/polls/{poll_id}/vote
  Body: { "option_id": "opt_2", "viewer_id": "u_xyz" }
  → 200 { "results": { "opt_1": 42.2, "opt_2": 29.0, ... }, "total_votes": 43121 }

// Change vote (idempotent PUT)
PUT /api/v1/polls/{poll_id}/vote
  Body: { "option_id": "opt_3", "viewer_id": "u_xyz" }
  → 200 { "results": { ... } }

Key Decisions

  • Pre-fetch polls at video start: The player fetches all polls for the video when playback begins. This avoids a network request at the exact trigger timestamp (which could cause a visible delay).
  • Vote response includes live results: After voting, the user immediately sees percentages. This is the “social proof” hook that drives engagement.
  • Targeting evaluated client-side: The server sends all eligible polls plus targeting rules. The client-side SDK evaluates targeting (geo, demographics, A/B bucket) to avoid an extra server round-trip at trigger time.

4. Data Model (3 min)

Polls Table (PostgreSQL — strong consistency for definitions)

Column Type Notes
poll_id UUID (PK) Globally unique
video_id VARCHAR(11) YouTube video ID, indexed
creator_id BIGINT FK to creator accounts
question TEXT Poll question text
type ENUM single_select, multi_select, rating
options JSONB Array of {id, text, display_order}
trigger_time_sec INT Seconds into the video
display_duration_sec INT How long to show the overlay
targeting JSONB Geo, demographics, sample_pct, etc.
status ENUM draft, active, paused, closed
close_at TIMESTAMP When to stop accepting votes
created_at TIMESTAMP

Votes Table (Cassandra — high write throughput, partitioned by poll_id)

Column Type Notes
poll_id UUID (partition key)
viewer_id BIGINT (clustering key) Ensures uniqueness: one vote per user per poll
option_id UUID Which option they chose
voted_at TIMESTAMP
metadata MAP<TEXT,TEXT> Device, geo, referrer, A/B variant

Why Cassandra for votes?

  • Writes dominate reads (every vote is a write; reads are aggregated separately)
  • Partition by poll_id: all votes for a poll are co-located → efficient aggregation
  • Built-in upsert semantics (INSERT with same PK = update) → natural dedup for “change vote”
  • Scales horizontally to handle 6K writes/sec with ease

Vote Aggregates (Redis — real-time counters)

Key Type Example
poll:{poll_id}:counts Hash { "opt_1": 18200, "opt_2": 12500, "opt_3": 8400, "opt_4": 4020 }
poll:{poll_id}:total Integer 43120
poll:{poll_id}:voted:{viewer_id} String “opt_2” (used for dedup check, TTL = poll close time)

Analytics Store (ClickHouse — OLAP for dashboard queries)

  • Materialized from the Kafka vote stream
  • Columns: poll_id, video_id, creator_id, option_id, viewer_id, voted_at, geo, age_bucket, device, ab_variant
  • Queries: response rate by demographic, time-series of votes, A/B test significance

5. High-Level Design (12 min)

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                     YouTube Video Player                          │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐  │
│  │ Video Stream  │  │ Poll SDK     │  │ Poll Overlay UI        │  │
│  │ (HLS/DASH)   │  │ (pre-fetched │  │ (renders at trigger    │  │
│  │              │  │  poll data)  │  │  timestamp, shows      │  │
│  │              │  │              │  │  results after vote)   │  │
│  └──────────────┘  └──────┬───────┘  └────────────────────────┘  │
│                           │                                       │
└───────────────────────────┼───────────────────────────────────────┘
                            │ HTTPS
                            ▼
                     ┌─────────────┐
                     │   API GW /   │
                     │   LB (L7)   │
                     └──────┬──────┘
                            │
              ┌─────────────┼──────────────┐
              ▼             ▼              ▼
       ┌────────────┐ ┌──────────┐  ┌────────────┐
       │ Poll Read  │ │ Vote     │  │ Creator    │
       │ Service    │ │ Service  │  │ Analytics  │
       │            │ │          │  │ Service    │
       └─────┬──────┘ └────┬─────┘  └─────┬──────┘
             │              │              │
        ┌────┘         ┌────┼────┐         │
        ▼              ▼    ▼    ▼         ▼
   ┌─────────┐   ┌───────┐ │ ┌──────┐ ┌──────────┐
   │PostgreSQL│   │ Redis │ │ │Kafka │ │ClickHouse│
   │ (polls)  │   │(counts│ │ │(vote │ │(analytics│
   │          │   │+dedup)│ │ │stream│ │  OLAP)   │
   └─────────┘   └───────┘ │ └──┬───┘ └──────────┘
                            ▼    │
                      ┌──────────┤
                      │Cassandra ││
                      │ (votes)  ││
                      └──────────┘│
                            ▲     │
                            │     ▼
                      ┌──────────────┐
                      │ Vote Consumer│
                      │ (aggregation │
                      │  + analytics │
                      │  pipeline)   │
                      └──────────────┘

Component Breakdown

1. Poll SDK (Client-Side)

  • Embedded in the YouTube player (web, iOS, Android)
  • On video load: fetches all polls for the video via Poll Read Service
  • Evaluates targeting rules locally (geo from IP, demographics from user profile, A/B bucket from hash(user_id + poll_id))
  • At trigger timestamp: renders non-intrusive overlay (bottom-third of screen, semi-transparent)
  • Handles vote submission, optimistic UI update (show results immediately), and retry on failure

2. Poll Read Service

  • Serves GET /videos/{video_id}/polls — fetches poll definitions from PostgreSQL (cached in Redis/CDN with 60s TTL)
  • Checks if the viewer has already voted (Redis lookup) and includes their previous vote in the response
  • Stateless, horizontally scalable

3. Vote Service

  • Handles vote submission: dedup check in Redis, write to Kafka, update Redis counters atomically
  • Flow: Check poll:{poll_id}:voted:{viewer_id} → if exists, it’s a vote change (decrement old option, increment new) → HINCRBY on counts hash → publish to Kafka → ACK to client
  • Returns updated aggregate counts in the response

4. Kafka Vote Stream

  • Durable log of all vote events
  • Consumed by: (a) Cassandra writer for persistent vote storage, (b) ClickHouse sink for analytics, (c) real-time aggregation for any downstream systems
  • Partitioned by poll_id for ordering guarantees within a poll

5. Creator Analytics Service

  • Serves the creator dashboard with poll performance data
  • Queries ClickHouse for complex analytics (response rate by geo, demographic breakdown, A/B test results)
  • Pre-computes hourly/daily rollups for fast dashboard loads

Request Flow: Viewer Votes on a Poll

Player SDK → API Gateway → Vote Service
Vote Service:
  1. Validate: poll exists, not closed, option_id valid
  2. Redis: GET poll:{p_abc}:voted:{u_xyz}
     → null (first vote) or "opt_1" (changing vote)
  3. Redis Pipeline (atomic):
     - SET poll:{p_abc}:voted:{u_xyz} = "opt_2" EX {ttl}
     - HINCRBY poll:{p_abc}:counts opt_2 1
     - (if changing) HINCRBY poll:{p_abc}:counts opt_1 -1
  4. Kafka: produce VoteEvent{poll_id, viewer_id, option_id, timestamp, metadata}
  5. Redis: HGETALL poll:{p_abc}:counts → {opt_1: 18200, opt_2: 12501, ...}
  6. Return 200 with live results to client

6. Deep Dives (15 min)

Deep Dive 1: Preventing Duplicate Votes & Vote Integrity

The Problem: A user could submit the same vote multiple times due to network retries, client bugs, or intentional abuse. We must ensure exactly-once semantics for votes.

Layer 1: Client-Side Dedup

  • After submitting a vote, the SDK stores poll_id → option_id in local storage
  • On subsequent page loads or video replays, the SDK checks local storage before rendering the poll
  • If already voted, it shows results instead of the voting UI
  • This prevents accidental double-votes but is easily bypassed (clear storage, different device)

Layer 2: Redis Dedup (Real-Time)

  • Key: poll:{poll_id}:voted:{viewer_id} with TTL matching poll close time
  • Before processing a vote, Vote Service checks this key
  • If the key exists with the same option → idempotent, return current results
  • If the key exists with a different option → treat as a vote change (atomic swap)
  • If the key doesn’t exist → new vote, proceed

Layer 3: Cassandra Upsert (Durable)

  • Cassandra partition key: (poll_id), clustering key: (viewer_id)
  • INSERT/UPDATE with same (poll_id, viewer_id) is an upsert — no duplicates at the storage level
  • This is the source of truth for vote integrity

Handling Redis Failure:

  • If Redis is down, fall back to Cassandra for dedup (slower but correct)
  • Read from Cassandra: SELECT option_id FROM votes WHERE poll_id = ? AND viewer_id = ?
  • This adds ~5ms latency but maintains correctness

Preventing Bot/Abuse Votes:

  • Require authenticated users only (no anonymous voting)
  • Rate limit: max 10 vote submissions per user per minute across all polls
  • Behavioral signals: if a user votes on 100 polls in 1 minute, flag as bot
  • For high-stakes polls (advertiser surveys), require CAPTCHA verification on suspicious accounts

Deep Dive 2: Real-Time Vote Aggregation & Response Rate Optimization

Real-Time Aggregation Architecture:

The challenge is providing live vote counts to millions of concurrent viewers while maintaining accuracy.

Vote arrives → Redis HINCRBY (atomic counter increment)
                    ↓
              Redis Hash: poll:{poll_id}:counts
              { "opt_1": 18200, "opt_2": 12501, "opt_3": 8400, "opt_4": 4020 }
                    ↓
              On each vote response, return HGETALL → client shows live %

Why Redis counters work at this scale:

  • HINCRBY is O(1), atomic, and ~0.1ms on a single Redis instance
  • A single Redis shard handles 100K+ HINCRBY/sec easily
  • Even the most viral poll won’t exceed 10K votes/sec (human click speed is the bottleneck)
  • Hot poll sharding: if a single poll exceeds Redis throughput, shard counters across N Redis instances and sum on read

Response Rate Optimization:

Response rate is the key metric. Higher response rate = more valuable data. Techniques:

  1. Timing optimization: Don’t show the poll in the first 10 seconds (viewer hasn’t engaged yet) or last 10 seconds (about to leave). Sweet spot: 30-60% through the video.

  2. Visual treatment:

    • Semi-transparent overlay on the bottom third — doesn’t block content
    • Subtle entrance animation (slide up) to catch attention without being jarring
    • Auto-dismiss after 15 seconds if no interaction (don’t annoy the viewer)
    • Show a small “1 question” teaser 3 seconds before the full poll appears
  3. Social proof: Show “X people have voted” before the user votes. This creates a bandwagon effect.

  4. Post-vote reward: After voting, show the results with a satisfying animation. This trains users that voting has an immediate payoff.

  5. A/B testing framework:

    • Each poll can define sample_pct and targeting variants
    • Hash(user_id + poll_id) % 100 determines the A/B bucket (deterministic, consistent across devices)
    • Test variables: trigger time, display duration, visual treatment, question wording
    • Track response rate per variant, compute statistical significance (chi-squared test), auto-promote the winning variant

Engagement Metrics Pipeline:

Poll rendered (impression event) → Kafka
User interacts (vote event)     → Kafka
User dismisses (dismiss event)  → Kafka
Video continues past poll        → Kafka

All events → ClickHouse → Materialized views:
  - response_rate = votes / impressions
  - dismiss_rate = dismissals / impressions
  - completion_impact = avg(watch_time_with_poll) vs avg(watch_time_without_poll)

Deep Dive 3: Integration with Ads System & Advertiser Surveys

The Problem: YouTube’s ad system is a multi-billion dollar revenue engine. Advertiser surveys (Brand Lift studies) must integrate seamlessly without disrupting ad delivery, frequency capping, or revenue optimization.

Advertiser Survey Flow:

  1. Advertiser creates a Brand Lift campaign: “Did you see an ad for Product X in the last 7 days?”
  2. System identifies two groups: exposed (saw the ad) and control (didn’t see the ad)
  3. Both groups see the same survey → difference in responses = brand lift

Integration Architecture:

Ad Server → "User U saw Ad A at time T"
                    ↓
            Exposure Log (BigQuery)
                    ↓
Survey Targeting Service:
  - For Brand Lift: select exposed + control users
  - For creator polls: use creator-defined targeting
  - For YouTube research: random sampling
                    ↓
Poll Read Service → includes survey in video polls

Key Integration Constraints:

  • Frequency capping: A user should see at most 1 survey per session (every 7 days). This is enforced by the Poll Read Service checking last_survey_shown:{viewer_id} in Redis.
  • Ad pod conflict: Don’t show a survey during an ad break. The player SDK coordinates with the ad SDK to avoid overlapping UI elements.
  • Revenue priority: If showing a survey would displace a paid ad, the ad wins. Surveys are lower priority in the ad auction.
  • Control group integrity: Control group users must never see the ad. The ad server and survey system share an exclusion list.
  • Statistical rigor: Brand Lift surveys require minimum sample sizes (typically 2,000 exposed + 2,000 control) for statistical significance. The system tracks sample sizes and stops collecting once significance is achieved (sequential testing).

7. Extensions (2 min)

  • Multi-language support: Auto-translate poll questions based on viewer locale using a translation service, with creator approval for machine translations before they go live. Store translations as a JSONB map in the polls table.
  • Polls in live streams: For live/premiere content, enable real-time polls that creators can trigger from their dashboard. Uses WebSocket push instead of pre-fetch. Results update in real time for all viewers simultaneously (a shared social experience).
  • Gamification & rewards: Award viewers points/badges for participating in polls. Track streaks (voted in 5 polls this week). Drives habitual engagement with surveys and increases long-term response rates.
  • Content-aware poll suggestions: Use ML to analyze the video content (transcript, visual segments) and suggest relevant poll questions to the creator. “Your video mentions 3 products — want to ask viewers which they prefer?”
  • Cross-video poll campaigns: Allow creators to run a poll campaign across multiple videos (same question, aggregated results). Useful for ongoing audience feedback like “What series should I start next?” with responses collected over a month of uploads.