1. Requirements & Scope (5 min)

Functional Requirements

  1. Given a long URL, generate a short, unique URL
  2. Given a short URL, redirect to the original long URL
  3. Users can optionally set a custom alias
  4. Links expire after a configurable TTL (default: 5 years)
  5. Analytics: track click count per short URL

Non-Functional Requirements

  • Availability: 99.99% — redirects must always work; this is on the critical path of every click
  • Latency: Redirect in < 10ms at p99 (just a lookup + 301)
  • Consistency: Eventual consistency is fine for analytics. Strong consistency for URL creation (no duplicate short codes)
  • Scale: 100M new URLs/day, 10:1 read-to-write ratio → 1B redirects/day
  • Durability: URLs must not be lost — a broken short link is permanent reputation damage

2. Estimation (3 min)

Write (URL creation)

  • 100M URLs/day ÷ 100K sec/day = ~1,000 writes/sec
  • Peak: 5x → 5,000 writes/sec

Read (redirects)

  • 1B redirects/day ÷ 100K = ~10,000 reads/sec
  • Peak: 50,000 reads/sec

Storage

  • Each record: short code (7 bytes) + long URL (avg 200 bytes) + metadata (50 bytes) ≈ 250 bytes
  • 100M/day × 365 × 5 years = 182.5B records
  • 182.5B × 250 bytes = ~45 TB over 5 years

Short code space

  • Base62 (a-z, A-Z, 0-9), 7 characters = 62^7 = 3.5 trillion unique codes — more than enough

3. API Design (3 min)

POST /api/v1/shorten
  Headers: Authorization: Bearer <api_key>
  Body: {
    "long_url": "https://example.com/very/long/path",
    "custom_alias": "my-link",     // optional
    "ttl_days": 365                // optional, default 1825
  }
  Response 201: {
    "short_url": "https://tiny.url/aB3kX9p",
    "short_code": "aB3kX9p",
    "expires_at": "2031-02-22T00:00:00Z"
  }

GET /{short_code}
  Response 301: Location: https://example.com/very/long/path
  Response 404: { "error": "URL not found or expired" }

GET /api/v1/stats/{short_code}
  Headers: Authorization: Bearer <api_key>
  Response 200: {
    "short_code": "aB3kX9p",
    "long_url": "https://example.com/...",
    "total_clicks": 142857,
    "created_at": "2026-02-22T00:00:00Z",
    "expires_at": "2031-02-22T00:00:00Z"
  }

Key decisions:

  • 301 (permanent redirect) for SEO and browser caching. Use 302 if we need to track every single click (301 lets browser skip us on subsequent visits)
  • API key required for creation (prevents abuse), not for redirects
  • Rate limiting: 100 creations/min per API key

4. Data Model (3 min)

Primary Store: Key-Value (DynamoDB or Cassandra)

Table: urls
  short_code  (PK)    | string, 7 chars
  long_url             | string, up to 2048 chars
  user_id              | string (who created it)
  created_at           | timestamp
  expires_at           | timestamp
  click_count          | bigint (eventually consistent)

Why NoSQL? Access pattern is purely key-value: given short_code, return long_url. No joins, no complex queries. DynamoDB gives single-digit ms reads at any scale with partition key lookups.

Custom Alias Handling

Custom aliases go into the same table. Before inserting, do a conditional write (PutItem with condition attribute_not_exists(short_code)) — atomic, no race conditions.

Analytics (separate store)

Table: click_events (append-only, Kafka → ClickHouse/Druid)
  short_code  | string
  timestamp   | timestamp
  country     | string (from IP)
  referrer    | string
  user_agent  | string

5. High-Level Design (12 min)

Write Path (URL Creation)

Client → API Gateway (rate limit + auth)
  → URL Service
    → Generate short code (Snowflake ID → Base62 encode)
    → Check for collision (conditional write to DynamoDB)
    → Write to DynamoDB
    → Invalidate cache (if overwriting expired code)
  → Return short URL

Read Path (Redirect)

Client → CDN/Edge (check cache)
  Cache hit → 301 redirect (done)
  Cache miss → Load Balancer → Redirect Service
    → Check Redis cache
      Hit → 301 redirect, async log click
      Miss → Query DynamoDB
        Found → Populate Redis, 301 redirect, async log click
        Not found / expired → 404

Click Analytics Path

Redirect Service → Kafka (async, fire-and-forget)
  → Click Analytics Consumer → ClickHouse
  → Periodic aggregation → Update click_count in DynamoDB

Components

  1. API Gateway: Rate limiting, API key auth, request validation
  2. URL Service: Handles creation, generates short codes
  3. Redirect Service: Handles lookups, optimized for latency
  4. Redis Cluster: Cache layer for hot URLs (LRU eviction)
  5. DynamoDB: Primary store (or Cassandra for self-hosted)
  6. Kafka: Async click event stream
  7. ClickHouse: Click analytics (time-series optimized)
  8. CDN: Cache redirects at edge for extremely hot URLs

6. Deep Dives (15 min)

Deep Dive 1: Short Code Generation (Avoiding Collisions)

Approach: Pre-generated ID + Base62 encoding

Use a Snowflake-like ID generator:

  • 41 bits: timestamp (69 years)
  • 10 bits: machine ID (1024 machines)
  • 12 bits: sequence number (4096 per ms per machine)

Convert the 63-bit ID to Base62 → 7 characters.

Why not hash the URL? MD5/SHA256 of the URL then truncate to 7 chars has ~50% collision probability at 62^3.5 ≈ 900M entries (birthday paradox). At 100M URLs/day, we’d hit collision issues within 10 days. We’d need collision detection + retry, which adds latency and complexity.

Why not a counter? A single auto-increment counter is a SPOF and bottleneck. Distributed counters (like ZooKeeper-backed ranges) work but add operational complexity. Snowflake IDs give us uniqueness without coordination.

Custom alias handling: User-chosen aliases bypass the ID generator. We do a conditional write to DynamoDB — if the alias already exists, reject with 409 Conflict.

Deep Dive 2: Caching Strategy

Multi-tier caching:

  1. CDN/Edge cache (Cloudflare, CloudFront): Cache 301 redirects with Cache-Control: max-age=86400. For extremely popular links (>1M clicks/day), the CDN absorbs >99% of traffic. Trade-off: we lose per-click analytics accuracy (browser + CDN cache means we don’t see every click).

  2. Redis cluster: LRU cache with ~100GB capacity. At 250 bytes per entry, that’s ~400M URLs cached — covers all URLs accessed in the last ~week.

  3. DynamoDB DAX (optional): In-memory cache integrated with DynamoDB for microsecond reads on cache miss from Redis.

Cache invalidation:

  • On expiry: TTL-based eviction in Redis (set Redis TTL = URL expires_at - now)
  • On delete: Explicit invalidation (publish to Kafka → all cache nodes evict)
  • Hot key protection: Popular URLs get replicated across multiple Redis nodes to avoid hotspot

Analytics trade-off: If we use 301 redirects + CDN caching, we lose granular click tracking because the browser/CDN won’t hit our servers on subsequent clicks. Options:

  • Use 302 (temporary redirect) for analytics-critical links — forces browser to always go through us
  • Accept approximate analytics (good enough for most use cases)
  • Hybrid: 301 by default, 302 for premium users who need exact analytics

Deep Dive 3: Handling Expiry at Scale

With 182.5B records over 5 years, we can’t scan the entire table to find expired URLs.

DynamoDB TTL: Built-in feature. Set a ttl attribute (Unix timestamp). DynamoDB automatically deletes expired items within 48 hours of expiry. Deleted items are published to DynamoDB Streams → we consume this to also evict from Redis.

For Cassandra: Use TTL on insert: INSERT INTO urls (...) VALUES (...) USING TTL 157680000 (5 years in seconds). Cassandra handles tombstone cleanup during compaction.

Redirect behavior: Even before the background cleanup runs, the redirect service checks expires_at on every lookup. Expired → 404, regardless of whether the row has been physically deleted yet.


7. Extensions (2 min)

  • Abuse prevention: Blacklist malicious URLs using Google Safe Browsing API check on creation. Rate limit by IP (not just API key) to prevent credential stuffing of shortened URLs as phishing vectors.
  • Global deployment: Multi-region DynamoDB Global Tables for < 50ms writes worldwide. Redis clusters per region. Geo-route users to nearest edge.
  • Custom domains: Allow users to use their own domain (brand.co/link) — just a CNAME + configuration in our system.
  • A/B testing: Allow multiple destination URLs per short code with traffic splitting (50/50, 90/10, etc.).
  • Monitoring: Track redirect latency p50/p95/p99, cache hit ratio, short code generation rate, DynamoDB consumed capacity, Kafka lag.