Design a URL Shortening Service (TinyURL)

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Given a long URL, generate a short, unique URL
Given a short URL, redirect to the original long URL
Users can optionally set a custom alias
Links expire after a configurable TTL (default: 5 years)
Analytics: track click count per short URL

Non-Functional Requirements

Availability: 99.99% — redirects must always work; this is on the critical path of every click
Latency: Redirect in < 10ms at p99 (just a lookup + 301)
Consistency: Eventual consistency is fine for analytics. Strong consistency for URL creation (no duplicate short codes)
Scale: 100M new URLs/day, 10:1 read-to-write ratio → 1B redirects/day
Durability: URLs must not be lost — a broken short link is permanent reputation damage

2. Estimation (3 min)

Write (URL creation)

100M URLs/day ÷ 100K sec/day = ~1,000 writes/sec
Peak: 5x → 5,000 writes/sec

Read (redirects)

1B redirects/day ÷ 100K = ~10,000 reads/sec
Peak: 50,000 reads/sec

Storage

Each record: short code (7 bytes) + long URL (avg 200 bytes) + metadata (50 bytes) ≈ 250 bytes
100M/day × 365 × 5 years = 182.5B records
182.5B × 250 bytes = ~45 TB over 5 years

Short code space

Base62 (a-z, A-Z, 0-9), 7 characters = 62^7 = 3.5 trillion unique codes — more than enough

3. API Design (3 min)

POST /api/v1/shorten
  Headers: Authorization: Bearer <api_key>
  Body: {
    "long_url": "https://example.com/very/long/path",
    "custom_alias": "my-link",     // optional
    "ttl_days": 365                // optional, default 1825
  }
  Response 201: {
    "short_url": "https://tiny.url/aB3kX9p",
    "short_code": "aB3kX9p",
    "expires_at": "2031-02-22T00:00:00Z"
  }

GET /{short_code}
  Response 301: Location: https://example.com/very/long/path
  Response 404: { "error": "URL not found or expired" }

GET /api/v1/stats/{short_code}
  Headers: Authorization: Bearer <api_key>
  Response 200: {
    "short_code": "aB3kX9p",
    "long_url": "https://example.com/...",
    "total_clicks": 142857,
    "created_at": "2026-02-22T00:00:00Z",
    "expires_at": "2031-02-22T00:00:00Z"
  }

Key decisions:

301 (permanent redirect) for SEO and browser caching. Use 302 if we need to track every single click (301 lets browser skip us on subsequent visits)
API key required for creation (prevents abuse), not for redirects
Rate limiting: 100 creations/min per API key

4. Data Model (3 min)

Primary Store: Key-Value (DynamoDB or Cassandra)

Table: urls
  short_code  (PK)    | string, 7 chars
  long_url             | string, up to 2048 chars
  user_id              | string (who created it)
  created_at           | timestamp
  expires_at           | timestamp
  click_count          | bigint (eventually consistent)

Why NoSQL? Access pattern is purely key-value: given short_code, return long_url. No joins, no complex queries. DynamoDB gives single-digit ms reads at any scale with partition key lookups.

Custom Alias Handling

Custom aliases go into the same table. Before inserting, do a conditional write (PutItem with condition attribute_not_exists(short_code)) — atomic, no race conditions.

Analytics (separate store)

Table: click_events (append-only, Kafka → ClickHouse/Druid)
  short_code  | string
  timestamp   | timestamp
  country     | string (from IP)
  referrer    | string
  user_agent  | string

5. High-Level Design (12 min)

Write Path (URL Creation)

Client → API Gateway (rate limit + auth)
  → URL Service
    → Generate short code (Snowflake ID → Base62 encode)
    → Check for collision (conditional write to DynamoDB)
    → Write to DynamoDB
    → Invalidate cache (if overwriting expired code)
  → Return short URL

Read Path (Redirect)

Client → CDN/Edge (check cache)
  Cache hit → 301 redirect (done)
  Cache miss → Load Balancer → Redirect Service
    → Check Redis cache
      Hit → 301 redirect, async log click
      Miss → Query DynamoDB
        Found → Populate Redis, 301 redirect, async log click
        Not found / expired → 404

Click Analytics Path

Redirect Service → Kafka (async, fire-and-forget)
  → Click Analytics Consumer → ClickHouse
  → Periodic aggregation → Update click_count in DynamoDB

Components

API Gateway: Rate limiting, API key auth, request validation
URL Service: Handles creation, generates short codes
Redirect Service: Handles lookups, optimized for latency
Redis Cluster: Cache layer for hot URLs (LRU eviction)
DynamoDB: Primary store (or Cassandra for self-hosted)
Kafka: Async click event stream
ClickHouse: Click analytics (time-series optimized)
CDN: Cache redirects at edge for extremely hot URLs

6. Deep Dives (15 min)

Deep Dive 1: Short Code Generation (Avoiding Collisions)

Approach: Pre-generated ID + Base62 encoding

Use a Snowflake-like ID generator:

41 bits: timestamp (69 years)
10 bits: machine ID (1024 machines)
12 bits: sequence number (4096 per ms per machine)

Convert the 63-bit ID to Base62 → 7 characters.

Why not hash the URL? MD5/SHA256 of the URL then truncate to 7 chars has ~50% collision probability at 62^3.5 ≈ 900M entries (birthday paradox). At 100M URLs/day, we’d hit collision issues within 10 days. We’d need collision detection + retry, which adds latency and complexity.

Why not a counter? A single auto-increment counter is a SPOF and bottleneck. Distributed counters (like ZooKeeper-backed ranges) work but add operational complexity. Snowflake IDs give us uniqueness without coordination.

Custom alias handling: User-chosen aliases bypass the ID generator. We do a conditional write to DynamoDB — if the alias already exists, reject with 409 Conflict.

Deep Dive 2: Caching Strategy

Multi-tier caching:

CDN/Edge cache (Cloudflare, CloudFront): Cache 301 redirects with Cache-Control: max-age=86400. For extremely popular links (>1M clicks/day), the CDN absorbs >99% of traffic. Trade-off: we lose per-click analytics accuracy (browser + CDN cache means we don’t see every click).
Redis cluster: LRU cache with ~100GB capacity. At 250 bytes per entry, that’s ~400M URLs cached — covers all URLs accessed in the last ~week.
DynamoDB DAX (optional): In-memory cache integrated with DynamoDB for microsecond reads on cache miss from Redis.

Cache invalidation:

On expiry: TTL-based eviction in Redis (set Redis TTL = URL expires_at - now)
On delete: Explicit invalidation (publish to Kafka → all cache nodes evict)
Hot key protection: Popular URLs get replicated across multiple Redis nodes to avoid hotspot

Analytics trade-off: If we use 301 redirects + CDN caching, we lose granular click tracking because the browser/CDN won’t hit our servers on subsequent clicks. Options:

Use 302 (temporary redirect) for analytics-critical links — forces browser to always go through us
Accept approximate analytics (good enough for most use cases)
Hybrid: 301 by default, 302 for premium users who need exact analytics

Deep Dive 3: Handling Expiry at Scale

With 182.5B records over 5 years, we can’t scan the entire table to find expired URLs.

DynamoDB TTL: Built-in feature. Set a ttl attribute (Unix timestamp). DynamoDB automatically deletes expired items within 48 hours of expiry. Deleted items are published to DynamoDB Streams → we consume this to also evict from Redis.

For Cassandra: Use TTL on insert: INSERT INTO urls (...) VALUES (...) USING TTL 157680000 (5 years in seconds). Cassandra handles tombstone cleanup during compaction.

Redirect behavior: Even before the background cleanup runs, the redirect service checks expires_at on every lookup. Expired → 404, regardless of whether the row has been physically deleted yet.

7. Extensions (2 min)

Abuse prevention: Blacklist malicious URLs using Google Safe Browsing API check on creation. Rate limit by IP (not just API key) to prevent credential stuffing of shortened URLs as phishing vectors.
Global deployment: Multi-region DynamoDB Global Tables for < 50ms writes worldwide. Redis clusters per region. Geo-route users to nearest edge.
Custom domains: Allow users to use their own domain (brand.co/link) — just a CNAME + configuration in our system.
A/B testing: Allow multiple destination URLs per short code with traffic splitting (50/50, 90/10, etc.).
Monitoring: Track redirect latency p50/p95/p99, cache hit ratio, short code generation rate, DynamoDB consumed capacity, Kafka lag.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Write (URL creation)#

Read (redirects)#

Storage#

Short code space#

3. API Design (3 min)#

4. Data Model (3 min)#

Primary Store: Key-Value (DynamoDB or Cassandra)#

Custom Alias Handling#

Analytics (separate store)#

5. High-Level Design (12 min)#

Write Path (URL Creation)#

Read Path (Redirect)#

Click Analytics Path#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Short Code Generation (Avoiding Collisions)#

Deep Dive 2: Caching Strategy#

Deep Dive 3: Handling Expiry at Scale#

7. Extensions (2 min)#