1. Requirements & Scope (5 min)
Functional Requirements
- Given a long URL, generate a short, unique URL
- Given a short URL, redirect to the original long URL
- Users can optionally set a custom alias
- Links expire after a configurable TTL (default: 5 years)
- Analytics: track click count per short URL
Non-Functional Requirements
- Availability: 99.99% — redirects must always work; this is on the critical path of every click
- Latency: Redirect in < 10ms at p99 (just a lookup + 301)
- Consistency: Eventual consistency is fine for analytics. Strong consistency for URL creation (no duplicate short codes)
- Scale: 100M new URLs/day, 10:1 read-to-write ratio → 1B redirects/day
- Durability: URLs must not be lost — a broken short link is permanent reputation damage
2. Estimation (3 min)
Write (URL creation)
- 100M URLs/day ÷ 100K sec/day = ~1,000 writes/sec
- Peak: 5x → 5,000 writes/sec
Read (redirects)
- 1B redirects/day ÷ 100K = ~10,000 reads/sec
- Peak: 50,000 reads/sec
Storage
- Each record: short code (7 bytes) + long URL (avg 200 bytes) + metadata (50 bytes) ≈ 250 bytes
- 100M/day × 365 × 5 years = 182.5B records
- 182.5B × 250 bytes = ~45 TB over 5 years
Short code space
- Base62 (a-z, A-Z, 0-9), 7 characters = 62^7 = 3.5 trillion unique codes — more than enough
3. API Design (3 min)
POST /api/v1/shorten
Headers: Authorization: Bearer <api_key>
Body: {
"long_url": "https://example.com/very/long/path",
"custom_alias": "my-link", // optional
"ttl_days": 365 // optional, default 1825
}
Response 201: {
"short_url": "https://tiny.url/aB3kX9p",
"short_code": "aB3kX9p",
"expires_at": "2031-02-22T00:00:00Z"
}
GET /{short_code}
Response 301: Location: https://example.com/very/long/path
Response 404: { "error": "URL not found or expired" }
GET /api/v1/stats/{short_code}
Headers: Authorization: Bearer <api_key>
Response 200: {
"short_code": "aB3kX9p",
"long_url": "https://example.com/...",
"total_clicks": 142857,
"created_at": "2026-02-22T00:00:00Z",
"expires_at": "2031-02-22T00:00:00Z"
}
Key decisions:
- 301 (permanent redirect) for SEO and browser caching. Use 302 if we need to track every single click (301 lets browser skip us on subsequent visits)
- API key required for creation (prevents abuse), not for redirects
- Rate limiting: 100 creations/min per API key
4. Data Model (3 min)
Primary Store: Key-Value (DynamoDB or Cassandra)
Table: urls
short_code (PK) | string, 7 chars
long_url | string, up to 2048 chars
user_id | string (who created it)
created_at | timestamp
expires_at | timestamp
click_count | bigint (eventually consistent)
Why NoSQL? Access pattern is purely key-value: given short_code, return long_url. No joins, no complex queries. DynamoDB gives single-digit ms reads at any scale with partition key lookups.
Custom Alias Handling
Custom aliases go into the same table. Before inserting, do a conditional write (PutItem with condition attribute_not_exists(short_code)) — atomic, no race conditions.
Analytics (separate store)
Table: click_events (append-only, Kafka → ClickHouse/Druid)
short_code | string
timestamp | timestamp
country | string (from IP)
referrer | string
user_agent | string
5. High-Level Design (12 min)
Write Path (URL Creation)
Client → API Gateway (rate limit + auth)
→ URL Service
→ Generate short code (Snowflake ID → Base62 encode)
→ Check for collision (conditional write to DynamoDB)
→ Write to DynamoDB
→ Invalidate cache (if overwriting expired code)
→ Return short URL
Read Path (Redirect)
Client → CDN/Edge (check cache)
Cache hit → 301 redirect (done)
Cache miss → Load Balancer → Redirect Service
→ Check Redis cache
Hit → 301 redirect, async log click
Miss → Query DynamoDB
Found → Populate Redis, 301 redirect, async log click
Not found / expired → 404
Click Analytics Path
Redirect Service → Kafka (async, fire-and-forget)
→ Click Analytics Consumer → ClickHouse
→ Periodic aggregation → Update click_count in DynamoDB
Components
- API Gateway: Rate limiting, API key auth, request validation
- URL Service: Handles creation, generates short codes
- Redirect Service: Handles lookups, optimized for latency
- Redis Cluster: Cache layer for hot URLs (LRU eviction)
- DynamoDB: Primary store (or Cassandra for self-hosted)
- Kafka: Async click event stream
- ClickHouse: Click analytics (time-series optimized)
- CDN: Cache redirects at edge for extremely hot URLs
6. Deep Dives (15 min)
Deep Dive 1: Short Code Generation (Avoiding Collisions)
Approach: Pre-generated ID + Base62 encoding
Use a Snowflake-like ID generator:
- 41 bits: timestamp (69 years)
- 10 bits: machine ID (1024 machines)
- 12 bits: sequence number (4096 per ms per machine)
Convert the 63-bit ID to Base62 → 7 characters.
Why not hash the URL? MD5/SHA256 of the URL then truncate to 7 chars has ~50% collision probability at 62^3.5 ≈ 900M entries (birthday paradox). At 100M URLs/day, we’d hit collision issues within 10 days. We’d need collision detection + retry, which adds latency and complexity.
Why not a counter? A single auto-increment counter is a SPOF and bottleneck. Distributed counters (like ZooKeeper-backed ranges) work but add operational complexity. Snowflake IDs give us uniqueness without coordination.
Custom alias handling: User-chosen aliases bypass the ID generator. We do a conditional write to DynamoDB — if the alias already exists, reject with 409 Conflict.
Deep Dive 2: Caching Strategy
Multi-tier caching:
-
CDN/Edge cache (Cloudflare, CloudFront): Cache 301 redirects with
Cache-Control: max-age=86400. For extremely popular links (>1M clicks/day), the CDN absorbs >99% of traffic. Trade-off: we lose per-click analytics accuracy (browser + CDN cache means we don’t see every click). -
Redis cluster: LRU cache with ~100GB capacity. At 250 bytes per entry, that’s ~400M URLs cached — covers all URLs accessed in the last ~week.
-
DynamoDB DAX (optional): In-memory cache integrated with DynamoDB for microsecond reads on cache miss from Redis.
Cache invalidation:
- On expiry: TTL-based eviction in Redis (set Redis TTL = URL expires_at - now)
- On delete: Explicit invalidation (publish to Kafka → all cache nodes evict)
- Hot key protection: Popular URLs get replicated across multiple Redis nodes to avoid hotspot
Analytics trade-off: If we use 301 redirects + CDN caching, we lose granular click tracking because the browser/CDN won’t hit our servers on subsequent clicks. Options:
- Use 302 (temporary redirect) for analytics-critical links — forces browser to always go through us
- Accept approximate analytics (good enough for most use cases)
- Hybrid: 301 by default, 302 for premium users who need exact analytics
Deep Dive 3: Handling Expiry at Scale
With 182.5B records over 5 years, we can’t scan the entire table to find expired URLs.
DynamoDB TTL: Built-in feature. Set a ttl attribute (Unix timestamp). DynamoDB automatically deletes expired items within 48 hours of expiry. Deleted items are published to DynamoDB Streams → we consume this to also evict from Redis.
For Cassandra: Use TTL on insert: INSERT INTO urls (...) VALUES (...) USING TTL 157680000 (5 years in seconds). Cassandra handles tombstone cleanup during compaction.
Redirect behavior: Even before the background cleanup runs, the redirect service checks expires_at on every lookup. Expired → 404, regardless of whether the row has been physically deleted yet.
7. Extensions (2 min)
- Abuse prevention: Blacklist malicious URLs using Google Safe Browsing API check on creation. Rate limit by IP (not just API key) to prevent credential stuffing of shortened URLs as phishing vectors.
- Global deployment: Multi-region DynamoDB Global Tables for < 50ms writes worldwide. Redis clusters per region. Geo-route users to nearest edge.
- Custom domains: Allow users to use their own domain (brand.co/link) — just a CNAME + configuration in our system.
- A/B testing: Allow multiple destination URLs per short code with traffic splitting (50/50, 90/10, etc.).
- Monitoring: Track redirect latency p50/p95/p99, cache hit ratio, short code generation rate, DynamoDB consumed capacity, Kafka lag.