Everyone thinks the URL shortener is a trivial problem. “It’s a hash map. Store long URL, return short URL, done.” Then the interviewer asks: how do you generate the key, how do you avoid collisions, what happens when one popular link gets 50,000 redirects a second, and how do you serve that redirect in under 10ms across the globe. Now it’s a real system.

The whole problem is deceptively read-heavy and deceptively about one decision: how you mint short keys. Get the key generation wrong and everything downstream (collisions, hot shards, wasted storage) gets worse. Get it right and the rest is caching and sharding you already know.

This is the classic. Let me do it properly.

Functional Requirements (FR)

In scope:

  • Shorten a long URL. Given a long URL, return a short URL like https://sho.rt/aB3xK9.
  • Redirect. Given a short URL, redirect (HTTP 301/302) to the original long URL.
  • Custom aliases. Optionally let a user pick their own key, e.g. sho.rt/my-launch.
  • Expiration. Optionally let a link expire at a given time (or default TTL).
  • Basic analytics. Click counts per short link. Treat this as best-effort, not transactional.

Explicitly out of scope (say this out loud in the interview so you control the scope):

  • User accounts, auth, and quota management beyond a simple API key.
  • Link preview / safe-browsing / malware scanning (mention it exists, do not design it).
  • Editing the destination of an existing short link (links are immutable once created; this avoids a whole class of cache-invalidation problems).
  • Real-time analytics dashboards. We will capture raw events; aggregation is a separate offline pipeline.

The one decision that drives the design: this is a read-heavy system. Reads (redirects) dominate writes (creates) by roughly 100:1. Every choice should optimize the redirect path.

Non-Functional Requirements (NFR)

  • Scale: ~100M new URLs created per day. Read:write ratio ~100:1, so ~10B redirects per day.
  • Latency: redirect p99 under 10ms at the service (excluding network RTT to the user). Creation can be slower, p99 under 100ms.
  • Availability: 99.99% for the redirect path. A redirect failing is a broken link on someone else’s site - it is the most visible failure mode. Creation can tolerate slightly lower availability.
  • Consistency: for a given short key, the mapping is immutable, so we only need read-your-writes on create and can serve redirects from eventually-consistent replicas and caches. We never have to worry about a key pointing to two different URLs over time.
  • Durability: a short link must never be lost. If sho.rt/aB3xK9 is printed on a billboard, losing that row is catastrophic. We need durable, replicated storage.
  • No collisions: two different long URLs must never receive the same short key (unless we deliberately dedupe identical long URLs).

Back-of-the-Envelope Estimation (BoE)

Let me put real numbers down so the architecture is grounded.

Writes (URL creation):

100M new URLs / day
= 100,000,000 / 86,400 seconds
≈ 1,160 writes/sec  (call it ~1.2K WPS average)
Peak (3x average) ≈ 3.5K WPS

Reads (redirects):

Read:write = 100:1
≈ 116,000 reads/sec average
Peak (3x) ≈ 350K RPS

That peak read number is the whole ballgame. 350K redirects/sec cannot hit a database. This is why caching is not optional.

Storage:

Per row, roughly:

short_key        7 bytes
long_url         ~200 bytes (avg; can be up to 2KB)
created_at       8 bytes
expires_at       8 bytes
creator_id       8 bytes
metadata/overhead ~70 bytes
-------------------------------------
≈ 300 bytes/row, round to 500 bytes with index overhead

Per year:

100M/day * 365 = 36.5B new URLs/year
36.5B * 500 bytes ≈ 18.25 TB/year

Over 5 years, ~90TB. Comfortable for a sharded store. Storage is cheap; the key space is the real constraint (covered below).

Bandwidth:

Redirect response is tiny: HTTP 301 with a Location header ≈ 500 bytes.
Read bandwidth at peak: 350K * 500 bytes ≈ 175 MB/sec
Write bandwidth: 3.5K * 700 bytes (request body) ≈ 2.5 MB/sec

Negligible. This is a latency and request-rate problem, not a bandwidth problem.

Cache size:

The classic 80/20 rule: 20% of links generate 80% of traffic. Daily redirects ≈ 10B. Cache the hot 20% of one day’s reads.

Cache the hot working set. Suppose ~20% of distinct daily-accessed links are hot.
Distinct links touched/day ≈ a few hundred million, but hot set is smaller.
Cache ~100M entries * (7 byte key + 200 byte URL + overhead ≈ 250 bytes)
≈ 25 GB

Round up to ~50-75GB of cache, easily handled by a small Redis cluster. Hit rate target: >95% on the redirect path.

Key space sanity check (base62, length 7):

62^7 = 3.5 trillion keys
At 100M/day we consume 36.5B/year.
3.5T / 36.5B ≈ 95 years of runway at length 7.

So 7 characters is the right length. Length 6 (62^6 ≈ 56B) gives barely 1.5 years - too short. Length 7 it is.

High-Level Design (HLD)

Two distinct paths: a write path (create short URL) and a read path (redirect). They have wildly different traffic and SLAs, so I separate them logically even if they share infrastructure.

                          ┌─────────────────────────┐
                          │        Clients           │
                          │ (browsers, API callers)  │
                          └────────────┬─────────────┘
                                       │
                          ┌────────────▼─────────────┐
                          │   CDN / Edge (redirects)  │  301 cached at edge
                          └────────────┬─────────────┘
                                       │ miss
                          ┌────────────▼─────────────┐
                          │      Load Balancer        │
                          └──────┬───────────────┬────┘
                                 │               │
                  WRITE path     │               │  READ path
                  ┌──────────────▼──┐       ┌─────▼────────────┐
                  │  Create Service  │       │ Redirect Service │
                  │  (stateless)     │       │  (stateless)     │
                  └───┬──────────┬───┘       └────────┬─────────┘
                      │          │                    │
            ┌─────────▼──┐  ┌────▼─────────┐   ┌──────▼───────┐
            │ Key Gen    │  │ Write to DB  │   │ Redis Cache  │
            │ Service    │  │ (sharded)    │   │ (hot links)  │
            │ (counter)  │  └────┬─────────┘   └──────┬───────┘
            └────────────┘       │                    │ miss
                                 │            ┌────────▼─────────┐
                          ┌──────▼────────────▼──────────────┐
                          │     Sharded Key-Value Store        │
                          │   (short_key -> long_url)          │
                          │   shard by hash(short_key)         │
                          └──────────────┬─────────────────────┘
                                         │ async click events
                                  ┌──────▼────────┐
                                  │ Message Queue │ (Kafka)
                                  └──────┬────────┘
                                         │
                                  ┌──────▼────────┐
                                  │ Analytics     │
                                  │ (offline agg) │
                                  └───────────────┘

Write flow (create):

  1. Client POST /shorten with the long URL.
  2. Create Service validates the URL (scheme, length, basic sanity).
  3. Create Service requests a unique ID from the Key Generation Service.
  4. The ID is base62-encoded into a 7-char short key.
  5. The mapping short_key -> long_url is written to the sharded store (durable, replicated).
  6. The short URL is returned. Optionally warm the cache.

Read flow (redirect) - the hot path:

  1. Client requests GET sho.rt/aB3xK9.
  2. CDN/edge may already hold a cached 301 for popular links. Hit -> return immediately.
  3. On miss, Load Balancer routes to a Redirect Service instance.
  4. Redirect Service checks Redis. Hit (>95% of the time) -> return 301 with Location.
  5. On cache miss, read from the sharded store using short_key, populate cache, return 301.
  6. Asynchronously emit a click event to Kafka (fire-and-forget; never block the redirect on analytics).

The key insight is in step 6: analytics must never be in the critical path. A redirect that waits on an analytics write is a redirect that fails when analytics is down.

Component Deep Dive

The two hard parts are (1) key generation and (2) serving redirects at 350K RPS with single-digit-millisecond latency. I will walk each from naive to scalable.

1. Key Generation

This is the heart of the problem. Three candidate strategies. Let me go through each the way you would in an interview - start naive, break it, evolve.

Approach A: Hash the long URL (the naive instinct)

The obvious first idea: short_key = base62(md5(long_url))[:7].

import hashlib, base64
def shorten(long_url):
    h = hashlib.md5(long_url.encode()).digest()
    return base64.urlsafe_b64encode(h)[:7].decode()

Where it breaks:

  • Collisions. Truncating a 128-bit hash to 7 base62 chars (~42 bits) creates collisions. By the birthday bound, with billions of URLs you will hit collisions. Now every write must do a read-before-write to check “does this key exist with a different URL?” and rehash on conflict. That read-before-write at 3.5K WPS against a sharded DB is expensive and racy.
  • No natural dedup benefit. People assume hashing identical URLs to the same key is a feature. But you usually want different short links for the same destination (different owners, different expiry, different analytics). Hash-based keys fight that.
  • Predictable / enumerable if you are not careful, but that is secondary.

Hashing is the trap answer. It seems elegant and is actually the most operationally painful because of collision handling.

Approach B: Auto-increment counter + base62 (the clean idea)

Keep a global counter. Each new URL gets the next integer. Encode that integer in base62 to get the short key.

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def base62_encode(num):
    if num == 0:
        return ALPHABET[0]
    s = []
    while num:
        num, rem = divmod(num, 62)
        s.append(ALPHABET[rem])
    return "".join(reversed(s))

Counter value 12345678901 becomes a short, unique, collision-free key. No collisions ever, because each integer is unique by construction. No read-before-write. This is the right core idea.

Where the naive version breaks: the single counter is a single point of failure and a write bottleneck. A SELECT ... FOR UPDATE; UPDATE counter SET val = val + 1 on one row cannot sustain 3.5K WPS, and if that row’s database dies, creation stops globally. We have moved the whole system’s availability onto one integer.

Approach C: Distributed counter via ranges (the scalable fix)

Do not hand out IDs one at a time from a global row. Hand out ranges (batches).

A central coordinator (a small, highly-available service backed by ZooKeeper/etcd, or a single auto-increment row hit rarely) allocates blocks of, say, 1,000 or 100,000 IDs to each Create Service instance. The instance serves those IDs locally from memory and only goes back to the coordinator when its block runs low.

Coordinator holds: next_block_start = 5,000,000,000

Create Service A asks for a block  -> gets [5,000,000,000 .. 5,000,099,999]
Create Service B asks for a block  -> gets [5,000,100,000 .. 5,000,199,999]

Each instance hands out IDs from its block with a local atomic counter.
When a block is ~90% used, it pre-fetches the next block (no stall).

Why this works:

  • The coordinator is hit once per 100,000 creates instead of once per create. At 3.5K WPS that is roughly one coordinator call every ~30 seconds. Trivial load, easy to make HA.
  • Each instance generates IDs from RAM at memory speed. No DB on the create hot path for ID minting.
  • IDs are globally unique because blocks never overlap.
  • If an instance crashes mid-block, we waste the rest of that block. That is fine - we have 3.5 trillion keys; burning a few thousand is irrelevant. We never reuse, which keeps uniqueness simple.

One caveat: predictability. Sequential counters produce sequential, enumerable keys (aB3xK8, aB3xK9, …). An attacker can scrape your whole link space. If that matters, scramble the integer before encoding with a fixed bijective permutation (e.g. multiply by a large odd constant mod 62^7, or Feistel-encrypt the integer). This keeps uniqueness and collision-freedom while making keys non-sequential. You get the best of both: counter guarantees, hash-like opacity.

This is the answer: distributed counter + base62, with optional bijective scrambling. Decision summary:

Strategy Collisions Hot-path DB read Availability Notes
Hash + truncate Yes (must check) Read-before-write OK Operationally painful
Single counter None None SPOF / bottleneck Clean but does not scale
Distributed range counter None None (RAM) HA via blocks The answer

Custom aliases

Custom aliases bypass the counter. The user supplies the key directly, so here we do need a uniqueness check: INSERT ... IF NOT EXISTS on the primary key. Because the short_key is the primary key of a sharded store, this is a single-shard conditional insert - cheap and atomic. To avoid custom aliases ever colliding with generated keys, reserve generated keys to length 7 from the counter space and either require custom aliases to be a different length/charset, or just let the conditional insert reject a clash and ask the user to pick another.

2. The Redirect Hot Path (350K RPS, sub-10ms)

The naive version: every redirect does SELECT long_url FROM urls WHERE short_key = ?.

Where it breaks: 350K RPS of point reads against any single database, even sharded, blows past connection limits and disk IOPS, and you cannot hold p99 under 10ms when a request hits disk. Redirects are read 100x more than they are written and most traffic concentrates on a small hot set, so this is a textbook cache problem.

Caching layers

  1. CDN / edge cache. Because a short link is immutable, the 301 response is perfectly cacheable. Put the redirect behind a CDN keyed on the short key. The hottest links (the billboard URL, the viral tweet) get served entirely at the edge, never touching your origin. Set a long Cache-Control max-age since the mapping never changes.

  2. Application cache (Redis). On CDN miss, the Redirect Service checks Redis: GET short_key. With a >95% hit rate and ~50-75GB of cache, almost everything resolves here in well under 1ms.

  3. Database (sharded KV). Only true long-tail misses reach the store. On a miss, read, then write back into Redis with a TTL so the next request is a hit (read-through cache).

Eviction: LRU. The hot working set is small relative to total links, and LRU naturally keeps the trending links resident.

301 vs 302 - a real trade-off:

Code Meaning Effect Use when
301 Permanent Browser caches the redirect; future clicks skip your server Max performance, but you lose per-click analytics after the first
302 Temporary Browser hits your server every time You want accurate click analytics

If analytics matter, use 302 (or 307) so every click comes back to you. If raw redirect performance matters more and you accept that browsers will cache, use 301. Most production shorteners use 302 precisely so they keep counting clicks.

Hot key problem

What if one link is so popular it overwhelms the single Redis shard that holds it (the viral-tweet scenario, 50K RPS for one key)? A single Redis node has a per-key throughput ceiling. Fixes:

  • Edge/CDN absorbs it - this is exactly what the CDN layer is for; a 301/long-cache 302 on a single super-hot key is served at the edge.
  • Local in-process cache on each Redirect Service instance (a tiny LRU of the top few thousand keys, TTL of a few seconds). 50K RPS spread across 50 service instances is 1K RPS each, trivially served from local memory.
  • Key replication: store the hottest keys on multiple Redis nodes and pick one at random per request.

Combination of edge cache + per-instance local cache solves the hot key problem completely.

3. Analytics Without Slowing Redirects

Naive: increment a click_count column on every redirect. Breaks because that is a write on the read path - at 350K RPS you have just turned your read-heavy system into a write-heavy one, and the hot link’s counter row becomes a contention point.

Fix: the Redirect Service emits a click event (short_key, timestamp, geo, referrer) to Kafka asynchronously and returns the redirect immediately. A downstream consumer aggregates clicks in batches and writes rollups to an analytics store (e.g. ClickHouse or a time-series DB). Counts are eventually consistent, which is fine - nobody needs exact-to-the-millisecond click counts, and we never block a redirect on a write.

API Design & Data Schema

API

POST /api/v1/shorten
Body:
{
  "long_url": "https://example.com/some/very/long/path?x=1",
  "custom_alias": "my-launch",      // optional
  "expires_at": "2027-01-01T00:00:00Z"  // optional
}
Response 201:
{
  "short_url": "https://sho.rt/aB3xK9",
  "short_key": "aB3xK9",
  "long_url": "https://example.com/some/very/long/path?x=1",
  "expires_at": "2027-01-01T00:00:00Z"
}
Errors:
  400  invalid URL
  409  custom_alias already taken
  429  rate limit exceeded
GET /{short_key}
Response 302:
  Location: https://example.com/some/very/long/path?x=1
  Cache-Control: private, max-age=0   (302, to keep counting clicks)
Errors:
  404  unknown or expired key
  410  key expired (optional, instead of 404)
GET /api/v1/stats/{short_key}
Response 200:
{
  "short_key": "aB3xK9",
  "total_clicks": 184231,
  "clicks_last_24h": 5120,
  "top_referrers": [...]
}

Data store choice: NoSQL key-value

The core access pattern is a primary-key lookup: short_key -> long_url. There are no joins, no range scans on the hot path, and we need horizontal scaling to ~100TB with predictable single-key latency. That is the exact sweet spot for a distributed key-value / wide-column store like DynamoDB or Cassandra, not a relational database.

Why not SQL: a single Postgres instance cannot hold 90TB or serve this read rate, and once you shard SQL manually you lose its main advantage (joins, transactions across rows) while keeping its operational weight. We do not need cross-row transactions here - each short link is independent and immutable. NoSQL with short_key as the partition key gives us automatic sharding and the latency profile we want.

The one place SQL-style guarantees help is the rarely-hit counter coordinator, which can be a single small relational row or, better, an etcd/ZooKeeper sequence - tiny data, high consistency needs.

Primary table: urls (key-value)

Partition key: short_key   (string, 7 chars)

short_key     STRING   PK     -- "aB3xK9"
long_url      STRING          -- the destination
created_at    TIMESTAMP
expires_at    TIMESTAMP NULL  -- TTL; store can auto-evict
creator_id    STRING NULL
is_custom     BOOLEAN

Index: primary key only (short_key). That is the only access path on the hot path.
TTL: enable native TTL on expires_at so expired rows are reaped automatically.

Counter coordinator (tiny, separate, strongly consistent)

counter_blocks
  block_id        BIGINT PK
  next_block_start BIGINT     -- monotonically increasing
  -- one logical row, updated once per ~100K creates

Analytics rollup (separate store, e.g. ClickHouse / time-series)

clicks
  short_key     STRING
  ts_bucket     TIMESTAMP   -- e.g. hourly bucket
  click_count   BIGINT
  geo           STRING
  referrer      STRING
PK / order by: (short_key, ts_bucket)

Keeping analytics in its own store means heavy analytical aggregation never touches the redirect store.

Bottlenecks & Scaling

Where it breaks first, in order, and the fix for each:

1. The redirect read rate (breaks first). 350K RPS cannot hit the DB. Fix: layered caching - CDN at the edge, Redis app cache (>95% hit rate), per-instance local LRU for hot keys, read-through on miss. The DB only sees long-tail traffic.

2. The key generator as SPOF / bottleneck. Fix: distributed range allocation. The coordinator is hit once per block (~every 30s), making it trivial to run as a small HA cluster (etcd/ZooKeeper). Instances pre-fetch the next block before exhausting the current one, so there is no stall. Lost blocks on crash are simply burned - the key space is enormous.

3. Database storage and write throughput. Fix: shard by short_key (hash partitioning). Because keys are uniformly distributed (especially with base62 of a counter), load spreads evenly across shards - no natural hot shard on writes. Add read replicas per shard for durability and to absorb cache-miss reads. Replication factor 3 for durability.

Shard key choice: short_key. Reads are always by short_key, so this is the natural partition key and every redirect read is a single-shard, single-key lookup. Sharding by created_at or creator_id would be wrong - it would scatter the by-key reads across shards.

4. Hot key (single viral link). Fix: CDN absorbs the bulk; per-instance local cache spreads the rest across all service nodes; optionally replicate the hottest keys across multiple cache nodes. Covered in the deep dive.

5. Analytics writes contaminating the read path. Fix: emit click events to Kafka asynchronously, aggregate offline. Redirects never block on a write. Counts are eventually consistent.

6. Single-region latency for global users. Fix: deploy Redirect Service + cache in multiple regions; route via GeoDNS/anycast. Because links are immutable, multi-region replication is easy - no write conflicts to reconcile. The write path can stay primary-region with async replication out; a brief replication lag only means a just-created link might 404 in a far region for a second, which we mitigate by serving creates with read-your-writes from the primary and warming caches on create.

7. Single points of failure. Stateless Create and Redirect services behind load balancers (scale horizontally, any instance handles any request). Coordinator runs as an HA quorum. Cache and DB are replicated. No single box whose loss stops redirects.

Wrap-Up

The trade-offs that define this design:

  • Distributed counter + base62 over hashing. We trade a tiny central coordinator (hit rarely, made HA) for zero collisions and zero read-before-write. With bijective scrambling we also kill key enumeration. This is the single most important call.
  • NoSQL key-value over SQL. The access pattern is pure primary-key lookup at massive scale with no joins, so we take automatic sharding and predictable latency and give up relational features we do not need.
  • Cache everything, immutability makes it safe. Because mappings never change, we cache aggressively at the edge, in Redis, and in-process, with no invalidation headaches. This is what makes 350K RPS at sub-10ms feasible.
  • Analytics off the hot path. Async events to Kafka, eventual counts. We refuse to let a write or an analytics outage break a redirect.
  • 302 over 301 when click analytics matter, accepting more origin traffic to keep counting.

One-line summary: a read-heavy, immutable key-value system where a distributed range-allocated counter mints collision-free base62 keys, and layered edge plus Redis plus in-process caching serves redirects at hundreds of thousands of requests per second in single-digit milliseconds, with analytics pushed entirely off the critical path.