1. Requirements & Scope (5 min)
Functional Requirements
- As the user types in a search box, show top 5-10 suggestions matching the prefix
- Suggestions should be ranked by popularity (search frequency)
- Suggestions update as new search trends emerge (near-real-time)
- Support personalization (user’s own search history weighted higher)
- Handle typos/fuzzy matching (optional stretch goal)
Non-Functional Requirements
- Availability: 99.99% — broken autocomplete degrades the entire search experience
- Latency: Suggestions must appear in < 50ms at p99 (users expect instant response as they type)
- Consistency: Eventual consistency is fine — if a new trending search takes a few minutes to appear in suggestions, that’s acceptable
- Scale: 10B search queries/day, suggestions requested on every keystroke → 50B+ suggestion requests/day (average query = 5 keystrokes)
2. Estimation (3 min)
Traffic
- 50B suggestion requests/day ÷ 100K = 500,000 requests/sec
- Peak: 3x → 1.5M requests/sec
- This is an extremely high-QPS system — latency and throughput are everything
Data Size
- Vocabulary: ~5M unique search phrases (with frequency counts)
- Each phrase: avg 30 chars + frequency count + metadata = ~100 bytes
- Total vocabulary: 5M × 100 bytes = 500MB — fits entirely in memory
Key Insight
This is a memory-bound, latency-critical system. The entire dataset fits in RAM. The challenge is serving 1.5M QPS at < 50ms consistently.
3. API Design (3 min)
GET /api/v1/suggest?q={prefix}&limit=10&user_id={user_id}
Response 200: {
"suggestions": [
{ "text": "system design interview", "score": 9850 },
{ "text": "system design primer", "score": 7200 },
{ "text": "system design patterns", "score": 5100 },
{ "text": "system design youtube", "score": 4800 },
{ "text": "system design book", "score": 3200 }
]
}
// Internal: update search frequency (called by Search Service)
POST /api/v1/suggest/record
Body: { "query": "system design interview" }
Response 204
Key decisions:
- No authentication required — autocomplete is called on every keystroke, adding auth overhead would kill latency
- Client-side debouncing — client should wait 100-200ms after last keystroke before requesting (reduces QPS by 3-5x)
- Limit parameter — default 5-10 suggestions, never more than 20
- User context optional — for personalization (boost user’s recent searches)
4. Data Model (3 min)
Trie (in-memory, per-server)
Trie Node:
char | single character
children | map<char, TrieNode>
is_end | boolean (marks end of a complete phrase)
top_k | list<(phrase, score)> -- pre-computed top-K for this prefix
Pre-computed top-K at each node:
- Each trie node stores the top 10 suggestions for its prefix
- When user types “sys”, we look up the node at s→y→s and immediately return
top_k - This makes lookup O(L) where L = prefix length — typically 3-5 characters → ~5 pointer traversals = sub-microsecond
Why not query-time sorting?
- For prefix “s”, there might be 500K matching phrases
- Sorting 500K items per request at 1.5M QPS is impossible
- Pre-computing top-K at build time moves the cost offline
Frequency Data (Kafka → Redis → Batch Rebuild)
Redis (real-time frequency tracking):
Key: freq:{normalized_query}
Value: count (incremented on each search)
Aggregated data (periodic snapshot):
query_text | frequency | updated_at
5. High-Level Design (12 min)
Suggestion Serving Path (hot path — must be fast)
Client types "sys"
→ Client debounces (100ms)
→ GET /suggest?q=sys
→ Load Balancer → Suggestion Server
→ Trie lookup: s → y → s → return top_k
→ (Optional) Merge with personalized results from user history
→ Return top 10 suggestions
→ Total time: < 5ms
Data Update Path (offline/near-real-time)
User performs a search → Search Service
→ Publish to Kafka: search_queries topic
→ Real-time aggregator (Flink/Spark Streaming):
→ Count query frequencies in sliding windows (1h, 24h)
→ Write updated frequencies to Redis
→ Periodic Trie Rebuilder (every 15-30 minutes):
→ Read all query frequencies from Redis
→ Build new Trie with pre-computed top-K at each node
→ Serialize Trie → Publish to Trie Distribution Service
→ Each Suggestion Server downloads the new Trie, swaps it in atomically
Components
- Suggestion Servers (stateful): Each holds the complete Trie in memory. Horizontal scaling = more replicas.
- Trie Builder (periodic batch job): Builds new Trie from frequency data. Runs every 15-30 minutes.
- Frequency Aggregator (Flink/Spark Streaming): Processes search events from Kafka, maintains frequency counts in Redis.
- Redis: Real-time frequency counts, user search history (for personalization)
- Kafka: Search event stream
- Trie Distribution Service (S3 or internal blob store): Stores serialized Trie snapshots. Suggestion Servers pull new versions.
6. Deep Dives (15 min)
Deep Dive 1: Trie Design and Optimization
Basic Trie → Compressed Trie (Patricia Trie):
- Many paths in a Trie have no branching (e.g., “system” = s→y→s→t→e→m with no branches)
- Compressed Trie merges single-child paths: “system” stored as one node
- Memory reduction: 40-60% fewer nodes
Pre-computed top-K propagation:
- Build bottom-up: leaf nodes have their own frequency
- At each internal node, top-K is the union of its children’s top-K, re-sorted
- Space: each node stores 10 entries × ~50 bytes = 500 bytes
- For 5M unique queries with avg 30 chars → ~150M trie nodes → after compression ~50M nodes
- 50M × 500 bytes top-K = 25GB per server — fits in a mid-range server’s RAM
Atomic Trie swap:
- Build new Trie in a background thread
- When complete, atomically swap the pointer (read pointer → new Trie)
- Old Trie is garbage collected after all in-flight requests complete
- Zero-downtime updates
Sharding for larger vocabularies:
- If vocabulary exceeds single-server memory, shard by first character (or first 2 chars)
- 26 shards for a-z, or 676 shards for aa-zz
- Request routing: hash first char(s) of prefix → correct shard
- But for most systems, 5M unique queries at 25GB is fine for a single server with replicas
Deep Dive 2: Trending Queries and Freshness
The problem: A static Trie built from historical data won’t surface breaking news or trending searches. “Election results 2026” should appear within minutes of becoming a trending search, not after the next 30-minute Trie rebuild.
Solution: Two-layer architecture
-
Static Trie (rebuilt every 30 min): Contains all queries with stable frequency. This is the primary data source.
-
Trending cache (updated every 1-2 min): Kafka → Flink streaming aggregation with sliding 1-hour window. Detect queries with abnormal velocity (frequency / historical_avg > 5x). Store trending queries in a small Redis sorted set.
-
At query time: Merge results from static Trie + trending cache. Trending queries get a boost factor in scoring.
Decay function: Trending score = recent_count × decay_factor^(minutes_since_first_spike). This ensures trending queries rise fast and also fade naturally.
Content filtering: Trending queries pass through a filter before being shown:
- Blocklist of offensive terms
- Adult content filter
- Spam detection (bot-driven query patterns)
Deep Dive 3: Personalization
User search history:
- Store last 1000 searches per user in Redis:
user_history:{user_id}(sorted set by timestamp) - On suggestion request with user_id:
- Get top-K from Trie (global popularity)
- Get matching queries from user’s history
- Merge with weighted scoring:
final_score = 0.7 × global_score + 0.3 × personal_score - Personal score = recency × frequency (how often and how recently the user searched this)
Privacy consideration:
- User can clear search history → delete Redis key
- Search history stored per-user, not shared
- Personalization is optional (works without user_id)
Edge case — cold start:
- New user with no history → pure global suggestions
- As they search more, personalization kicks in gradually
7. Extensions (2 min)
- Fuzzy matching: Handle typos using edit distance. Build a secondary index of common misspellings → correct query mappings. At query time, if exact prefix has few results, try fuzzy alternatives.
- Multi-language: Separate Tries per language. Detect language from input characters or user locale.
- Category-aware suggestions: “apple” → suggest “apple stock price” (finance), “apple macbook” (tech), “apple pie recipe” (food) with category labels.
- Query completion vs entity suggestion: Alongside query completions, show entity cards (e.g., typing “barack” → show “Barack Obama” with a photo, Wikipedia snippet). Requires an entity knowledge graph.
- Offline/client-side: For mobile apps, ship a compressed Trie of top 100K queries in the app binary. Serve suggestions locally when offline. Sync with server when online.