You walk into a system design interview. The interviewer says: “Design a URL shortener” or “Design WhatsApp” or “Design a distributed task scheduler.”
You have 45 minutes. The clock starts.
What do most candidates do? They grab the marker and start drawing boxes. API gateway here, database there, a cache somewhere in the middle. Fifteen minutes in, the interviewer asks “what happens when a server goes down?” and the whole design falls apart because nobody talked about availability requirements upfront.
The single biggest differentiator in system design interviews isn’t technical depth. It’s structure. A well-structured interview where you systematically build up from requirements to a complete design will always beat a chaotic whiteboard session where you’re randomly jumping between components.
Here’s the exact structure, phase by phase, with time allocations for a 45-minute round.
The Overview
| Phase | Time | What You’re Doing |
|---|---|---|
| 1. Requirements & Scope | 5 min | Nail down what you’re building |
| 2. Estimation & Constraints | 3-5 min | Understand the scale |
| 3. API Design | 3-5 min | Define the contract |
| 4. Data Model | 3-5 min | Design the storage |
| 5. High-Level Design | 10-12 min | Draw the architecture |
| 6. Deep Dives | 10-12 min | Go deep on 2-3 hard problems |
| 7. Wrap-Up & Extensions | 2-3 min | Show you think beyond the immediate ask |
The first four phases are foundation. The last three are where you actually design. Most candidates spend too much time on foundation or skip it entirely. Neither works. You need a fast, thorough foundation so you have maximum time for the actual design where the interesting decisions live.
A note on ordering: This structure puts API Design and Data Model before the High-Level Design. That’s intentional for most product-design problems (Design Twitter, Design Uber, Design WhatsApp) — knowing the endpoints and data shapes first makes the architecture feel deliberate rather than invented on the fly. But for infrastructure-focused problems (Design a Rate Limiter, Design a Distributed Cache, Design a Task Scheduler), the API is often trivial and the interesting part is component interaction. In those cases, sketch the HLD first, then fill in the API and data model as you go. Read the problem and adapt.
Phase 1: Requirements & Scope (5 minutes)
This phase exists to prevent you from designing the wrong system.
The interviewer gave you a vague prompt on purpose. “Design Twitter” could mean the tweet timeline, the notification system, the search infrastructure, or the ad targeting pipeline. These are completely different systems. If you don’t scope it, you’ll design something broad and shallow, which is exactly what the interviewer doesn’t want.
Functional Requirements
Ask clarifying questions to narrow the scope to 3-5 core features. The goal is to agree on exactly what the system does.
Good questions:
- “Are we focusing on the write path (posting tweets) or the read path (timeline generation), or both?”
- “Do we need to handle media uploads, or just text?”
- “Should the system support real-time updates, or is polling acceptable?”
- “Are we designing for a single region or globally distributed?”
What you should end up with: A clear, numbered list of 3-5 features that you and the interviewer agree on.
Example for “Design Twitter”:
- User can post a tweet (text only, 280 chars)
- User can follow/unfollow other users
- User can view their home timeline (tweets from people they follow)
- Timeline should be near-real-time (within seconds of posting)
Notice what’s not on this list: search, trending, DMs, notifications, media, ads. You explicitly scoped those out. If the interviewer wants to include any of them, they’ll tell you. If they don’t, you just saved yourself from designing five systems in 45 minutes.
Non-Functional Requirements
This is where you quantify the “how well” instead of the “what.” State these proactively - don’t wait to be asked.
- Availability: “I’d target 99.99% uptime. A social feed being down is immediately visible to millions.”
- Latency: “Timeline loads should be under 200ms at p99. Users expect near-instant feed loading.”
- Consistency: “I’d accept eventual consistency for the timeline. It’s okay if a tweet takes 2-3 seconds to appear in all followers’ feeds. But the write path should be strongly consistent - if you post a tweet and refresh, you must see it.”
- Scale: “Let’s assume 500M DAU, with each user viewing their timeline ~10 times a day.”
The key here is trade-offs. When you say “eventual consistency for the read path,” you’re not just stating a requirement. You’re signaling that you understand the CAP theorem and have made a deliberate architectural choice. The interviewer now knows you’ll choose AP over CP for the timeline service.
Common Mistake
Don’t spend 10 minutes here playing 20 questions. Five minutes max. If you’re unsure about something, state your assumption and move on: “I’ll assume we’re designing for a global audience of 500M DAU. I’ll revisit if that changes the design significantly.”
Phase 2: Estimation & Constraints (3-5 minutes)
You’ve scoped the system. Now you need to know how big it is, because the architecture for 1,000 users is fundamentally different from the architecture for 1 billion users. For a deep dive on estimation techniques, see the back-of-envelope calculation guide.
Estimate these (only the ones relevant to your system):
- QPS (Queries per Second): How many reads and writes per second?
- Storage: How much data over 5 years?
- Bandwidth: What’s the data transfer rate?
Quick Example (Twitter Timeline)
Writes:
- 500M DAU, let’s say 1% post daily = 5M tweets/day
- 5M / 100K seconds per day = 50 tweets/second (write QPS)
- Peak: 5x average = 250 tweets/second
Reads:
- 500M DAU x 10 timeline loads/day = 5B reads/day
- 5B / 100K = 50,000 reads/second (read QPS)
- Peak: 250,000 reads/second
Storage:
- Each tweet: ~300 bytes (text + metadata)
- 5M tweets/day x 300 bytes = 1.5 GB/day
- Per year: ~550 GB for tweets alone
What this tells you: Read-heavy system (1000:1 read-to-write ratio). You need aggressive caching and probably pre-computed timelines. The write path is relatively light. This directly shapes your architecture.
Don’t spend more than 3-5 minutes here. Round aggressively. The point isn’t precision - it’s knowing the order of magnitude so you can make the right architectural calls.
For a detailed deep dive on estimation, see my earlier post on back-of-envelope calculations.
Phase 3: API Design (3-5 minutes)
Before you draw a single box, define the contract. What does the client send, and what does it get back?
This forces you to think about the system from the user’s perspective before diving into internals. It also gives you a clear foundation to build on - every API endpoint will eventually map to a flow through your architecture.
POST /v1/tweets
Body: { text: string, media_ids?: string[] }
Response: { tweet_id: string, created_at: timestamp }
GET /v1/timeline?cursor={cursor}&limit=20
Response: { tweets: Tweet[], next_cursor: string }
POST /v1/follow
Body: { target_user_id: string }
Response: { status: "ok" }
Key decisions to call out:
- Cursor-based pagination over offset-based (more efficient for feeds, handles insertions/deletions)
- Idempotency keys for write operations if needed
- Rate limiting considerations (mention it, don’t design it)
You don’t need to design every endpoint. Cover the core 3-4 that map to your functional requirements. The interviewer can see that you think in terms of API contracts, which signals real-world design experience.
Phase 4: Data Model (3-5 minutes)
Now define what you’re storing and where.
Users table:
user_id (PK), username, created_at
Tweets table:
tweet_id (PK), user_id (FK), content, created_at
Follows table:
follower_id, followee_id, created_at
PK: (follower_id, followee_id)
Key decisions to call out:
- SQL vs NoSQL: “Tweets are append-heavy and we need fast reads by user. I’d use a NoSQL store like Cassandra for tweets, with user_id as the partition key. The Follows table is relational and has simple access patterns, so SQL (PostgreSQL) works fine.” For throughput numbers to back this decision, see the database ops/sec guide.
- Indexes: What queries will you run? Index accordingly.
- Denormalization: Will you denormalize for read performance? State the trade-off.
Don’t over-design this. You’re showing the interviewer you think about storage and access patterns before drawing architecture diagrams. The schema will evolve as you build the high-level design.
Phase 5: High-Level Design (10-12 minutes)
This is the main event. You have requirements, constraints, APIs, and a data model. Now draw the architecture.
Start Simple
Begin with a basic flow: client → load balancer → service → database. Then evolve it based on your requirements and constraints.
Don’t start with 15 boxes. Start with 3, then add components as you encounter problems. This shows the interviewer your design is driven by requirements, not by a memorised template.
Build Incrementally
Walk through each core flow:
Write flow (posting a tweet):
- Client → Load Balancer → Tweet Service → Database (write)
- “But we have 50K read QPS. We can’t query the database for every timeline load.”
- Add Timeline Service + Cache (Redis/Memcached) for pre-computed timelines
- “When a user posts a tweet, we need to update all followers’ timelines.”
- Add Message Queue (Kafka) → Fan-out Service that pushes the tweet to each follower’s cached timeline
Read flow (loading timeline):
- Client → Load Balancer → Timeline Service → Cache hit? → Return
- Cache miss? → Query database, build timeline, cache it, return
Call Out Decisions
As you add each component, explain why:
- “I’m adding a message queue here because fan-out is expensive and we don’t want it blocking the write path. The user should get a confirmation that their tweet was posted without waiting for all followers to be updated.”
- “I’m using a pull-based model for users with millions of followers (celebrities) and push-based for regular users. Pure push doesn’t scale when one user has 50M followers.”
This is where the interviewer judges your design thinking. They don’t care that you know what Kafka is. They care that you can articulate why Kafka is the right choice here and what happens if you don’t use it.
The Diagram
Your diagram should have:
- Clear data flow arrows (read path vs write path, ideally different colours)
- Component names with technology choices where relevant
- Data stores with their type (SQL, NoSQL, Cache, Blob store)
- Asynchronous boundaries clearly marked (queues, event streams)
Keep it clean. A readable diagram with 8 well-placed components beats a cluttered one with 20 boxes connected by spaghetti arrows.
Phase 6: Deep Dives (10-12 minutes)
This is where you prove you’re not just drawing boxes from a template. Pick 2-3 hard problems in your design and go deep. The interviewer will often guide you here, but it’s better if you proactively identify the interesting problems.
What Makes a Good Deep Dive?
The hard problems that require trade-offs. Not “how does a load balancer work” but “how do we handle the celebrity fan-out problem” or “how do we ensure exactly-once delivery for notifications.”
For the Twitter example, strong deep dives would be:
1. Fan-out Strategy (Push vs Pull vs Hybrid)
“For users with fewer than 10K followers, we use push (fan-out-on-write): when they tweet, we immediately push to all followers’ cached timelines. This is fast for reads - the timeline is pre-computed.
For celebrities with millions of followers, we use pull (fan-out-on-read): we don’t pre-compute. When someone loads their timeline, we merge their pre-computed timeline with a real-time fetch of celebrity tweets they follow. This avoids writing to millions of cache entries for every celebrity tweet.
The threshold (10K) is tunable. We’d monitor and adjust based on fan-out latency.”
2. Cache Design
“Each user’s timeline is stored in Redis as a sorted set, scored by timestamp. When a new tweet is fan-out, we ZADD it. For reads, we ZREVRANGE with cursor-based pagination.
Cache eviction: we keep the most recent 800 tweets per user. If someone scrolls beyond that, we fall back to a database query.
Cache invalidation: when a user unfollows someone, we need to remove that person’s tweets from the cached timeline. This is expensive, so we do it lazily - the next timeline rebuild will exclude them.”
3. Availability & Failure Handling
“What if the fan-out service goes down? The message queue (Kafka) retains messages with a retention period of 7 days. When the service recovers, it resumes from where it left off. No tweets are lost.
What if Redis goes down? We have Redis cluster with replication. If a primary fails, a replica promotes automatically. For the cold cache case, we rebuild from the database - slower for the first few requests, but recoverable.”
How to Choose Deep Dives
Prioritize problems where:
- The naive solution doesn’t scale (fan-out)
- There are genuine trade-offs with no single right answer (consistency vs latency)
- Failure handling is non-obvious (what happens when X goes down?)
If the interviewer asks you to go deep on something specific, follow their lead. They’re testing something specific.
Phase 7: Wrap-Up & Extensions (2-3 minutes)
In the final 2-3 minutes, briefly mention extensions you’d consider with more time. This shows breadth and forward thinking.
- “For monitoring, I’d add distributed tracing across all services and track p50/p95/p99 latencies per component.”
- “For abuse prevention, I’d add rate limiting at the API gateway and a content moderation pipeline on the write path.”
- “For global users, I’d set up multi-region deployment with geo-routing. Timeline caches would be regional, with cross-region replication for the source-of-truth database.”
Don’t design these. Just mention them. You’re showing the interviewer you know the system isn’t complete and you have the breadth to identify what’s missing.
The Meta-Game
Beyond the structure itself, here are the things that separate a good interview from a great one:
Drive the Conversation
Don’t wait for the interviewer to ask questions. State your plan upfront: “I’ll start with requirements, do a quick estimation, define the API, then build the high-level design and deep dive into 2-3 interesting problems.”
This tells the interviewer you have a framework and you’re going to use their time efficiently. They’ll let you drive.
Think Out Loud
The interviewer can’t evaluate what they can’t hear. When you’re deciding between two approaches, say it:
“I’m choosing between a relational database and a NoSQL store for tweets. The access pattern is primarily key-value lookups by tweet_id and range queries by user_id + time. We also have high write throughput at 250 peak QPS. Given these patterns, I’d lean toward Cassandra with user_id as the partition key and created_at as the clustering column. This gives us efficient writes and time-ordered reads per user.”
Anchor Every Decision to a Requirement
Every component you add should trace back to a requirement or constraint from Phase 1 and 2.
- “I’m adding a cache because our read QPS is 50K, and the database can’t handle that directly.” (traces to estimation)
- “I’m using eventual consistency here because we agreed the timeline can be 2-3 seconds stale.” (traces to NFRs)
- “I’m adding a dead letter queue because we said 99.99% availability, so we can’t silently drop failed messages.” (traces to NFRs)
This is what makes a design feel deliberate instead of template-driven.
Handle “What If” Questions
Interviewers will throw wrenches: “What if this service goes down?” “What if the database becomes a bottleneck?” “What if we need to support 10x the current load?”
The framework for answering these:
- Identify the impact: “If the fan-out service goes down, new tweets stop appearing in followers’ timelines. But existing cached timelines still serve reads.”
- Immediate mitigation: “Kafka retains messages, so when the service recovers, it catches up. No data loss.”
- Long-term solution: “We’d run multiple instances of the fan-out service in different availability zones. If one goes down, the others handle the load.”
Know When to Go Shallow vs Deep
You can’t go deep on everything in 45 minutes. Explicitly tell the interviewer what you’re going deep on and what you’re treating as a black box:
“I’m going to treat the notification system as a separate service that consumes from our event stream. I won’t design its internals today, but I can if you’d like.”
This shows maturity. You know what’s important, you know what can be deferred, and you’re giving the interviewer the option to redirect if they disagree.
Quick Reference: The 45-Minute Checklist
- [0:00-5:00] Scoped to 3-5 functional requirements, stated NFRs with numbers, agreed with interviewer
- [5:00-10:00] Estimated QPS (read + write), storage, bandwidth; identified read-heavy vs write-heavy
- [10:00-13:00] Defined 3-4 core API endpoints with request/response shapes
- [13:00-18:00] Designed data model, chose SQL vs NoSQL with reasoning, identified key indexes
- [18:00-30:00] Drew high-level architecture incrementally, walked through read + write flows, justified every component
- [30:00-42:00] Deep-dived into 2-3 hard problems with trade-off analysis
- [42:00-45:00] Mentioned monitoring, security, and scaling extensions
The exact times will flex depending on the problem and interviewer. Some interviewers want to spend 20 minutes on deep dives. Some want more time on data modeling. Read the room and adjust. But the order stays the same: requirements → estimation → API → data model → architecture → deep dives → extensions.
The structure isn’t a straitjacket. It’s a scaffold. It keeps you from wandering aimlessly while giving you the flexibility to go deep where it matters.
Comments