“How many requests per second should we design for?”
This question shows up in every system design round. And most candidates handle it in one of two ways:
- Skip it entirely. “Let’s just say it’s a lot.” Then they draw boxes and hope for the best.
- Over-engineer it. They pull out exact division, multiply everything precisely, and burn five minutes on arithmetic that adds zero architectural signal.
Both are wrong. Back-of-envelope estimation isn’t about getting the right number. It’s about getting the right order of magnitude so you can make informed architectural decisions.
The difference between 100 QPS and 100,000 QPS isn’t just a bigger number. It’s a fundamentally different system. One fits on a single server. The other needs distributed caching, load balancing, and database sharding. The estimation tells you which world you’re in.
The Reference Numbers
Before doing any calculation, you need a mental toolkit of reference numbers. You don’t need to memorize these exactly. Round numbers are fine. The point is having anchors.
Time
| Unit | Value |
|---|---|
| 1 day | ~100,000 seconds (86,400, round to 100K) |
| 1 month | ~2.5 million seconds |
| 1 year | ~30 million seconds |
Use 100K seconds/day for everything. It makes division trivial.
Data Size
| Unit | Value |
|---|---|
| 1 character (ASCII) | 1 byte |
| 1 character (UTF-8, avg) | 2-3 bytes |
| A tweet (280 chars) | ~0.5 KB with metadata |
| A typical JSON API response | 1-10 KB |
| A photo (compressed) | 200 KB - 1 MB |
| A short video (1 min, compressed) | 5-10 MB |
| A high-res image | 2-5 MB |
Scale Prefixes
| Prefix | Value | Shorthand |
|---|---|---|
| Kilo | 10^3 | Thousand |
| Mega | 10^6 | Million |
| Giga | 10^9 | Billion |
| Tera | 10^12 | Trillion |
| Peta | 10^15 | Quadrillion |
Quick conversions:
- 1 million seconds = ~12 days
- 1 billion seconds = ~32 years
- 1 TB = 1,000 GB = 1,000,000 MB
Latency (Order of Magnitude)
| Operation | Latency |
|---|---|
| L1 cache reference | ~1 ns |
| L2 cache reference | ~4 ns |
| Main memory reference | ~100 ns |
| SSD random read | ~100 us |
| HDD seek | ~10 ms |
| Round trip within same datacenter | ~0.5 ms |
| Round trip cross-continent | ~100-150 ms |
| Packet round trip CA to Netherlands | ~150 ms |
Real-World Scale References
| Service | Approximate Scale |
|---|---|
| Google Search | ~100K QPS |
| ~300K tweets/day at peak | |
| ~100B messages/day | |
| YouTube | ~500 hours of video uploaded/minute |
| ~100M photos uploaded/day | |
| Netflix | ~250M subscribers, ~1B hours streamed/week |
These aren’t exact. They’re directional. When someone says “design a system like Twitter,” you now know the ballpark.
The Estimation Framework
Every estimation follows the same four steps:
Step 1: Anchor on Users
Start with DAU (Daily Active Users). If the interviewer doesn’t give a number, ask. If they say “assume reasonable scale,” pick something concrete:
- Small/startup: 1M DAU
- Medium: 10M-50M DAU
- Large (Twitter/Instagram scale): 100M-500M DAU
Step 2: Estimate Actions Per User
How many times does an average user perform the core action per day?
- Social media post: 0.1-1 per day (most users lurk, few post)
- Messages sent: 10-50 per day
- Searches: 5-10 per day
- Feed refreshes: 10-20 per day
- URL shortener: 0.1 per day (most users click, not create)
Step 3: Calculate QPS
Total daily actions = DAU x actions per user
QPS = Total daily actions / 100,000 (seconds in a day)
Peak QPS = QPS x 2-3 (for traffic spikes)
Step 4: Estimate Storage
Storage per day = Total daily actions x size per action
Storage per year = Storage per day x 365
Total storage = Storage per year x retention period
That’s it. Four steps. Should take 60-90 seconds.
Worked Examples
Example 1: URL Shortener
Given: 100M DAU
Reads vs. Writes:
- Write: each user creates ~0.1 short URLs/day = 10M writes/day
- Read: each short URL gets clicked ~10x = 100M reads/day
QPS:
- Write QPS: 10M / 100K = 100 writes/sec
- Read QPS: 100M / 100K = 1,000 reads/sec
- Peak read QPS: ~3,000 reads/sec
Storage:
- Each record: short URL (7 chars) + long URL (~200 chars) + metadata = ~500 bytes
- Daily: 10M x 500 bytes = 5 GB/day
- Yearly: 5 GB x 365 = ~1.8 TB/year
- 5-year retention: ~9 TB total
What this tells you:
- Read-heavy (10:1 ratio) -> caching is essential
- 3K peak QPS -> single database can handle this with read replicas
- 9 TB -> fits in a single well-provisioned database, but consider partitioning for growth
- This is not a massive-scale problem. No need for complex distributed architecture.
Example 2: Chat System (WhatsApp-scale)
Given: 500M DAU
Messages:
- Average user sends 40 messages/day
- Total: 500M x 40 = 20B messages/day
QPS:
- Message QPS: 20B / 100K = 200,000 writes/sec
- Peak: ~500,000 writes/sec
Storage:
- Average message: 100 bytes (text) + 200 bytes (metadata) = ~300 bytes
- Daily: 20B x 300 bytes = 6 TB/day
- Yearly: ~2 PB/year
Bandwidth:
- Incoming: 200K messages/sec x 300 bytes = 60 MB/sec
- With media (10% of messages have a 200KB image): 20K x 200KB = 4 GB/sec
What this tells you:
- 500K peak writes/sec -> single database won’t work. Need horizontal sharding.
- 2 PB/year -> need a distributed storage system (not a single RDBMS)
- 4 GB/sec bandwidth for media -> need CDN, object storage (S3-style)
- This is a massive-scale problem. Every component needs horizontal scaling.
Example 3: Twitter-like Feed
Given: 200M DAU
Write path (posting):
- 1% of users post per day = 2M posts/day
- Post QPS: 2M / 100K = 20 writes/sec (surprisingly low!)
Read path (feed):
- Average user refreshes feed 10x/day = 2B feed requests/day
- Feed QPS: 2B / 100K = 20,000 reads/sec
- Peak: ~50,000 reads/sec
Fan-out:
- Average user has 200 followers
- Each post fans out to 200 timelines
- Fan-out operations/sec: 20 x 200 = 4,000 timeline writes/sec
- Celebrity with 10M followers: single post = 10M timeline writes. This is the fan-out problem.
Storage:
- Each post: ~1 KB (text + metadata)
- Daily: 2M x 1KB = 2 GB/day (posts are small!)
- Timeline cache per user: last 200 posts x 1KB = 200 KB
- Total timeline cache: 200M x 200KB = 40 TB
What this tells you:
- Write QPS is tiny (20/sec). The challenge isn’t writing posts.
- Read QPS is massive (50K/sec). Caching is critical.
- Fan-out is the real problem. A celebrity post triggers millions of writes.
- Need hybrid approach: fan-out-on-write for normal users, fan-out-on-read for celebrities.
- 40 TB timeline cache -> Redis cluster with sharding.
The Rounding Rules
Speed matters more than precision. Here are the shortcuts:
Round everything to the nearest power of 10.
- 86,400 seconds in a day? Use 100,000.
- 365 days in a year? Use 400 (or just 10 months x 30 days = 300).
- 1,048,576 bytes in a MB? Use 1,000,000.
Use 2x-3x for peak traffic. Most systems see 2-3x average traffic during peaks. For spiky systems (e-commerce during sales, sports during live events), use 5-10x.
Round storage up, not down. It’s better to over-provision storage than under-provision. Storage is cheap. Running out of storage at 3 AM is not.
State the ratio, not just the number. “1,000 reads/sec and 100 writes/sec” is more useful than “1,100 total QPS.” The 10:1 ratio tells you to optimize for reads.
Common Mistakes
1. Spending Too Long
The estimation should take 60-90 seconds. If you’re doing long division on the whiteboard, you’ve lost the plot. Round aggressively and move on.
2. Not Separating Reads and Writes
“Total QPS is 10,000.” That’s incomplete. A system with 9,000 reads and 1,000 writes is architected completely differently from one with 5,000 reads and 5,000 writes. Always split them.
3. Forgetting Peak Traffic
Average QPS is meaningless for capacity planning. Systems don’t fail at average load. They fail at peak. Always multiply by 2-3x (or more for spiky workloads).
4. Ignoring the Fan-Out Effect
A social media post doesn’t create one write. It creates N writes, where N is the number of followers. A user with 1M followers creates 1M fan-out writes from a single post. This is often the bottleneck, not the ingestion rate.
5. Getting Lost in the Math
The interviewer doesn’t care if the answer is 1,847 QPS or 2,000 QPS. Both lead to the same architecture. What matters is: “This is in the low thousands, so a single server with caching can handle it.” That’s the insight. The number is just a vehicle.
6. Not Connecting Estimation to Architecture
The worst thing you can do is calculate numbers and then ignore them. Every number should lead to a decision:
| Estimation | Architectural Signal |
|---|---|
| < 1K QPS | Single server, maybe with read replica |
| 1K-10K QPS | Load balancer + multiple app servers + read replicas |
| 10K-100K QPS | Horizontal scaling, caching layer (Redis/Memcached), possibly sharding |
| 100K+ QPS | Distributed system, CDN, database sharding, message queues |
| < 1 TB storage | Single database instance |
| 1-10 TB | Consider partitioning, compression |
| 10-100 TB | Sharded database or distributed storage |
| 100 TB+ | Distributed file system (HDFS, S3), data lake architecture |
The Quick-Reference Cheat Sheet
When you’re in the interview and need to move fast:
Users to QPS:
QPS = (DAU x actions_per_user) / 100,000
Storage per year:
Storage = DAU x actions_per_user x bytes_per_action x 365
Bandwidth:
Bandwidth = QPS x bytes_per_request
Machines needed (rough):
A single modern server handles ~10K-50K simple requests/sec
Machines = Peak QPS / 10,000 (conservative)
Cache size:
Follow the 80/20 rule: 20% of data serves 80% of reads
Cache = 0.2 x daily_read_data
Closing Thought
Back-of-envelope estimation is not a math exercise. It’s a calibration tool. The goal is to spend 60 seconds understanding the scale of the problem so that every architectural decision that follows is grounded in reality.
A system designed for 100 QPS looks nothing like a system designed for 100,000 QPS. The estimation is what tells you which one to build. Get the order of magnitude right, connect it to architecture, and move on. That’s all there is to it.
Comments