“How many requests per second should we design for?”

This question shows up in every system design round. And most candidates handle it in one of two ways:

  1. Skip it entirely. “Let’s just say it’s a lot.” Then they draw boxes and hope for the best.
  2. Over-engineer it. They pull out exact division, multiply everything precisely, and burn five minutes on arithmetic that adds zero architectural signal.

Both are wrong. Back-of-envelope estimation isn’t about getting the right number. It’s about getting the right order of magnitude so you can make informed architectural decisions.

The difference between 100 QPS and 100,000 QPS isn’t just a bigger number. It’s a fundamentally different system. One fits on a single server. The other needs distributed caching, load balancing, and database sharding. The estimation tells you which world you’re in.

The Reference Numbers

Before doing any calculation, you need a mental toolkit of reference numbers. You don’t need to memorize these exactly. Round numbers are fine. The point is having anchors.

Time

Unit Value
1 day ~100,000 seconds (86,400, round to 100K)
1 month ~2.5 million seconds
1 year ~30 million seconds

Use 100K seconds/day for everything. It makes division trivial.

Data Size

Unit Value
1 character (ASCII) 1 byte
1 character (UTF-8, avg) 2-3 bytes
A tweet (280 chars) ~0.5 KB with metadata
A typical JSON API response 1-10 KB
A photo (compressed) 200 KB - 1 MB
A short video (1 min, compressed) 5-10 MB
A high-res image 2-5 MB

Scale Prefixes

Prefix Value Shorthand
Kilo 10^3 Thousand
Mega 10^6 Million
Giga 10^9 Billion
Tera 10^12 Trillion
Peta 10^15 Quadrillion

Quick conversions:

  • 1 million seconds = ~12 days
  • 1 billion seconds = ~32 years
  • 1 TB = 1,000 GB = 1,000,000 MB

Latency (Order of Magnitude)

Operation Latency
L1 cache reference ~1 ns
L2 cache reference ~4 ns
Main memory reference ~100 ns
SSD random read ~100 us
HDD seek ~10 ms
Round trip within same datacenter ~0.5 ms
Round trip cross-continent ~100-150 ms
Packet round trip CA to Netherlands ~150 ms

Real-World Scale References

Service Approximate Scale
Google Search ~100K QPS
Twitter ~300K tweets/day at peak
WhatsApp ~100B messages/day
YouTube ~500 hours of video uploaded/minute
Instagram ~100M photos uploaded/day
Netflix ~250M subscribers, ~1B hours streamed/week

These aren’t exact. They’re directional. When someone says “design a system like Twitter,” you now know the ballpark.

The Estimation Framework

Every estimation follows the same four steps:

Step 1: Anchor on Users

Start with DAU (Daily Active Users). If the interviewer doesn’t give a number, ask. If they say “assume reasonable scale,” pick something concrete:

  • Small/startup: 1M DAU
  • Medium: 10M-50M DAU
  • Large (Twitter/Instagram scale): 100M-500M DAU

Step 2: Estimate Actions Per User

How many times does an average user perform the core action per day?

  • Social media post: 0.1-1 per day (most users lurk, few post)
  • Messages sent: 10-50 per day
  • Searches: 5-10 per day
  • Feed refreshes: 10-20 per day
  • URL shortener: 0.1 per day (most users click, not create)

Step 3: Calculate QPS

Total daily actions = DAU x actions per user
QPS = Total daily actions / 100,000 (seconds in a day)
Peak QPS = QPS x 2-3 (for traffic spikes)

Step 4: Estimate Storage

Storage per day = Total daily actions x size per action
Storage per year = Storage per day x 365
Total storage = Storage per year x retention period

That’s it. Four steps. Should take 60-90 seconds.

Worked Examples

Example 1: URL Shortener

Given: 100M DAU

Reads vs. Writes:

  • Write: each user creates ~0.1 short URLs/day = 10M writes/day
  • Read: each short URL gets clicked ~10x = 100M reads/day

QPS:

  • Write QPS: 10M / 100K = 100 writes/sec
  • Read QPS: 100M / 100K = 1,000 reads/sec
  • Peak read QPS: ~3,000 reads/sec

Storage:

  • Each record: short URL (7 chars) + long URL (~200 chars) + metadata = ~500 bytes
  • Daily: 10M x 500 bytes = 5 GB/day
  • Yearly: 5 GB x 365 = ~1.8 TB/year
  • 5-year retention: ~9 TB total

What this tells you:

  • Read-heavy (10:1 ratio) -> caching is essential
  • 3K peak QPS -> single database can handle this with read replicas
  • 9 TB -> fits in a single well-provisioned database, but consider partitioning for growth
  • This is not a massive-scale problem. No need for complex distributed architecture.

Example 2: Chat System (WhatsApp-scale)

Given: 500M DAU

Messages:

  • Average user sends 40 messages/day
  • Total: 500M x 40 = 20B messages/day

QPS:

  • Message QPS: 20B / 100K = 200,000 writes/sec
  • Peak: ~500,000 writes/sec

Storage:

  • Average message: 100 bytes (text) + 200 bytes (metadata) = ~300 bytes
  • Daily: 20B x 300 bytes = 6 TB/day
  • Yearly: ~2 PB/year

Bandwidth:

  • Incoming: 200K messages/sec x 300 bytes = 60 MB/sec
  • With media (10% of messages have a 200KB image): 20K x 200KB = 4 GB/sec

What this tells you:

  • 500K peak writes/sec -> single database won’t work. Need horizontal sharding.
  • 2 PB/year -> need a distributed storage system (not a single RDBMS)
  • 4 GB/sec bandwidth for media -> need CDN, object storage (S3-style)
  • This is a massive-scale problem. Every component needs horizontal scaling.

Example 3: Twitter-like Feed

Given: 200M DAU

Write path (posting):

  • 1% of users post per day = 2M posts/day
  • Post QPS: 2M / 100K = 20 writes/sec (surprisingly low!)

Read path (feed):

  • Average user refreshes feed 10x/day = 2B feed requests/day
  • Feed QPS: 2B / 100K = 20,000 reads/sec
  • Peak: ~50,000 reads/sec

Fan-out:

  • Average user has 200 followers
  • Each post fans out to 200 timelines
  • Fan-out operations/sec: 20 x 200 = 4,000 timeline writes/sec
  • Celebrity with 10M followers: single post = 10M timeline writes. This is the fan-out problem.

Storage:

  • Each post: ~1 KB (text + metadata)
  • Daily: 2M x 1KB = 2 GB/day (posts are small!)
  • Timeline cache per user: last 200 posts x 1KB = 200 KB
  • Total timeline cache: 200M x 200KB = 40 TB

What this tells you:

  • Write QPS is tiny (20/sec). The challenge isn’t writing posts.
  • Read QPS is massive (50K/sec). Caching is critical.
  • Fan-out is the real problem. A celebrity post triggers millions of writes.
  • Need hybrid approach: fan-out-on-write for normal users, fan-out-on-read for celebrities.
  • 40 TB timeline cache -> Redis cluster with sharding.

The Rounding Rules

Speed matters more than precision. Here are the shortcuts:

Round everything to the nearest power of 10.

  • 86,400 seconds in a day? Use 100,000.
  • 365 days in a year? Use 400 (or just 10 months x 30 days = 300).
  • 1,048,576 bytes in a MB? Use 1,000,000.

Use 2x-3x for peak traffic. Most systems see 2-3x average traffic during peaks. For spiky systems (e-commerce during sales, sports during live events), use 5-10x.

Round storage up, not down. It’s better to over-provision storage than under-provision. Storage is cheap. Running out of storage at 3 AM is not.

State the ratio, not just the number. “1,000 reads/sec and 100 writes/sec” is more useful than “1,100 total QPS.” The 10:1 ratio tells you to optimize for reads.

Common Mistakes

1. Spending Too Long

The estimation should take 60-90 seconds. If you’re doing long division on the whiteboard, you’ve lost the plot. Round aggressively and move on.

2. Not Separating Reads and Writes

“Total QPS is 10,000.” That’s incomplete. A system with 9,000 reads and 1,000 writes is architected completely differently from one with 5,000 reads and 5,000 writes. Always split them.

3. Forgetting Peak Traffic

Average QPS is meaningless for capacity planning. Systems don’t fail at average load. They fail at peak. Always multiply by 2-3x (or more for spiky workloads).

4. Ignoring the Fan-Out Effect

A social media post doesn’t create one write. It creates N writes, where N is the number of followers. A user with 1M followers creates 1M fan-out writes from a single post. This is often the bottleneck, not the ingestion rate.

5. Getting Lost in the Math

The interviewer doesn’t care if the answer is 1,847 QPS or 2,000 QPS. Both lead to the same architecture. What matters is: “This is in the low thousands, so a single server with caching can handle it.” That’s the insight. The number is just a vehicle.

6. Not Connecting Estimation to Architecture

The worst thing you can do is calculate numbers and then ignore them. Every number should lead to a decision:

Estimation Architectural Signal
< 1K QPS Single server, maybe with read replica
1K-10K QPS Load balancer + multiple app servers + read replicas
10K-100K QPS Horizontal scaling, caching layer (Redis/Memcached), possibly sharding
100K+ QPS Distributed system, CDN, database sharding, message queues
< 1 TB storage Single database instance
1-10 TB Consider partitioning, compression
10-100 TB Sharded database or distributed storage
100 TB+ Distributed file system (HDFS, S3), data lake architecture

The Quick-Reference Cheat Sheet

When you’re in the interview and need to move fast:

Users to QPS:

QPS = (DAU x actions_per_user) / 100,000

Storage per year:

Storage = DAU x actions_per_user x bytes_per_action x 365

Bandwidth:

Bandwidth = QPS x bytes_per_request

Machines needed (rough):

A single modern server handles ~10K-50K simple requests/sec
Machines = Peak QPS / 10,000 (conservative)

Cache size:

Follow the 80/20 rule: 20% of data serves 80% of reads
Cache = 0.2 x daily_read_data

Closing Thought

Back-of-envelope estimation is not a math exercise. It’s a calibration tool. The goal is to spend 60 seconds understanding the scale of the problem so that every architectural decision that follows is grounded in reality.

A system designed for 100 QPS looks nothing like a system designed for 100,000 QPS. The estimation is what tells you which one to build. Get the order of magnitude right, connect it to architecture, and move on. That’s all there is to it.