You’ve nailed the functional requirements. You know what the system does. Now comes the question that separates architects from feature builders: under what constraints does it do it?
This is where non-functional requirements come in. And this is where most candidates say something like:
“The system should be highly available, scalable, and have low latency.”
That’s not an NFR. That’s a wish list. Every system should be “highly available.” Saying it adds zero information. The interviewer is waiting for you to quantify, prioritise, and make trade-offs - because NFRs are where trade-offs live.
What NFRs Actually Are
Functional requirements define what the system does. Non-functional requirements define how well it does it, and what it sacrifices to get there.
NFRs are constraints on the system’s quality attributes. They answer questions like:
- How fast must it respond?
- How many users must it support simultaneously?
- What happens when a server goes down?
- Can the data be stale, or must it be real-time?
- How much data will we store in a year?
Here’s the critical insight: NFRs force architectural trade-offs. You can’t have maximum consistency and maximum availability. You can’t have sub-millisecond latency and strong durability guarantees. Every NFR you state either opens or closes architectural options.
When you say “highly available,” the interviewer hears nothing. When you say “99.99% uptime, which means we can tolerate about 52 minutes of downtime per year, and we’ll prioritise availability over strong consistency for the read path,” the interviewer hears someone who designs real systems.
The NFR Framework
Here’s how to structure non-functional requirements. Not all categories apply to every problem - pick the ones that matter and go deep on those.
1. Scale: How Big Is This System?
Before you can make any architectural decision, you need to understand the order of magnitude. This isn’t about exact math - it’s about knowing whether you’re designing for thousands or billions. For a detailed walkthrough of estimation techniques, see the back-of-envelope calculation guide.
What to estimate:
- Users: DAU (Daily Active Users), peak concurrent users
- Throughput: Requests per second (read and write separately)
- Storage: Data generated per day/month/year
- Bandwidth: Data transferred per second
How to do it in an interview:
Start with a single anchor number and derive everything else.
Example - URL Shortener:
“Let’s say we have 100M DAU. If each user creates 1 short URL per day on average, that’s ~100M writes/day, or roughly 1,200 writes/sec. But reads are much higher - say each URL is clicked 10x on average - so 12,000 reads/sec. This is a read-heavy system, roughly 10:1 read-to-write ratio.”
“For storage: each URL mapping is maybe 500 bytes (short code + long URL + metadata). At 100M new URLs per day, that’s ~50 GB/day, or ~18 TB/year. If we keep URLs for 5 years, we’re looking at ~90 TB total.”
This takes 60 seconds and tells the interviewer three critical things:
- It’s read-heavy → caching matters, write optimisation is secondary
- 12K reads/sec → a single database won’t cut it, need replicas or cache (see database ops/sec benchmarks for reference)
- 90 TB → need to think about storage strategy, partitioning, cleanup
Anti-pattern: Spending five minutes doing exact arithmetic. Round aggressively. 86,400 seconds in a day? Call it 100K. Close enough. The point is the order of magnitude, not the decimal.
2. Latency: How Fast Must It Be?
Not everything needs to be fast. The question is: what operations need to be fast, and how fast is fast enough?
Break latency into tiers:
| Tier | Target | Examples |
|---|---|---|
| Real-time | < 100ms | Search autocomplete, feed loading, message delivery |
| Interactive | 100ms – 1s | Page loads, API responses, posting content |
| Background | 1s – minutes | Email delivery, notification fanout, analytics |
| Batch | Minutes – hours | Report generation, data pipeline, recommendations |
Stating the tier for each major operation tells the interviewer which paths are performance-critical and which can be async.
Example - Chat System:
“Message delivery should be real-time - under 200ms for online recipients. Message history loading is interactive - under 500ms for a page of 50 messages. Read receipts can be background - a few seconds is fine. Search across message history is interactive for recent messages but can be slower (1-2s) for older messages.”
Now the interviewer knows:
- Delivery path → WebSocket, in-memory routing, no disk on the hot path
- History → indexed, paginated, probably cached
- Read receipts → can be batched, eventually consistent
- Search → separate service, probably Elasticsearch, can be eventually indexed
3. Availability: How Much Downtime Is Acceptable?
Availability is measured in “nines.” Each additional nine is exponentially harder (and more expensive) to achieve.
| Availability | Downtime/year | Downtime/month |
|---|---|---|
| 99% (two nines) | 3.65 days | 7.3 hours |
| 99.9% (three nines) | 8.76 hours | 43.8 minutes |
| 99.99% (four nines) | 52.6 minutes | 4.38 minutes |
| 99.999% (five nines) | 5.26 minutes | 26.3 seconds |
Most systems target 99.9% to 99.99%. The question to ask yourself is: what is the blast radius of downtime?
- A social media feed being down for 10 minutes? Annoying, not catastrophic. Three nines is fine.
- A payment system being down for 10 minutes? Revenue loss, broken transactions. Four nines minimum.
- Air traffic control? Five nines. Different universe entirely.
How to state it:
“We should target 99.99% availability for the write path - users must always be able to create short URLs. For the redirect path, 99.9% is acceptable since a brief outage means users see an error page, not data loss. The analytics dashboard can tolerate even lower availability - 99% is fine.”
Notice how different parts of the same system have different availability targets. This is senior-level thinking. It means you can invest differently - heavy redundancy on the write path, simpler architecture for analytics.
4. Consistency: Can the Data Be Stale?
This is the CAP theorem in practice - not the textbook version, but the real-world question: when a user writes data, how quickly must other users see it?
Three consistency models you should know cold:
Strong consistency: Every read sees the most recent write. Required when correctness matters more than speed.
- Bank balance after a transfer
- Inventory count for the last item in stock
- User authentication state (logged in / logged out)
Eventual consistency: Reads may return stale data, but will converge. Acceptable when slight staleness doesn’t cause harm.
- Social media feed (a post appearing 2 seconds late is fine)
- Like counts (showing 999 instead of 1000 briefly is harmless)
- Notification delivery (a few seconds delay is expected)
Read-after-write consistency: A user always sees their own writes immediately, but other users may see them with a delay.
- User updates their profile → they see the change instantly
- User posts a tweet → it appears on their timeline immediately, followers see it seconds later
How to state it:
“For the URL shortener, we need strong consistency on the write path - after creating a short URL, it must immediately resolve. We can’t have a window where the URL returns 404. For analytics (click counts), eventual consistency is fine - the count can lag by a few seconds.”
This directly drives your database choice and replication strategy. Strong consistency on writes? You need synchronous replication or a single-leader setup. Eventual consistency on reads? You can use read replicas and caching aggressively.
5. Durability: Can We Lose Data?
Durability is about data surviving failures. It’s separate from availability - a system can be unavailable (you can’t access it) but still durable (your data is safe on disk, waiting for recovery).
Questions to ask:
- Is any data loss acceptable? (Usually no for user-generated content, yes for ephemeral data like sessions)
- What’s the RPO (Recovery Point Objective)? If the system crashes, how much recent data can we afford to lose?
- What’s the RTO (Recovery Time Objective)? How quickly must the system recover?
Example - File Storage System:
“Zero data loss for uploaded files - RPO of zero. Files must be replicated across at least three availability zones before we acknowledge the upload as successful. For metadata (tags, descriptions), RPO of a few seconds is acceptable - we can use async replication. RTO for the overall system is under 5 minutes - we need automated failover.”
6. Security and Privacy
Don’t spend your entire interview on security, but show awareness where it matters:
- Authentication and authorisation: Who can access what? Multi-tenancy isolation?
- Data encryption: At rest? In transit? End-to-end?
- PII handling: Any personal data subject to GDPR/compliance?
- Rate limiting: Protection against abuse?
One sentence is often enough:
“All data encrypted in transit via TLS. User PII encrypted at rest. API endpoints protected with rate limiting - 100 requests per minute per user. Short URL generation requires authentication to prevent abuse.”
7. Operational Requirements
These are often forgotten but they signal production experience:
- Monitoring: What metrics matter? (p99 latency, error rate, queue depth)
- Observability: Distributed tracing, structured logging
- Deployment: Zero-downtime deploys? Blue-green? Canary?
- Disaster recovery: Multi-region? Backup strategy?
You don’t need to design these - just mention them. It shows you’ve run systems in production, not just drawn them on whiteboards.
The NFR Trade-Off Matrix
The real power move is showing that you understand NFRs aren’t independent - they trade off against each other. Here are the big ones:
Consistency vs. Availability (CAP Theorem)
- Stronger consistency → more coordination → higher latency, lower availability
- Higher availability → accept stale reads → weaker consistency
Latency vs. Durability
- Faster writes → write to memory, acknowledge immediately → risk data loss on crash
- Safer writes → write to disk, replicate, then acknowledge → slower response
Latency vs. Consistency
- Faster reads → serve from nearest cache/replica → possibly stale
- Fresh reads → read from leader/source of truth → higher latency
Cost vs. Availability
- Higher availability → more redundancy, more regions → higher infrastructure cost
- Lower cost → fewer replicas, single region → higher risk of downtime
When you state an NFR, follow it with the trade-off you’re accepting:
“We’ll prioritise availability over consistency for the read path. This means a user might see a like count that’s a few seconds stale, but the feed never goes down. For the write path, we’ll prioritise consistency - a user must see their own post immediately after creating it.”
This is the hallmark of architectural thinking. You’re not saying “we want everything” - you’re saying “we choose this and accept that.”
Full Example: Design a Payment System
Let’s walk through NFRs for a system where they really matter.
Interviewer: “Design a payment processing system.”
Scale
“Assuming 10M transactions per day, that’s roughly 115 transactions per second on average. At peak (2-3x average), we’re looking at 300-400 TPS. Each transaction record is about 1 KB (amount, currency, payer, payee, status, timestamps, metadata). That’s ~10 GB/day, or ~3.6 TB/year. We need to retain transaction data for 7 years for compliance, so ~25 TB total.”
Latency
“Payment initiation must respond within 1 second - the user clicks ‘Pay’ and sees confirmation (or failure) quickly. Payment processing (actual fund movement) can happen in the background (seconds to minutes) - this is normal for banking. Payment status queries must be under 200ms - users check ‘did it go through?’ frequently.”
Availability
“99.99% availability on the payment initiation path. Users must always be able to start a payment. For the processing pipeline, 99.9% is acceptable - a brief processing delay is normal. The reporting/dashboard can tolerate 99%.”
Consistency
“Strong consistency for the transaction ledger - we cannot have phantom transactions or double charges. This is non-negotiable. Read-after-write consistency for payment status - after a payment is processed, the user must see the updated status immediately. Eventual consistency for aggregate reporting (daily totals, merchant dashboards) - a lag of minutes is acceptable.”
Durability
“Zero data loss for transaction records. RPO of zero - every committed transaction must survive any single point of failure. We’ll use synchronous replication to at least two availability zones before acknowledging a transaction. RTO of under 2 minutes with automated failover.”
Security
“All traffic over TLS. Payment card data is never stored - we tokenise via a PCI-compliant provider. Transaction data encrypted at rest. Strict access control - no engineer can access production payment data without audit trail. Rate limiting to prevent card testing attacks.”
The Trade-Offs Being Made
“The choice here is consistency over availability for the write path - if we can’t guarantee the transaction is recorded, we reject it rather than risk inconsistency. This means during a network partition, some payments will fail rather than proceed in an uncertain state. For the read path (checking status), the choice is availability over consistency - it’s better to show a slightly stale status than to be unable to show anything.”
Notice how every NFR directly shapes the architecture:
- Strong consistency → single-leader database for the ledger, no caching on the write path
- 400 TPS → manageable for a well-tuned relational database, no need for NoSQL
- Zero RPO → synchronous replication, not async
- 99.99% → multi-AZ deployment, automated failover, no single points of failure
- Tokenisation → third-party payment processor integration
Common Anti-Patterns
1. The Buzzword Bingo
“Scalable, available, reliable, fault-tolerant, secure, performant.” You’ve said six words and communicated nothing. Every one of these needs a number or a trade-off attached to it.
2. The Over-Estimation
“We need to handle 1 billion requests per second.” Really? Even Google Search handles ~100K QPS. Be realistic. If you’re designing a chat app for a startup, 10K concurrent users is more honest - and it changes the architecture entirely (you probably don’t need Kafka).
3. The Under-Estimation
The opposite problem. “Maybe 100 users.” If the interviewer is asking you to design it, they expect scale. Ask: “What scale should I design for - thousands, millions, or billions of users?” Let the interviewer set the anchor.
4. The Consistency Cop-Out
“We’ll use eventual consistency everywhere.” This avoids making hard choices. Some operations need strong consistency. Show that you know which ones and why.
5. Ignoring the Read/Write Split
Treating all traffic the same. Most systems are either read-heavy or write-heavy. This distinction drives caching strategy, database topology, and replication model. Always state the ratio.
6. NFRs Without Numbers
“Low latency.” How low? “High throughput.” How high? “Minimal downtime.” How minimal? If you can’t put a number on it, you haven’t defined it - you’ve just described a vague aspiration.
The Checklist
Use this before you start drawing boxes.
- Scale estimated - DAU, RPS (read and write separately), storage per year
- Read/write ratio stated explicitly
- Latency targets set per operation tier (real-time, interactive, background, batch)
- Availability target stated with nines, different targets for different paths if needed
- Consistency model chosen per data path (strong, eventual, read-after-write)
- Trade-offs articulated - what you’re sacrificing and why
- Durability requirements stated - RPO and RTO for critical data
- Security and compliance mentioned where relevant
- Numbers are realistic - you haven’t over/under-estimated by orders of magnitude
- Every NFR maps to an architectural decision - if it doesn’t change the design, it’s noise
How FRs and NFRs Work Together
Your FRs tell you what components to build. Your NFRs tell you how to build them.
| FR | NFR | Architectural Decision |
|---|---|---|
| User can send a message | Delivery under 200ms | WebSocket, not HTTP polling |
| User can search message history | Results under 1s for recent, 2s for old | Elasticsearch with time-partitioned indices |
| Messages are never lost | Zero RPO | Write-ahead log, synchronous replication |
| Support 10M concurrent users | 99.99% availability | Multi-region, horizontal scaling, no SPOF |
When you present your requirements this way - FRs paired with NFRs, each pair driving a specific architectural choice - the interviewer sees exactly what they’re looking for: a structured thinker who designs from requirements, not from memorised templates.
Closing Thought
NFRs are where senior engineers earn their keep. Anyone can list features. The hard part is deciding: how available? How consistent? How fast? And most importantly - what are we willing to give up?
Once you’ve defined these constraints, pair them with your functional requirements and feed them into a structured interview framework to drive the rest of your design.
Every architectural diagram you’ve ever seen is an answer to these questions. The boxes aren’t the architecture. The trade-offs between the boxes are. And those trade-offs start here, in the first five minutes, when you define the constraints that make the system real.
Comments