Database Ops/Sec and Memory Limits - When to Shard and When Not To

You’re in a system design interview. You say “we’ll use PostgreSQL” and immediately follow it with “and we’ll shard it across 16 nodes.” The interviewer asks: “How much traffic are you expecting?” You don’t have a number. You just sharded because it sounded like the senior thing to do.

Here’s the thing - most teams shard too early. A single PostgreSQL node can handle far more than people think. The decision to shard should come from actual numbers, not vibes. This post gives you those numbers.

The Numbers You Need

Before diving into each database, here’s the reference table you’ll keep coming back to. These are realistic single-node numbers on reasonable hardware (8-16 cores, 64GB RAM, NVMe SSD) - not marketing benchmarks on 128-core machines.

Database	Reads/sec	Writes/sec	Practical Data Limit	Latency (P50)
PostgreSQL	100K-200K	20K-70K	500GB-4TB	1-5ms
Redis	100K-500K	100K-400K	25-50GB	<1ms
MongoDB	20K-67K	10K-30K	256GB-1TB	1-5ms
DynamoDB	3K/partition	1K/partition	Unlimited (managed)	5-10ms
Cassandra	15K-25K/node	30K/node	500GB-1TB/node	1-5ms

These numbers assume indexed lookups or simple key-value access - not full table scans or complex joins. Your actual numbers depend on query complexity, data model, and indexing.

PostgreSQL

The database that keeps surprising people with how far it scales on a single node. The PostgreSQL documentation covers tuning parameters in detail.

Throughput:

Simple SELECT by primary key: 100,000-200,000 ops/sec
Read-only pgbench on tuned hardware: 1,000,000+ TPS (synthetic, but shows the ceiling)
Mixed read-write (TPC-B style): 20,000-70,000 TPS
Single-row INSERTs: 10,000-25,000/sec (network-bound). Batched inserts: 100,000+/sec
Analytical scans: 5-10 million rows/sec per core

Memory and storage:

Best performance when working set (active indexes + hot data) fits in shared_buffers + OS page cache
With 64GB RAM, you can comfortably serve 50-200GB of hot data
Maximum table size: 32TB. Maximum database size: no hard limit
Performance degrades when queries regularly hit disk instead of cache

When to shard:

Most workloads run fine on a single node up to 500GB-1TB
Vertical scaling + read replicas + connection pooling (PgBouncer) + table partitioning can push this further
Notion ran on a single PostgreSQL instance serving 30 million users before they needed to shard
Consider sharding at: 1TB+ data with heavy write loads, or when you exceed 50,000+ write TPS

Before you shard, try these first:

Connection pooling (PgBouncer) - most Postgres bottlenecks are connection limits, not CPU
Read replicas for read-heavy workloads
Table partitioning for large tables (time-series data, logs)
Better indexing - a missing index is cheaper to add than a shard is to manage
VACUUM tuning - bloated tables kill performance silently

Redis

The fastest option on this list - because everything lives in memory. The Redis documentation covers benchmarking and configuration.

Throughput:

GET/SET without pipelining: 70,000-100,000 ops/sec
GET/SET with pipelining (16 commands): 400,000-500,000 ops/sec
Redis 8.x with IO threads (16 cores): up to 3.5 million ops/sec with pipelining
Sub-millisecond latency for virtually all operations

Memory and storage:

Everything lives in RAM. Your dataset size equals your available memory
With 64GB RAM, expect 30-50GB of usable storage (after accounting for fragmentation, metadata overhead, and persistence buffers)
Each key has ~50-70 bytes of metadata overhead
Persistence options: RDB snapshots (periodic) or AOF (append-only file, near-real-time)

When to shard (Redis Cluster):

Single instance works well up to 25-50GB
AWS ElastiCache recommends clustering at ~25GB
The bottleneck is usually CPU, not memory - Redis command processing is single-threaded (prior to Redis 8.x IO threading)
Redis Cluster requires minimum 3 master nodes

Common mistake: Using Redis as a primary database instead of a cache. If your Redis data exceeds 50GB, ask yourself whether this data should be in a disk-based database with Redis as a caching layer in front.

MongoDB

Flexible schema, decent throughput, but the sharding decision is more nuanced than people think. The MongoDB sharding documentation is worth reading before making decisions.

Throughput:

Document lookups by indexed field: 20,000-50,000 ops/sec
When working set fits in WiredTiger cache: up to 67,000 ops/sec
Single-document inserts: 10,000-30,000 ops/sec
MongoDB 8.0 improved query performance by ~45% over 7.0 on large datasets

Memory and storage:

WiredTiger cache: 50% of RAM minus 1GB (default). With 64GB RAM, that’s ~31GB cache
Performance is best when working set fits in WiredTiger cache
Snappy compression (default) gives 2-4x compression ratio
Max document size: 16MB

When to shard:

Single replica set handles 256-512GB comfortably
The commonly cited threshold: shard when data exceeds 256GB or write throughput saturates a single primary
MongoDB Atlas supports up to 4TB per shard, but 1TB per shard is the recommended operational limit

Before you shard, try these first:

Better indexes - MongoDB’s explain() is your best friend
Read preference: route reads to secondaries for read-heavy workloads
Schema design - embedding vs referencing makes a huge performance difference
WiredTiger cache tuning

DynamoDB

Different category entirely - it’s a managed service that handles sharding for you. The DynamoDB developer guide explains capacity modes in depth. But you still need to understand the numbers.

Throughput:

Per partition: 3,000 read capacity units (RCU) and 1,000 write capacity units (WCU)
Eventually consistent reads double read capacity (6,000/partition)
Default per-table limit: 40,000 RCU and 40,000 WCU (can be increased)
On-demand mode starts at 4,000 WCU / 12,000 RCU and auto-scales to 2x previous peak
The service handles 89 million requests/sec globally across all customers

Memory and storage:

Each partition holds up to 10GB
Max item size: 400KB (much smaller than MongoDB’s 16MB)
Table size: unlimited - partitions are added automatically
Latency: single-digit milliseconds. With DAX (in-memory cache): microsecond reads

The catch - hot partitions:

DynamoDB auto-shards, but all traffic for a partition key hits one partition. If your partition key is date and everyone queries today’s data, you’ll hit the 3,000 RCU / 1,000 WCU limit on that single partition while other partitions sit idle.

Bad:  partition_key = "2026-03-14"  (all traffic hits one partition)
Good: partition_key = "user_12345"  (traffic spreads across partitions)

When DynamoDB makes sense:

You need zero operational overhead for scaling
Access patterns are simple (key-value or key-range lookups)
You can model your data around partition keys effectively
You’re already in the AWS ecosystem

When it doesn’t: Complex queries, joins, aggregations, or ad-hoc analytics. You’ll pay a fortune in RCUs scanning tables.

Cassandra

Built for writes. Built for distribution. Not built for single-node deployments. The Cassandra documentation covers its architecture and tuning.

Throughput (per node):

Writes: ~30,000 ops/sec (mean latency 1.7ms, P99 5.5ms)
Reads: 15,000-25,000 ops/sec
Cassandra 4.0 improved throughput by 25-33% over 3.11
Write-optimized by design: append-only log-structured merge tree means writes are always fast

Memory and storage:

Modest RAM needs: 16-64GB recommended per node
Heap: 8-16GB. Remaining RAM used for OS page cache
Recommended data per node: 500GB-1TB
Maximum: 3-5TB per node (but bootstrap and repair times become painful beyond 1TB)
Leave 25-50% free disk space for compaction overhead

Cassandra is always distributed:

Unlike the other databases on this list, Cassandra is designed to run as a cluster from day one. A single-node Cassandra deployment is not a recommended production configuration. Minimum recommended: 3 nodes.

When Cassandra makes sense:

Write-heavy workloads (logging, time-series, IoT sensor data)
Multi-region deployments with tunable consistency
Very large datasets (10TB+) that need to scale linearly
You’re okay with limited query flexibility (no joins, limited aggregations)

When it doesn’t: Read-heavy workloads with complex queries. Cassandra’s read path (checking multiple SSTables, bloom filters, merge) is fundamentally slower than its write path.

The Decision Framework

Before reaching for sharding, walk through this checklist:

Step 1: Know your actual numbers

Question	How to find out
Current ops/sec?	Application metrics, database slow query log
Data size?	`pg_database_size()`, `db.stats()`, CloudWatch
Growth rate?	Monthly data size trend over last 6 months
Read vs write ratio?	Application metrics or database audit logs

Step 2: Are you at the limit?

Database	You’re fine	Getting tight	Time to scale
PostgreSQL	<10K mixed TPS, <200GB	10K-50K TPS, 200GB-1TB	>50K write TPS, >1TB
Redis	<50K ops/sec, <10GB	50K-100K ops/sec, 10-25GB	>100K ops/sec, >25GB
MongoDB	<10K ops/sec, <100GB	10K-30K ops/sec, 100-256GB	>30K ops/sec, >256GB
DynamoDB	<1K WCU/partition	Approaching 1K WCU/partition	Hot partition throttling
Cassandra	<10K writes/node	10K-25K writes/node	>25K writes/node, >1TB/node

Step 3: Try cheaper alternatives first

Most “we need to shard” conversations should actually be:

“We need better indexes”
“We need a read replica”
“We need connection pooling”
“We need to cache hot data in Redis”
“We need to archive cold data”

Sharding adds permanent operational complexity. Every query needs routing logic. Cross-shard joins don’t exist. Schema migrations happen N times instead of once. Rebalancing shards is a multi-day operation.

Real-World Scale References

To calibrate your intuition, here’s what well-known companies run on what:

Company	Database	Scale	Notes
Notion	PostgreSQL	30M users, single node (until 2021)	Sharded only when tables hit multiple TB
Instagram	PostgreSQL	Billions of rows	Sharded, but started single-node
Discord	Cassandra (migrated to ScyllaDB)	Trillions of messages	Write-heavy, needed Cassandra’s write throughput
Uber	Cassandra + Redis	Millions of trips/day	Cassandra for durable writes, Redis for real-time
Lyft	DynamoDB	Millions of rides	Chose DynamoDB for zero-ops scaling
Coinbase	MongoDB (migrated to PostgreSQL)	Millions of transactions	Hit MongoDB limits, moved to Postgres

Why Not Just Use Postgres For Everything?

If Postgres handles 100K+ reads and 20K-70K writes per second on a single node, supports JSON, full-text search, and even time-series with extensions - why bother with anything else?

For most teams, you shouldn’t. Postgres covers 90% of use cases. But here’s the honest 10% where it doesn’t:

Problem	Why Postgres Struggles	What to Use Instead
Sub-millisecond cache reads	Disk-based, can’t guarantee <1ms	Redis
100K+ writes/sec sustained	Single-node write scaling hits a wall, sharding is painful	Cassandra, DynamoDB
Schema changing every sprint	ALTER TABLE on a 500GB table can lock for minutes	MongoDB
Multi-region active-active writes	Postgres multi-master is still immature	Cassandra, DynamoDB, CockroachDB
Zero-ops at any scale	Postgres needs vacuuming, connection pooling, replica management, index tuning	DynamoDB
10TB+ with linear scaling	Postgres at 10TB is a full-time DBA job	Cassandra, DynamoDB
Session/rate-limit storage	Overkill, and you need TTL expiry built-in	Redis

The pragmatic approach: Start with Postgres. Add Redis as a cache layer when you need sub-millisecond reads or session storage. That combination handles more scale than 95% of companies will ever need. Only reach for Cassandra, DynamoDB, or MongoDB when you hit a specific wall that Postgres genuinely can’t solve - not because a blog post told you “NoSQL scales better.”

Instagram, Notion, and OpenAI all run on Postgres. If it’s good enough for them at their scale, it’s probably good enough for your startup.

The One Mistake Everyone Makes

Sharding on day one because the system design textbook said so. In reality:

A well-tuned PostgreSQL node handles more traffic than 95% of startups will ever see
Adding a read replica doubles your read capacity with near-zero complexity
Redis in front of your database eliminates most read scaling problems
Vertical scaling (bigger machine) is underrated - doubling your cores is cheaper than managing a distributed database

The right time to shard is when you’ve exhausted vertical scaling, read replicas, caching, and query optimization - and you still can’t keep up. For most teams, that day never comes.

Bottom Line

Know the numbers before you architect. A single PostgreSQL node does 100K+ reads/sec. Redis does 500K+ with pipelining. MongoDB handles 256GB before blinking. These are big numbers - bigger than most applications will ever need. Start simple, measure everything, and scale only when the metrics tell you to - not when your gut does.

The Numbers You Need#

PostgreSQL#

Redis#

MongoDB#

DynamoDB#

Cassandra#

The Decision Framework#

Real-World Scale References#

Why Not Just Use Postgres For Everything?#

The One Mistake Everyone Makes#

Bottom Line#

Comments

The Numbers You Need

PostgreSQL

Redis

MongoDB

DynamoDB

Cassandra

The Decision Framework

Real-World Scale References

Why Not Just Use Postgres For Everything?

The One Mistake Everyone Makes

Bottom Line