How Redis Went from Single-Threaded to 3.5 Million Ops/Sec

“Redis is single-threaded.” You’ve heard this in every system design interview, every blog post, every tech talk. It was true in 2009. It’s not true anymore - and hasn’t been since 2020. But the way Redis adopted multithreading is more interesting than just flipping a switch. It’s a story about knowing exactly where your bottleneck is and only parallelizing that part.

Why Single-Threaded Was the Right Call

When Salvatore Sanfilippi built Redis in 2009, he made what seemed like a strange choice - a single-threaded event loop for a high-performance database. But it wasn’t strange at all. It was the smartest possible design for an in-memory data store.

The core insight: When all data lives in RAM and most operations are O(1) or O(log N), CPU time per operation is tiny. A single GET takes about 0.1 microseconds to execute. The bottleneck was never computation - it was coordination.

A single thread eliminates:

Locks and synchronization. INCR, LPUSH, and every other command is inherently atomic. No mutexes needed. An uncontended mutex costs 40-100 CPU cycles. Under contention, 10,000+ cycles
Context switching. Thread context switches cost thousands of CPU cycles including kernel transitions and state restoration. A single thread never pays this
Cache line ping-ponging. Multi-threaded access to shared data forces CPU cache invalidation across cores via MESI protocol. L1 cache hit: ~1.2ns. Main memory access after a cache miss: 60-100ns. That’s a 50x penalty
Race conditions. Zero concurrency bugs. Zero. The codebase is dramatically simpler to reason about, debug, and maintain

The Event Loop

Redis uses epoll (Linux) or kqueue (macOS) to monitor thousands of connections with a single thread:

while (server is running) {
    process time events (expiry, background tasks)
    process file events (client reads/writes)
}

One thread, one loop, processing one command at a time. And it hits 80,000-100,000 ops/sec without pipelining. For 2009, that was screaming fast. For most applications today, it’s still more than enough.

Where Single-Threaded Hit a Wall

As Redis became the default caching layer for millions of applications, workloads grew. And the bottleneck revealed itself - not where you’d expect.

The CPU wasn’t busy executing commands. It was busy moving bytes.

For a simple GET key returning a small value, the actual data lookup takes ~0.1 microseconds. But the network I/O around it - reading from the socket, parsing the RESP protocol, writing the response - consumes 10-100x more CPU time than the command itself.

[read from socket]     ~1-5 microseconds     ← bottleneck
[parse RESP protocol]  ~0.5-2 microseconds   ← bottleneck
[execute GET command]  ~0.1 microseconds      ← fast
[write response]       ~1-5 microseconds     ← bottleneck

At 100,000 ops/sec, a single core is saturated - not by data operations, but by network I/O. The main thread spends ~90% of its time on socket reads/writes and protocol parsing, and ~10% actually executing commands.

This meant the single-threaded model was leaving 7+ CPU cores idle on a modern server while one core was maxed out doing I/O work.

Redis 4.0 (2017) - The First Threads

The first multithreading in Redis wasn’t about performance. It was about not blocking your entire server for 10 seconds.

The problem: Deleting a key with 50 million elements via DEL blocked the entire server. No reads, no writes, nothing - until all 50 million elements were freed. Salvatore described it: “if you send Redis ‘DEL mykey’ and your key happens to have 50 million objects, the server will block for seconds.”

The solution: Background threads for expensive memory operations.

DEL bigkey           # blocks main thread until done (old behavior)
UNLINK bigkey        # removes key instantly, frees memory in background
FLUSHALL ASYNC       # clears database in background
FLUSHDB ASYNC        # same, per-database

UNLINK is clever. It checks the deallocation cost of an object. If it’s small (a simple string), it frees immediately like DEL. If it’s large (a hash with millions of fields), it removes the key from the keyspace instantly - making it invisible to all clients - then ships the actual memory reclamation to a background thread.

This required significant refactoring. Salvatore rewrote ~800 lines of “highly bug-sensitive” code over multiple weeks to eliminate shared object references from aggregate data types, making it safe to free them from a different thread.

Configuration options added:

lazyfree-lazy-eviction yes     # async eviction when maxmemory hit
lazyfree-lazy-expire yes       # async deletion of expired keys
lazyfree-lazy-server-del yes   # implicit DEL behaves like UNLINK

This was a surgical change - background threads only for memory deallocation, nothing else. Command execution stayed single-threaded.

Redis 6.0 (2020) - IO Threads

This was the big one. Redis 6.0 introduced IO threads - dedicated threads for reading from and writing to client sockets, while keeping command execution on the main thread.

The Architecture

              READ PHASE                    EXECUTE              WRITE PHASE
         ┌──────────────────┐          ┌─────────────┐     ┌──────────────────┐
Clients ─→ IO Thread 1      │          │             │     │ IO Thread 1      ├─→ Clients
Clients ─→ IO Thread 2      ├─ parse ─→│ Main Thread ├─ ─ →│ IO Thread 2      ├─→ Clients
Clients ─→ IO Thread 3      │          │ (commands)  │     │ IO Thread 3      ├─→ Clients
Clients ─→ Main (as IO 0)   │          │             │     │ Main (as IO 0)   ├─→ Clients
         └──────────────────┘          └─────────────┘     └──────────────────┘

What IO threads do:

Read data from client sockets (read() syscall)
Parse the RESP protocol into commands
Write responses back to client sockets (write() syscall)

What the main thread does:

Accept new connections
Distribute clients to IO threads (round-robin)
Execute ALL commands (data access stays single-threaded)
Generate responses

The Synchronous Drain Model

Redis 6.0’s approach was straightforward but had a limitation:

Main thread distributes pending clients across IO threads (round-robin)
IO threads read and parse (or write) in parallel
IO threads signal completion via atomic counters
Main thread busy-polls - spinning in a loop checking if all IO threads are done
Main thread executes all parsed commands
Repeat for the write phase

The busy-polling was a deliberate tradeoff. It eliminated the latency of condition variables or mutexes (no kernel involvement), but it burned CPU cycles while waiting. IO threads that had no work would spin for up to 1 million iterations before sleeping.

Lock-free synchronization using just three shared variables:

io_threads_pending[id] - atomic counter per thread
io_threads_op - global flag: READ or WRITE
io_threads_list[id] - per-thread client queue

No mutexes for the hot path. The main thread populates queues, IO threads consume only their own queue.

Configuration

io-threads 4          # number of IO threads (default: 1 = disabled)
io-threads-do-reads yes  # also thread read operations (default: no)

Recommended from redis.conf:

4-core machine: 2-3 IO threads
8-core machine: 6 IO threads
“Using more than 8 threads is unlikely to help much”

Limitation: IO threads didn’t work with TLS connections in Redis 6.x.

Redis 7.x (2022) - Holding Pattern

Redis 7 didn’t change the IO threading model. The major IO overhaul was saved for Redis 8.

The most notable threading development for Redis 7 came from AWS, not Redis itself. AWS ElastiCache implemented Enhanced IO Multiplexing on top of Redis 7, where each IO thread pipelines commands from multiple clients into the engine. AWS reported up to 72% better throughput with this enhancement.

Redis 8.0 (2025) - The Async Rewrite

Redis 8 fundamentally redesigned IO threads, fixing three critical problems with the 6.0 model:

Problem 1: Synchronous blocking. The main thread blocked waiting for ALL IO threads to finish. One slow thread stalled everything.

Problem 2: CPU waste. Busy-polling burned CPU cycles. With 8 IO threads, 7 might finish quickly and spin uselessly waiting for the 8th.

Problem 3: No TLS support. Race conditions with TLS connections made IO threads incompatible with encrypted traffic.

The New Model - Event-Driven

         ┌──────────────────────────┐
Clients ─→ IO Thread 1 (own loop)  ├──eventfd──→ ┌─────────────┐
         └──────────────────────────┘              │             │
         ┌──────────────────────────┐              │ Main Thread │
Clients ─→ IO Thread 2 (own loop)  ├──eventfd──→ │ (execute +  │
         └──────────────────────────┘              │  respond)   │
         ┌──────────────────────────┐              │             │
Clients ─→ IO Thread 3 (own loop)  ├──eventfd──→ └──────┬──────┘
         └──────────────────────────┘                     │
                    ↑                                     │
                    └────── responses sent back ──────────┘

Each IO thread now runs its own independent event loop. No more busy-polling. No more synchronous barriers. Communication between IO threads and the main thread uses eventfd or pipes with proper mutex protection.

Key changes:

Least-loaded distribution instead of round-robin. Clients are assigned to the IO thread with the fewest active clients
TLS fully works - all TLS operations moved entirely to IO threads
No busy-polling - threads sleep when idle, wake on events
Certain clients stay on the main thread: replicas, pub/sub subscribers, clients in blocking commands, Lua debug sessions

Redis 8.2-8.6 (2025-2026) - Pushing the Ceiling

Each minor release squeezed out more performance from the async IO model:

Redis 8.2:

49% throughput improvement over 8.0 with 8 IO threads
Exceeds 1,000,000 ops/sec on a single instance (80% reads, 20% writes)

Redis 8.4:

30% throughput increase for caching workloads (90% GET, 10% SET) over 8.2

Redis 8.6:

3,500,000 ops/sec with 11 IO threads, pipeline size 16, on a 16-core machine
Over 5x throughput compared to Redis 7.2 on a single node
Up to 35% lower latency for sorted set commands

The Numbers Over Time

Version	Model	Ops/sec (no pipeline)	Ops/sec (pipeline 16)
Redis 5.x	Single-threaded	~100K	~500K
Redis 6.0	IO threads (sync drain)	~200K	~900K
Redis 7.2	IO threads (sync drain)	~200K	~900K
Redis 8.0	IO threads (async)	~300K	~1.2M
Redis 8.2	IO threads (async, optimized)	~450K	~1.5M+
Redis 8.6	IO threads (async, 11 threads)	~700K+	~3.5M

From 100K to 3.5M - a 35x improvement over 7 years of IO threading evolution.

Why Redis Refuses to Go Fully Multi-Threaded

With IO threads proving so effective, why not go all the way? Make command execution multi-threaded too?

Redis’s official answer: multi-process beats multi-threaded for their use case.

The Math

Multiple Redis instances on one machine scale almost linearly:

Instances	Throughput Scaling
1	1x
2	1.98x
4	3.95x
8	7.89x

Near-perfect linear scaling with zero shared state, zero locks, zero complexity.

The Problems with Full Multithreading

Lock overhead. Every data structure - skiplists, hash tables, sorted sets, streams - would need thread-safe access. Adding locks to every GET and SET would eat a significant chunk of the performance gained from parallelism.

NUMA penalties. On multi-socket servers, threads accessing memory on the wrong NUMA node lose up to 80% performance. Multiple smaller processes naturally partition across NUMA nodes.

Copy-on-write costs. Redis forks for RDB snapshots, AOF rewrites, and replication. A single 50GB process forking causes massive COW memory overhead. Multiple 25GB processes fork independently with manageable overhead.

Complexity. Redis’s simplicity is a feature. Making every data structure thread-safe would be a fundamental rewrite - not an incremental improvement.

The Competitors Who Went Multi-Threaded

KeyDB (by Snapchat): A Redis fork that runs the event loop on multiple threads with a spinlock on the core hash table. At 4 cores, ~66% more ops/sec than single-threaded Redis. But recent benchmarks show Redis 8 with IO threads matches or exceeds KeyDB.

Dragonfly: A ground-up rewrite using shared-nothing multi-threading. Each thread owns a slice of the keyspace (no locks). Claims 25x throughput over single-process Redis.

Redis’s response: A 40-shard Redis Cluster on 64 vCPUs outperformed single-process Dragonfly by 18-43% while using only 40 of 64 cores. Their argument: comparing single-process Redis to multi-threaded Dragonfly is unfair because “this is not how Redis was designed to be used.”

The honest truth: if you have 64 cores and need maximum throughput from a single machine, Redis Cluster with multiple shards is the intended path. IO threads handle the network bottleneck. Multiple processes handle the compute scaling.

Configuration Best Practices

When to enable IO threads:

You have 4+ cores available
Your workload is network-IO bound (high QPS, simple commands like GET/SET)
Many concurrent non-pipelined connections
Single-core CPU is saturating while others are idle

When IO threads won’t help:

Already using heavy pipelining (main thread becomes the bottleneck)
Complex commands (CPU-bound on main thread)
Low connection count
Fewer than 4 cores

Recommended thread counts:

Cores Available	IO Threads Setting
2	Don’t enable
4	2-3
8	6
16	8-11
32+	8-12 (diminishing returns)

Always leave at least 1 core for the OS and background tasks.

CPU pinning for maximum performance:

# Pin Redis to specific cores
numactl -C 2-15 redis-server --io-threads 11

# Keep cores 0-1 for OS and IRQ handling

Monitor with:

# See per-thread CPU usage
top -H -p $(pidof redis-server)

# Redis 8+ thread info
redis-cli INFO THREADS

The Timeline

2009  Redis created - single-threaded event loop
      "The bottleneck is memory, not CPU"

2017  Redis 4.0 - background threads for UNLINK/FLUSH
      "Don't block the main thread for 10 seconds"

2020  Redis 6.0 - IO threads (sync drain model)
      "Network I/O is 90% of CPU usage, parallelize it"

2022  Redis 7.x - no IO threading changes
      AWS adds Enhanced IO Multiplexing on top

2025  Redis 8.0 - async IO threads rewrite
      "Event-driven, not busy-polling"

2025  Redis 8.2 - 1M+ ops/sec single instance

2026  Redis 8.6 - 3.5M ops/sec, 5x over Redis 7
      "IO threads are mature"

Bottom Line

Redis was never “just” single-threaded. It was single-threaded where it mattered (command execution) and gradually added threads where the bottleneck actually was (network I/O, memory deallocation). The result is a system that went from 100K ops/sec to 3.5M ops/sec without sacrificing the simplicity and determinism that made it fast in the first place. The next time someone says “Redis is single-threaded” in an interview, you can tell them the full story.

Why Single-Threaded Was the Right Call#

The Event Loop#

Where Single-Threaded Hit a Wall#

Redis 4.0 (2017) - The First Threads#

Redis 6.0 (2020) - IO Threads#

The Architecture#

The Synchronous Drain Model#

Configuration#

Redis 7.x (2022) - Holding Pattern#

Redis 8.0 (2025) - The Async Rewrite#

The New Model - Event-Driven#

Redis 8.2-8.6 (2025-2026) - Pushing the Ceiling#

The Numbers Over Time#

Why Redis Refuses to Go Fully Multi-Threaded#

The Math#

The Problems with Full Multithreading#

The Competitors Who Went Multi-Threaded#

Configuration Best Practices#

The Timeline#

Bottom Line#

Comments