Distributed databases face a fundamental tradeoff: you can have consistency or you can have availability during network partitions, but not both (the CAP theorem). Spanner chose consistency in a way that was considered theoretically impossible before Google published the Spanner paper in 2012.

Understanding how it works reveals something important about distributed systems design - sometimes the “impossible” is just “requires hardware nobody else built.”

The Core Problem

In a distributed database spanning multiple datacenters, two operations happen simultaneously: a write to a row in Iowa and a read of that row from Singapore. The read should see the write if it happens “after” it - but what does “after” mean when the two operations are happening in locations 15,000km apart?

Traditional distributed systems solve this with two-phase locking (2PL). To read a value, you acquire a lock. Writes require exclusive locks. This guarantees consistency but creates bottlenecks - everything waits on locks - and causes deadlocks that must be detected and resolved.

Spanner’s approach: skip the locks by making time itself reliable.

TrueTime

Every Google datacenter has GPS receivers and atomic clocks. TrueTime is the API that exposes time with explicit uncertainty bounds:

TrueTime.now() returns: TTinterval{earliest, latest}

Instead of returning “the current time is exactly 14:32:05.123456789,” TrueTime returns “the current time is between 14:32:05.123456000 and 14:32:05.123457100.”

The interval is typically 1-7 milliseconds wide. That small window of uncertainty is the key to everything.

How Consistency Works Without Locking

Spanner assigns each transaction a commit timestamp. The commit timestamp is a real wall-clock time that is guaranteed to be:

  1. Greater than the start time of any transaction that started before this one
  2. Within the TrueTime uncertainty interval of “now”

The commit protocol works like this:

  1. Transaction wants to commit with operations that were prepared at time T
  2. Spanner calls TrueTime.now() and gets interval [T_early, T_late]
  3. Spanner waits until TrueTime.now().earliest > T_late (the “commit wait”)
  4. The commit is complete

The commit wait, typically 5-10ms, ensures that by the time any other node receives the commit, the committed timestamp is definitively in the past for every node in the world.

Any subsequent read with a timestamp after the commit timestamp will see the write. This is called “external consistency” or “strict serializability” - the gold standard of distributed consistency.

Why This Is Remarkable

Every other distributed database that achieved similar consistency before Spanner did it through explicit coordination: Paxos or Raft consensus, distributed locks, or a global sequence number server. All of these create coordination bottlenecks.

Spanner avoids coordination by making time reliable enough to serve as a coordination mechanism. If every datacenter agrees on “what time it is” within a bounded error, you can use timestamps as causal ordering without communicating between datacenters for each transaction.

The catch: this only works if you have atomic clocks and GPS receivers in every datacenter. Which is a catch that only Google could satisfy at the time. Now AWS (with Time Sync Service), Azure, and cloud providers generally have improved time synchronization, though none match Google’s TrueTime bounds.

The Practical Architecture

Spanner’s data layout:

  • Data is stored in tablets (like bigtable tablets), distributed across Paxos groups
  • Each Paxos group spans at least three replicas across different zones/regions
  • A leader replica handles writes and coordinates the TrueTime commit protocol
  • Any replica can serve reads with timestamps in the “safe range” (past the commit wait window)

For reads:

  • Snapshot reads at a specific timestamp can be served from any replica without any coordination - they just check the timestamp against the local replica’s data
  • Strong reads (read the latest committed data) require waiting for the commit wait window to pass

This means read-heavy workloads distribute globally without any cross-datacenter coordination. A read in Singapore of data written in Iowa requires no network round-trips to Iowa.

The Latency Profile

Operation Latency
Single-region read <1ms
Multi-region read (snapshot) <10ms
Single-region write (commit) 5-15ms (includes commit wait)
Multi-region write (commit) 20-100ms (Paxos consensus across regions)

Multi-region write latency is dominated by the speed of light between datacenters plus the commit wait. You cannot get much faster than 20-30ms for a write that must be durable in multiple US regions.

Spanner vs CockroachDB vs YugabyteDB

CockroachDB and YugabyteDB are Spanner-inspired systems available outside Google:

Property Spanner CockroachDB YugabyteDB
Clock source TrueTime (GPS/atomic) HLC (hybrid logical clock) HLC
Consistency guarantee External consistency Serializable Serializable
Commit wait required? Yes (TrueTime) No (HLC-based) No (HLC-based)
Uncertainty window 1-7ms Higher (NTP-based) Higher

HLC (Hybrid Logical Clock) achieves similar consistency guarantees without hardware clocks by using a combination of physical time and logical counters. The consistency guarantees are slightly weaker (serializable rather than strict serializable) but work on commodity hardware.

When You Actually Need This

Spanner solves a specific class of problems:

  • Financial systems that need globally consistent account balances
  • Inventory systems where overselling is catastrophic
  • Any globally distributed write-heavy workload that cannot tolerate eventual consistency

For most applications, a single-region Postgres with read replicas is sufficient and much simpler. The Spanner approach is engineering for a problem space that requires genuine global consistency, which is a smaller category than “globally deployed application.”

Bottom Line

Spanner’s insight is that clock synchronization can replace distributed coordination as a consistency mechanism. TrueTime makes this work in practice by bounding the uncertainty in wall-clock time to a small enough interval that a short commit wait provides ironclad ordering guarantees.

The architecture is elegant and the consistency guarantees are the strongest in the industry. The practical lesson for most engineers is the design principle: when you need coordination between distributed components, ask whether improving the reliability of a shared resource (in this case, time) can replace explicit coordination protocols.