Design Uber's Driver-Rider Matching System

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Rider requests a ride (pickup location, destination) and the system finds the best available driver within seconds
Drivers continuously report their GPS location; the system maintains a real-time view of all available drivers
Matching algorithm considers distance, ETA, driver rating, vehicle type, and supply-demand balance
Support ride types: standard, pool/shared, premium, XL — each with different matching criteria
Handle surge pricing zones and dynamically adjust pricing based on supply-demand ratio

Non-Functional Requirements

Availability: 99.99% — riders cannot request rides if matching is down. Revenue loss is immediate.
Latency: Match a rider to a driver within 3-5 seconds. Location updates processed within 1 second.
Consistency: A driver must never be matched to two riders simultaneously (strong consistency on driver state). Ride pricing can be eventually consistent.
Scale: 5M concurrent drivers, 20M rides/day, 500K location updates/sec globally.
Freshness: Driver location must be accurate within 5 seconds for meaningful ETA calculations.

2. Estimation (3 min)

Traffic

Location updates: 5M drivers × 1 update/4 seconds = 1.25M updates/sec
Ride requests: 20M rides/day = ~230 rides/sec (peak 5x = 1,150/sec)
Match queries: Each ride request queries nearby drivers → geospatial index lookup (1,150 QPS at peak)
ETA calculations: Per match attempt, compute ETA for ~10 candidate drivers = 11,500 ETA computations/sec at peak

Storage

Driver locations: 5M drivers × 64 bytes (id + lat/lng + timestamp + status) = 320MB (fits in memory)
Ride history: 20M rides/day × 1KB × 365 = 7.3TB/year
Geospatial index: In-memory spatial index of 5M points = ~500MB

Key Insight

This is a real-time geospatial matching problem. The core challenge is maintaining an up-to-date spatial index of 5M moving points with 1.25M updates/sec, then efficiently querying it to find optimal matches. The matching itself is a constrained optimization problem (minimize total wait time across all concurrent requests, not just each individual one).

3. API Design (3 min)

Driver Location Updates

POST /v1/drivers/{driver_id}/location
  Body: { "lat": 37.7749, "lng": -122.4194, "heading": 270,
          "speed_mph": 25, "timestamp": 1708632060, "status": "available" }
  Response 200: { "ack": true }
  // Sent every 4 seconds when driver app is active

Ride Request

POST /v1/rides/request
  Body: {
    "rider_id": "rider_123",
    "pickup": { "lat": 37.7749, "lng": -122.4194, "address": "123 Market St" },
    "destination": { "lat": 37.7849, "lng": -122.4094, "address": "456 Mission St" },
    "ride_type": "standard",     // standard, pool, premium, xl
    "payment_method_id": "pm_abc"
  }
  Response 202: { "ride_id": "ride_xyz", "status": "matching",
                  "estimated_pickup_eta": 180, "surge_multiplier": 1.2 }

Match Notification (push to driver)

// Server → Driver via WebSocket / push notification
{
  "type": "ride_offer",
  "ride_id": "ride_xyz",
  "pickup": { "lat": 37.7749, "lng": -122.4194 },
  "destination": { "lat": 37.7849, "lng": -122.4094 },
  "estimated_fare": "$12-15",
  "accept_timeout_seconds": 15
}

// Driver accepts/declines
POST /v1/rides/{ride_id}/respond
  Body: { "driver_id": "driver_456", "action": "accept" }

Key Decisions

Ride request returns 202 (Accepted) immediately — matching happens async, result pushed via WebSocket
Driver has 15 seconds to accept before the offer goes to the next candidate
Location updates are fire-and-forget (no response body needed) to minimize latency

4. Data Model (3 min)

Real-Time Driver State (Redis + Geospatial Index)

// Redis GeoSet for spatial queries
GEOADD drivers:available {lng} {lat} {driver_id}
GEORADIUS drivers:available {lng} {lat} 5 km COUNT 20 ASC

// Redis Hash for driver metadata
HSET driver:{driver_id}
  status: "available"           // available, on_trip, offline
  lat: 37.7749
  lng: -122.4194
  heading: 270
  speed: 25
  vehicle_type: "standard"
  rating: 4.85
  last_update: 1708632060
  current_ride_id: null

Rides (PostgreSQL — ACID for ride lifecycle)

Table: rides
  ride_id          (PK) | uuid
  rider_id         (FK) | uuid
  driver_id        (FK) | uuid          -- NULL until matched
  status                | enum('matching', 'driver_assigned', 'en_route_pickup',
                                'arrived', 'in_progress', 'completed', 'cancelled')
  pickup_lat            | decimal(9,6)
  pickup_lng            | decimal(9,6)
  dest_lat              | decimal(9,6)
  dest_lng              | decimal(9,6)
  ride_type             | enum('standard', 'pool', 'premium', 'xl')
  surge_multiplier      | decimal(3,2)
  fare_cents            | int           -- NULL until completed
  requested_at          | timestamp
  matched_at            | timestamp
  completed_at          | timestamp

Geospatial Index (H3 or S2 cells, in-memory)

// H3 resolution 9 (~174m hexagons) for fine-grained driver lookup
Map<H3CellIndex, Set<DriverId>>

// On location update:
1. Remove driver from old cell
2. Add driver to new cell
3. On query: expand to neighboring cells for radius search

Why Redis GeoSet + H3?

Redis GeoSet provides built-in GEORADIUS for quick nearest-driver lookups
H3 hexagonal grid used as secondary index for supply-demand heat maps and surge pricing zones
PostgreSQL for ride records (durability and relational queries for billing, analytics)

5. High-Level Design (12 min)

Architecture

Driver App                                    Rider App
  │ (location updates every 4s)                  │ (ride request)
  ▼                                              ▼
Location Service                           Ride Service
  → Kafka (location-updates topic)           → Matching Service
  → Location Processor:                        → Find candidates (geospatial query)
    1. Update Redis GeoSet                     → Rank candidates (scoring)
    2. Update H3 supply map                    → Dispatch to best driver
    3. Update driver metadata                  → Wait for accept/decline
                                               → If decline → next candidate
                                               → If timeout → next candidate
  Supply-Demand Service                        → If accepted → ride confirmed
    → Reads H3 supply map
    → Reads ride request rate                Notification Service
    → Computes surge multiplier per zone       → Push ride offer to driver
                                               → Push match result to rider
ETA Service
  → OSRM / Valhalla routing engine
  → Computes driving time from driver to pickup
  → Considers real-time traffic data

Matching Flow (Critical Path)

Rider requests ride
  → Ride Service creates ride record (status: matching)
  → Matching Service triggered:
    1. Query Redis GeoSet: GEORADIUS from pickup, 5km radius, top 20 nearest
    2. Filter: only drivers with matching vehicle_type, status=available
    3. For each candidate (parallel):
       → ETA Service: compute drive time from driver location to pickup
    4. Score each candidate:
       score = w1/ETA + w2*driver_rating + w3*acceptance_rate - w4*surge_distance
    5. Sort by score, pick top candidate
    6. Send ride offer to driver #1 via WebSocket
    7. Start 15-second timer
       → If accepted: lock driver (status=on_trip), notify rider
       → If declined/timeout: offer to driver #2
       → After 3 declines: expand radius to 10km, repeat
       → After 5 total attempts: notify rider "no drivers available"

Components

Location Service: Ingests 1.25M location updates/sec. Publishes to Kafka for async processing. Stateless, horizontally scaled.
Location Processor: Consumes Kafka, updates Redis GeoSet and H3 supply map. Partitioned by city/region.
Ride Service: Manages ride lifecycle state machine. Saga coordinator for the match-dispatch-payment flow.
Matching Service: Core intelligence. Queries geospatial index, runs scoring algorithm, orchestrates the offer-accept cycle.
ETA Service: Wraps OSRM (Open Source Routing Machine) with real-time traffic adjustments. Caches common routes.
Supply-Demand Service: Computes surge multipliers per H3 cell by comparing available drivers to incoming ride requests over a rolling 5-minute window.
Notification Service: WebSocket connections to all active drivers and riders. Delivers match offers and status updates.

6. Deep Dives (15 min)

Deep Dive 1: Geospatial Indexing with H3 and Efficient Nearest-Driver Search

The problem: Finding the nearest available drivers among 5M moving points, where each point moves every 4 seconds. Traditional R-tree spatial indexes degrade with high update rates.

Solution: H3 hexagonal grid + Redis GeoSet hybrid

H3 indexing:

Uber’s H3 system divides the earth into hexagonal cells at multiple resolutions
Resolution 7 (~5.16 km2): used for surge pricing zones
Resolution 9 (~0.105 km2, ~174m edge): used for driver lookup
Each driver is assigned to exactly one H3 cell at resolution 9

Driver lookup algorithm:

1. Convert pickup location to H3 cell at resolution 9
2. Get the cell's k-ring neighbors (k=1 → 7 cells, k=2 → 19 cells, k=3 → 37 cells)
3. For each cell, get all available drivers (O(1) lookup per cell)
4. Compute straight-line distance to filter obvious non-candidates
5. For top 10-15 candidates, compute actual driving ETA via routing engine
6. If fewer than 5 candidates found, expand k-ring (k++)

Why H3 over S2 or geohash?

Hexagonal cells have uniform distance from center to all edges (unlike square geohash cells where corners are sqrt(2) farther)
All neighbors are equidistant (6 neighbors, each sharing an edge) — cleaner radius expansion
H3 is what Uber actually uses in production, battle-tested for ride matching

Update throughput:

Each location update: 1 Redis GEOADD (O(log N)) + 1 H3 cell reassignment (O(1) hash map update)
1.25M updates/sec distributed across 10 city-level shards = 125K updates/shard/sec
Redis benchmarks show 500K+ GEOADD/sec on a single instance — well within budget

Deep Dive 2: Matching Algorithm — Closest vs. ETA-Based vs. Global Optimization

Closest driver (naive):

Sort drivers by straight-line distance to pickup → offer to closest

Problem: Closest driver may be across a river, on a highway, or facing the wrong direction. A driver 1km away but heading toward the rider may arrive faster than one 500m away driving in the opposite direction.

ETA-based matching (Uber’s approach):

For each candidate driver:
  ETA = routing_engine.compute_time(driver.location, pickup.location, traffic=real_time)
  score = -ETA  // minimize ETA
Pick driver with lowest ETA

Better than distance but still greedy — optimizes each ride independently.

Batched global optimization (advanced):

Every 2 seconds, collect all unmatched ride requests in a city:
  requests = [R1, R2, R3, ...]
  available_drivers = [D1, D2, D3, ...]

Solve assignment problem:
  Minimize total_wait_time = sum(ETA(Ri, Dj)) for all matched pairs (Ri, Dj)
  Subject to: each driver assigned to at most 1 ride, each ride gets exactly 1 driver

Algorithm: Hungarian algorithm or auction algorithm
  - Hungarian: O(n^3) — works for batches of ~100 requests
  - Auction algorithm: faster in practice for sparse assignment matrices

Why batched matching is better:

Example with 2 riders (A, B) and 2 drivers (X, Y):
  ETA(A→X) = 3 min, ETA(A→Y) = 5 min
  ETA(B→X) = 4 min, ETA(B→Y) = 2 min

Greedy: A gets X (3 min), B gets Y (2 min) → total = 5 min
But if A arrived 1 second before B, greedy gives:
  A gets X (3 min), B gets Y (2 min) → total = 5 min ✓

What if B arrived first?
  B gets X (4 min), A gets Y (5 min) → total = 9 min ✗

Batched optimal: A→X (3 min), B→Y (2 min) → total = 5 min regardless of arrival order

Production approach: hybrid

Immediate requests (rider waiting > 5 seconds) → greedy ETA-based matching
Fresh requests (within 2-second batch window) → batched optimization
This bounds worst-case wait time while improving average

Deep Dive 3: Supply-Demand Balancing and Surge Pricing

The problem: At 2 AM, there are few drivers. During a concert ending, there are 5,000 ride requests in 5 minutes from one location. Without surge pricing, all riders wait 30+ minutes.

Surge computation:

For each H3 cell at resolution 7 (city-block level):
  supply = count(available_drivers in cell)
  demand = count(ride_requests in last 5 minutes)
  surge_multiplier = max(1.0, demand / (supply * capacity_factor))

  capacity_factor = 3 (each driver can serve ~3 rides per hour)
  Example: 10 drivers, 50 requests → 50 / (10 * 3) = 1.67x surge
  Capped at 3.0x to prevent PR disasters

Dynamic supply repositioning:

When surge is detected in a zone, send “demand heat map” to nearby idle drivers
Nudge message: “Higher earnings in Financial District — 1.8x surge”
Predictive: use historical data (events, weather, time of day) to pre-position drivers
- Stadium event ending at 10 PM → start nudging drivers to the area at 9:45 PM

Surge pricing mechanics:

Rider requests ride in surge zone:
  1. Show surge multiplier upfront: "1.5x surge pricing — estimated fare $18-22"
  2. Rider must confirm ("I accept the higher fare")
  3. Surge multiplier locked at time of request (doesn't change during matching)
  4. Driver sees earnings boost: base_fare * surge_multiplier

Fairness safeguards:

Surge multiplier smoothed over 5-minute windows (no sudden 1.0x → 3.0x spikes)
Maximum surge cap: 3.0x (configurable per city/regulation)
During natural disasters: surge automatically disabled (regulatory requirement)
Wait time guarantee: if surge is active, estimated pickup time must be under 10 minutes

7. Extensions (2 min)

Ride pooling (UberPool): Match multiple riders heading in similar directions to the same driver. Solve a vehicle routing problem (VRP) — group requests with compatible routes, minimize total detour. Offer riders a discount (30%) for sharing.
Driver ETA prediction with ML: Replace static routing with ML model trained on historical trip data. Features: time of day, weather, events, road conditions. Predict actual arrival time more accurately than shortest-path algorithms.
Multi-modal matching: Extend to bikes, scooters, and public transit. Rider requests “cheapest option” → system returns options: UberX ($12, 8 min), scooter ($3, 15 min), bus+walk ($2.50, 25 min).
Scheduled rides: Rider books a ride for 6 AM tomorrow. System pre-assigns a driver 15 minutes before, guaranteeing availability. Requires demand forecasting and driver incentive system.
International expansion: Each city runs as an independent cell (its own Redis, matching service). No cross-city dependency. Deploy in new cities by spinning up a city-cell and configuring local parameters (base fare, surge caps, vehicle types).

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Key Insight#

3. API Design (3 min)#

Driver Location Updates#

Ride Request#

Match Notification (push to driver)#

Key Decisions#

4. Data Model (3 min)#

Real-Time Driver State (Redis + Geospatial Index)#

Rides (PostgreSQL — ACID for ride lifecycle)#

Geospatial Index (H3 or S2 cells, in-memory)#

Why Redis GeoSet + H3?#

5. High-Level Design (12 min)#

Architecture#

Matching Flow (Critical Path)#

Components#

6. Deep Dives (15 min)#

Deep Dive 1: Geospatial Indexing with H3 and Efficient Nearest-Driver Search#

Deep Dive 2: Matching Algorithm — Closest vs. ETA-Based vs. Global Optimization#

Deep Dive 3: Supply-Demand Balancing and Surge Pricing#

7. Extensions (2 min)#