Design a Stock Price Alert System

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
4. Data Model (3 min)
5. High-Level Design (12 min)
- System Architecture
- Component Breakdown
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Create price alerts — users set alerts on instruments with conditions: price above/below a threshold, or percentage change from a reference price (e.g., “alert me if RELIANCE drops 5% from ₹2500”)
Real-time matching — alerts must be evaluated against every incoming price tick and trigger within seconds of the condition being met
Multi-channel notification — triggered alerts are delivered via push notification, SMS, and/or email based on user preference
Alert lifecycle management — alerts have states: active → triggered → (snoozed | expired | deleted). Users can snooze (re-arm after cooldown), edit, or delete alerts.
Alert dashboard — users can list, filter, and manage all their alerts (active, triggered history, expired)

Non-Functional Requirements

Availability: 99.95% — missing an alert during a market crash is unacceptable. Degraded mode: delay delivery, but never lose a triggered alert.
Latency: < 5 seconds from price crossing the threshold to notification delivery (push). < 30 seconds for SMS/email.
Scale: 100M total alerts across 10M users. 500K price ticks/sec ingested from market data feed. Average 10 alerts per user.
Throughput: During volatile markets, up to 1M alerts could trigger within a 1-minute window (e.g., broad market crash).
Durability: Triggered alerts must be persisted before notification is attempted. At-least-once delivery guarantee for notifications.

2. Estimation (3 min)

Traffic

Price ticks ingested: 5,000 instruments × 100 ticks/sec = 500K ticks/sec
Alert evaluations: Each tick must be checked against all active alerts for that instrument. Average 20K active alerts per instrument (100M alerts / 5,000 instruments) → 500K × 20K = 10B comparisons/sec (naive approach — this is why efficient matching is critical)
Alert triggers: Normal day: ~500K alerts trigger/day. Volatile day: ~5M alerts trigger/day.
Alert creation: ~1M new alerts/day, ~500K deletions/day

Storage

Alert records: 100M alerts × 200 bytes = 20 GB — fits in memory for hot path
Alert history: 500K triggers/day × 300 bytes = 150 MB/day → ~55 GB/year
Notification logs: 1.5M notifications/day (3 channels avg per trigger) × 200 bytes = 300 MB/day

Key Insight

The core challenge is efficient matching: 500K ticks/sec against 100M alerts. Naive O(alerts_per_instrument) scan per tick is 10B comparisons/sec — too expensive. We need a data structure that answers “which alerts are triggered by price X?” in O(log N + K) where K is the number of triggered alerts. Sorted sets (by trigger price) or interval trees make this possible.

3. API Design (3 min)

Alert CRUD

POST /api/v1/alerts
  Headers: Authorization: Bearer <token>
  Body: {
    "instrument_id": "NSE:RELIANCE",
    "condition": "PRICE_ABOVE",        // PRICE_ABOVE | PRICE_BELOW | PCT_CHANGE_UP | PCT_CHANGE_DOWN
    "threshold": 2600.00,              // target price for PRICE_ABOVE/BELOW
    "reference_price": 2500.00,        // base price for PCT_CHANGE (optional, defaults to current LTP)
    "pct_threshold": 5.0,              // for PCT_CHANGE conditions
    "channels": ["push", "email"],     // notification channels
    "expires_at": "2025-06-30T15:30:00Z",  // optional expiry
    "note": "Breakout above resistance"     // user's personal note
  }
  Response 201: {
    "alert_id": "alert_abc123",
    "status": "ACTIVE",
    "created_at": "2025-02-15T10:30:00Z"
  }

GET    /api/v1/alerts?status=ACTIVE&instrument_id=NSE:RELIANCE  — list alerts with filters
GET    /api/v1/alerts/{alert_id}                                 — get alert detail
PUT    /api/v1/alerts/{alert_id}                                 — modify alert (threshold, channels)
DELETE /api/v1/alerts/{alert_id}                                 — delete alert

POST   /api/v1/alerts/{alert_id}/snooze
  Body: { "duration_minutes": 30 }                               — snooze triggered alert

Alert History

GET /api/v1/alerts/history?from=2025-02-01&to=2025-02-15       — triggered alert history

Key Decisions

Percentage change alerts require a reference_price — this is captured at alert creation time (not recalculated). Otherwise “5% drop” is ambiguous (5% from when?).
Channel selection is per-alert, not global — a user might want push for price-above alerts but SMS for crash alerts.
Expiry is optional — alerts without expiry live until triggered or manually deleted. We run a daily cleanup job to expire alerts on instruments that have been delisted.

4. Data Model (3 min)

Alerts Table (PostgreSQL — source of truth)

alerts
├── alert_id          UUID PRIMARY KEY
├── user_id           BIGINT NOT NULL (FK → users)
├── instrument_id     VARCHAR(30) NOT NULL          -- "NSE:RELIANCE"
├── condition         ENUM('PRICE_ABOVE','PRICE_BELOW','PCT_CHANGE_UP','PCT_CHANGE_DOWN')
├── threshold         DECIMAL(12,2) NOT NULL        -- computed target price (even for % alerts)
├── reference_price   DECIMAL(12,2)                 -- original reference for % alerts
├── pct_threshold     DECIMAL(5,2)                  -- NULL if not a % alert
├── channels          VARCHAR(50) NOT NULL           -- comma-separated: "push,email,sms"
├── status            ENUM('ACTIVE','TRIGGERED','SNOOZED','EXPIRED','DELETED')
├── note              TEXT
├── snooze_until      TIMESTAMP                     -- NULL unless snoozed
├── expires_at        TIMESTAMP                     -- NULL = no expiry
├── triggered_at      TIMESTAMP                     -- NULL until triggered
├── triggered_price   DECIMAL(12,2)                 -- actual price that triggered
├── created_at        TIMESTAMP NOT NULL
├── updated_at        TIMESTAMP NOT NULL
├── INDEX (instrument_id, status, threshold)         -- critical for range queries
├── INDEX (user_id, status)                          -- for user's alert dashboard
├── INDEX (status, expires_at)                       -- for expiry cleanup job
└── INDEX (status, snooze_until)                     -- for snooze re-activation

Trigger History Table

alert_triggers
├── trigger_id        UUID PRIMARY KEY
├── alert_id          UUID (FK → alerts)
├── user_id           BIGINT
├── instrument_id     VARCHAR(30)
├── condition         VARCHAR(20)
├── threshold         DECIMAL(12,2)
├── triggered_price   DECIMAL(12,2)
├── triggered_at      TIMESTAMP NOT NULL
├── notifications     JSONB                          -- delivery status per channel
└── INDEX (user_id, triggered_at DESC)

In-Memory Alert Index (Redis Sorted Sets — hot path for matching)

Key:   alerts:above:{instrument_id}
Score: threshold price
Value: alert_id

Key:   alerts:below:{instrument_id}
Score: threshold price (negated for reverse ordering)
Value: alert_id

Example:
  ZADD alerts:above:NSE:RELIANCE 2600.00 "alert_abc123"
  ZADD alerts:above:NSE:RELIANCE 2700.00 "alert_def456"

On tick LTP = 2650:
  ZRANGEBYSCORE alerts:above:NSE:RELIANCE -inf 2650  → returns ["alert_abc123"]
  (all alerts with threshold ≤ 2650 are triggered)

Why This Dual-Store Approach?

PostgreSQL is the durable source of truth — all CRUD operations go here first
Redis sorted sets provide O(log N + K) matching — we can find all triggered alerts for a price tick without scanning all alerts for that instrument
On alert creation: write to PostgreSQL, then add to Redis sorted set
On alert trigger: remove from Redis, update PostgreSQL status, enqueue notification

5. High-Level Design (12 min)

System Architecture

┌──────────────┐
│  Exchange     │
│  Market Data  │
│  Feed         │
└──────┬───────┘
       │ 500K ticks/sec
       ▼
┌──────────────────┐     ┌─────────────────────────┐
│  Market Data     │────▶│  Kafka                   │
│  Ingestion       │     │  (topic: price-ticks     │
│  Service         │     │   partitioned by          │
└──────────────────┘     │   instrument_group)       │
                         └────────────┬──────────────┘
                                      │
                    ┌─────────────────┼─────────────────┐
                    ▼                 ▼                  ▼
           ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
           │  Alert Matcher │ │  Alert Matcher │ │  Alert Matcher │
           │  Worker 1      │ │  Worker 2      │ │  Worker N      │
           │  (instruments  │ │  (instruments  │ │  (instruments  │
           │   A-F)         │ │   G-M)         │ │   N-Z)         │
           └───────┬───────┘ └───────┬───────┘ └───────┬───────┘
                   │                 │                  │
                   │    Redis Sorted Sets (per instrument)
                   │    ┌─────────────────────────┐
                   └───▶│  alerts:above:RELIANCE   │
                        │  alerts:below:RELIANCE   │
                        │  alerts:above:INFY       │
                        │  ...                     │
                        └─────────────────────────┘
                                      │
                              triggered alerts
                                      │
                                      ▼
                         ┌─────────────────────────┐
                         │  Kafka                   │
                         │  (topic: triggered-alerts│
                         │   partitioned by user_id)│
                         └────────────┬─────────────┘
                                      │
                    ┌─────────────────┼─────────────────┐
                    ▼                 ▼                  ▼
           ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
           │  Notification  │ │  Notification  │ │  Notification  │
           │  Worker 1      │ │  Worker 2      │ │  Worker N      │
           │  (push)        │ │  (SMS)         │ │  (email)       │
           └───────────────┘ └───────────────┘ └───────────────┘
                   │                 │                  │
                   ▼                 ▼                  ▼
              FCM / APNs        Twilio / SNS      SES / SendGrid


┌──────────────────┐     ┌──────────────────┐
│  API Service      │────▶│  PostgreSQL       │
│  (CRUD for alerts)│     │  (alerts table)   │
│                   │────▶│                   │
└──────────────────┘     └──────────────────┘
        │
        └───────────────▶ Redis (sync alert to sorted set on create/update/delete)

Component Breakdown

1. Market Data Ingestion Service

Receives raw tick data from exchange feed (UDP multicast or TCP)
Normalizes format, adds receive timestamp
Publishes to Kafka price-ticks topic, partitioned by instrument group (hash of instrument_id mod N)
Stateless — can be horizontally scaled with exchange feed multicast

2. Alert Matcher Workers (Core Engine)

Each worker consumes from one or more Kafka partitions (owns a subset of instruments)
On each tick for instrument X with LTP P:
- ZRANGEBYSCORE alerts:above:X -inf P → all “price above” alerts where threshold ≤ P
- ZRANGEBYSCORE alerts:below:X P +inf → all “price below” alerts where threshold ≥ P
- For each matched alert: remove from sorted set, publish to triggered-alerts Kafka topic
Stateless workers — Kafka consumer group handles rebalancing on failure

3. Alert API Service

Standard REST API for CRUD operations
On create: write to PostgreSQL → sync to Redis sorted set (within same transaction boundary using outbox pattern)
On delete: remove from PostgreSQL → remove from Redis sorted set
On modify: update PostgreSQL → remove old entry from Redis → add new entry to Redis
Handles user authentication, input validation, and rate limiting (100 alerts per user max)

4. Notification Dispatcher

Consumes from triggered-alerts Kafka topic (partitioned by user_id to prevent duplicate sends)
For each triggered alert:
- Look up user’s notification preferences and tokens (cached in Redis)
- Fan out to channel-specific queues: push, SMS, email
- Each channel has its own retry policy and circuit breaker
Push: FCM (Android) + APNs (iOS) — < 5 second delivery
SMS: Twilio with fallback to SNS — < 30 second delivery
Email: SES with SendGrid fallback — < 30 second delivery

5. Alert Lifecycle Manager (Background Jobs)

Expiry job: Runs every minute — finds alerts where expires_at < now() and status = ACTIVE → marks EXPIRED, removes from Redis
Snooze re-activation: Runs every minute — finds alerts where snooze_until < now() and status = SNOOZED → marks ACTIVE, re-adds to Redis sorted set
Cleanup job: Daily — archives old triggered alerts to cold storage (S3 + Athena)

6. Deep Dives (15 min)

Deep Dive 1: Efficient Alert Matching — Sorted Sets vs. Interval Trees

The Problem: We receive 500K ticks/sec. Each tick for instrument X needs to find all alerts triggered by price P. With 100M alerts across 5,000 instruments, that’s ~20K alerts per instrument on average. We cannot scan all 20K alerts per tick — that’s 10B comparisons/sec.

Approach 1: Redis Sorted Sets (Chosen)

For each instrument, maintain two sorted sets:

alerts:above:{instrument_id}  — sorted by threshold (ascending)
alerts:below:{instrument_id}  — sorted by threshold (descending via negated scores)

On tick with LTP = P for instrument X:

# Find all "price above" alerts where threshold ≤ P (condition met)
triggered_above = ZRANGEBYSCORE alerts:above:X -inf P

# Find all "price below" alerts where threshold ≥ P (condition met)
triggered_below = ZRANGEBYSCORE alerts:below:X P +inf

# Remove triggered alerts atomically
for alert_id in triggered_above + triggered_below:
    ZREM alerts:above:X alert_id   (or alerts:below:X)

Complexity: O(log N + K) per tick where N = alerts for that instrument, K = triggered alerts. On a normal tick, K = 0 or 1, so this is effectively O(log N) = O(log 20K) ≈ O(15). Excellent.

Why not Interval Trees?

Interval trees are ideal when alerts define ranges (e.g., “alert me when price is between 2500 and 2600”)
Our alerts are threshold-based (one-sided conditions), which maps perfectly to sorted set range queries
Redis sorted sets are battle-tested, replicated, and persistent — no need to build custom in-memory data structures
If we needed range alerts in the future, we could model them as two entries in opposite sorted sets

Approach 2: In-Memory Sorted Array (Alternative for extreme performance)

For ultra-low-latency matching (sub-microsecond), we could maintain an in-memory sorted array per instrument in the matcher worker’s heap:

above_alerts = [(2400, alert1), (2500, alert2), (2600, alert3), ...]

Binary search for the insertion point of P, then all alerts to the left are triggered. This avoids network round-trips to Redis entirely.

Trade-off:

Aspect	Redis Sorted Sets	In-Memory Arrays
Latency	~0.5ms (network hop)	~1μs
Consistency	Shared across workers	Per-worker, needs sync
Failure recovery	Persistent + replicated	Lost on crash
Operational cost	Managed Redis	Custom code
Our choice	Yes — simplicity wins	Only if Redis becomes bottleneck

Handling Percentage Change Alerts:

Percentage alerts are converted to absolute price thresholds at creation time:

reference_price = 2500, pct_threshold = 5%, condition = PCT_CHANGE_UP
→ threshold = 2500 × 1.05 = 2625.00
→ stored in alerts:above:RELIANCE with score 2625.00

This means percentage alerts are matched identically to absolute price alerts — no special handling needed in the hot path.

Deep Dive 2: Deduplication and Alert Trigger Guarantees

Problem 1: Duplicate Triggers

When a price oscillates around a threshold (e.g., price bounces between 2599 and 2601 repeatedly), a “price above 2600” alert should trigger exactly once, not on every tick above 2600.

Solution: Remove the alert from the Redis sorted set atomically on first trigger using ZREM. Since ZREM is atomic in Redis, even if two ticks at 2601 arrive simultaneously on different threads, only one ZREM will return 1 (success) — the other returns 0. Only the successful ZREM enqueues the notification.

# Atomic check-and-remove
removed = ZREM alerts:above:NSE:RELIANCE alert_abc123
if removed == 1:
    # We won the race — enqueue notification
    publish_to_kafka("triggered-alerts", alert_abc123)
else:
    # Another worker already triggered this — skip
    pass

Problem 2: At-Least-Once Delivery

What if the matcher worker crashes after ZREM but before publishing to Kafka? The alert is removed from Redis but the notification is never sent.

Solution: Outbox Pattern

Matcher worker does ZREM from Redis
If successful, writes a trigger record to PostgreSQL alert_triggers table with notification_status = PENDING
A separate Notification Dispatcher reads PENDING triggers and sends notifications
After successful delivery, marks notification_status = SENT
A sweep job runs every 30 seconds, picks up any PENDING triggers older than 30 seconds (indicates the dispatcher missed it), and re-enqueues them

But wait — step 2 writes to PostgreSQL which is slow in the hot path!

Optimization: Instead of synchronous PostgreSQL write, we publish to Kafka triggered-alerts topic (durable, replicated). The Kafka consumer writes to PostgreSQL and sends notifications. If the matcher crashes after ZREM but before Kafka publish, we detect the inconsistency during the periodic reconciliation:

Reconciliation Job (every 5 minutes):
  For each instrument:
    active_alerts_in_redis = ZRANGE alerts:above:X 0 -1
    active_alerts_in_postgres = SELECT alert_id FROM alerts
                                WHERE instrument_id = X AND status = 'ACTIVE'

    # Alerts in PostgreSQL but not in Redis = potentially lost triggers
    missing = active_alerts_in_postgres - active_alerts_in_redis
    for alert_id in missing:
      re-evaluate against current LTP
      if condition met → mark triggered and send notification
      else → re-add to Redis sorted set (was incorrectly removed)

Problem 3: Thundering Herd During Market Crash

If the market drops 10% in 5 minutes, millions of “price below” alerts trigger simultaneously. The notification system gets overwhelmed.

Solution: Backpressure + Priority Queuing

Kafka triggered-alerts topic absorbs the burst (Kafka can handle millions of messages/sec)
Notification workers process at their max throughput (bounded by FCM/Twilio rate limits)
Prioritize push notifications (instant) over SMS (expensive, slow) over email (batched)
During burst: batch email notifications into a single digest (“5 of your alerts triggered”)
SMS circuit breaker: if Twilio returns 429s, back off exponentially and queue SMS for later delivery
SLA during crash: push within 10s, SMS within 2 min, email within 5 min (relaxed from normal SLA)

Deep Dive 3: Scaling the Alert Index and Multi-Region Considerations

Problem: 100M alerts in Redis — memory and performance

100M alerts × ~100 bytes (sorted set entry) = 10 GB in Redis. A single Redis instance can hold this, but ZRANGEBYSCORE performance degrades with very large sorted sets, and we need high availability.

Solution: Sharded Redis Cluster

Shard by instrument_id:

Shard 1: instruments A-D (hash range 0-3999)
Shard 2: instruments E-K (hash range 4000-7999)
...
Shard N: instruments T-Z (hash range 36000-40000)

Each shard holds ~10-20 GB, well within a single Redis node’s capacity (64 GB typical). Each shard has a replica for failover.

The Alert Matcher Workers are co-located with their Redis shard — Worker 1 reads ticks for instruments A-D and queries Shard 1. No cross-shard queries needed.

Cold Start / Redis Failure Recovery:

If a Redis shard goes down and the replica takes over with stale data:

The replica becomes primary (automatic failover via Redis Sentinel)

A background job immediately starts reconciling alerts from PostgreSQL:

SELECT alert_id, instrument_id, condition, threshold
FROM alerts
WHERE status = 'ACTIVE'
AND instrument_id IN (instruments on this shard)

Rebuild the sorted sets from this query (~2M alerts × 1K inserts/ms = 2 seconds)
During rebuild, ticks are buffered in Kafka (Kafka retention = 1 hour). After rebuild, replay buffered ticks.

Multi-Region:

Market data feeds are region-specific (NSE feed in India, NYSE feed in US)
Alert matching runs in the same region as the market data feed (minimize latency)
Notification dispatch can be multi-region: push/SMS are routed to the provider closest to the user
PostgreSQL uses streaming replication to a DR region (async, ~1s lag acceptable since it’s the notification path, not the matching path)

7. Extensions (2 min)

Complex alert conditions: Support AND/OR combinations of conditions (e.g., “alert me if RELIANCE > 2600 AND NIFTY > 22000”). Requires a small rule engine that tracks partial matches across instruments. Implementation: maintain a pending_conditions map in Redis — when one leg triggers, check if all legs are satisfied.
Volume-based alerts: “Alert me if RELIANCE trading volume exceeds 10M shares in 1 hour.” Requires maintaining rolling window counters (sliding window or tumbling window in a stream processor like Flink). Fundamentally different from price-threshold matching — needs a separate evaluation pipeline.
Alert templates and social sharing: Users can create and share alert configurations (“Nifty crash detector” = set of 5 alerts on index constituents). Templates stored as JSON blueprints, instantiated per user. Popular templates surface via a discovery feed.
Smart alerts with ML: Instead of static thresholds, use anomaly detection to alert on unusual price movements (e.g., “stock moved 3 standard deviations in 5 minutes”). Requires a feature store with rolling statistics (mean, stddev, VWAP) per instrument, updated in real-time.
Rate limiting and abuse prevention: Cap alerts per user (100 active alerts). Detect and block alert spam (user creating and deleting thousands of alerts to probe the system). Implement progressive delays for high-frequency alert modifications. Log all alert mutations for audit compliance.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Traffic#

Storage#

Key Insight#

3. API Design (3 min)#

Alert CRUD#

Alert History#

Key Decisions#

4. Data Model (3 min)#

Alerts Table (PostgreSQL — source of truth)#

Trigger History Table#

In-Memory Alert Index (Redis Sorted Sets — hot path for matching)#

Why This Dual-Store Approach?#

5. High-Level Design (12 min)#

System Architecture#

Component Breakdown#

6. Deep Dives (15 min)#

Deep Dive 1: Efficient Alert Matching — Sorted Sets vs. Interval Trees#

Deep Dive 2: Deduplication and Alert Trigger Guarantees#

Deep Dive 3: Scaling the Alert Index and Multi-Region Considerations#

7. Extensions (2 min)#