1. Requirements & Scope (5 min)

Functional Requirements

  1. Sellers can onboard, list products with images/descriptions/pricing, and manage inventory
  2. Buyers can search/browse products, add to cart, and place orders with payment
  3. Platform handles payment splitting — holds funds in escrow, pays seller after delivery confirmation
  4. Buyers and sellers can leave reviews/ratings on completed transactions
  5. Platform detects and prevents fraudulent listings, fake reviews, and payment fraud

Non-Functional Requirements

  • Availability: 99.99% — downtime directly means lost revenue for both platform and sellers
  • Latency: Search results < 200ms, product page load < 100ms, checkout < 500ms
  • Consistency: Inventory must be strongly consistent (no overselling). Order/payment state must be exactly-once. Reviews can be eventually consistent.
  • Scale: 50M products, 10M daily active buyers, 500K active sellers, 2M orders/day
  • Durability: Zero tolerance for lost orders or payment records. All financial data replicated across regions.

2. Estimation (3 min)

Traffic

  • Browse/Search: 10M DAU × 20 searches/day = 200M searches/day = ~2,300 QPS (peak 3x = 7,000 QPS)
  • Product page views: 10M DAU × 30 views/day = 300M/day = ~3,500 QPS (peak 10,000 QPS)
  • Orders: 2M orders/day = ~23 orders/sec (peak 5x during flash sales = 115/sec)
  • Listings: 500K sellers × 2 updates/day = 1M write ops/day = ~12 QPS

Storage

  • Product catalog: 50M products × 5KB metadata = 250GB
  • Product images: 50M products × 5 images × 500KB = 125TB (object storage)
  • Orders: 2M/day × 2KB × 365 days × 3 years = ~4TB
  • Reviews: 50M reviews × 1KB = 50GB
  • User profiles: 10M buyers + 500K sellers × 2KB = ~21GB

Key Insight

This is a read-heavy system (100:1 read-to-write ratio for catalog browsing). The hard problems are search relevance at scale, inventory consistency during concurrent purchases, and payment orchestration with escrow.


3. API Design (3 min)

Product Catalog

POST /v1/sellers/{seller_id}/products
  Body: { title, description, category_id, price, currency,
          images: [presigned_url_refs], inventory_count,
          attributes: { size, color, weight, ... } }
  Response 201: { product_id, status: "pending_review" }

GET /v1/products/{product_id}
  Response 200: { product_id, title, description, price, seller,
                  images, rating_avg, review_count, inventory_status }

GET /v1/search?q=...&category=...&price_min=...&price_max=...&sort=relevance&page=1
  Response 200: { results: [...], facets: { categories, price_ranges }, total, next_page }

Orders & Payments

POST /v1/cart/items
  Body: { product_id, quantity }

POST /v1/orders
  Body: { cart_id, shipping_address_id, payment_method_id }
  Response 201: { order_id, status: "payment_pending", estimated_delivery }
  // Triggers: inventory reservation, payment authorization, escrow hold

POST /v1/orders/{order_id}/confirm-delivery
  // Triggers: escrow release to seller minus platform commission

Reviews

POST /v1/orders/{order_id}/reviews
  Body: { rating: 1-5, title, body, images: [] }
  // Only allowed after delivery confirmation

Key Decisions

  • Presigned URLs for image upload (bypass API servers for large payloads)
  • Cart is server-side (persisted in Redis with TTL) to support multi-device
  • Inventory is reserved at order creation, not at add-to-cart (reduces abandoned reservation load)

4. Data Model (3 min)

Products (PostgreSQL — relational integrity for catalog)

Table: products
  product_id       (PK) | uuid
  seller_id        (FK) | uuid
  title                 | varchar(200)
  description           | text
  category_id      (FK) | int
  price_cents           | bigint
  currency              | varchar(3)
  inventory_count       | int           -- decremented atomically
  status                | enum('draft', 'pending_review', 'active', 'suspended')
  created_at            | timestamp
  updated_at            | timestamp

Table: product_images
  image_id         (PK) | uuid
  product_id       (FK) | uuid
  s3_key                | varchar(500)
  position              | int
  width                 | int
  height                | int

Table: product_attributes
  product_id       (FK) | uuid
  attribute_key         | varchar(50)   -- 'color', 'size', etc.
  attribute_value       | varchar(200)
  (composite PK: product_id + attribute_key)

Orders (PostgreSQL — ACID for financial data)

Table: orders
  order_id         (PK) | uuid
  buyer_id         (FK) | uuid
  seller_id        (FK) | uuid
  status                | enum('pending', 'paid', 'shipped', 'delivered', 'refunded', 'disputed')
  total_cents           | bigint
  platform_fee_cents    | bigint
  payment_intent_id     | varchar(100)  -- Stripe/payment provider reference
  escrow_status         | enum('held', 'released', 'refunded')
  shipping_address      | jsonb
  created_at            | timestamp

Table: order_items
  order_item_id    (PK) | uuid
  order_id         (FK) | uuid
  product_id       (FK) | uuid
  quantity              | int
  unit_price_cents      | bigint        -- snapshot at time of purchase

Reviews (PostgreSQL)

Table: reviews
  review_id        (PK) | uuid
  order_id         (FK) | uuid          -- one review per order item
  product_id       (FK) | uuid
  buyer_id         (FK) | uuid
  rating                | smallint      -- 1-5
  title                 | varchar(200)
  body                  | text
  created_at            | timestamp
  verified_purchase     | boolean

Search Index (Elasticsearch)

  • Denormalized product data: title, description, category path, attributes, price, seller rating, review count
  • Updated via CDC (change data capture) from PostgreSQL

Why PostgreSQL + Elasticsearch?

  • PostgreSQL provides ACID guarantees for inventory, orders, payments (the data that cannot be wrong)
  • Elasticsearch provides full-text search, faceted filtering, and relevance scoring at scale
  • CDC pipeline ensures search index stays within seconds of source of truth

5. High-Level Design (12 min)

Architecture

Buyer App / Web
  → CDN (product images, static assets)
  → API Gateway (auth, rate limiting)
    → Product Service → PostgreSQL (catalog) + Elasticsearch (search)
    → Cart Service → Redis (session-scoped cart)
    → Order Service → PostgreSQL (orders)
    → Payment Service → Payment Provider (Stripe) + Escrow Ledger
    → Review Service → PostgreSQL (reviews)
    → Notification Service → Push / Email / SMS

Seller App / Web
  → API Gateway
    → Seller Service → PostgreSQL (seller profiles, onboarding)
    → Product Service (listing management)
    → Analytics Service → ClickHouse (sales analytics)
    → Payout Service → Payment Provider (payouts to seller bank)

Background Workers:
  → Image Processing Pipeline: S3 upload → Lambda → resize/compress → CDN invalidation
  → Search Indexer: PostgreSQL CDC → Kafka → Elasticsearch consumer
  → Fraud Detection: Order events → Kafka → ML scoring pipeline
  → Review Aggregation: New review → recompute product avg rating (async)

Order Flow (Critical Path)

Buyer clicks "Buy Now"
  → Order Service: create order (status: pending)
  → Inventory Service: SELECT ... FOR UPDATE; decrement inventory (atomic)
    → If out of stock → reject order, notify buyer
  → Payment Service: authorize payment via Stripe
    → If payment fails → release inventory reservation
  → Payment Service: capture payment → hold in escrow
  → Order Service: update status to "paid"
  → Notification: notify seller of new order
  → Seller ships → carrier tracking updates status
  → Buyer confirms delivery (or auto-confirm after 14 days)
  → Escrow Service: release funds to seller minus 15% platform commission
  → Payout Service: batch payout to seller bank account (daily)

Components

  1. API Gateway: Authentication, rate limiting, request routing. Kong or custom.
  2. Product Service: CRUD for listings, manages product lifecycle (draft → review → active).
  3. Search Service: Wraps Elasticsearch, handles query parsing, ranking, and faceted search.
  4. Cart Service: Redis-backed ephemeral cart with 7-day TTL.
  5. Order Service: Orchestrates the order state machine. Saga pattern for distributed transaction.
  6. Payment Service: Integrates with Stripe Connect for marketplace payments and escrow.
  7. Escrow Ledger: Double-entry bookkeeping for all money movement. Append-only.
  8. Review Service: Manages reviews, computes aggregates, detects fake review patterns.
  9. Fraud Service: ML-based scoring for listings, orders, and reviews.
  10. Notification Service: Multi-channel (email, push, SMS) with templating and delivery tracking.

6. Deep Dives (15 min)

Deep Dive 1: Search Ranking and Relevance

The problem: With 50M products, returning relevant results is the single biggest driver of conversion. A naive text-match search is useless.

Multi-signal ranking model:

score = w1 * text_relevance    // BM25 from Elasticsearch
      + w2 * conversion_rate   // historical clicks → purchases for this query
      + w3 * seller_quality    // seller rating, fulfillment speed, return rate
      + w4 * recency           // newer listings get a small boost
      + w5 * price_competitiveness  // compared to similar products
      - w6 * fraud_score       // penalize suspicious listings

Implementation:

  1. Query understanding: Tokenize, spell-correct, expand synonyms, detect category intent. “iPhone 15 case” → category: phone_cases, brand: Apple, model: iPhone 15.
  2. Recall phase: Elasticsearch BM25 + category filter → retrieve top 1000 candidates.
  3. Ranking phase: Apply ML re-ranker (LightGBM or neural) using features above. Score each of 1000 candidates → return top 50.
  4. Personalization: Boost categories and price ranges the user has historically purchased from. Light collaborative filtering.

Indexing pipeline:

PostgreSQL (product table)
  → Debezium CDC → Kafka (product-updates topic)
  → Search Indexer Consumer → Elasticsearch (bulk index)
  Latency: product update visible in search within 5-10 seconds

Trade-off: Real-time indexing (sub-second) vs. batch indexing (cheaper). We choose near-real-time (5-10s) as a compromise — fresh enough for inventory changes, but batched enough to avoid overwhelming Elasticsearch with single-doc updates.

Deep Dive 2: Inventory Consistency (Preventing Overselling)

The problem: During flash sales, 10,000 buyers may try to purchase the last 100 units simultaneously. We cannot oversell.

Approach: Pessimistic locking at the database level

BEGIN;
  SELECT inventory_count FROM products
    WHERE product_id = 'abc' FOR UPDATE;  -- row-level lock

  -- Application checks: if inventory_count >= requested_quantity
  UPDATE products SET inventory_count = inventory_count - 1
    WHERE product_id = 'abc'
    AND inventory_count >= 1;  -- double-check in WHERE clause

  -- If affected rows = 0 → out of stock
COMMIT;

Why not Redis for inventory?

  • Redis is fast but lacks durable transactions. If Redis crashes between decrement and order creation, we lose the count.
  • PostgreSQL’s FOR UPDATE + the order insert can live in the same ACID transaction.
  • For flash sales (extreme concurrency on one product), we add a Redis-based semaphore as a pre-check to shed load before hitting PostgreSQL.

Flash sale pattern:

Request → Redis pre-check (DECR counter, if < 0 → instant reject)
  → Only requests that pass Redis check hit PostgreSQL
  → PostgreSQL does authoritative decrement
  → On failure → increment Redis counter back

This reduces PostgreSQL load from 10,000 concurrent requests to ~100 (the actual inventory), while Redis handles the 9,900 rejections at microsecond latency.

Reservation and timeout:

  • Inventory is reserved when order is created (not at add-to-cart)
  • If payment fails within 10 minutes, reservation expires and inventory is restored
  • Background job sweeps expired reservations every minute

Deep Dive 3: Payment Splitting and Escrow

The problem: Marketplace payments are complex. Money flows: Buyer → Platform (escrow) → Seller minus commission. Must handle refunds, disputes, multi-seller orders, and regulatory compliance.

Architecture: Stripe Connect

Seller onboarding → Stripe Connected Account (KYC/identity verification)
Order placed:
  1. Create PaymentIntent for total amount
  2. Capture payment → funds held in platform Stripe account
  3. Create Transfer record in escrow ledger (status: held)

Delivery confirmed:
  4. Create Stripe Transfer: platform → seller connected account
     Amount: order_total - platform_commission (15%)
  5. Update escrow ledger (status: released)

Dispute/refund:
  6. If buyer disputes within 14 days:
     → Pause escrow release
     → Platform mediates
     → If refund: Stripe Refund to buyer, no payout to seller
     → If resolved in seller's favor: release escrow

Double-entry escrow ledger (append-only):

Table: ledger_entries
  entry_id         (PK) | uuid
  order_id              | uuid
  debit_account         | varchar(50)   -- 'buyer:user123'
  credit_account        | varchar(50)   -- 'escrow:platform'
  amount_cents          | bigint
  currency              | varchar(3)
  entry_type            | enum('capture', 'release', 'refund', 'commission')
  created_at            | timestamp

Invariant: SUM(debits) = SUM(credits) for every order

Multi-seller orders:

  • If a cart contains items from 3 sellers, we create 3 separate sub-orders
  • Each sub-order has its own escrow hold and independent delivery tracking
  • Commission is calculated per sub-order
  • This avoids the complexity of splitting a single payment across multiple sellers atomically

Fraud detection signals:

  • New seller listing high-value items at suspiciously low prices
  • Buyer creating multiple accounts to abuse promotions
  • Review patterns: same IP, similar text, burst of 5-star reviews
  • Shipping to reshipping addresses (common in stolen credit card fraud)
  • ML model scores each transaction; high-risk orders require manual review

7. Extensions (2 min)

  • Recommendation engine: Collaborative filtering (“buyers who bought X also bought Y”) + content-based recommendations from product attributes. Pre-compute and cache in Redis.
  • Seller analytics dashboard: Real-time sales, conversion rates, search impression share, inventory forecasting. Powered by ClickHouse for OLAP queries over event streams.
  • Multi-currency and international: Store prices in seller’s currency, convert at checkout using live exchange rates. Handle VAT/GST calculations per buyer’s jurisdiction.
  • Promoted listings (ads): Sellers bid for placement in search results. Second-price auction per search query. Revenue source for the platform beyond transaction commissions.
  • Return and refund automation: Buyer initiates return → generate shipping label → track return delivery → auto-refund on receipt confirmation. Debit seller’s escrow or platform insurance fund.