Design a Parts Compatibility System (PCPartPicker)

Table of Contents

1. Requirements & Scope (5 min)
- Functional Requirements
- Non-Functional Requirements
2. Estimation (3 min)
3. API Design (3 min)
- REST Endpoints
4. Data Model (3 min)
5. High-Level Design (12 min)
6. Deep Dives (15 min)
7. Extensions (2 min)

This content is password protected

1. Requirements & Scope (5 min)

Functional Requirements

Users select PC components (CPU, motherboard, RAM, GPU, storage, PSU, case) and the system validates compatibility across all parts in real-time
Compatibility rules engine checks socket matching (CPU ↔ motherboard), form factor (motherboard ↔ case), RAM type (DDR4 vs DDR5), power requirements (total wattage ↔ PSU), and physical clearances (GPU length, cooler height)
Provide a power budget calculator that sums component TDPs and recommends minimum PSU wattage with appropriate headroom
Detect and warn about performance bottlenecks (e.g., pairing a high-end GPU with a low-end CPU)
Aggregate pricing from multiple retailers with real-time price tracking, price history, and price drop alerts

Non-Functional Requirements

Availability: 99.9% — the site is informational/commerce, not life-critical. Brief outages are tolerable but costly (lost affiliate revenue).
Latency: Compatibility checks must complete in < 200ms as users add each component. Price data should be < 1 hour stale.
Consistency: Compatibility rules must be strictly correct — a false “compatible” signal that leads to a bad purchase is unacceptable. False “incompatible” (overly conservative) is tolerable.
Scale: 500K product SKUs across 15 component categories. 10M monthly active users. 50K concurrent build sessions. 100M compatibility checks/day.
Durability: User builds must be saved reliably. Price history must be retained for 2+ years for trend analysis.

2. Estimation (3 min)

Product Catalog

500K SKUs across 15 categories
Average product record: 2 KB (specs, descriptions, images, links)
Total catalog: 500K × 2 KB = 1 GB — easily fits in memory
Compatibility attributes per product: ~500 bytes of structured spec data
Compatibility data: 500K × 500 bytes = 250 MB

Compatibility Checks

100M checks/day = 1,157 checks/sec average, ~5,000/sec peak
Each check: evaluate ~10-15 rules per component pair
Average build has 7 components → adding 1 component checks against 6 others → ~60-90 rule evaluations
At < 200ms budget and ~90 rules: < 2ms per rule

Price Data

500K products × 10 retailers average = 5M price points
Price scrape frequency: every 1 hour per retailer
5M scrapes/hour = 1,400 scrapes/sec
Price history: 5M products × 365 days × 24 data points/day × 10 bytes = ~440 GB/year

User Builds

10M monthly users, ~30% create builds = 3M builds/month
Average build: 500 bytes (7 component IDs + metadata)
3M × 500 bytes = 1.5 GB/month — trivial storage

3. API Design (3 min)

REST Endpoints

// Create or update a build
POST /v1/builds
  Headers: Authorization: Bearer <token>  // optional for anonymous builds
  Body: {
    "name": "My Gaming PC 2026",
    "components": {
      "cpu": "sku_amd_9800x3d",
      "motherboard": "sku_asus_x870e",
      "ram": "sku_corsair_ddr5_6000_32gb",
      "gpu": "sku_nvidia_5080",
      "storage": "sku_samsung_990_pro_2tb",
      "psu": "sku_corsair_rm850x",
      "case": "sku_fractal_north"
    }
  }
  Response 200: {
    "build_id": "build_abc123",
    "compatibility": {
      "status": "compatible",           // compatible | warning | incompatible
      "issues": [],
      "warnings": [
        {
          "type": "bottleneck",
          "severity": "low",
          "message": "CPU may slightly bottleneck GPU at 1080p. Consider 1440p+ gaming."
        }
      ],
      "power_budget": {
        "estimated_tdp": 450,
        "psu_wattage": 850,
        "headroom_pct": 47,
        "rating": "excellent"
      }
    },
    "total_price": 189499,              // $1,894.99 in cents
    "price_by_component": { ... }
  }

// Check compatibility for adding a single component
POST /v1/builds/{build_id}/check
  Body: { "category": "gpu", "sku": "sku_nvidia_5080" }
  Response 200: {
    "compatible": true,
    "issues": [],
    "warnings": [...],
    "updated_power_budget": { ... }
  }

// Search products with filters
GET /v1/products?category=gpu&socket=AM5&min_vram=12&max_price=80000&sort=price_asc
  Response 200: {
    "products": [
      {
        "sku": "sku_nvidia_5080",
        "name": "NVIDIA GeForce RTX 5080",
        "specs": { "vram_gb": 16, "tdp_watts": 360, "length_mm": 304, ... },
        "compatible_with_build": true,  // pre-filtered based on current build
        "prices": [
          { "retailer": "Amazon", "price": 99999, "url": "...", "in_stock": true },
          { "retailer": "Newegg", "price": 98999, "url": "...", "in_stock": true }
        ]
      }
    ]
  }

// Get price history for a product
GET /v1/products/{sku}/price-history?period=90d
  Response 200: {
    "sku": "sku_nvidia_5080",
    "history": [
      { "date": "2026-02-22", "min_price": 98999, "avg_price": 100499 },
      ...
    ]
  }

4. Data Model (3 min)

Products (PostgreSQL)

Table: products
  sku              (PK) | varchar(50)
  category               | enum('cpu','motherboard','ram','gpu','storage','psu','case','cooler','monitor',...)
  name                   | varchar(200)
  manufacturer           | varchar(100)
  specs                  | jsonb         -- category-specific specifications
  compatibility_attrs    | jsonb         -- extracted compatibility attributes
  image_url              | varchar(500)
  release_date           | date
  is_active              | boolean
  updated_at             | timestamp

-- Example specs for a CPU:
-- { "socket": "AM5", "cores": 8, "threads": 16, "base_clock_ghz": 4.7,
--   "boost_clock_ghz": 5.2, "tdp_watts": 120, "ram_type": "DDR5",
--   "max_ram_speed_mhz": 5200, "pcie_version": "5.0", "pcie_lanes": 24,
--   "integrated_graphics": false, "cooler_included": false }

-- Example compatibility_attrs for a CPU:
-- { "socket": "AM5", "ram_type": "DDR5", "tdp_watts": 120, "pcie_version": "5.0" }

Compatibility Rules (PostgreSQL + in-memory cache)

Table: compatibility_rules
  rule_id          (PK) | int
  name                   | varchar(100)
  category_a             | varchar(20)   -- e.g., "cpu"
  category_b             | varchar(20)   -- e.g., "motherboard"
  rule_type              | enum('must_match', 'must_fit', 'must_not_exceed', 'warning')
  attribute_a            | varchar(50)   -- e.g., "socket"
  attribute_b            | varchar(50)   -- e.g., "socket"
  operator               | enum('equals', 'gte', 'lte', 'contains', 'fits_in', 'custom')
  custom_logic           | text          -- for complex rules (e.g., RAM slot count + DIMM count)
  severity               | enum('error', 'warning', 'info')
  message_template       | text          -- "CPU socket {a} does not match motherboard socket {b}"
  is_active              | boolean

User Builds (PostgreSQL)

Table: builds
  build_id         (PK) | varchar(20)
  user_id          (FK) | varchar(20)  -- nullable for anonymous
  name                   | varchar(100)
  components             | jsonb        -- { "cpu": "sku_xxx", "gpu": "sku_yyy", ... }
  total_price            | int
  compatibility_status   | enum('compatible','warning','incompatible')
  is_public              | boolean
  created_at             | timestamp
  updated_at             | timestamp

Prices (TimescaleDB / PostgreSQL with partitioning)

Table: prices
  sku              (FK) | varchar(50)
  retailer               | varchar(50)
  price                  | int           -- in cents
  in_stock               | boolean
  url                    | varchar(500)
  scraped_at             | timestamp
  -- Partitioned by month on scraped_at
  -- Index on (sku, retailer, scraped_at DESC)

Why PostgreSQL?

Product catalog and compatibility rules need relational integrity
JSONB for flexible specs (each category has different attributes)
Compatibility rules are read-heavy, easily cached — PostgreSQL is fine as the source of truth
Price data benefits from TimescaleDB extension for time-series queries (price history charts)

5. High-Level Design (12 min)

Compatibility Check Flow

User adds a GPU to their build:
  → Frontend sends: POST /v1/builds/{id}/check { "category": "gpu", "sku": "sku_5080" }
    → Build Service:
      1. Load current build components from cache/DB
      2. Load new component's compatibility_attrs from product cache
      3. Call Compatibility Engine:
         For each existing component in the build:
           → Find applicable rules (gpu ↔ motherboard, gpu ↔ psu, gpu ↔ case, ...)
           → Evaluate each rule:
              Rule: gpu.pcie_version <= motherboard.pcie_version → OK
              Rule: gpu.length_mm <= case.max_gpu_length_mm → OK
              Rule: gpu.tdp_watts + ... <= psu.wattage × 0.8 → OK
              Rule: gpu.power_connectors available on PSU → OK
           → Collect all issues and warnings
      4. Calculate power budget:
         Sum TDP of all components → estimated_draw
         Headroom = (psu_wattage - estimated_draw) / psu_wattage × 100
      5. Check bottleneck (heuristic):
         CPU tier vs GPU tier (based on benchmark scores)
         If mismatch > threshold → warning
      6. Return result

Product Search with Compatibility Filtering

User is on GPU selection page (already has CPU, motherboard, case, PSU chosen):
  → GET /v1/products?category=gpu&compatible_with=build_abc123

  → Product Service:
    1. Load all GPUs from product cache (filtered by user's other criteria)
    2. For each GPU candidate:
       → Run compatibility check against existing build components
       → Mark as compatible/warning/incompatible
    3. Return only compatible + warning GPUs (hide incompatible by default)
    4. Sort by user preference (price, performance, popularity)

  Optimization: Pre-compute compatibility for common component pairs.
  For 1,000 CPUs × 500 motherboards = 500K pairs → precompute and cache.

Components

Build Service: Manages user builds. Orchestrates compatibility checks. Stateless, horizontally scaled.
Compatibility Engine: Core rule evaluation logic. Loaded in-memory with all rules and product compatibility attributes. < 200ms per full build check.
Product Service: Manages the product catalog. Search, filter, sort. Elasticsearch for full-text search + faceted filtering.
Price Aggregation Service: Scrapes retailer prices. Stores price history. Detects price drops and sends alerts.
Product Cache (Redis): All 500K products + compatibility attributes cached. 250 MB — fits easily. Refreshed on catalog updates.
Rules Cache (in-memory): All compatibility rules loaded into each service instance’s memory. Refreshed every 5 minutes from PostgreSQL.

Architecture Diagram

Users (browser)
  → CDN (static assets, product images)
  → API Gateway → Load Balancer
    → Build Service
      → Compatibility Engine (in-process)
        → Product Cache (Redis)
        → Rules Cache (in-memory)
    → Product Service
      → Elasticsearch (search + faceted filters)
      → PostgreSQL (catalog source of truth)
    → Price Service
      → Price Scrapers (distributed workers)
      → TimescaleDB (price history)
      → Redis (current prices)

Background:
  Price Scrapers → Retailer websites/APIs → TimescaleDB + Redis
  Catalog Updater → Manufacturer feeds → PostgreSQL → Redis cache invalidation

6. Deep Dives (15 min)

Deep Dive 1: The Compatibility Rules Engine

Rule categories and examples:

Category 1: Socket/Interface Matching (must_match)
  CPU.socket == Motherboard.socket
    AMD Ryzen 9 9800X3D (AM5) ↔ ASUS ROG X870E (AM5) → ✓
    Intel i9-14900K (LGA1700) ↔ ASUS ROG X870E (AM5) → ✗

  RAM.type == Motherboard.ram_type
    DDR5-6000 (DDR5) ↔ X870E (DDR5) → ✓
    DDR4-3600 (DDR4) ↔ X870E (DDR5) → ✗

  GPU.interface == Motherboard.pcie_slots[*].type
    RTX 5080 (PCIe 5.0 x16) ↔ X870E (has PCIe 5.0 x16 slot) → ✓
    (PCIe is backward compatible, so 5.0 GPU in 4.0 slot → warning, not error)

Category 2: Physical Fit (must_fit)
  GPU.length_mm <= Case.max_gpu_length_mm
    RTX 5080 (304mm) ↔ Fractal North (355mm max) → ✓
    RTX 4090 (336mm) ↔ NZXT H1 (267mm max) → ✗

  Cooler.height_mm <= Case.max_cooler_height_mm
  Motherboard.form_factor IN Case.supported_form_factors
    ATX motherboard ↔ Mid-Tower case (supports ATX) → ✓
    ATX motherboard ↔ Mini-ITX case (supports Mini-ITX only) → ✗

Category 3: Power (must_not_exceed)
  Sum(all_component_tdps) <= PSU.wattage × 0.8   // 80% rule for headroom
  GPU.power_connector_count <= PSU.available_pcie_power_connectors
  GPU.power_connector_type IN PSU.supported_connector_types
    RTX 5080 needs 1x 16-pin → PSU has 12VHPWR adapter → ✓

Category 4: RAM Specifics (complex rules)
  RAM.speed_mhz <= Motherboard.max_ram_speed_mhz (warning if exceeded, may run at lower speed)
  RAM.module_count <= Motherboard.ram_slots
  RAM.total_capacity <= Motherboard.max_ram_capacity
  RAM.voltage within CPU.supported_voltages range

Category 5: Storage (interface + physical)
  M.2 NVMe drive → motherboard has available M.2 slot of correct key type (M key vs B key)
  Number of M.2 drives <= available M.2 slots
  SATA drives: number <= available SATA ports (minus any disabled by M.2 usage)

Rule evaluation engine architecture:

class CompatibilityEngine:
  rules: List[Rule]  // loaded from DB, cached in memory
  products: Map[SKU, Product]  // cached from Redis

  def check(build: Build, new_component: Component) -> Result:
    issues = []
    warnings = []

    for existing in build.components:
      applicable_rules = rules.filter(
        r => (r.category_a == new_component.category and r.category_b == existing.category)
          or (r.category_b == new_component.category and r.category_a == existing.category)
      )

      for rule in applicable_rules:
        result = evaluate_rule(rule, new_component, existing)
        if result.failed and rule.severity == "error":
          issues.append(result.message)
        elif result.failed and rule.severity == "warning":
          warnings.append(result.message)

    // Cross-component rules (power budget, etc.)
    power_result = check_power_budget(build, new_component)
    if power_result.issues: issues.extend(power_result.issues)

    return Result(issues, warnings)

Handling ambiguous compatibility:

Some compatibility is not binary:
  - PCIe backward compatibility: GPU 5.0 in motherboard 4.0 slot → works but at reduced bandwidth
    → Warning: "GPU will run at PCIe 4.0 speeds. ~5% performance reduction in bandwidth-heavy games."

  - RAM overclocking: DDR5-6000 on a CPU that officially supports 5200
    → Warning: "RAM speed exceeds official CPU support. May require XMP/EXPO profile. Not guaranteed to be stable."

  - Cooler TDP rating: CPU TDP 120W, cooler rated for 150W
    → Info: "Cooler has adequate headroom for stock operation. May be marginal for heavy overclocking."

These nuanced warnings differentiate a great tool from a simple pass/fail checker.

Deep Dive 2: Price Aggregation and Real-Time Tracking

Scraping architecture:

500K products × 10 retailers = 5M price points
Scrape frequency: every 1 hour → 1,400 scrapes/sec

Worker Pool (distributed):
  - 50 scraper workers
  - Each worker handles ~28 scrapes/sec
  - Workers pull jobs from a Redis queue (sorted by next_scrape_time)
  - Rate limiting per retailer:
    Amazon: max 5 req/sec (respect robots.txt and API limits)
    Newegg: max 3 req/sec
    B&H: max 2 req/sec
  - Use retailer APIs where available (Amazon Product Advertising API)
  - Fall back to HTML scraping with headless browser for others
  - Rotate user agents and proxy IPs to avoid blocks

Scrape pipeline:
  1. Worker fetches product page / API response
  2. Extract: price, availability, shipping cost, seller (3P vs 1P)
  3. Normalize price (convert to cents, handle currency)
  4. Write to TimescaleDB: INSERT INTO prices (sku, retailer, price, in_stock, scraped_at)
  5. Update Redis current_price cache: HSET prices:{sku} {retailer} {price}
  6. Compare with previous price:
     If price_dropped > 5%:
       → Queue price alert notification for users watching this product

Price history API optimization:

A price history chart for 90 days at hourly granularity = 2,160 data points.
For the chart, we don't need every data point — daily min/avg/max suffices.

Materialized view (TimescaleDB continuous aggregate):
  CREATE MATERIALIZED VIEW price_daily
  WITH (timescaledb.continuous) AS
  SELECT
    time_bucket('1 day', scraped_at) AS day,
    sku,
    retailer,
    MIN(price) AS min_price,
    AVG(price)::int AS avg_price,
    MAX(price) AS max_price,
    LAST(in_stock, scraped_at) AS latest_in_stock
  FROM prices
  GROUP BY day, sku, retailer;

This reduces a 90-day query from scanning 2,160 rows to 90 rows per retailer.
Response time: < 50ms.

Price drop alerts:

User subscribes: "Notify me if RTX 5080 drops below $900"

Storage:
  Table: price_alerts
    alert_id    (PK) | varchar(20)
    user_id     (FK) | varchar(20)
    sku              | varchar(50)
    threshold_price  | int
    is_active        | boolean

On price scrape:
  If new_price < previous_price:
    SELECT * FROM price_alerts
    WHERE sku = {sku} AND threshold_price >= new_price AND is_active = true

    For each matching alert:
      → Send push notification / email
      → Mark alert as triggered (avoid repeated notifications)

Deep Dive 3: Bottleneck Detection

Problem: A user pairs an RTX 5080 ($1000 GPU) with an Intel i3-12100 ($100 CPU). The CPU will bottleneck the GPU severely — money wasted.

Approach: Tier-Based Heuristic

Assign a performance tier (1-10) to each component based on benchmarks:

CPU Tiers (from benchmark database):
  Tier 10: Ryzen 9 9950X, i9-14900KS
  Tier 8:  Ryzen 7 9800X3D, i7-14700K
  Tier 6:  Ryzen 5 9600X, i5-14600K
  Tier 4:  Ryzen 5 7600, i5-13400
  Tier 2:  Ryzen 3, i3-12100

GPU Tiers:
  Tier 10: RTX 5090, RX 9070 XT
  Tier 8:  RTX 5080, RX 9070
  Tier 6:  RTX 5070, RX 7800 XT
  Tier 4:  RTX 4060, RX 7600
  Tier 2:  GTX 1650, RX 6500 XT

Bottleneck detection:
  tier_diff = abs(cpu_tier - gpu_tier)

  if tier_diff <= 2: No bottleneck
  if tier_diff == 3-4: Minor bottleneck (warning, info severity)
    "CPU may limit GPU performance at lower resolutions (1080p)"
  if tier_diff >= 5: Significant bottleneck (warning, high severity)
    "CPU will severely bottleneck this GPU. Consider upgrading CPU
     or choosing a less powerful GPU to save money."

Resolution-aware advice:
  At 4K: CPU bottleneck is minimal (GPU-bound workload)
  At 1080p: CPU bottleneck is severe (CPU-bound at high FPS)
  → Adjust warning message based on the monitor's resolution if selected in build.

Advanced: Benchmark Data Integration

Maintain a benchmark database:
  Table: benchmarks
    sku          | varchar(50)
    benchmark    | varchar(50)  -- "cinebench_r23_multi", "3dmark_timespy"
    score        | int
    source       | varchar(100) -- "techpowerup", "anandtech"
    tested_at    | date

Use actual benchmark scores instead of manual tiers:
  cpu_score = benchmarks.where(sku=cpu, benchmark='cinebench_r23_multi').score
  gpu_score = benchmarks.where(sku=gpu, benchmark='3dmark_timespy').score

  // Normalize to same scale
  cpu_normalized = cpu_score / max_cpu_score * 100
  gpu_normalized = gpu_score / max_gpu_score * 100

  bottleneck_ratio = cpu_normalized / gpu_normalized
  if bottleneck_ratio < 0.5: severe CPU bottleneck
  if bottleneck_ratio < 0.7: moderate CPU bottleneck
  if bottleneck_ratio > 1.5: GPU is the bottleneck (rare, but possible with APU + low-end GPU)

7. Extensions (2 min)

Build sharing and community: Public build gallery where users share completed builds with photos, benchmarks, and reviews. Voting system for popular builds. “Clone this build” button that pre-fills all components. Social features drive organic traffic and affiliate revenue.
AI-powered build recommendations: “Build me a gaming PC for $1500” → ML model selects optimal components based on budget, use case, and current pricing. Trained on community build data and benchmark scores. Recommend upgrades: “For $50 more, you could get a significantly better GPU.”
Peripheral compatibility: Extend beyond the PC itself to monitors (resolution, refresh rate, G-Sync/FreeSync compatibility with GPU), keyboards, mice, and audio. Monitor + GPU: check if the GPU supports the monitor’s resolution at its refresh rate via the available port types (DisplayPort 2.1, HDMI 2.1).
Build cost optimizer: Given a target performance level, find the cheapest combination of compatible parts. This is a constraint satisfaction problem — use branch-and-bound or constraint programming to search the solution space efficiently. Prune incompatible combinations early.
Inventory and availability tracking: Show real-time stock status across retailers. “In stock at Amazon, out of stock at Newegg.” Notify users when out-of-stock components become available. Suggest alternatives for unavailable parts with similar specs and price.

1. Requirements & Scope (5 min)#

Functional Requirements#

Non-Functional Requirements#

2. Estimation (3 min)#

Product Catalog#

Compatibility Checks#

Price Data#

User Builds#

3. API Design (3 min)#

REST Endpoints#

4. Data Model (3 min)#

Products (PostgreSQL)#

Compatibility Rules (PostgreSQL + in-memory cache)#

User Builds (PostgreSQL)#

Prices (TimescaleDB / PostgreSQL with partitioning)#

Why PostgreSQL?#

5. High-Level Design (12 min)#

Compatibility Check Flow#

Product Search with Compatibility Filtering#

Components#

Architecture Diagram#

6. Deep Dives (15 min)#

Deep Dive 1: The Compatibility Rules Engine#

Deep Dive 2: Price Aggregation and Real-Time Tracking#

Deep Dive 3: Bottleneck Detection#

7. Extensions (2 min)#