1. Requirements & Scope (5 min)

Functional Requirements

  1. Users can upload, download, and delete files from any device
  2. Files automatically sync across all connected devices
  3. Users can share files/folders with other users (view/edit permissions)
  4. File versioning — users can view and restore previous versions
  5. Offline support — changes made offline sync when connectivity resumes

Non-Functional Requirements

  • Availability: 99.99% — users depend on this for critical documents
  • Durability: 99.999999999% (11 nines) — losing a user’s file is unacceptable
  • Latency: Small file sync < 5 seconds end-to-end between devices. Large file upload should show progress and be resumable.
  • Consistency: Strong consistency for file metadata (rename, move, delete must be immediately reflected). Eventual consistency acceptable for sync propagation to other devices (within seconds).
  • Scale: 500M users, 100M DAU, average user stores 10GB, peak uploads 10M files/hour
  • Bandwidth efficiency: Only transfer changed parts of files (delta sync)

2. Estimation (3 min)

Storage

  • 500M users × 10GB avg = 5 exabytes (5,000PB) total storage
  • This is the defining constraint — everything revolves around efficient storage

Traffic

  • 10M file uploads/hour ÷ 3600 = ~2,800 uploads/sec
  • Average file: 500KB → 1.4GB/sec upload bandwidth
  • Sync events (metadata changes): 10x file uploads = 28,000 events/sec

Metadata

  • Average user: 2,000 files → 500M × 2,000 = 1 trillion file metadata records
  • Each record: ~500 bytes → 500TB metadata

3. API Design (3 min)

// File operations
POST /api/v1/files/upload
  Headers: Content-Range: bytes 0-1048575/5242880  // chunked upload
  Body: <binary chunk>
  Response 200: { "upload_id": "up_123", "next_offset": 1048576 }

POST /api/v1/files/upload/complete
  Body: { "upload_id": "up_123", "filename": "doc.pdf", "parent_id": "folder_abc" }
  Response 201: { "file_id": "f_xyz", "version": 1 }

GET /api/v1/files/{file_id}/download
  Response 200: redirect to pre-signed S3 URL

GET /api/v1/files/{file_id}/versions
  Response 200: [{ "version": 3, "size": 524288, "modified_at": "...", "modified_by": "..." }]

POST /api/v1/files/{file_id}/restore?version=2

// Sync
GET /api/v1/sync/changes?cursor={cursor}
  Response 200: {
    "changes": [
      { "type": "create", "file_id": "f_xyz", "path": "/docs/notes.md", ... },
      { "type": "modify", "file_id": "f_abc", "version": 3, ... },
      { "type": "delete", "file_id": "f_def", ... }
    ],
    "cursor": "c_next_123",
    "has_more": false
  }

// Sharing
POST /api/v1/files/{file_id}/share
  Body: { "user_email": "[email protected]", "permission": "edit" }

Key decisions:

  • Chunked uploads (1MB chunks): enables resumable uploads, handles network interruptions, allows parallel chunk uploads
  • Cursor-based sync: client maintains a cursor, fetches incremental changes since last sync
  • Pre-signed download URLs: client downloads directly from S3/CDN, not through our servers

4. Data Model (3 min)

Metadata Store (PostgreSQL, sharded by user_id)

Table: files
  file_id          (PK) | uuid
  user_id                | bigint (shard key)
  parent_folder_id       | uuid (FK → files, nullable for root)
  name                   | varchar(255)
  is_folder              | boolean
  current_version        | int
  size                   | bigint
  checksum               | char(64)  -- SHA-256 of content
  created_at             | timestamptz
  modified_at            | timestamptz
  is_deleted             | boolean (soft delete)

Table: file_versions
  file_id                | uuid (FK)
  version                | int
  size                   | bigint
  checksum               | char(64)
  s3_key                 | varchar(200)
  chunks                 | jsonb  -- [{ chunk_hash, s3_key, offset, size }]
  created_at             | timestamptz
  PK: (file_id, version)

Table: sharing
  file_id                | uuid
  shared_with_user_id    | bigint
  permission             | enum('view', 'edit')
  PK: (file_id, shared_with_user_id)

Block/Chunk Store (S3)

  • Files are split into chunks (4MB each)
  • Each chunk stored by its content hash: chunks/{sha256_hash}
  • Content-addressable storage enables deduplication

5. High-Level Design (12 min)

Upload Path

Desktop Client detects file change
  → Split file into 4MB chunks
  → Compute SHA-256 hash per chunk
  → Send chunk hashes to Sync Service: "which chunks do you already have?"
  → Sync Service checks Block Metadata DB → returns list of missing chunks
  → Client uploads only missing chunks to Block Storage Service
    → Block Storage Service writes to S3
  → Client sends "upload complete" with chunk list to Sync Service
  → Sync Service:
    → Creates new file_version in PostgreSQL
    → Updates current_version
    → Publishes change event to Message Queue
    → Notification Service pushes to other connected devices

Download/Sync Path

Desktop Client → long-poll or WebSocket to Notification Service
  → Receives "file X changed" event
  → Client calls GET /sync/changes?cursor=...
  → Gets list of changed files with chunk manifests
  → For each changed file:
    → Identify which chunks client already has locally
    → Download only missing chunks from S3 (via CDN)
    → Reassemble file locally
  → Update local cursor

Components

  1. Sync Service: Coordinates uploads/downloads, manages file metadata, computes diffs
  2. Block Storage Service: Handles chunk upload to S3, deduplication checks
  3. Notification Service: Real-time push to connected clients (WebSocket/long-poll)
  4. PostgreSQL (sharded): File metadata, versions, sharing
  5. S3: Chunk storage (content-addressable)
  6. Redis: Chunk existence cache, active session tracking
  7. Kafka: Change event stream for cross-device sync
  8. CDN: Chunk download acceleration

6. Deep Dives (15 min)

Deep Dive 1: Chunking and Delta Sync

The core insight: When a user edits a 100MB file, they typically change only a few KB. Uploading the entire 100MB again is wasteful. Delta sync uploads only the changed chunks.

How chunking works:

  1. File is split into 4MB blocks using content-defined chunking (CDC)
  2. CDC uses a rolling hash (Rabin fingerprint) to find chunk boundaries based on content — not fixed offsets
  3. Why CDC over fixed-size chunks? If you insert 1 byte at the beginning of a file, fixed-size chunking shifts every boundary → all chunks change. CDC boundaries are content-dependent, so only the affected chunk changes.

Deduplication:

  • Chunks are stored by SHA-256 hash: chunks/{hash}
  • Before uploading, client sends chunk hashes to the server
  • Server responds with which chunks already exist (from any user’s files)
  • Common files (OS files, popular libraries) are stored once across all users
  • Deduplication ratio is typically 50-60% across the entire system

Bandwidth savings example:

  • User edits a 100MB file, changing 2 pages in the middle
  • File has 25 chunks (4MB each)
  • Only 1 chunk actually changed (different hash)
  • Upload: 4MB instead of 100MB → 96% bandwidth reduction

Deep Dive 2: Conflict Resolution

The problem: User edits file on laptop (offline), same file is edited on phone (also offline). Both come online — whose version wins?

Strategy: Last-writer-wins + conflict copies

  1. Each file has a version number. Each device tracks the last-known version.

  2. When device syncs, it sends its base version + new content.

  3. Server checks: is the base version == current version?

    • Yes: Clean update. Increment version, store new content.
    • No: Conflict. Someone else modified the file between your last sync and now.
  4. On conflict:

    • Server stores the new upload as a “conflict copy”: budget.xlsx (Chirag's laptop's conflicted copy 2026-02-22)
    • Both versions are preserved — no data loss
    • User resolves manually (same as Dropbox’s actual behavior)

Why not auto-merge? For text files, we could attempt a three-way merge (like git). But for binary files (DOCX, XLSX, images), auto-merge is impossible. The safest approach is always preserving both versions.

Vector clocks (advanced): For shared folders with many collaborators, we can use vector clocks instead of simple version numbers to track causality across devices more precisely. This helps distinguish “concurrent edits” (true conflict) from “sequential edits that arrived out of order” (not a conflict).

Deep Dive 3: Storage at Exabyte Scale

5 exabytes is the defining challenge. At S3 pricing ($0.023/GB/month), that’s $115M/month in storage alone.

Cost optimization layers:

  1. Deduplication (50-60% savings): Content-addressable chunks mean identical files across users are stored once. Popular files (OS updates, npm packages, Docker images) get massive deduplication.

  2. Compression (30-40% additional): Compress each chunk before storing. Text/code compresses 5:1, office docs 2:1, already-compressed files (JPEG, MP4) get no benefit. Apply compression selectively based on content type.

  3. Tiered storage:

    • Hot tier (S3 Standard): Files accessed in last 30 days
    • Warm tier (S3 IA): Files accessed in last 30-90 days (50% cheaper)
    • Cold tier (S3 Glacier): Files not accessed in 90+ days (90% cheaper)
    • Auto-promotion on access: if a cold file is downloaded, move it to hot tier
  4. Version cleanup: Store all versions for 30 days, then keep only every-other version for 30-90 days, then keep only monthly snapshots for 90+ days. This dramatically reduces version storage.

  5. Deleted file cleanup: Soft-deleted files are recoverable for 30 days, then permanently deleted. S3 lifecycle policy enforces this.

Result: Effective cost per GB stored drops from $0.023 to ~$0.005-0.008/month after all optimizations.


7. Extensions (2 min)

  • Real-time collaboration: For docs/sheets, add OT/CRDT layer for simultaneous editing (Google Docs style). Completely separate infrastructure from file sync.
  • Search: Full-text search across file names and content. Extract text from PDFs, DOCX, images (OCR) → index in Elasticsearch.
  • Smart sync (selective sync): On devices with limited storage, show file listings but download content only on demand. Store metadata locally, fetch content from cloud.
  • Activity feed: Track all file operations (create, edit, share, delete) in an event log. Show as an activity stream in the UI.
  • Compliance: GDPR right to deletion, SOC2 audit logs, encryption at rest (AES-256) and in transit (TLS 1.3), region-specific data residency.