1. Requirements & Scope (5 min)

Functional Requirements

  1. Users can share files by creating a torrent (metadata file) and seeding (uploading) the actual content to peers
  2. Users can download files by obtaining a torrent file or magnet link, discovering peers who have the file, and downloading pieces from multiple peers simultaneously
  3. Support peer discovery through both centralized trackers and decentralized DHT (Distributed Hash Table)
  4. Ensure file integrity — every downloaded piece is verified against a cryptographic hash before being accepted
  5. Implement an incentive mechanism (tit-for-tat) — peers who upload more get faster downloads; freeloaders (leechers) get throttled

Non-Functional Requirements

  • Availability: The system must work even when the tracker is offline (DHT provides decentralized fallback). No single point of failure for content distribution.
  • Latency: Peer discovery < 5 seconds. First bytes of download begin within 10 seconds of starting. Sustained throughput scales with the number of available peers.
  • Consistency: Eventual consistency for peer lists (stale peers are tolerable — the client will simply fail to connect and try others). Strong consistency for file integrity (every piece must match its hash).
  • Scale: Support files from 1 MB to 100 GB. Swarms with 1 million simultaneous peers. Global DHT with 100 million nodes.
  • Durability: Files remain available as long as at least one seeder exists. The more popular a file, the more available it becomes (self-scaling).

2. Estimation (3 min)

File Distribution

  • A 4 GB file, split into 256 KB pieces = 16,384 pieces
  • Each piece has a 20-byte SHA-1 hash for verification
  • Torrent metadata (info dictionary): 16,384 × 20 bytes = 320 KB of hashes + metadata ≈ 350 KB total

Tracker Load

  • 1 million active torrents
  • Average swarm size: 200 peers
  • Each peer announces to the tracker every 30 minutes
  • 200M peers / 1800 seconds = 111K announce requests/sec to the tracker
  • Each announce: ~200 bytes request, ~2 KB response (list of 50 peers)
  • Bandwidth: 111K × 2 KB = 222 MB/sec tracker outbound

DHT Scale

  • 100 million DHT nodes globally
  • Routing table per node: 160 k-buckets × 8 entries = 1,280 peers ≈ 50 KB
  • DHT lookup: O(log N) hops. log2(100M) ≈ 27 hops, but k-bucket routing reduces this to ~4-6 hops in practice
  • Each hop: ~50ms (UDP round trip) → 200-300ms total DHT lookup time

Download Throughput

  • A well-seeded torrent with 100 peers, each contributing 1 MB/sec
  • Theoretical max: 100 MB/sec (limited by receiver’s bandwidth, not peer count)
  • Practical: 20-50 MB/sec (connection overhead, slow peers, churn)
  • A 4 GB file at 30 MB/sec ≈ 2.2 minutes to download

3. API Design (3 min)

BitTorrent is a peer-to-peer protocol, not a traditional client-server API. But there are three distinct interfaces.

Tracker HTTP API (BEP 3)

// Announce: peer registers with tracker and gets peer list
GET /announce?info_hash=<20-byte hash>&peer_id=<20-byte id>&port=6881
    &uploaded=0&downloaded=0&left=4294967296&event=started&compact=1
  Response 200 (bencoded):
  {
    "interval": 1800,                   // re-announce in 30 min
    "complete": 150,                     // seeders
    "incomplete": 340,                   // leechers
    "peers": <compact binary: 6 bytes per peer (4 IP + 2 port)>
  }

// Scrape: get stats for a torrent without joining
GET /scrape?info_hash=<20-byte hash>
  Response 200:
  {
    "files": {
      "<info_hash>": {
        "complete": 150,
        "downloaded": 12345,
        "incomplete": 340
      }
    }
  }

Peer Wire Protocol (TCP, BEP 3)

// After connecting to a peer via TCP:

1. Handshake:
   <pstrlen=19><pstr="BitTorrent protocol"><reserved=8 bytes><info_hash=20><peer_id=20>

2. Messages (length-prefixed):
   choke          → "I will not send you pieces"
   unchoke        → "I will send you pieces now"
   interested     → "I want pieces you have"
   not_interested → "I don't need anything from you"
   have(index)    → "I now have piece #index"
   bitfield(bits) → "Here's what pieces I have" (sent on connect)
   request(index, begin, length) → "Send me this block"
   piece(index, begin, data)     → "Here's the data you requested"
   cancel(index, begin, length)  → "Never mind, cancel that request"

DHT Protocol (UDP, BEP 5 — Kademlia)

// KRPC messages (bencoded, over UDP):

ping(sender_id)
  → pong(responder_id)

find_node(sender_id, target_id)
  → nodes: [closest known nodes to target_id]

get_peers(sender_id, info_hash)
  → peers: [list of peers for this torrent]
  → OR nodes: [closer nodes to ask]  // if this node doesn't know peers

announce_peer(sender_id, info_hash, port, token)
  → "OK, I've recorded that you're serving this torrent"

4. Data Model (3 min)

Torrent File (.torrent — bencoded dictionary)

{
  "info": {
    "name": "movie.mkv",
    "piece length": 262144,            // 256 KB per piece
    "pieces": <concatenated SHA-1 hashes of all pieces>,
    "length": 4294967296,              // file size (single file mode)
    // OR for multi-file:
    "files": [
      { "length": 3000000000, "path": ["movie.mkv"] },
      { "length": 1000000, "path": ["subtitles.srt"] }
    ]
  },
  "announce": "http://tracker.example.com/announce",
  "announce-list": [                    // multiple tracker tiers
    ["udp://tracker1.example.com:6969"],
    ["http://tracker2.example.com/announce"]
  ],
  "creation date": 1708632000,
  "comment": "Uploaded by user123"
}

info_hash = SHA-1(bencode(info_dict))   // unique identifier for this torrent

Tracker State (in-memory, Redis-backed)

// Per torrent
Key: torrent:{info_hash}
Type: Hash
Fields:
  complete       | int (seeders)
  incomplete     | int (leechers)
  downloaded     | int (all-time completed downloads)

// Peer list per torrent
Key: torrent:{info_hash}:peers
Type: Hash
Member: peer_id
Value: { ip, port, uploaded, downloaded, left, last_announce }
TTL per entry: 2 × announce_interval (auto-cleanup stale peers)

DHT Routing Table (per node, in-memory)

Node ID: 160-bit random identifier (generated on first run)
Routing table: 160 k-buckets
  Bucket[i]: stores up to 8 nodes whose ID differs from ours in bit position i
  Each entry: { node_id, ip, port, last_seen }

Peer storage (this node's contribution to the DHT):
  Map<info_hash, List<{ip, port, timestamp}>>
  Stores peers announced via announce_peer for info_hashes "close" to this node's ID

Client State (local, per torrent)

{
  "info_hash": "...",
  "total_pieces": 16384,
  "piece_length": 262144,
  "bitfield": [1,1,1,0,0,1,...],       // which pieces we have
  "peers": [
    {
      "peer_id": "...",
      "ip": "1.2.3.4",
      "port": 6881,
      "bitfield": [1,1,0,1,...],        // what they have
      "am_choking": true,               // am I choking them?
      "am_interested": false,           // am I interested in them?
      "peer_choking": true,             // are they choking me?
      "peer_interested": false,         // are they interested in me?
      "upload_rate": 0,                 // bytes/sec uploading to them
      "download_rate": 0               // bytes/sec downloading from them
    }
  ]
}

5. High-Level Design (12 min)

Download Flow (End to End)

User obtains torrent file or magnet link
  → BitTorrent Client:
    1. Parse torrent metadata (info_hash, piece hashes, tracker URLs)
       OR resolve magnet link via DHT to get metadata

    2. Peer Discovery (parallel):
       a. HTTP/UDP announce to tracker(s)
          → Receive list of 50-200 peers
       b. DHT lookup: get_peers(info_hash)
          → 4-6 hops through DHT → receive peers
       c. Peer Exchange (PEX): connected peers share their peer lists

    3. Connect to peers (TCP, up to 50 simultaneous connections):
       → Handshake (verify info_hash match)
       → Exchange bitfields ("here's what I have")
       → Express interest ("interested" if they have pieces I need)

    4. Download pieces:
       → Wait to be unchoked by peers
       → Select pieces to request (rarest-first strategy)
       → Request 16 KB blocks within a piece (pipeline 5-10 requests)
       → Receive blocks, assemble into complete pieces
       → Verify SHA-1 hash of completed piece
         If valid → mark as "have", announce to all peers
         If invalid → discard, re-download from different peer, ban bad peer

    5. Upload (simultaneously):
       → Unchoke peers who are uploading to us (tit-for-tat)
       → Respond to their piece requests
       → Optimistically unchoke a random peer every 30 seconds (discovery)

    6. Endgame mode (last few pieces):
       → Request remaining pieces from ALL peers simultaneously
       → Cancel duplicate requests once a piece arrives
       → Prevents slow last-mile from a single slow peer

Tracker Architecture

Clients
  → Load Balancer
    → Tracker Servers (stateless, 10+ instances):
      → On announce:
        1. Update peer entry in Redis: HSET torrent:{hash}:peers {peer_id} {data}
        2. Set TTL: EXPIRE on the peer entry (2 × interval)
        3. Get random peers: HRANDFIELD torrent:{hash}:peers 50
        4. Update stats: HINCRBY torrent:{hash} complete/incomplete ±1
        5. Return peer list to client

Redis Cluster:
  ├── Torrent stats (hash per torrent)
  ├── Peer lists (hash per torrent, entry per peer)
  └── Auto-expiry of stale peers via TTL

DHT Network Architecture

Every BitTorrent client is also a DHT node:

Node A wants peers for info_hash H:
  1. Find closest nodes to H in local routing table
  2. Send get_peers(H) to those nodes
  3. Each responds with:
     - Peers (if it knows any for H) → done!
     - Closer nodes (if it doesn't know peers for H)
  4. Recursively query closer nodes
  5. After 4-6 hops, find nodes responsible for H
  6. Those nodes return the peer list
  7. Announce: tell those nodes "I'm also a peer for H"

Routing table maintenance:
  - On every interaction with another node, update its entry in the routing table
  - Periodically refresh buckets: find_node for a random ID in each bucket's range
  - Evict nodes that don't respond to pings after 15 minutes

Components

  1. BitTorrent Client: Runs on the user’s machine. Handles peer discovery, piece selection, downloading, uploading, hash verification. Open-source implementations: libtorrent (C++), used by qBittorrent, Deluge.
  2. Tracker: Centralized peer directory. Stateless servers backed by Redis. Handles announce/scrape. Optional — DHT provides decentralized alternative.
  3. DHT Network: Distributed Kademlia hash table across all participating clients. No central infrastructure. Each node stores a small portion of the peer-torrent mapping.
  4. Web Seed Servers (optional): HTTP servers hosting the file. Clients can download pieces via HTTP GET with Range headers. Provides initial availability before enough peers exist.

6. Deep Dives (15 min)

Deep Dive 1: Piece Selection — Rarest-First Algorithm

Problem: Which pieces should a client download first? Random selection leads to a situation where common pieces are duplicated everywhere while rare pieces exist on only one or two peers. If those peers leave, the rare pieces are lost and nobody can complete the download.

Rarest-First Strategy:

For each piece I don't have:
  rarity[piece_i] = count of connected peers who have piece_i

Sort by rarity (ascending). Download the rarest pieces first.

Example:
  Piece #0: 45 peers have it → common
  Piece #7: 3 peers have it  → rare → download this first!
  Piece #12: 1 peer has it   → critical → highest priority!

Why this works:
  1. Rare pieces become less rare as downloaders get them and re-share
  2. Prevents the "missing piece" scenario where everyone is stuck at 99%
  3. Maximizes the robustness of the swarm — if any peer leaves,
     the pieces they had are available from multiple other peers
  4. Natural load balancing — rare pieces are requested from their few holders,
     common pieces are spread across many peers (less load per peer)

Exception: First piece is random.

When a new peer joins with nothing:
  - Rarest-first would have them download the rarest piece
  - But they can't upload anything until they have at least one complete piece
  - And rare pieces might only be available from slow peers

  Solution: Download the first 1-4 pieces randomly.
  This quickly gives the new peer something to upload,
  allowing it to participate in tit-for-tat and get unchoked faster.

  After initial pieces: switch to rarest-first.

Strict piece priority within a piece:

Once a piece is partially downloaded (some blocks received, some pending):
  - Prioritize completing this piece over starting new ones
  - A partially downloaded piece is useless (can't verify hash, can't share)
  - Complete pieces can be verified and immediately offered to other peers
  - This is called "strict priority" and applies per-piece

Deep Dive 2: Tit-for-Tat Incentive Mechanism (Choking Algorithm)

Problem: Without incentives, rational peers would download without uploading (free-riding). This collapses the system — if everyone free-rides, nobody uploads, and nobody can download.

BitTorrent’s solution: Choking/Unchoking

Each peer maintains two states for each connection:
  - am_choking: if true, I refuse to upload to this peer
  - am_interested: if true, I want pieces from this peer

Every 10 seconds, re-evaluate who to unchoke:
  1. Rank all interested peers by their upload rate TO ME
     (how fast they've been sending me data)
  2. Unchoke the top 4 peers (reward the best uploaders)
  3. Choke everyone else (stop uploading to free-riders)

This creates a reciprocal relationship:
  - If you upload fast to me → I unchoke you → you can download from me
  - If you don't upload to me → I choke you → you get nothing from me
  - Result: fast uploaders get fast downloads. Free-riders get nothing.

Optimistic unchoking (exploration):

Every 30 seconds, unchoke one RANDOM peer (regardless of their upload rate).

Why:
  1. Discovery: this peer might turn out to be a fast uploader
     (we won't know until we try)
  2. Bootstrap: new peers have nothing to upload yet. Without optimistic
     unchoking, they would never get unchoked and could never get started.
  3. Fairness: even slow peers eventually get some data

Mechanism:
  - 4 regular unchoke slots (top uploaders) + 1 optimistic slot = 5 total
  - Rotate the optimistic slot every 30 seconds
  - If the optimistic peer turns out to be fast, they'll earn a regular slot
    on the next 10-second re-evaluation

Anti-snubbing:

If a peer has unchoked us but we haven't received any data from them
for 60 seconds → consider them "snubbing" us.

Response:
  - Don't count them in the unchoke algorithm
  - Try more optimistic unchokes to find better peers
  - Increase the number of simultaneous connections to compensate

Seed behavior (when we have all pieces):

Seeders don't need anything from other peers (they have the complete file).
So the regular tit-for-tat doesn't apply.

Seeder unchoke strategy:
  - Unchoke peers we can upload to fastest
    (maximize outgoing bandwidth utilization)
  - This naturally distributes to the peers with the best connections,
    who will then re-distribute to others
  - Rotate unchoke slots more frequently (every 10 sec instead of 30)
    to serve more peers

Deep Dive 3: DHT — Decentralized Peer Discovery (Kademlia)

Why DHT? Trackers are centralized — they can go offline, be taken down, or become bottlenecks. DHT distributes the peer directory across all participating nodes, making the system resilient and uncensorable.

Kademlia Distance Metric:

distance(node_A, node_B) = node_A.id XOR node_B.id
  - This is NOT geographic distance — it's mathematical distance in ID space
  - IDs are 160-bit numbers (same space as SHA-1 info_hashes)
  - XOR distance has useful properties:
    - d(A, A) = 0 (identity)
    - d(A, B) = d(B, A) (symmetry)
    - d(A, C) <= d(A, B) + d(B, C) (triangle inequality)

The "closest" nodes to an info_hash are responsible for storing the peer list
for that torrent. Closeness = XOR distance.

Routing Table (k-buckets):

Each node maintains 160 k-buckets (one per bit of the ID space):
  Bucket[0]: nodes that differ from us in bit 0 (very far — half the ID space)
  Bucket[1]: nodes that differ in bit 1 but not bit 0 (quarter of ID space)
  ...
  Bucket[159]: nodes that share 159 bits with us (very close — only 1 bit different)

Each bucket holds up to k=8 nodes (IP, port, node_id, last_seen).

Properties:
  - We know MANY nodes far from us (high-order buckets are full)
  - We know FEW nodes close to us (low-order buckets may be empty)
  - Total entries: ~160 × 8 = 1,280 (small, fits in memory)
  - Yet this is enough to route to ANY node in O(log N) hops!

Lookup algorithm (iterative):

find_peers(info_hash):
  1. From routing table, find alpha=3 closest nodes to info_hash
  2. Send get_peers(info_hash) to all 3 in parallel
  3. Each responds with either:
     a. Peers! → add to result set
     b. Closer nodes → add to candidate set
  4. From candidate set, pick the 3 closest nodes not yet queried
  5. Send get_peers to them
  6. Repeat until:
     - We have enough peers (e.g., 50), OR
     - No closer nodes are being returned (convergence)
  7. Typically converges in 4-6 rounds (each round ~50ms)

Total lookup time: 200-300ms to find peers for any torrent in a 100M-node DHT.

NAT Traversal:

Most peers are behind NAT (home routers). They can connect OUT but cannot
receive incoming connections.

Solutions:
  1. UPnP/NAT-PMP: Client requests port forwarding from the router
     Works with most consumer routers. Automatic, no user configuration.

  2. UDP hole punching: Both peers behind NAT send UDP packets to each other
     simultaneously via a rendezvous server. NAT entries are created,
     allowing bidirectional communication.

  3. Relay (last resort): Peer A connects to Peer B via a publicly reachable
     Peer C that relays traffic. Increases latency and loads Peer C,
     so used only when hole punching fails.

  4. WebRTC (for browser-based clients): Built-in ICE/STUN/TURN for
     NAT traversal. More complex but works in browsers.

7. Extensions (2 min)

  • Magnet links: Instead of distributing .torrent files, use a URI containing just the info_hash: magnet:?xt=urn:btih:<info_hash>. The client resolves the info_hash via DHT to find peers, then downloads the torrent metadata from peers before starting the actual file download. Eliminates the need for a torrent hosting site.
  • Selective download: For multi-file torrents, allow users to select which files to download. Mark unwanted files’ pieces as “not interested.” Still share pieces of the files being downloaded. Client UI shows file list with checkboxes.
  • Streaming (sequential download): Override rarest-first with sequential piece ordering for video streaming. Trade-off: reduces swarm health (pieces are less evenly distributed) but enables immediate playback. Hybrid: download the first 5% sequentially (for playback buffer) then switch to rarest-first.
  • Encryption (Protocol Obfuscation): Encrypt the peer wire protocol using RC4 or AES to prevent ISP traffic shaping and deep packet inspection. BEP 10 (Message Stream Encryption). The handshake uses Diffie-Hellman key exchange. Does not provide anonymity (IP is still visible) but prevents content-based blocking.
  • Private trackers and access control: Require user authentication to access the tracker. Each user has a unique passkey in the announce URL. Track upload/download ratios per user. Enforce minimum ratio requirements (e.g., must maintain 1:1 ratio). Improves swarm health because users are incentivized to seed.