WebRTC is the only way to do real-time video in a browser without plugins. Every video calling app you use - Google Meet, Zoom’s web client, Discord - runs on WebRTC under the hood. The API is built into every modern browser. You do not need to install anything.

The problem is that WebRTC is easy to demo and hard to ship. The “hello world” takes 30 minutes. Making it work reliably across corporate firewalls, mobile networks, and symmetric NATs takes months of understanding that the tutorials skip.

How a WebRTC Connection Actually Happens

Two browsers cannot just connect to each other. They do not know each other’s IP addresses, and even if they did, NAT routers block unsolicited incoming connections. WebRTC solves this with a process called ICE (Interactive Connectivity Establishment).

Here is the sequence:

  1. Signaling: Both peers exchange connection metadata (SDP offers/answers) through your server. WebRTC does not define how signaling works - you pick the transport (WebSocket, Firebase, HTTP polling)
  2. ICE candidate gathering: Each peer discovers its own network addresses - local IP, public IP (via STUN), and relay address (via TURN)
  3. Connectivity checks: Peers try every combination of their candidates to find a path that works
  4. DTLS handshake: Once connected, peers establish an encrypted channel
  5. Media flow: Video and audio stream over SRTP
// Simplified WebRTC connection setup
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.example.com:3478' },
    {
      urls: 'turn:turn.example.com:3478',
      username: 'user',
      credential: 'pass'
    }
  ]
});

// Add local media
const stream = await navigator.mediaDevices.getUserMedia({
  video: { width: 1280, height: 720 },
  audio: true
});
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// Create and send offer via your signaling server
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signalingServer.send({ type: 'offer', sdp: offer.sdp });

// Handle ICE candidates
pc.onicecandidate = (event) => {
  if (event.candidate) {
    signalingServer.send({
      type: 'candidate',
      candidate: event.candidate
    });
  }
};

// Handle remote answer
signalingServer.on('answer', async (answer) => {
  await pc.setRemoteDescription(new RTCSessionDescription(answer));
});

// Handle remote candidates
signalingServer.on('candidate', async (candidate) => {
  await pc.addIceCandidate(new RTCIceCandidate(candidate));
});

STUN, TURN, and Why Most Calls Need a Relay

STUN (Session Traversal Utilities for NAT) is simple. Your browser sends a packet to a STUN server, and the server replies with your public IP address and port. This is your “server-reflexive candidate.” STUN is lightweight - it is just a single request/response.

TURN (Traversal Using Relays around NAT) is the fallback. When direct connection and STUN both fail, your media flows through the TURN server. The TURN server relays every packet between the two peers. This is expensive in bandwidth but it always works.

Here is what determines whether you need TURN:

NAT Type Direct connection STUN works TURN needed
Full cone NAT Yes Yes No
Address-restricted NAT Sometimes Yes Sometimes
Port-restricted NAT Rarely Sometimes Usually
Symmetric NAT No No Yes

The reality: In production, 15-30% of connections require TURN. Corporate networks, mobile carriers, and many home routers use symmetric NAT or strict firewalls. If you do not provide a TURN server, those users cannot connect. Period.

Google provides free STUN servers (stun:stun.l.google.com:19302), but no free TURN. TURN costs real money because it relays the actual media stream.

Signaling: The Part WebRTC Does Not Handle

WebRTC is a peer-to-peer protocol, but peers need a way to find each other and exchange connection information. This is signaling, and WebRTC deliberately does not specify how it works.

Common signaling approaches:

WebSocket server - The most common choice. Build a simple WebSocket server that routes messages between peers. You need to handle room management, user identification, and message routing.

// Minimal signaling server with WebSocket
const rooms = new Map();

wss.on('connection', (ws) => {
  ws.on('message', (data) => {
    const msg = JSON.parse(data);

    if (msg.type === 'join') {
      if (!rooms.has(msg.room)) rooms.set(msg.room, new Set());
      rooms.get(msg.room).add(ws);
      ws.room = msg.room;
    }

    if (['offer', 'answer', 'candidate'].includes(msg.type)) {
      // Forward to other peers in the room
      rooms.get(ws.room)?.forEach(peer => {
        if (peer !== ws && peer.readyState === 1) {
          peer.send(JSON.stringify(msg));
        }
      });
    }
  });

  ws.on('close', () => {
    rooms.get(ws.room)?.delete(ws);
  });
});

Firebase Realtime Database - Good for prototyping. Each peer writes its signaling data to a shared document, and the other peer listens for changes. No server to build. Gets expensive at scale.

HTTP polling - Works but adds latency. Each peer polls your API for new signaling messages. Only use this if WebSocket is not an option.

Self-Hosting coturn

coturn is the standard open-source TURN server. Here is a production-ready setup:

# Install
apt install coturn

# /etc/turnserver.conf
listening-port=3478
tls-listening-port=5349
listening-ip=0.0.0.0
external-ip=YOUR_PUBLIC_IP
relay-ip=YOUR_PUBLIC_IP

# Authentication
use-auth-secret
static-auth-secret=YOUR_LONG_RANDOM_SECRET

# TLS (required for turns: protocol)
cert=/etc/letsencrypt/live/turn.example.com/fullchain.pem
pkey=/etc/letsencrypt/live/turn.example.com/privkey.pem

# Limits
total-quota=100
user-quota=10
max-bps=1000000

# Security
no-multicast-peers
denied-peer-ip=10.0.0.0-10.255.255.255
denied-peer-ip=172.16.0.0-172.31.255.255
denied-peer-ip=192.168.0.0-192.168.255.255

# Logging
log-file=/var/log/turnserver.log

Critical notes on coturn deployment:

  • Bandwidth is the cost driver: Each relayed call uses 1-3 Mbps per direction. A server handling 50 simultaneous relayed calls needs 300+ Mbps of bandwidth. Cloud egress costs dominate
  • Use TURN over TLS (port 443): Many corporate firewalls block non-HTTPS traffic. Running TURN on port 443 with TLS gets through almost everything
  • Generate short-lived credentials: Do not use static username/password. Generate time-limited HMAC credentials on your backend:
import hashlib, hmac, base64, time

def get_turn_credentials(secret, user, ttl=86400):
    expiry = int(time.time()) + ttl
    username = f"{expiry}:{user}"
    password = base64.b64encode(
        hmac.new(secret.encode(), username.encode(), hashlib.sha1).digest()
    ).decode()
    return username, password
  • Deploy regionally: TURN adds latency because media bounces through the server. A TURN server in US-East relaying between two users in Mumbai adds 300ms+ of round-trip latency. Deploy TURN servers close to your users

SFU vs Mesh vs MCU

For group calls (3+ participants), you need to choose a topology:

Topology How it works Upload per user Download per user Server CPU Best for
Mesh Every peer connects to every other peer (N-1) streams (N-1) streams None 2-4 participants
SFU Peers send one stream to server, server forwards to all 1 stream (N-1) streams Low (just forwarding) 5-50 participants
MCU Peers send to server, server mixes into one stream 1 stream 1 stream Very high (transcoding) Rare, specialized use

Mesh works for 2-3 people. At 4 participants, each browser sends 3 video streams and receives 3. At 5 participants, it is 4 up and 4 down. Mobile devices and weak connections fall apart.

SFU (Selective Forwarding Unit) is the industry standard. Every production video calling app uses an SFU. The user uploads one stream, and the SFU forwards it to everyone else. The SFU can also do smart things like:

  • Forward only the active speaker’s video at high resolution
  • Send lower resolution to participants with slow connections (simulcast)
  • Drop video entirely for audio-only participants

Open source SFUs worth evaluating in 2026:

  • mediasoup: Node.js/Rust based. Mature, well-documented, used in production by many companies
  • Pion: Written in Go. Lower level, more flexible, excellent if your backend is Go
  • LiveKit: Full platform including SFU, client SDKs, and cloud hosting. The most complete solution if you want managed infrastructure
  • Janus: C-based, plugin architecture. Mature but showing its age

MCU composites all streams into one. Rarely used because the server CPU cost is enormous - transcoding N video streams in real-time requires serious hardware. The only advantage is that each participant downloads a single stream, which helps participants with very limited bandwidth.

The Things That Will Bite You

After building multiple WebRTC applications, here are the issues that consume the most debugging time:

  1. Safari: Safari’s WebRTC implementation has quirks. Test on Safari from day one, not after launch
  2. Renegotiation: Adding or removing tracks mid-call requires SDP renegotiation. This is the most fragile part of WebRTC. Use “unified plan” SDP (the default in modern browsers) and test thoroughly
  3. Bandwidth estimation: WebRTC’s built-in bandwidth estimation (GCC) can be aggressive. On unstable connections, it oscillates between high and low quality. Implementing proper simulcast with manual layer switching gives you more control
  4. Firewall traversal in China: The Great Firewall blocks STUN/TURN traffic unpredictably. If you need to serve users in China, you need TURN servers inside China
  5. Mobile background behavior: iOS kills the WebRTC connection when the app goes to background. Android is more lenient but still unreliable. Your app needs to handle reconnection gracefully

WebRTC is powerful and battle-tested. But “it works in my local demo” and “it works for 10,000 users across 40 countries” are separated by a significant amount of infrastructure and edge-case handling. Start with a solid TURN deployment and an SFU architecture, and you will avoid most of the pain.