What a 100K-Concurrent WebRTC Signaling Architecture Looks Like
What
15 min read
Wed Feb 18 2026

What a 100K-Concurrent WebRTC Signaling Architecture Looks Like

A control-plane deep dive into deterministic room placement, backpressure budgets, autoscaling signals, and failure containment at high concurrency.

WebRTC
LiveKit
Distributed Systems
Realtime

What Changes in WebRTC at 100K Concurrency

At this scale, signaling becomes a control-plane design problem. You need deterministic room placement, admission control, backpressure, and resilient reconnect semantics.

Baseline topology

  • Edge gateway: TLS, auth, and websocket termination.
  • Signaling cluster: room membership, offer/answer routing, ICE lifecycle.
  • Message bus: cross-node fanout for participant updates.
  • Media plane (SFU): isolated from signaling failures.

Deterministic Room Placement

Avoid random node selection. Use rendezvous hashing so reconnects tend to land on the same shard when healthy, reducing cross-node chatter and cache misses.

room-placement.tsts
interface SignalingNode {
  id: string;
  region: string;
  activeConnections: number;
  maxConnections: number;
}

function score(nodeId: string, roomId: string) {
  return murmurHash(nodeId + ":" + roomId);
}

export function selectNode(roomId: string, nodes: SignalingNode[]) {
  const healthy = nodes.filter((n) => n.activeConnections < n.maxConnections);
  if (healthy.length === 0) throw new Error("No capacity available");

  return healthy
    .map((node) => ({
      node,
      hash: score(node.id, roomId),
      loadRatio: node.activeConnections / node.maxConnections,
    }))
    .sort((a, b) => b.hash - a.hash || a.loadRatio - b.loadRatio)[0].node;
}

Backpressure and Fanout Discipline

One noisy room can starve a node. Apply per-room and per-connection budgets. Drop non-critical updates first (e.g., speaking indicators) before critical protocol events.

fanout-budget.tsts
class TokenBucket {
  constructor(
    private capacity: number,
    private refillPerSecond: number,
    private tokens = capacity,
  ) {}

  tick(elapsedSeconds: number) {
    this.tokens = Math.min(this.capacity, this.tokens + elapsedSeconds * this.refillPerSecond);
  }

  tryConsume(cost = 1) {
    if (this.tokens < cost) return false;
    this.tokens -= cost;
    return true;
  }
}

export function publishParticipantUpdate(roomBucket: TokenBucket, payload: unknown) {
  if (!roomBucket.tryConsume(1)) {
    metrics.increment("rtc.room_updates_dropped");
    return;
  }

  bus.publish("room.participant.updated", payload);
}

Autoscaling Signals That Matter

CPU alone is insufficient. You need domain metrics in autoscaling: active connections, fanout queue depth, join latency p95, and reconnect failure rate.

signaling-hpa.yamlyaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: signaling-hpa
spec:
  minReplicas: 8
  maxReplicas: 64
  metrics:
    - type: Pods
      pods:
        metric:
          name: active_websocket_connections
        target:
          type: AverageValue
          averageValue: "1700"
    - type: Pods
      pods:
        metric:
          name: fanout_queue_depth
        target:
          type: AverageValue
          averageValue: "250"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Resilience principle

If a signaling node dies, blast radius should be shard-local and clients must reconnect within an SLO window. Design for partial failure, not perfect uptime.