What a 100K-Concurrent WebRTC Signaling Architecture Looks Like
15 min read
Wed Feb 18 2026

What Changes in WebRTC at 100K Concurrency

At this scale, signaling becomes a control-plane design problem. You need deterministic room placement, admission control, backpressure, and resilient reconnect semantics.

Baseline topology

  • Edge gateway: TLS, auth, and websocket termination.
  • Signaling cluster: room membership, offer/answer routing, ICE lifecycle.
  • Message bus: cross-node fanout for participant updates.
  • Media plane (SFU): isolated from signaling failures.

Deterministic Room Placement

Avoid random node selection. Use rendezvous hashing so reconnects tend to land on the same shard when healthy, reducing cross-node chatter and cache misses.

room-placement.tsts
interface SignalingNode {
  id: string;
  region: string;
  activeConnections: number;
  maxConnections: number;
}

function score(nodeId: string, roomId: string) {
  return murmurHash(nodeId + ":" + roomId);
}

export function selectNode(roomId: string, nodes: SignalingNode[]) {
  const healthy = nodes.filter((n) => n.activeConnections < n.maxConnections);
  if (healthy.length === 0) throw new Error("No capacity available");

  return healthy
    .map((node) => ({
      node,
      hash: score(node.id, roomId),
      loadRatio: node.activeConnections / node.maxConnections,
    }))
    .sort((a, b) => b.hash - a.hash || a.loadRatio - b.loadRatio)[0].node;
}

Backpressure and Fanout Discipline

One noisy room can starve a node. Apply per-room and per-connection budgets. Drop non-critical updates first (e.g., speaking indicators) before critical protocol events.

fanout-budget.tsts
class TokenBucket {
  constructor(
    private capacity: number,
    private refillPerSecond: number,
    private tokens = capacity,
  ) {}

  tick(elapsedSeconds: number) {
    this.tokens = Math.min(this.capacity, this.tokens + elapsedSeconds * this.refillPerSecond);
  }

  tryConsume(cost = 1) {
    if (this.tokens < cost) return false;
    this.tokens -= cost;
    return true;
  }
}

export function publishParticipantUpdate(roomBucket: TokenBucket, payload: unknown) {
  if (!roomBucket.tryConsume(1)) {
    metrics.increment("rtc.room_updates_dropped");
    return;
  }

  bus.publish("room.participant.updated", payload);
}

Autoscaling Signals That Matter

CPU alone is insufficient. You need domain metrics in autoscaling: active connections, fanout queue depth, join latency p95, and reconnect failure rate.

signaling-hpa.yamlyaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: signaling-hpa
spec:
  minReplicas: 8
  maxReplicas: 64
  metrics:
    - type: Pods
      pods:
        metric:
          name: active_websocket_connections
        target:
          type: AverageValue
          averageValue: "1700"
    - type: Pods
      pods:
        metric:
          name: fanout_queue_depth
        target:
          type: AverageValue
          averageValue: "250"
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Resilience principle

If a signaling node dies, blast radius should be shard-local and clients must reconnect within an SLO window. Design for partial failure, not perfect uptime.