Why Webhook-Driven State Machines Beat Polling for Telephony

Polling fails both economically and semantically at scale. This post breaks down inbox/outbox, legal transition guards, and replay-safe call orchestration.

Twilio

System Design

State Machines

Reliability

Why Polling Becomes Incorrect (Not Just Expensive)

At telephony scale, polling introduces temporal ambiguity. You may observe transitions out of order, process stale state, and duplicate side effects when workers race each other.

Example: 4,000 active calls polled every 3 seconds generates 80,000 requests/minute before any business logic. More importantly, it still cannot guarantee correct event ordering.

Webhook + State Machine Architecture

A safer model is an append-only event inbox plus deterministic state transition rules. Provider events are persisted first, then consumed through guarded transitions.

Inbox table stores every provider callback with idempotency key.
Transition function validates legal edges.
Outbox emits side effects (CRM, email, analytics) after commit.
Replay worker can rebuild state from inbox after incidents.

call-transition.tsts

type CallState =
  | "initiated"
  | "ringing"
  | "in_progress"
  | "completed"
  | "failed"
  | "busy"
  | "no_answer";

const LEGAL_TRANSITIONS: Record<CallState, Set<CallState>> = {
  initiated: new Set(["ringing", "failed"]),
  ringing: new Set(["in_progress", "busy", "no_answer", "failed"]),
  in_progress: new Set(["completed", "failed", "busy"]),
  completed: new Set(),
  failed: new Set(),
  busy: new Set(),
  no_answer: new Set(),
};

export function assertTransition(current: CallState, next: CallState) {
  if (!LEGAL_TRANSITIONS[current].has(next)) {
    throw new Error("Illegal call transition: " + current + " -> " + next);
  }
}

Transactional apply with optimistic lock

apply-event.sqlsql

BEGIN;

INSERT INTO call_event_inbox (provider_event_id, call_id, payload)
VALUES ($1, $2, $3)
ON CONFLICT (provider_event_id) DO NOTHING;

-- If no row inserted, event is duplicate; stop here.

UPDATE calls
SET state = $4,
    version = version + 1,
    updated_at = now()
WHERE id = $2
  AND version = $5;

-- If update count is 0, another worker won race; re-read + retry.

INSERT INTO outbox (event_type, aggregate_id, payload)
VALUES ('call_state_changed', $2, $6);

COMMIT;

Idempotency and Replay

Exactly-once delivery is unrealistic across distributed systems. Build for at-least-once delivery and enforce idempotency in your write path.

Persist provider event ID under unique index.
Use deterministic transition function with version checks.
Make side effects idempotent via outbox dedupe keys.
Keep replay jobs to regenerate call state for audit and disaster recovery.

Operational metric to watch

Track duplicate-event rate and illegal-transition rejection rate. Rising rejection rate is often an early signal of upstream provider behavior changes.