The vulnerability stems from two related issues in the voice call webhook processing logic. First, the system was susceptible to webhook replay attacks because it used a newly generated random UUID as the identifier for each incoming event. This meant that the deduplication mechanism, which checked for previously seen event IDs, was ineffective against replayed requests, as each replay would be assigned a new, unique ID. The vulnerable functions TwilioProvider.normalizeEvent and PlivoProvider.normalizeEvent were responsible for generating these random IDs. The processEvent function was then unable to detect these duplicates.
Second, the system was vulnerable to stale call-state transitions. A replayed speech event from a previous turn could be accepted for a new, active turn. This was because the resolveTranscriptWaiter function, which waits for a user's speech, did not validate that the incoming speech event was intended for the current interaction. The patch addresses this by introducing a turnToken that is generated for each turn in continueCall and validated in resolveTranscriptWaiter.
The overall fix involves multiple layers: introducing a stable dedupeKey based on provider-specific headers (like i-twilio-idempotency-token or Plivo's nonce) for reliable replay detection, adding a replay cache in webhook-security.ts to flag replayed requests early, and implementing the turnToken mechanism to prevent stale state transitions.