6.5

CVSS Score

3.1

-

CVSS Score

Basic Information

Concerned about an active attack path?

Talk to our security experts and see Miggo in action.

Miggo Vulnerability Database

→

CVE-2026-44223

CVE-2026-44223: vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

(GitHub Advisory)

Miggo Vulnerability Database

→

CVE-2026-44223

CVE-2026-44223:

6.5

CVSS Score

3.1

-

CVSS Score

Basic Information

Is this CVE running in your environment?

Easily map the attack path and prioritize which CVEs are a threat to your organization

Validate Exposure

Technical Details

Package Name	Ecosystem	Vulnerable Versions	First Patched Version
vllm	pip	>= 0.18.0, < 0.20.0	0.20.0

Technical Details

Vulnerability Intelligence
Miggo AI

Root Cause Analysis

The vulnerability lies in the propose method of the ExtractHiddenStatesProposer class in vLLM's speculative decoding implementation. A refactoring in version 0.18.0 removed a call to .unsqueeze(-1) on the returned tensor, sampled_token_ids. This change was incorrect because, after the first decode step, the rejection sampler can produce a tensor with a shape of (batch_size, 2). The downstream code, specifically the part that applies penalty parameters, expects a tensor of shape (batch_size, 1). This shape mismatch triggers a RuntimeError, crashing the server. The provided patch bd1c3a9c34ce623edca021623c720fff1b8cf588 confirms this analysis by explicitly slicing the sampled_token_ids tensor to [:, :1], ensuring it always has the correct shape before being returned. Therefore, the ExtractHiddenStatesProposer.propose function is the direct source of the vulnerability.

Vulnerable functions

ExtractHiddenStatesProposer.propose

vllm/v1/spec_decode/extract_hidden_states.py

The `propose` function in `ExtractHiddenStatesProposer` returned a tensor `sampled_token_ids` with an incorrect shape after the first decode step. Specifically, it returned a tensor of shape `(batch_size, 2)` instead of the expected `(batch_size, 1)`. This shape mismatch would cause a `RuntimeError` during the application of sampling penalty parameters (like `repetition_penalty`), leading to a crash of the vLLM engine process.

Vulnerability Intelligence
Miggo AI

Unlock WAF rules for this CVE

Generate vendor-ready rules for the observed attack patterns, plus reasoning and safe deployment guidance

Get WAF rules

WAF Protection Rules

WAF Rule

W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.

Reasoning

*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.