| Package Name | Ecosystem | Vulnerable Versions | First Patched Version |
|---|---|---|---|
| vllm | pip | >= 0.5.5, < 0.11.1 | 0.11.1 |
The vulnerability is a Denial of Service (DoS) in vLLM's processing of multimodal and text embeddings. The root cause is the insufficient validation of the shape of user-provided embedding tensors. An attacker can provide an embedding with a correct number of dimensions (ndim) but an incorrect shape (e.g., wrong hidden dimension size), which causes the vLLM engine to crash when it attempts to process the tensor.
The vulnerability manifests in several functions that act as entrypoints for processing embeddings:
MultiModalProcessor._to_mm_items: This function orchestrates the processing of multi_modal_data. It calls MultiModalDataParser, which fails to validate the full tensor shape.ImageContentParser.parse_image_embeds (and its async counterpart): These functions handle image embeddings within the chat API, passing them down to the vulnerable processing logic.BaseRenderer.load_prompt_embeds: This function handles text embeddings (prompt_embeds), which are also susceptible to the same shape validation flaw.The provided patch does not fix the underlying shape validation issue. Instead, it acts as a mitigation by introducing flags (--enable-mm-embeds and --enable-prompt-embeds) that are disabled by default. The vulnerable functions are modified to check for these flags before processing any embeddings. Therefore, the identified functions are the exact locations in the code where the vulnerable logic path begins. Before the patch, these functions would unconditionally process user-provided embeddings, making them the starting point of the exploit.
MultiModalProcessor._to_mm_itemsvllm/multimodal/processing.py
ImageContentParser.parse_image_embedsvllm/entrypoints/chat_utils.py
AsyncImageContentParser.parse_image_embedsvllm/entrypoints/chat_utils.py
BaseRenderer.load_prompt_embedsvllm/entrypoints/renderer.py
Ongoing coverage of React2Shell