| Package Name | Ecosystem | Vulnerable Versions | First Patched Version |
|---|---|---|---|
| vllm | pip | >= 0.10.2, < 0.11.1 | 0.11.1 |
The vulnerability stems from the unsafe deserialization of user-provided tensor embeddings using torch.load. In PyTorch versions 2.8.0 and later, integrity checks for sparse tensors are disabled by default. This allows an attacker to craft a malicious sparse tensor that causes an out-of-bounds write when converted to a dense tensor via to_dense(), leading to a denial-of-service (crash) and potentially remote code execution.
The primary vulnerable function identified is CompletionRenderer.load_prompt_embeds in vllm/entrypoints/renderer.py. This function is responsible for handling the prompt_embeds parameter in the Completions API. The torch.load call occurs within a nested function, _load_and_validate_embed.
The analysis of the patch commit 58fab50d82838d5014f4a14d991fdb9352c9c84b reveals that the fix was not to add the missing validation (torch.sparse.check_sparse_tensor_invariants), but rather to disable the feature by default. The feature can only be re-enabled by explicitly setting the --enable-prompt-embeds flag.
The same commit also introduced a similar guard (--enable-mm-embeds) for functions handling image embeddings, specifically ChatParser.parse_image_embeds and AsyncChatParser.parse_image_embeds. This strongly suggests that the same deserialization vulnerability existed for multimodal embeddings, even though it was not explicitly mentioned in the initial vulnerability description. Therefore, these functions are also included as likely vulnerable.
CompletionRenderer.load_prompt_embedsvllm/entrypoints/renderer.py
ChatParser.parse_image_embedsvllm/entrypoints/chat_utils.py
AsyncChatParser.parse_image_embedsvllm/entrypoints/chat_utils.py