| Package Name | Ecosystem | Vulnerable Versions | First Patched Version |
|---|---|---|---|
| vllm | pip | >= 0.5.1, < 0.11.0 | 0.11.0 |
The vulnerability is a resource-exhaustion (DoS) issue in the vLLM OpenAI-Compatible Server, identified as GHSA-6fvq-23cw-5628. The root cause is the improper handling of user-supplied Jinja2 templates through the chat_template and chat_template_kwargs parameters in chat completion requests.
An attacker can craft a malicious Jinja2 template with constructs like nested loops, which, when rendered, consume excessive CPU and memory, leading to a denial-of-service. The vulnerability is exacerbated because the chat_template can be overwritten by a value within chat_template_kwargs, bypassing simple checks that might only forbid the top-level chat_template parameter.
The analysis of the patch commit 7977e5027c2250a4abc1f474c5619c40b4e5682f reveals two key functions involved in the vulnerability:
vllm.entrypoints.openai.serving_chat.ServingChat.create_chat_completion: This is the public-facing API endpoint that receives the user's request. Before the patch, it would pass the user-provided template parameters down to the processing engine without sufficient validation. The patch introduces a crucial security control: a trust_request_chat_template flag. If this flag is not enabled, the server will reject any request that attempts to provide a custom chat template, effectively blocking the attack at the entry point.
vllm.entrypoints.chat_utils.apply_hf_chat_template: This function is responsible for applying the chat template to the conversation. The vulnerability was present here because it used **kwargs to pass arguments to the underlying tokenizer.apply_chat_template function. This allowed the chat_template from chat_template_kwargs to be passed through, enabling the exploit. The patch fixes this by introducing a new function, resolve_chat_template_kwargs, which sanitizes the keyword arguments, specifically filtering out chat_template and other unexpected variables before they are passed to the tokenizer. This ensures that even if a malicious template gets past the initial checks, it won't be used during the rendering process.
In summary, an exploit would involve sending a malicious Jinja2 template to the create_chat_completion endpoint. This template would then be processed by apply_hf_chat_template, causing the server to hang or crash. The identified functions would be present in any runtime profile or stack trace during such an attack.
Ongoing coverage of React2Shell