CVE-2025-61620: vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server
6.5
Basic Information
Technical Details
Package Name | Ecosystem | Vulnerable Versions | First Patched Version |
---|---|---|---|
vllm | pip | >= 0.5.1, < 0.11.0 | 0.11.0 |
Vulnerability Intelligence
Miggo AI
Root Cause Analysis
The vulnerability is a resource-exhaustion (DoS) issue in the vLLM OpenAI-Compatible Server, identified as GHSA-6fvq-23cw-5628. The root cause is the improper handling of user-supplied Jinja2 templates through the chat_template
and chat_template_kwargs
parameters in chat completion requests.
An attacker can craft a malicious Jinja2 template with constructs like nested loops, which, when rendered, consume excessive CPU and memory, leading to a denial-of-service. The vulnerability is exacerbated because the chat_template
can be overwritten by a value within chat_template_kwargs
, bypassing simple checks that might only forbid the top-level chat_template
parameter.
The analysis of the patch commit 7977e5027c2250a4abc1f474c5619c40b4e5682f
reveals two key functions involved in the vulnerability:
-
vllm.entrypoints.openai.serving_chat.ServingChat.create_chat_completion
: This is the public-facing API endpoint that receives the user's request. Before the patch, it would pass the user-provided template parameters down to the processing engine without sufficient validation. The patch introduces a crucial security control: atrust_request_chat_template
flag. If this flag is not enabled, the server will reject any request that attempts to provide a custom chat template, effectively blocking the attack at the entry point. -
vllm.entrypoints.chat_utils.apply_hf_chat_template
: This function is responsible for applying the chat template to the conversation. The vulnerability was present here because it used**kwargs
to pass arguments to the underlyingtokenizer.apply_chat_template
function. This allowed thechat_template
fromchat_template_kwargs
to be passed through, enabling the exploit. The patch fixes this by introducing a new function,resolve_chat_template_kwargs
, which sanitizes the keyword arguments, specifically filtering outchat_template
and other unexpected variables before they are passed to the tokenizer. This ensures that even if a malicious template gets past the initial checks, it won't be used during the rendering process.
In summary, an exploit would involve sending a malicious Jinja2 template to the create_chat_completion
endpoint. This template would then be processed by apply_hf_chat_template
, causing the server to hang or crash. The identified functions would be present in any runtime profile or stack trace during such an attack.
Vulnerable functions
apply_hf_chat_template
vllm/entrypoints/chat_utils.py
ServingChat.create_chat_completion
vllm/entrypoints/openai/serving_chat.py