Miggo Logo

CVE-2025-61620: vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

6.5

CVSS Score
3.1

Basic Information

EPSS Score
-
Published
10/7/2025
Updated
10/7/2025
KEV Status
No
Technology
TechnologyPython

Technical Details

CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Package NameEcosystemVulnerable VersionsFirst Patched Version
vllmpip>= 0.5.1, < 0.11.00.11.0

Vulnerability Intelligence
Miggo AIMiggo AI

Miggo AIRoot Cause Analysis

The vulnerability is a resource-exhaustion (DoS) issue in the vLLM OpenAI-Compatible Server, identified as GHSA-6fvq-23cw-5628. The root cause is the improper handling of user-supplied Jinja2 templates through the chat_template and chat_template_kwargs parameters in chat completion requests.

An attacker can craft a malicious Jinja2 template with constructs like nested loops, which, when rendered, consume excessive CPU and memory, leading to a denial-of-service. The vulnerability is exacerbated because the chat_template can be overwritten by a value within chat_template_kwargs, bypassing simple checks that might only forbid the top-level chat_template parameter.

The analysis of the patch commit 7977e5027c2250a4abc1f474c5619c40b4e5682f reveals two key functions involved in the vulnerability:

  1. vllm.entrypoints.openai.serving_chat.ServingChat.create_chat_completion: This is the public-facing API endpoint that receives the user's request. Before the patch, it would pass the user-provided template parameters down to the processing engine without sufficient validation. The patch introduces a crucial security control: a trust_request_chat_template flag. If this flag is not enabled, the server will reject any request that attempts to provide a custom chat template, effectively blocking the attack at the entry point.

  2. vllm.entrypoints.chat_utils.apply_hf_chat_template: This function is responsible for applying the chat template to the conversation. The vulnerability was present here because it used **kwargs to pass arguments to the underlying tokenizer.apply_chat_template function. This allowed the chat_template from chat_template_kwargs to be passed through, enabling the exploit. The patch fixes this by introducing a new function, resolve_chat_template_kwargs, which sanitizes the keyword arguments, specifically filtering out chat_template and other unexpected variables before they are passed to the tokenizer. This ensures that even if a malicious template gets past the initial checks, it won't be used during the rendering process.

In summary, an exploit would involve sending a malicious Jinja2 template to the create_chat_completion endpoint. This template would then be processed by apply_hf_chat_template, causing the server to hang or crash. The identified functions would be present in any runtime profile or stack trace during such an attack.

Vulnerable functions

apply_hf_chat_template
vllm/entrypoints/chat_utils.py
This function was vulnerable because it accepted arbitrary keyword arguments (`kwargs`) and passed them directly to `tokenizer.apply_chat_template` using `**kwargs`. An attacker could supply a malicious Jinja2 template via the `chat_template` key within the `chat_template_kwargs` of a request. This would be included in `kwargs` and would overwrite the intended chat template, leading to excessive resource consumption and a denial-of-service. The patch mitigates this by introducing `resolve_chat_template_kwargs` to filter the arguments and prevent this overwrite.
ServingChat.create_chat_completion
vllm/entrypoints/openai/serving_chat.py
This function serves as the API endpoint for chat completions and was vulnerable because it accepted `chat_template` and `chat_template_kwargs` from user requests without proper validation. An attacker could send a request containing a malicious Jinja2 template in either of these parameters. This input would then be processed by the backend, leading to a denial-of-service. The patch introduces a `trust_request_chat_template` check to ensure that user-provided templates are only processed if explicitly allowed by the server configuration, thus preventing the exploit.

WAF Protection Rules

WAF Rule

### Summ*ry * r*sour**-*x**ustion (**ni*l-o*-s*rvi**) vuln*r**ility *xists in multipl* *n*points o* t** Op*n*I-*omp*ti*l* S*rv*r *u* to t** **ility to sp**i*y Jinj* t*mpl*t*s vi* t** `***t_t*mpl*t*` *n* `***t_t*mpl*t*_kw*r*s` p*r*m*t*rs. I* *n *tt**

Reasoning

T** vuln*r**ility is * r*sour**-*x**ustion (*oS) issu* in t** vLLM Op*n*I-*omp*ti*l* S*rv*r, i**nti*i** *s **S*-**vq-***w-****. T** root **us* is t** improp*r **n*lin* o* us*r-suppli** Jinj** t*mpl*t*s t*rou** t** `***t_t*mpl*t*` *n* `***t_t*mpl*t*_k