| Package Name | Ecosystem | Vulnerable Versions | First Patched Version |
|---|---|---|---|
| vllm | pip | >= 0.5.5, < 0.11.1 | 0.11.1 |
The vulnerability is a Denial of Service (DoS) in the vLLM project, exploitable via the /v1/chat/completions and /tokenize endpoints. The root cause is the improper handling of the chat_template_kwargs request parameter.
The vulnerability originates in the OpenAIServing._preprocess_chat method within vllm/entrypoints/openai/serving_engine.py. This method accepts chat_template_kwargs from the user and unpacks them directly into a call to apply_hf_chat_template without any validation. This allows an attacker to inject arbitrary keyword arguments.
The core of the DoS vulnerability lies in the apply_hf_chat_template function in vllm/entrypoints/chat_utils.py. Before the patch, this function accepted a tokenize boolean parameter. By sending {"tokenize": true} in the chat_template_kwargs, an attacker could force this function to perform a synchronous, blocking tokenization operation on the input. With a sufficiently large input, this operation would block the server's event loop for an extended period, preventing it from handling any other requests and thus causing a DoS.
The fix involves two main changes in vllm/entrypoints/chat_utils.py:
apply_hf_chat_template function was modified to remove the tokenize parameter and to hardcode tokenize=False in its internal call to the tokenizer. This directly remediates the vulnerability.resolve_chat_template_kwargs function was updated to explicitly disallow tokenize and chat_template keys in the chat_template_kwargs, providing an additional layer of defense.OpenAIServing._preprocess_chatvllm/entrypoints/openai/serving_engine.py
apply_hf_chat_templatevllm/entrypoints/chat_utils.py
resolve_chat_template_kwargsvllm/entrypoints/chat_utils.py
Ongoing coverage of React2Shell