The vulnerability is a Server-Side Request Forgery (SSRF) within the vLLM library, specifically in the MediaConnector class responsible for handling multimodal inputs. The root cause is a discrepancy in how URLs are parsed between the validation logic and the request execution logic. The code used urllib.parse.urlparse to validate the domain of a user-provided URL, but the requests library (which uses urllib3.util.parse_url internally) was used to make the actual request. These two parsers handle certain characters, like backslashes, differently. This inconsistency allows an attacker to craft a URL that passes the urlparse-based validation but is interpreted by requests/urllib3 as a request to an internal or restricted resource.
The provided patch f46d576c54fb8aeec5fc70560e850bed38ef17d7 rectifies this by standardizing on urllib3.util.parse_url across the codebase, ensuring that the URL is interpreted consistently during both validation and execution. The analysis of this patch clearly points to MediaConnector.load_from_url and its asynchronous counterpart MediaConnector.load_from_url_async as the vulnerable entry points, as they are the functions that accept and process the malicious URL. The _assert_url_in_allowed_media_domains function is also included as it contains the flawed security check that is bypassed.