The vulnerability in docling-core is a combination of Server-Side Request Forgery (SSRF) and Path Traversal. The analysis of the security patch v2.74.1 reveals that the root cause lies in two functions within docling_core/utils/file.py.
The first vulnerable function, resolve_source_to_stream, was responsible for fetching content from a given URL. It failed to validate whether the URL pointed to a legitimate public resource or an internal, private one. This allowed an attacker to force the application to make requests to internal services (SSRF). The patch mitigates this by introducing _is_safe_url, a function that checks if the URL's resolved IP address is public, and applies this check to both the initial URL and any subsequent redirects.
The second vulnerable function, resolve_remote_filename, was responsible for determining the local filename for the downloaded content. It extracted the filename from the Content-Disposition header or the URL path without proper sanitization. This created a path traversal vulnerability, where an attacker could craft a malicious filename (e.g., ../../etc/passwd) to write files outside of the intended destination directory. The patch addresses this by using a new _sanitize_filename function to strip any directory information from the filename, ensuring only a safe basename is used.
Additionally, a related hardening fix was identified in docling_core/types/doc/document.py within the ImageRef.pil_image method. This change restricts the use of file:// URIs for loading images by default, further reducing the attack surface for local file inclusion vulnerabilities.