The vulnerability is an uncontrolled resource consumption (CWE-400) issue in the pypdf library, specifically a decompression bomb. The root cause lies in the pypdf.filters.decompress function, which, prior to the patch, used Python's zlib.decompress without any limit on the size of the decompressed data. An attacker could craft a PDF containing a highly compressed data stream (e.g., using the FlateDecode filter) that would expand to an extremely large size upon decompression, exhausting system memory and causing a denial of service.
The patch addresses this by introducing a new function, _decompress_with_limit, which wraps the zlib decompression logic and enforces a maximum output length (ZLIB_MAX_OUTPUT_LENGTH). The decompress function was modified to use this new limited decompression function.
The primary vulnerable function is pypdf.filters.decompress. The pypdf.filters.FlateDecode.decode function is also considered vulnerable as it is the direct caller of decompress for the vulnerable filter type. A key trigger for this vulnerability is the pypdf._reader.PdfReader._read_pdf15_xref_stream function, which is called when a PDF file is opened and is responsible for parsing potentially compressed cross-reference streams. An exploit would involve a specially crafted PDF that, when opened by an application using the vulnerable pypdf library, would cause one of these functions to be called on the malicious data, leading to the DoS condition.