The vulnerability lies in the CPython zipfile module's handling of "quoted-overlap" zip bombs, where entries in a ZIP archive are crafted to overlap. The provided patches (e.g., commit 66363b9a7b9fe7c99eba3a185b74c5fdbf842eba and its cherry-picks) show modifications primarily in two methods of the ZipFile class:
_RealGetContents: This method parses the ZIP file's central directory. The patch modifies it to calculate an _end_offset for each file entry. This offset indicates the boundary of the current entry's data. Before this change, this crucial piece of information for detecting overlaps was not computed.
open: This method is used to open a specific file within the archive. The patch introduces a check using the newly computed _end_offset and the entry's compress_size to determine if reading this entry would cause an overlap with subsequent data. If an overlap is detected, a BadZipFile exception is raised.
The function zipfile.ZipFile.open is identified as directly vulnerable because, prior to the patch, it would proceed to allow access to an overlapping entry without any checks, leading to the zip bomb condition when the entry's data was subsequently read.
The function zipfile.ZipFile._RealGetContents is identified because it processes the malicious ZIP structure (the central directory defining the overlaps) and, in its vulnerable state, failed to prepare the necessary data (_end_offset) that open would need to perform the overlap check. Thus, it was an essential part of the vulnerable processing chain.
The file paths Lib/zipfile/__init__.py (for newer versions/main branch) or Lib/zipfile.py (for older versions) are where these methods reside. The analysis uses Lib/zipfile/__init__.py based on the primary development commit.