The vulnerability CVE-2026-22690 describes a denial-of-service (DoS) condition in pypdf caused by long runtimes when processing a malicious PDF. The root cause is the lack of proper limits when handling partially broken or non-compliant PDF files in the non-strict parsing mode.
Based on the analysis of the patch 294165726b646bb7799be1cc787f593f2fdbcf45, three key functions were identified as being vulnerable to DoS attacks:
-
PdfReader.root_object: This is the primary function related to the CVE description. If a PDF is missing the /Root object in its trailer, this function attempts to find it by iterating through a number of objects specified by the /Size key. A malicious actor can set an extremely large /Size value, causing this function to loop for an extended period, consuming significant CPU resources. The patch mitigates this by introducing root_object_recovery_limit.
-
PdfReader._rebuild_xref_table: This function is called when the PDF's cross-reference table is broken. The original implementation used a regular expression to scan the file for object definitions. This regex was inefficient on files containing large amounts of whitespace, leading to very long processing times. The patch replaces this with a faster, non-regex-based scanning method (_find_pdf_objects).
-
_flatten: This function is responsible for creating a flat list of all pages in the document. A malicious PDF could contain a cyclic page reference (e.g., a page being its own child). The original code did not detect this, leading to infinite recursion and a stack overflow. The patch adds a check to detect and raise an error for such cyclic references.
All three functions are susceptible to causing a denial of service by consuming excessive system resources (CPU or memory) when parsing a crafted PDF file. During exploitation, these functions would likely appear in a runtime profile or stack trace as they enter long-running loops or deep recursion.