The vulnerability is a classic XML External Entity (XXE) injection in the tika-parser-pdf-module. The root cause is the insecure configuration of XML parsers and transformers within Apache Tika. The investigation of the commits between the vulnerable version 3.2.1 and the patched version 3.2.2 revealed two key commits that address this issue.
The first commit, 94acef2854eed07f0ded357c13a659409495ca49, directly patches the getXMLInputFactory method in org.apache.tika.utils.XMLReaderUtils. It explicitly disables DTDs and the processing of external entities, which are the root cause of XXE vulnerabilities. This is the primary fix.
The second commit, bd9b05352ddccf0b89821c2d683e9b80b95fab35, is a hardening commit that replaces all local instantiations of SAXTransformerFactory with a centralized, secure factory method XMLReaderUtils.getSAXTransformerFactory(). This ensures that all XML transformations are performed using a securely configured transformer, preventing XXE across the application.
The vulnerable functions are those that were either directly patched to be secure or were modified to use the new secure factory methods. The core vulnerable function is org.apache.tika.utils.XMLReaderUtils.getXMLInputFactory, as its insecure configuration was the primary enabler of the XXE vulnerability. Other functions in the TikaCLI, TikaGUI, TikaGrpcServerImpl, and TikaResource classes were also vulnerable because they used insecurely configured XML transformers. The patches effectively remediate the vulnerability by enforcing secure XML processing defaults throughout the Tika library.