The vulnerability advisory describes two primary XXE attack vectors: one through the Simple Archive Format (SAF) import feature and another through parsing XML from external data sources. The analysis of the provided patches confirms this and reveals the widespread nature of insecure XML parser instantiation across the DSpace codebase.
The core of the fix is the introduction of a centralized org.dspace.app.util.XMLUtils class. This class provides factory methods (getDocumentBuilder, getSAXBuilder, etc.) that return XML parsers configured with secure defaults, specifically disabling external entity resolution features that lead to XXE vulnerabilities (e.g., disallow-doctype-decl, external-general-entities).
The investigation of the commits shows numerous files where default, insecure DocumentBuilderFactory, DocumentBuilder, and SAXBuilder instances were being created. These insecure instances were then replaced with calls to the new secure methods in XMLUtils.
The identified vulnerable functions are key points where the application processes XML that could be controlled by an attacker, either directly (by an administrator uploading a malicious archive) or indirectly (if a trusted external service was compromised).
org.dspace.app.itemimport.ItemImportServiceImpl.loadXML is the primary function for handling metadata XML in SAF archives, making it a direct target for the first attack vector.
org.dspace.importer.external.* services, such as those for ArXiv and CrossRef, and the CCLicenseConnectorServiceImpl, directly correspond to the second attack vector, where XML is fetched from an external API and parsed insecurely.
org.dspace.content.packager.METSManifest.create represents another archive-based vector, as METS is a common packaging format for digital objects.
By replacing the insecure parser initializations in these functions, the patch effectively mitigates the XXE risk.