The vulnerability is a mutation XSS in the justhtml library that occurs during the serialization of DOM trees. The issue arises when a custom sanitization policy permits raw-text elements like <style> or <script>. The core problem, as detailed in the advisory and confirmed by code analysis, is that the text content within these elements was serialized literally, without escaping potentially dangerous character sequences.
An attacker could exploit this by crafting input text that includes a closing tag sequence (e.g., </style>). When this text is processed and serialized, the closing tag would prematurely terminate the raw-text element, allowing the subsequent attacker-controlled content to be interpreted as arbitrary HTML by the browser, leading to XSS.
The security patch addresses this flaw by introducing sanitization logic that specifically targets the content of these raw-text elements before serialization. The key changes are in commit bd2ddd9ef92991d8b1d7a871f1c9d27e72cabd5b, which adds the _sanitize_rawtext_element_contents function. This function neutralizes closing tag sequences (e.g., converting </style> to </style>) and removes any non-text child nodes from within <style> and <script> elements.
The investigation of the patch identified the primary vulnerable functions as the public API entry points that failed to perform this sanitization prior to the fix:
sanitize_dom: This function in src/justhtml/sanitize.py is a primary method for sanitizing DOM fragments. The patch retrofits it with a call to the new _sanitize_rawtext_element_contents function.
JustHTML.__init__: The constructor of the main JustHTML class in src/justhtml/parser.py was also found to be vulnerable when initialized directly with a crafted DOM node. The patch ensures that such nodes are sanitized before further processing.
_serialize_text_for_parent: While not modified in the patch, this function from src/justhtml/serialize.py was explicitly named in the advisory as the root cause. It performs the unsafe, literal serialization, and the vulnerability exists because unsanitized data was allowed to reach it.
A secondary, but related, vulnerability was fixed in commit 23c188284afe261eadd5705fc5408420634ec00f. The _markdown_escape_text function was not escaping HTML-significant characters, creating a similar XSS risk in the to_markdown() output. This function has also been included in the analysis.