6.5

CVSS Score

3.0

-

CVSS Score

Basic Information

Concerned about an active attack path?

Talk to our security experts and see Miggo in action.

Miggo Vulnerability Database

→

CVE-2025-6211

CVE-2025-6211: LlamaIndex vulnerable to data loss through hash collisions in its DocugamiReader class

A vulnerability in the DocugamiReader class of the run-llama/llama_index repository, up to but excluding version 0.12.41, involves the use of MD5 hashing to generate IDs for document chunks. This approach leads to hash collisions when structurally distinct chunks contain identical text, resulting in one chunk overwriting another. This can cause loss of semantically or legally important document content, breakage of parent-child chunk hierarchies, and inaccurate or hallucinated responses in AI outputs. The issue is resolved in version 0.3.1.

(GitHub Advisory)

Miggo Vulnerability Database

→

CVE-2025-6211

CVE-2025-6211:

6.5

CVSS Score

3.0

-

CVSS Score

Basic Information

Is this CVE running in your environment?

Easily map the attack path and prioritize which CVEs are a threat to your organization

Validate Exposure

Technical Details

Package Name	Ecosystem	Vulnerable Versions	First Patched Version
llama-index	pip	< 0.12.41	0.12.41
llama-index-readers-docugami	pip	< 0.3.1	0.3.1

Technical Details

Vulnerability Intelligence
Miggo AI

Root Cause Analysis

The vulnerability exists in the DocugamiReader class within the llama-index library, specifically in how document chunk IDs are generated. The _build_framework_chunk function, nested within the load_data method, was using an MD5 hash of only the chunk's text content to create a unique identifier. This created a vulnerability where two structurally different chunks with identical text would produce the same hash, leading to a collision. When such a collision occurred, one chunk would overwrite the other in storage, resulting in data loss. This could have significant consequences, such as losing important semantic or legal information from documents, breaking the hierarchical relationship between parent and child chunks, and causing AI models relying on this data to produce inaccurate or "hallucinated" responses. The patch addresses this by incorporating the chunk's XPath in addition to its text into the material being hashed. This ensures that even if two chunks have identical text, their different structural locations (represented by their XPaths) will result in unique hashes, thus preventing collisions and the associated data loss. The vulnerable function is _build_framework_chunk because it contains the flawed ID generation logic.

Vulnerable functions

Only Mi**o us*rs **n s** t*is s**tion

Vulnerability Intelligence
Miggo AI

Unlock WAF rules for this CVE

Generate vendor-ready rules for the observed attack patterns, plus reasoning and safe deployment guidance

Get WAF rules

WAF Protection Rules

WAF Rule

W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.W** rul*s *v*il**l* *or Mi**o *ustom*rs only.

Reasoning

*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.*v*il**l* *or Mi**o *ustom*rs only.