The vulnerability (CVE-2023-39410) concerns improper input validation in Apache Avro's Java SDK, leading to potential OutOfMemoryErrors when deserializing untrusted data. The provided commit a12a7e44ddbe060c3dc731863cad5c15f9267828 addresses this by introducing a centralized SystemLimitException class with methods (checkMaxBytesLength, checkMaxStringLength, checkMaxCollectionLength) to enforce configurable and VM-aware limits on allocations.
The analysis focused on identifying functions where these new checks were added, or where previous, less robust checks were replaced. These locations indicate points where the deserialization process could be manipulated to request excessively large memory allocations for strings, byte arrays, fixed-size data, arrays, or maps.
Functions in BinaryDecoder (and its subclass DirectBinaryDecoder) are directly responsible for reading data types and their lengths from the input stream. Modifications here, such as in readString, readBytes, readArrayStart, arrayNext, readMapStart, and mapNext, show that these methods previously lacked adequate protection against overly large declared sizes.
The Schema.FixedSchema constructor was modified to use checkMaxBytesLength, indicating it could previously be instantiated with a size that was too large.
The Utf8 class methods (constructors, setByteLength, set) were also updated to use checkMaxStringLength, replacing an older, less comprehensive internal check. This implies that Utf8 objects could be created or manipulated to hold excessively large string data.
All identified functions are directly involved in processing size/length information from potentially untrusted input during deserialization or in constructing data structures based on this information. The patch's introduction of stricter, centralized checks in these functions confirms their role in the vulnerability.