Summary
form-data uses Math.random() to select a boundary value for multipart form-encoded data. This can lead to a security issue if an attacker:
- can observe other values produced by Math.random in the target application, and
- can control one field of a request made using form-data
Because the values of Math.random() are pseudo-random and predictable (see: https://blog.securityevaluators.com/hacking-the-javascript-lottery-80cc437e3b7f), an attacker who can observe a few sequential values can determine the state of the PRNG and predict future values, includes those used to generate form-data's boundary value. The allows the attacker to craft a value that contains a boundary value, allowing them to inject additional parameters into the request.
This is largely the same vulnerability as was recently found in undici by parrot409 -- I'm not affiliated with that researcher but want to give credit where credit is due! My PoC is largely based on their work.
Details
The culprit is this line here: https://github.com/form-data/form-data/blob/426ba9ac440f95d1998dac9a5cd8d738043b048f/lib/form_data.js#L347
An attacker who is able to predict the output of Math.random() can predict this boundary value, and craft a payload that contains the boundary value, followed by another, fully attacker-controlled field. This is roughly equivalent to any sort of improper escaping vulnerability, with the caveat that the attacker must find a way to observe other Math.random() values generated by the application to solve for the state of the PRNG. However, Math.random() is used in all sorts of places that might be visible to an attacker (including by form-data itself, if the attacker can arrange for the vulnerable application to make a request to an attacker-controlled server using form-data, such as a user-controlled webhook -- the attacker could observe the boundary values from those requests to observe the Math.random() outputs). A common example would be a x-request-id header added by the server. These sorts of headers are often used for distributed tracing, to correlate errors across the frontend and backend. is a fine place to get these sorts of IDs (in fact, )