Blog post

Unzipping Dangers: OpenRefine Zip Slip Vulnerability

Stefan Schiller photo

Stefan Schiller

Vulnerability Researcher

7 min read

  • Security

Key Information

  • SonarQube Cloud discovered a critical Zip Slip vulnerability in OpenRefine.
  • If a user running a vulnerable version is tricked into importing a malicious project, an attacker could execute arbitrary code on the user’s machine.
  • SonarQube Cloud not only discovered the vulnerability but also provides valuable guidance on how to mitigate this kind of vulnerability and prevent common pitfalls.
  • The vulnerability was fixed with version 3.7.4.

OpenRefine Zip Slip Vulnerability: Introduction

OpenRefine is a Java-based open-source data cleaning and transformation tool. This includes loading different types of data, cleaning it, converting it, and extending it. All of this can be done from the browser by accessing OpenRefine’s web interface. With almost 10k stars and ~1.8k forks, it is one of the more popular GitHub projects.


In our continuous effort to help secure open-source projects and improve our Clean Code solution, we regularly scan open-source projects via SonarQube Cloud and evaluate the findings. In fact, everybody can also do it – SonarQube Cloud is a free code analysis product for open-source projects, regardless of their size or language.


One of the findings reported by SonarQube Cloud was a Zip Slip vulnerability in OpenRefine that made us curious. A Zip Slip vulnerability is caused by inadequate path validation when extracting archives, which may allow attackers to overwrite existing files or extract files to unintended locations.


In this article, we outline the impact of this vulnerability and explain how this and other code vulnerabilities can be detected with SonarQube Cloud. Furthermore, we explain how attackers could exploit the vulnerability and describe a typical pitfall developers may fall into when trying to fix it.

OpenRefine Zip Slip Vulnerability: Impact

OpenRefine version 3.7.3 and below is prone to a Zip Slip vulnerability in the project import feature (CVE-2023-37476). Although OpenRefine is designed to only run locally on a user's machine, an attacker can trick a user into importing a malicious project file. Once this file is imported, the attacker can execute arbitrary code on the user’s machine:

Demonstration of OpenRefine vulnerability on a test instance

The vulnerability was fixed with OpenRefine version 3.7.4.

OpenRefine Zip Slip Vulnerability: Technical Details

In this section, we dive into the technical details of the vulnerability.

Vulnerability Discovery

SonarQube Cloud is our cloud-based code analysis service. It uses state-of-the-art techniques in static code analysis to find quality issues, bugs, and security vulnerabilities in your code. With the recently added deeper SAST technology it is even possible to uncover hidden security vulnerabilities introduced by the usage of third-party dependencies.

During our regular scan of public open-source projects, the engine reported the following issue in OpenRefine (see it yourself on SonarQube Cloud):

As clearly visible by the highlighted code flow, the untar method iterates over all files within an archive and uses the tarEntry.getName() method to create a new File object, which is then passed to FileOutputStream to extract this file. This introduces a Zip Slip vulnerability allowing an attacker to write files outside the intended folder (destDir) by creating an archive with a file, e.g., named ../../../../tmp/pwned.


The vulnerable untar method is called from the FileProjectManager.importProject method, which handles the import of existing Refine project files:

OpenRefine/main/src/com/google/refine/io/FileProjectManager.java

public class FileProjectManager extends ProjectManager {
  // ...
  public void importProject(...) {
    // ..
    untar(destDir, inputStream);

Projects can either be imported by directly uploading an archive or by providing the URL of an archive. This is what the feature looks like on the web interface:

The corresponding endpoint is called /command/core/import-project. Although this and all other endpoints of OpenRefine do not require authentication, OpenRefine is supposed to run locally on a user’s machine. Additionally, the employed CSRF protection prevents malicious JavaScript code executed in the context of another website from performing unauthorized actions. In order to exploit the vulnerability, an attacker could still trick a user into importing a malicious project.

Exploitation via Auto-Reload

The vulnerability gives attackers a strong primitive: writing files with arbitrary content to an arbitrary location on the filesystem. For applications running with root privileges, there are dozens of possibilities to turn this into arbitrary code execution on the operating system: adding a new user to the passwd file, adding an SSH key, creating a cron job, and more. For applications running with the permissions of a low-privilege user, the opportunities are more limited but still occur – earlier this year, we documented a unique way to achieve code execution by writing a site-specific configuration hook, which is limited to Python applications.


Besides these generic techniques, there might be features of the application itself, which could be leveraged by attackers. In the case of OpenRefine, the application implements an auto-reload feature, which regularly scans the WEB-INF folder for changes and restarts the WebAppContext when a file is changed:

OpenRefine/server/src/com/google/refine/Refine.java

class RefineServer extends Server {
  static private void scanForUpdates(...) {
    // ...
    scanList.add(new File(contextRoot, "WEB-INF/web.xml"));
    findFiles(".class", new File(contextRoot, "WEB-INF/classes"), scanList);
    findFiles(".jar", new File(contextRoot, "WEB-INF/lib"), scanList);
    // ...
    scanner.addListener(new Scanner.BulkListener() {
      public void filesChanged() {
        try {
          context.stop();
          context.start();

All classes within the WEB-INF/classes folder are reloaded during the restart of the WebAppContext. This means that attackers could overwrite an existing .class file within this folder, which triggers the reload and subsequently executes the attacker's .class file, resulting in the ability to execute arbitrary code.

Mitigation, Pitfall, and Patch

In order to mitigate this vulnerability, it needs to be ensured that all files are extracted under the intended base folder. One way you might think of doing this is by using the getCanonicalPath method to retrieve the absolute and unique path as a String and then leverage the startsWith method to verify that the destination path is part of the intended base folder:


Caution: This does not fully fix the vulnerability! Can you spot the problem here?

        while ((tarEntry = tin.getNextTarEntry()) != null) {
            File destEntry = new File(destDir, tarEntry.getName());
+            if (!destEntry.getCanonicalPath().startsWith(destDir.getCanonicalPath())) {
+                throw new IllegalArgumentException("Zip archives with files escaping their root directory are not allowed.");
+            }

The getCanonicalPath method removes terminating path separators, which makes this still vulnerable to a partial path traversal!


Assuming the base folder (destDir) is defined as the home directory of the user john ("/home/john/"), the trailing slash is removed, resulting in "/home/john". This means that attackers could still partially path traversal to another user’s home directory beginning with the same characters, e.g., "/home/johnny/" since this passes the check:

"/home/johnny/.ssh/id_rsa".startsWith("/home/john") == true

A real-life example of such a partial path traversal vulnerability can be found here, which is covered in more detail in the related Black Hat talk by Jonathan Leitschuh.


We continuously keep track of freshly unveiled pitfalls like this and add them to our engine. To correctly fix a vulnerability, you can click on the "How can I fix it?" tab directly attached to the corresponding issue on SonarQube Cloud:

In order to prevent this partial path traversal, there are two different approaches:

  • Reinsert the path separator for the base folder after calling getCanonicalPath
  • Retrieve the Path object related to the File and use its startsWith method. This does not literally compare the path’s string but determines this on a path’s elements basis.


For OpenRefine, the maintainers avoided falling into this trap. They correctly fixed the vulnerability by leveraging the toPath method:

        while ((tarEntry = tin.getNextTarEntry()) != null) {
            File destEntry = new File(destDir, tarEntry.getName());
+            if (!destEntry.toPath().normalize().startsWith(destDir.toPath().normalize())) {
+                throw new IllegalArgumentException("Zip archives with files escaping their root directory are not allowed.");
+            }

This effectively prevents files from being written outside the intended destDir folder.


Timeline

DateAction
2023-07-07We report the issue to the maintainers
2023-07-08Maintainers confirm the issue and start working on a patch
2023-07-17OpenRefine Version 3.7.4 is released, which fixes the issue
2023-07-17CVE-2023-37476 is assigned

OpenRefine Zip Slip Vulnerability: Summary

In this article, we deep-dived into a critical Zip Slip vulnerability in OpenRefine. We also outlined how attackers can leverage an application’s features to turn a file write into arbitrary code execution. Furthermore, we highlighted common pitfalls developers may face when trying to fix this path traversal vulnerability.


With the help of SonarQube Cloud, this vulnerability was not only detected in a matter of seconds, it could also be fixed properly by relying on the comprehensive information SonarQube Cloud provides for each raised issue. This applies to security issues, but also code quality problems, which helps developers to write Clean Code, increasing security, maintainability, and reliability.


Finally, we would like to thank the OpenRefine maintainers for quickly responding to our notification, providing a comprehensive patch, and transparently informing all users.


Related Blog Posts

Get new blogs delivered directly to your inbox!

Stay up-to-date with the latest Sonar content. Subscribe now to receive the latest blog articles. 

By submitting this form, you agree to the Privacy Policy and Cookie Policy.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

  • Legal documentation
  • Trust center
  • Follow SonarSource on Twitter
  • Follow SonarSource on Linkedin

© 2008-2025 SonarSource SA. All rights reserved. SONAR, SONARSOURCE, SONARQUBE, and CLEAN AS YOU CODE are trademarks of SonarSource SA.