Source code is the core of any software company’s intellectual property. Accessing it is like learning the formula to Coca Cola, Kentucky Fried Chicken’s 51 herbs and spices, or the architectural drawings for Lockheed’s F22 Raptor. Source code leaks can be catastrophic.
Proprietary source code exposure is a big deal for any software company. It can lead to loss of trust, loss of revenue, and huge fines should sensitive data be exposed. Source code leaks even present safety issues, like when leaked code gives access to a car’s onboard systems, allowing attackers to remotely control steering, braking, and other critical functions.
This blog provides an excerpt of our new whitepaper, The Top 25+ Source Code Leaks, 2020-2024. In this blog, we have included several notable breaches from the past 5 years. To learn more, download the full report.
What Is a Source Code Leak?
A source code leak is any event that publicly exposes application or operating system code outside of the company that owns it. Source code leaks are dangerous because they can expose vulnerabilities in the design or coding of an application, give away trade secrets, and potentially wreak havoc on software and systems.Â
Source code often contains hardcoded secrets. This allows attackers to login to critical systems using authentic credentials, which makes it harder for security teams to detect malicious behavior.Â
Unfortunately for security teams, source code leaks are becoming increasingly common.
From January to June 2023, GitHub received 1,086 takedown notices to remove proprietary code from their platform. This resulted in 14,159 unique projects being taken down from GitHub.
How Is Source Code Leaked?
Broadly speaking, there are three ways in which source code can be leaked:
- Attacks: To gain access to source code, hackers may exploit existing software vulnerabilities, leverage compromised credentials, carry out social engineering attacks, or use malware to compromise infrastructure.
- Insider threats: Whether it be for personal gain, revenge, or to benefit a competitor, insiders with access to source code may intentionally leak it.Â
- Human error: Tool misconfigurations, Iost or stolen devices, and email mistakes can all lead to unauthorized access to source code and subsequent leaks. Even accidentally sharing sensitive information via ChatGPT can lead to a source code leak.
The Impact of a Source Code Leak
According to the Commission on the Theft of American Intellectual Property, the total value of stolen trade secrets from the US alone is anywhere from $180 billion to $540 billion per year.Â
The impact of a source code leak varies depending on the nature of the code, the context in which it’s used, and the organization or project involved. Nonetheless, organizations experience a number of significant consequences:
- Loss of intellectual property
- Reputational and brand damage
- Revenue loss
- Regulatory compliance fines
- Exposed secrets
- Compromised customers
- Zero-day exploits
The bottom line: It’s essential that organizations keep their code secure.
Top Source Code Leaks
The following are several notable source code leaks from 2020-2024.
Mercedes-Benz- January 2024
A Mercedes-Benz employee’s authentication token was found in a public GitHub repository during a internet scan by a third party. The GitHub token gave unrestricted access to the company’s source code hosted on an internal GitHub Enterprise Server. It had been exposed since late September 2023. When the exposure was disclosed in January 2024, Mercedes-Benz revoked the API token in question and immediately removed the public repository. A Mercedes-Benz spokesperson said the cause of the leak was human error.
Samsung – May 2023
Eager to experiment, Samsung engineers in the semiconductor division used ChatGPT. Samsung shared source code for equipment testing programs and chip testing sequences, along with internal meeting notes containing valuable trade secrets. As a result, OpenAI, ChatGPT’s developer, has access to the sensitive data, which raises concerns about intellectual property theft, unfair advantage, and best practice for using AI tools, especially when sensitive information is involved.
Twitter – March 2023
Between January and March 2023, a GitHub user named “FreeSpeechEnthusiast” uploaded snippets of Twitter’s internal code, revealing parts of Twitter’s recommendation algorithms, moderation tools, and internal APIs. This information could potentially allow attackers to understand how Twitter operates and exploit vulnerabilities for spam, manipulation, or even identity theft.
While Twitter quickly removed the leaked code from GitHub, the leak could complicate Twitter’s future development plans and leave it more vulnerable to future hacking attempts. Worse still, it’s unclear how the code was obtained in the first place, although employee negligence or a phishing attack are suspected possibilities.
Uber – September 2022
LAPSUS$ gained access to Uber’s internal systems using stolen employee credentials that were purchased from the dark web. After successfully connecting to Uber’s intranet, the hacker gained full admin access to the company’s VPN and other sensitive services like DA, DUO, Onelogin, AWS, and GSuite.
The attackers got their hands on a vast amount of information, including the source code for Uber’s backend systems, critical algorithms like driver-rider matching, and internal communications. They also allegedly accessed Uber’s bug bounty reports, which usually contain details of security vulnerabilities yet to be remediated.
While user data remained secure, the leak exposed Uber’s technical infrastructure and potentially vulnerable areas, causing significant reputational damage and security concerns.Â
State of New York, Office of IT Services – June 2020
A misconfigured Git repository operated by the New York State’s Office of Information Technology Services exposed all projects inside it to the internet. To make matters worse, several of the projects exposed in this leak included secrets and passwords for the servers and databases in use by the associated systems. The misconfiguration allowed anyone from the internet to create a user account and login unimpeded with admin credentials.
How to Stop Source Code Leaks
From software supply chain attacks to human error, the source code leaks highlighted above serve as a stark reminder of the need for robust defense measures across the entire software development lifecycle. To keep your most sensitive information safe, you need to build a strong security culture inside and out, secure the code itself, and respond quickly and effectively in the event of a breach.
Implement the following tips to help prevent source code leaks:
- Secure your software supply chain
- Use secure coding practices
- Adopt employee training and incentivization programs
- Implement access controls to prevent unauthorized access
- Use multi-factor authentication (MFA)
- Continuously monitor and patching code in real time.
- Develope an incident response plan before your code is leaked
For more details on how to prevent source code leaks, download the full report, Top 25+ Source Code Leaks, 2020-2024.
How Cycode Can Help
When code is leaked, organizations suffer. Their code can be examined by malicious actors for potential defects or ways to exploit it or by competitors interested in learning trade secrets. Exposed secrets often give attackers an easy way into high value targets, where they can sit undetected for months, doing significant damage. The organization suffers loss of trust and damage to their brand reputation. They may also be slapped with a hefty fine should PII or other sensitive data be exposed. No one wants to be the latest headline.Â
The good news is that Cycode’s multifaceted approach to pipeline security, secrets security, leak prevention, and application security can help you stop a leak before it occurs.Â
Cycode is the leading Application Security Posture Management (ASPM) platform providing peace of mind to its customers. Our complete ASPM scales and standardizes developer security without slowing down the business. The ASPM platform provides unmatched visibility, risk-based prioritization, and remediation at the speed of DevOps across the entire SDLC. With Cycode, enterprises can protect their cloud-native applications ensuring the governance, compliance, and software supply chain integrity of every software release.Â
Want peace of mind that your source code is safe? Book a demo now.