July 14, 2020 | 9 min read
Maor Davidzon

Once upon a time, environments were segregated so compromising one developer’s machine would not impact the entire build or production environment.  Today, many organizations have moved to a seamlessly integrated cloud ecosystem where developers hold the keys to the entire kingdom.  While cloud services have grown in popularity, the number of secrets (passwords, keys, tokens, etc.) used to connect to these services has increased exponentially.

A leaked secret can severely harm a company’s build or production ecosystems now more than ever.

Attacking the Source

Several examples over the last four years demonstrate how easily and frequently source code and the secrets within may be compromised:

  • In 2016, attackers gained access to Uber’s private Github repositories.  With unrestricted access, they were then able to extract hard-coded secrets from the victim’s source code in order to retrieve millions of records from Amazon S3 buckets.
  • In 2019, Scotiabank exposed hard-coded access keys and login credentials by storing them within a public GitHub repository.  Once secrets are accidentally published for the world to see, hackers do not even need to devize an attack; everything they need is free for the taking.
  • In 2020, an Amazon AWS engineer published system credentials, including AWS key pairs and private keys, to a public GitHub repository by mistake.

Understanding Detection Methods

Before we can evaluate the various secret-protecting tools available, we must first understand the most common detection methods employed by them.

There are three popular search methods for finding secrets: entropy checksregular expressions, and Machine Learning.

Entropy checks help detect unstructured secrets by measuring the entropy level of a single string.  Strings with a high entropy score are flagged as suspected secrets.  The downside to this approach is that it may yield a high volume of false positives.

Regular expressions (regex) are more commonly used to perform lexical searches against a variable name or API key pattern.  You may even implement your own custom regular expressions or keywords.

Machine Learning – Tools that feature machine learning are able to learn from experience, thus improving the odds of accurately detecting secrets while minimizing false positives.

Choosing the Right Source Code Secret Protection Tool

There are many tools that can detect secrets in source code, though most lack one or more important capabilities.  The following factors should be considered when evaluating any secret scanning tool:

Commit History – Simply deleting secrets from the latest code version/revision in the repository is not sufficient.  Hackers can easily review your commit history in order to steal secrets from previous commits.

False Negatives – Many tools rely only on the regex method and may miss many of the secrets in your source code. Given that thousands of API providers exist, it is impossible to write a regex for each one. Make sure that the tool you choose uses more techniques than just Regular Expressions.

False Positives – Many secret-detection tools register a large volume of false positives.  In fact, the sheer volume of false positives may be so great that it’s nearly impossible to filter them out, especially if the tool relies on a high-entropy method.  It is absolutely imperative that the tool you choose knows how to ignore false positives.

Monitoring – Code repositories (such as GitHub) can be regularly scanned and monitored for secrets.  For example, you can rely on webhooks to trigger alerts anytime a push event occurs.  This method will not prevent secrets from being committed into the repository, but it will generate immediate alerts when it happens, enabling you to act swiftly before the secret can be exposed.

This feature has some advantages: Alerts can be integrated into a number of tools, such as Slack, Microsoft Teams, or email.  Integration with workflow tools like SIEM or JIRA, allows for a relatively simple remediation workflow.  Given that you can configure the alert mechanism to ignore specific secrets, keywords, files, or other items, you can decrease the rate at which false positives occur.

When configuring a monitoring solution, be aware of all the locations your code — and potential secrets within — can appear.  For example, developers may include in their personal toolbox public platforms such as online repositories, Gists, and PasteBin, to name just a few.

CI/CD Integration – Your CI/CD pipeline is a critical part of ensuring secure code and thus should never be overlooked when implementing any security solution.  A tool should be able to immediately notify developers of any possible secrets and integrate any solution into their workflow.  For example, a pull request can be blocked automatically upon discovery or, alternatively, a developer can be notified as soon as a secret is detected anywhere along the pipeline.  Full workflow integration further eases the remediation process.

User experience – Any scan for secrets can very likely result in a large volume of results to trudge through.  A dedicated graphical user interface (GUI) is key in lightening the burden of that effort.  A good GUI will enable bulk operations and provide simplified views for reports.

Cycode – Keeping Source Code Secrets Safe

The Cycode platform offers an end to end cyber security in a single module that easily integrates with your existing source code management system, and provides a full-repository secret scan as a service.  Cycode will automatically scan your repository history and generate a report without any effort from the user.