Introduction
Understanding Secrets Exposure
In today’s digital landscape, the exposure of secrets poses significant risks to organizations, potentially revealing internal data or customer information. Malicious actors can exploit these exposures to alter data and cause substantial harm. The phenomenon of secrets sprawl, where sensitive information is scattered across multiple repositories and environments, worsens this risk. Therefore, robust secrets detection is crucial for safeguarding an organization’s data integrity and security.
Application security (AppSec) teams bear immense responsibility in protecting their organizations. They manage numerous alerts and issues from various tools and systems daily. Consequently, security tools must enhance the team’s focus and efficiency, enabling them to concentrate on real issues. By optimizing resource allocation, security teams can boost productivity and better manage critical security tasks, thereby maintaining a strong security posture.
Moreover, security teams need a comprehensive understanding of all environmental risks to take appropriate actions. Effective secret detection tools must not only identify risks accurately but also present them in a way that helps teams prioritize and respond effectively.
Challenges in Secrets Detection
The biggest risk in secret detection is missing critical secrets. When security tools fail to detect these, organizations remain unaware and vulnerable to breaches. This lack of awareness prevents security teams from mitigating these risks, leading to severe incidents. Comprehensive detection of all secrets is essential for a secure environment.
Alert fatigue, on the other hand, happens when security teams are overwhelmed by numerous alerts, many being false positives, leading to:
Desensitization: Teams may ignore alerts, risking missed real threats.
Resource Drain: Investigating false positives wastes time and resources, diverting attention from real issues.
Reduced Trust: Constant false positives erode trust in security tools, causing frustration and inefficiency.
A secret detection engine is required to provide accurate and comprehensive detection: “accurate” means that the detections provided to the security team are correct and the team is not overwhelmed by incorrect detections, while ” comprehensive” means that all the secrets that exist are found. Incorrect detections are considered “False Positives ( FP)” while missed secrets are considered “False Negatives (FN)”.
Balancing the reduction of false positives and false negatives in hardcoded secrets detection is inherently complex. Excessive false positives lead to alert fatigue and wasted response efforts, while missing critical secrets leaves organizations vulnerable to data breaches and security threats.
Traditional methods, such as rule-based systems and regex patterns, often struggle to find the right balance. While regex patterns can be effective, they frequently generate numerous false positives, especially when dealing with generic secrets. This makes it challenging to detect all secrets accurately without overwhelming security teams with false alerts.
Generic secrets detection is particularly complex. Rule-based approaches often fall short due to the vast variability and lack of structure in generic secrets. These systems can miss critical secrets or flag benign strings as sensitive data, creating noise and inefficiencies. Therefore, there’s a need for using more advanced methods to address the generic secret detection problem.
Cycode Generic Secret Detection Engine
To address the generic secret detection problem, Cycode has designed and trained a machine learning (ML) model to detect unstructured secrets and sensitive information in an accurate and scalable way.
In addition to detecting the presence of a secret or sensitive information in a specific body of text, the model can pinpoint the exact location of this information.
Our advanced ML model analyzes each code or text snippet, determines whether it contains a real secret, and finds the value and location of any secrets within the code snippet, providing a confidence score for each secret detected. The score reflects the engine’s certainty about the detection based on various factors.
Building our ML Secret Detector
To develop our dataset, we scanned a large number of public repositories to gather code snippets that may contain sensitive information. We took a hybrid approach for annotation, combining manual efforts with machine learning support from large language models. This ensures comprehensive and accurate annotations across a vast dataset.
We used CodeBERT as a base model, which is specifically pre-trained for understanding programming languages. We added both sequence classification and token classification heads on top and fine-tuned this model on our annotated dataset. The refined model assesses each text snippet to determine if it contains any secrets, pinpointing their locations. It outputs a list of detected secrets, their specific positions within the code, and the model’s confidence in the accuracy of each detection.
Results
With the described approach, we achieve a 80% reduction in existing false positives (FP) while also turning 70% of our false negatives (FN) into true positives (TP), setting a new industry standard for secret detection solutions.
Enabling this feature to Beta customers showed an increase in the total amount of unstructured secret detections while reducing the amount of incorrect detections to a minimum. We observed that trust in the unstructured secrets sections of our platform increased, and as a result, the adoption and usage of these sections increased significantly.
Customers’ Data Privacy
Our customers are important to us, and we prioritize their data privacy. No personal information is collected, and customers’ data is not used to train or improve AI models. Customer data never leaves our VPC to external providers, ensuring complete confidentiality and security.
Conclusion
With these advancements, CycodeAI continues to lead in providing cutting-edge security solutions, empowering organizations to maintain robust security and efficiently manage their critical data. By leveraging our AI-powered secrets detection, you can trust in a system designed to minimize false positives, optimize resource use, and ensure comprehensive security coverage.
Cycode – Securing Your Code, Enhancing Your Focus
*In collaboration with Maor Davidzon, Omer Calfon, Ilan Lidovski, Roni Gurvich, Yehonathan Moshkovitz, Nofar Hachmon