AI Discovery with Cycode AI: Uncovering AI Usage & Risk Across Your Organization

user profile
Security Researcher

AI is revolutionizing how organizations perform tasks, providing unprecedented speed and efficiency. However, it is common for employees to adopt AI tools internally without proper oversight or coordination with IT or data governance teams. This unsafe AI usage exposes the company to significant risks, such as potential data breaches, legal issues, and security vulnerabilities.

Adequate visibility is the key to managing the risks associated with ungoverned AI usage. Recognizing this necessity, the Cycode Labs team conducted extensive research to help you identify all the AI usage across your organization’s Software Development Life Cycle (SDLC). This blog post will share guidelines and detailed examples based on our insights.

AI Adoption – It’s More Than Just A Trend

72% of organizations have adopted AI in at least one business function, according to research done by McKinsey & Company in early 2024. This study underscores the increasing use of AI across various industries.

While this progression indicates that businesses are becoming more efficient and intelligent, unmonitored AI usage, also known as Shadow AI, exposes companies to various security and legal risks. For instance, the Ray AI Framework was responsible for the hacking of hundreds of clusters.

Companies that fail to detect and monitor AI usage within their organizations are at immediate risk. Conversely, those who adapt and effectively leverage AI’s power will benefit considerably.

Identify AI Across the SDLC

This section will demonstrate where AI can be used at each stage of the Software Development Life Cycle (SDLC) and provide specific examples to help you identify AI utilities.

Development Phase

The development phase is crucial for identifying AI usage and potential risks before they reach production. By identifying these issues early in the development process, you can reduce risks, ensure compliance, and maintain security standards. This proactive approach allows for better management and control over AI integrations, making the development phase the ideal place to identify and address potential concerns.

  • Code Libraries: Code libraries are collections of pre-written code that developers can use to perform common tasks, such as machine learning operations. They are an excellent place to identify AI usage because they directly indicate the inclusion of AI capabilities in the project. Look for machine learning libraries in dependency files such as requirements.txt, environment.yml, package.json, or pom.xml.
    • Example: Discovering AI Code Libraries in Google’s project

  • AI URLs: AI URLs are links within the code that point to AI services or APIs. They can be used to understand which external AI services the project relies on. Usually, You can find them in configuration files or the source code. To determine if a URL belongs to an AI service, look for keywords related to AI or specific endpoints of known AI providers.
    • Example: OpenAI’s API endpoint

      AI_API_URL = "https://api.openai.com/v1/engines/davinci-codex/completions"
  • AI Tokens: AI tokens are authentication keys used to access AI services or APIs. When identified in the code, they can indicate that AI services are being utilized. You can usually find them in environment variables, configuration files, or the source code. To determine if a token is related to an AI service, look for associated keywords or service-specific token structures.
    • For example, to identify a HuggingFace token, use this regex:
      (?:hf_|api_org_)[a-zA-Z0-9]{34}

  • IDE AI Assistants: IDE AI assistants are tools integrated within development environments to enhance productivity by providing code suggestions, completions, and other AI-powered features. Look for plugin or extension configurations for assistants like GitHub Copilot, Amazon Q or TabNine in IDEs such as Visual Studio Code.

  • Third-Party AI Applications: Check repositories for integrations with third-party AI applications or services. Platforms like GitHub, GitLab, and Bitbucket often have configurations indicating the use of these AI tools. For example, GitHub Copilot, an AI-powered code assistant tool, can be found configured at the user, organization, and repository levels.
    • Example: User configuration for GitHub Copilot:

  • AI Models: AI models are computational algorithms designed to perform specific tasks such as data preprocessing, training, and inference. They can be found in storage buckets or source repositories as model files. They can also be identified by their usage in the code. Identifying these models helps you understand how AI is utilized in your project. Validating the model’s license to ensure compliance with usage terms is also essential.
    • For example, this code snippet demonstrates how to load a pre-trained BERT model from Hugging Face’s model repository using the transformers library.

      from transformers import AutoModel 
      
      model = AutoModel.from_pretrained("google-bert/bert-base-cased")
  • Jupyter Notebooks: Jupyter Notebooks are interactive documents that combine code, text, and visualizations. They are often used for data analysis and machine learning. While a Jupyter Notebook file does not indicate AI usage, scanning .ipynb files for AI-related code and comments is essential to identify any AI usage within them.

Build and Testing Phase

  • CI/CD Pipelines with AI Dependencies: CI/CD pipelines that use dependencies with AI logic include specific steps for AI-related tasks, such as AI code review. These pipelines can be found in configuration files like Jenkinsfile, .github/workflows, or gitlab-ci.yml. Look for dependencies on AI tools or libraries.
  • Environment Variables for AI Services: Identify environment variables related to AI services used in CI/CD pipelines. These variables can be found in the pipeline configuration files or secret management systems like AWS Secrets Manager or Azure Key Vault.
    • Example: A model being pushed to Huggingface using an API token from Environment Variable.

  • AI Identifiers in Build Logs: Examine the build logs of CI/CD pipelines to find identifiers or print statements related to AI logic. These logs can provide insights into AI tasks being performed during the build and testing phase.
    • Example: The GitHub Actions log shows Ray, a computing framework commonly used for AI, being added to the Helm charts.

Deployment Phase

  • Dockerfile: Dockerfiles are scripts that define the environment and instructions for creating Docker images. Identify Docker commands that install AI libraries or configure AI services, which can indicate the use of AI within containerized applications. Look for commands such as RUN pip install tensorflow or RUN az ml.
  • Cloud IaC Files: Cloud Infrastructure as Code (IaC) files define and manage cloud resources using code files. Identify AI resources within IaC files like CloudFormation, Terraform, or ARM templates to understand AI usage in cloud environments. Look for resources such as AWS SageMaker, Azure ML, or Google AI Platform.
    • Example: Terraform configuration for deploying an AWS SageMaker app:

  • Kubernetes Resources: Kubernetes manifests define the deployment and management of applications in a Kubernetes cluster. Identify AI-related container images or AI Custom Resource Definitions (CRDs) within these manifests to understand AI usage in your Kubernetes environment.

Operations and Monitoring Phase

  • Cloud Services: Cloud service configurations manage and deploy various cloud resources. Monitor these configurations for AI services to understand how AI is being utilized in your cloud environment. Look for AI services such as AWS Bedrock, Azure OpenAI Service, and Google AutoML.
  • AI File Formats: Look for AI-specific file formats such as .pt, .safetensors, .pb, and others in your cloud storage buckets or source code management (SCM) repositories. 

Cycode’s Solution: Comprehensive AI Visibility With AI Discovery

Cycode’s platform provides comprehensive visibility into the usage of AI tools across your organization. By integrating with your code repositories, CI/CD pipelines, and cloud infrastructure, Cycode can identify and monitor AI-related activities, offering complete oversight and enabling effective risk mitigation.

For instance, you can use Cycode to:

  • Identify all AI code libraries within your organization

  • Monitor AI IaC resources that impact your production environment

  • Detect exposed AI tokens

Have It All Using Cycode’s AI Discovery

Closing Thoughts

Visibility is crucial for adequate security. By following the steps in this guide, your organization can achieve the visibility needed to mitigate the risks associated with AI usage. Implementing these practices will help you harness the benefits of advanced technology while maintaining robust security and resilience.

We’re absolutely dedicated to helping every organization’s security and development teams (A)chieve the (I)mpossible in this transformative age of AI. See firsthand how Cycode’s Complete ASPM can help yours do it too — book a demo today.