The integration of machine learning into software development is revolutionizing the field, automating tasks and generating complex code snippets at an unprecedented scale. However, this powerful paradigm shift also presents significant challenges including the risk of introducing security flaws into the codebase. This issue is explored in depth in the paper Do Users Write More Insecure Code with AI Assistants? by Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. In this post, we will explore how this paradigm shift in software development with impact on software security, and new approach to manage risks in the software accordingly.
AI Assistants and Accelerated Development
AI assistants, such as OpenAI’s codex-davinci-002, have demonstrated impressive capabilities in generating code snippets based on user prompts. This presents an exciting opportunity for engineering teams on this planet to accelerate and get through their development modules quicker. This will lead to an explosion in the volume of code being written and deployed, as developers can now generate complex code snippets quickly and efficiently. Research from Github, Harvard Business School and Keystone.ai claims that “increase in developer productivity due to AI could boost global GDP by over $1.5 trillion.” While this would be an incredible achievement, this increased volume of code comes with its own set of challenges.
Security Concerns with AI generated code
Perry et al.’s paper raises a critical question, is code written with generative AI assistance more or less secure than the unassisted equivalent? Their empirical study found that while AI assistants can generate code quickly, they often provide incorrect solutions and overlook security best practices. The authors, “observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet also more likely to rate their insecure answers as secure compared to those in our control group.” In some cases, “responses from the AI assistant use libraries that explicitly flag that they are insecure in the documentation for the library.” This is unsurprising as the models have been trained on a large corpus of human written code, warts and all. At the time of writing this blog, GitHub themselves admit this:
AI generated code is only as secure as the underlying models enable them to be. As developers use open source software—code that is publicly accessible and deliberately modifiable—it may expose organizations to security vulnerabilities that exist in the code. There are AI tools that help developers find and fix coding errors during the software development lifecycle.
AI Assistance vs. Traditional Code Copying
None of this is explicitly a new issue as the authors reference the time honoured tradition of copying code verbatim from online resources (e.g. Stack Overflow). What is different now is the speed and the scale in the use of insecure coding practices enabled via automation. The surge in predicted code volume could potentially be accompanied by a commensurate increase in security vulnerabilities, such as SQL injection or cross-site scripting attacks. Researchers observed that code generation assistants would frequently use less secure coding patterns such as string concatenation instead of prepared statements while authoring SQL queries. In this context, static code analysis, a method of debugging by examining source code before a program is run, becomes even more crucial. It can help identify vulnerabilities in the code, thus mitigating the risk of security flaws.
Developers’ Role in Software Security
Academic researchers have found that developers who were skeptical of their generative AI tools and stayed more engaged with the language and format of their prompts were able to generate code with fewer security vulnerabilities. This provides a valuable opportunity to security practitioners in supporting developers. If a tight feedback loop can be created in the development process that allows developers to flag insecure code and remediate issues quickly, organizations can leverage the advantages of generative AI code generation while mitigating its potential harms.
Overcoming Objections to Static Code Analysis
Despite its potential benefits, some developers may be hesitant to adopt static code analysis. Common objections include the occurrence of false positives, the complexity of setting up and using the tools in development workflows, and the time-consuming nature of the process (especially for large codebases). It’s important to note that not all static code analysis tools are created equal. Bearer, for instance, has been designed to overcome these common objections. Bearer focuses on high precision reporting to minimize false positives, reduces complexity for developers to make findings actionable in their workflows, and is optimized for performance, ensuring that it can analyze large codebases quickly and efficiently.
Balancing the Power of Generative AI with Security
As generative AI continues to revolutionize software development and deployment, leading to a surge in the volume of code, it’s crucial to understand and mitigate the associated risks. Generative AI assistants have demonstrated a capacity to introduce security flaws into codebases, emphasizing the importance of secure coding methodologies such as static code analysis. While there may be objections to using static code analysis in your CI/CD, tools like Bearer demonstrate that these challenges can be overcome. As we continue to harness the power of generative AI in software development it’s crucial to also equip ourselves with the tools and practices necessary to ensure the security and integrity of our code.
Andrew Becherer is a security practitioner with over 20 years of technology, computing and risk experience. He served as the Chief Security Officer at Datadog where he formed and developed the security program that secured their operations during the rapid growth that ultimately resulted in a successful 2019 IPO. Later, he served as the CISO of Iterable where he built a comprehensive program that could navigate the challenges of the marketing technology industry in an increasingly complicated global security and compliance environment. Andrew is currently an investor and advisor working across cybersecurity and generative artificial intelligence.
*Reference: Perry, N., Srivastava, M., Kumar, D., & Boneh, D. (2022). Do Users Write More Insecure Code with AI Assistants? Retrieved from https://arxiv.org/pdf/2211.03622.pdf*
*Reference: Dohmke, T., Iansiti, M., Richards, G. (2023). Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle