Home Blog Security Highlight Security Highlight: Is AI ready for finding security vulnerabilities?

Security Highlight: Is AI ready for finding security vulnerabilities?

Author: Marc Witteman

Fuzzing tools are invaluable for testing open source code, but they aren’t foolproof. The recent SQLite vulnerability, found by Google’s Big Sleep AI, underscores the challenges of fuzzing large search spaces.

10 years ago, Google started Project Zero to find zero-day vulnerabilities. Over the years they found many vulnerabilities in open source projects, but also studied and developed strategies for finding them. About a year ago, they started using LLMs (Large Language Models) for vulnerability research. This project branch was called Project Naptime (now: Big Sleep), symbolizing the work done while analysts are sleeping.

Recently, Big Sleep achieved its first big success. They found a zero-day in a recent release of SQLite, a popular open source database. This finding happened after the commit of an SQLite revision, but before its official release. With an almost instant fix, any potential exploitation was prevented. The vulnerability concerned a buffer underflow. This could have become a serious and exploitable problem.

So how did Big Sleep find this vulnerability? Its designers followed a popular strategy called ‘variant-analysis’. Whenever a new vulnerability is discovered, you can look for code instances that are similar, but not identical. These would not be found by regular malware scanners, but for security researchers this is a good starting point for manual analysis that simplifies vulnerability finding and triaging. The Big Sleep team trained their LLM implementation to do the same. With its new result, they feel that the opportunity window of variant-analysis for attackers in the field may be closed as machines can do this faster than humans.

Nowadays, open source code is often well-tested before committing. Fuzzing has become a strong and vital tool for finding issues. So why was this issue not found by a fuzzer? The answer lies in the size of the search space. Fuzzing tools work through huge parameter spaces and can only find defects with a well configured test-harnesses and corpus. Setting this up is not easy, and this issue demonstrates that fuzzing is not always the right approach.

While the Big Sleep achievement is impressive, it should also be clear that this is just a step toward better evaluation of new software. Nevertheless, it is exciting to see how AI technologies are showing their strength in this domain as well. We believe that this first step will be followed by more improvements and that AI may become the best defense against software vulnerabilities.

Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code

Share This