At Riscure, we like to explore new technologies that can help us better help our customers. Undoubtedly, the latest famous new applications are various versions of ChatGPT, a recently accessible chatbot that you can interact with in natural language, and which has been trained on a large section of the knowledge available on the internet.
Adversarially-minded people soon published the many flaws: it fails at basic math, its safety mechanisms can be bypassed using prompt injection, and it often hallucinates realities while sounding very convincing. While that is important research, Riscure experts focused on exploring how ChatGPT can be used in source code analysis.
During the course of the investigation, ChatGPT was used to review a C file and identify vulnerabilities and other issues. The purpose of this investigation was to determine how the ChatGPT language model could be used to help our Security Analysts during code reviews*.
We fed ChatGPT a C file with many known vulnerabilities. Overall, ChatGPT was able to identify some vulnerabilities in the C code, including a hardcoded password, a potential command injection vulnerability, and a bug where only the first 3 characters of a password were checked. It also had some false positives, including instances of making up source code that was not present in the original file. ChatGPT struggled to distinguish between command injection and buffer overflow vulnerabilities but demonstrated a good understanding of general knowledge of memory exploitation and mitigation techniques. An interesting feature of ChatGPT is that it is able to summarize and explain code blocks in human language, which can help a Security Analyst quickly get a high-level understanding of a code block.
In addition to reviewing code, ChatGPT was also able to take a crude description of a vulnerability and expand it into a mostly correct finding description. It was able to provide risk ratings that matched the ratings given by a human analyst in 4 out of 4 cases and provided some decent suggestions for countermeasures. ChatGPT also demonstrated some background knowledge of fault injection and was able to generate an example code that implemented a countermeasure for flow integrity after being provided with more specific information.
Overall, while ChatGPT has some potential as a tool for security analysts, it is not currently accurate enough to fully rely on for code review. A Security Analyst can use it as a tool, as long as they know the limitations. The main limitation currently is that ChatGPT runs on an untrusted service that logs data, and therefore cannot be used for any confidential code.
However, it may be useful for efficiently writing reports and explaining functions and modules in human language, which could potentially speed up reverse engineering efforts. It may also be interesting for developers to explore its code-generation capabilities. Code review using language models like ChatGPT is an active area of research, and it is possible that such tools could be improved over time to be more accurate and reliable.
If you have any questions, contact us at inforequest@riscure.com.
*No codebases under NDA were harmed during this research. ChatGPT is run as a service by OpenAI, which undoubtedly ingests, stores, and analyzes all conversations.