Coverage-based Fuzzing
The attack area of embedded systems is large. The software must not only be secure in friendly operational conditions but also be resilient in a hostile environment where data may be compromised.
Verifying that your software products are secure in these circumstances is hard. Despite development of a variety of techniques to aid the verification, teams struggle with the integration of these methods into their software development process. In part because good, integrated and supported tooling that takes off some of the burden and that lowers the cost of getting started is missing.
At Riscure, we have been working to address those struggles that we see time and time again in embedded software development teams. In this blog post, we consider a basic workflow for improving the effectiveness of a fuzzing analysis to find logical flaws in hostile execution environments.
Fuzzing in a Nutshell
Fuzzing is a dynamic analysis to root out bad behaviors of programs on particular inputs. Of course, we are most concerned with bad behaviors that can be escalated by an attacker to an exploit, such as buffer overflow vulnerabilities. To find those bad behaviors, the fuzzing engine probes an instrumented version of the program under test with a large set of inputs. The instrumentation helps to detect the behaviors we are interested in and to extract useful information from the program execution when they occur.
A central question in fuzzing is how the fuzzing engine should generate the probing inputs. In general, the complete input space is very large, so exhaustive and completely random testing is impractical. Hence, we want to guide the fuzzer to explore the input spaces more effectively. At the same time, we do not want to confine the engine too much because we are trying to expect the unexpected and reveal surprising bad behaviors.
Coverage as a Metric
A technique to guide the fuzzer’s exploration of the input space that is widely applicable is coverage guiding. The fuzzing engine starts with one or more seed inputs and makes pseudo-random mutations while measuring the coverage of the code under test. The input mutation process is biased towards variations that grow code coverage. This causes the engine to explore the input space in a way that reveals more program behaviors, increasing the chances of finding unexpected program paths that lead to vulnerabilities.
This is a compelling story. When one hears it for the first time, it sounds like fuzzing requires minimal effort from a developer: the engine does all the hard work, right?
As always, there is no free lunch. The effectiveness of fuzzing campaigns still depends on the chances of the engine stumbling on an input that reveals new program behaviors. Off-the-shelf fuzzing engines like libfuzzer and AFL see inputs as mere binary blobs and have no preconceived notion of what input variations are meaningful to the program. When we are interested in program behaviors that occur only from a small subset of those inputs (for example, inputs that have more structure), then we have to help the fuzzer explore those inputs.
Visualizing Coverage in Real-Time
Writing a good fuzz test that will sufficiently exercise many program behaviors is challenging. It is a cooperative game between the fuzzing tool and the developer that deploys the tool. The developer needs to bring knowledge about interesting program inputs to the table and help translate the low-level mutations of the fuzzing engine to interesting variations of the program inputs. Variations that are likely to trigger new coverage.
To be able to do that, we need to be able to see which behaviors we manage to exercise and which still elude us. Code coverage is our proxy measure, so let’s visualize that.
Executing a fuzz test with Riscure True Code spawns multiple processes that run the fuzzing engine cooperatively. The engines comprise a corpus of inputs representative of all the inputs that it tried, in the sense that they are enough to cover all the control-flow paths that the engine has seen so far.
True Code uses this corpus to compute the coverage of source code of the performed analysis and presents this in a live view during test execution. As fuzzing progresses, more code paths will turn from red (not covered) to green (sufficiently covered).
This interactive coverage visualization makes this easy to see when the engine struggles to exercise certain program behaviors. If that is the case, we will have to tweak the setup of the fuzz test, but that is a story for a different time.
Conclusion
We have looked at the basic workflow for setting up fuzz tests as a cooperative exercise between a developer and his machine, measuring success by code coverage. Setting up fuzz tests and the code coverage machinery can be a time-consuming exercise. Riscure True Code eases the burden with support for out-of-the-box real-time coverage visualization.