Technology Today - Home
 
 
Vulnerability research has historically been a disorganized process, with a collection of custom approaches used by different researchers with inconsistent results. Indeed, consistency is one of the most difficult aspects of vulnerability research — it's a never-ending hunt for the proverbial needle in the haystack, except a particular needle might not even exist. Despite the difficulty of the challenge, Raytheon SI Government Solutions has a track record of proactively identifying vulnerabilities for a variety of customer applications using an advanced tool set beyond the public state of the art.

Reverse engineering in the context of vulnerability research is taking apart an application to understand how it operates so that flaws in its operation may be discovered and either corrected or exploited. Whether the end result is to support an information operation mission or to improve information assurance, the process of reverse engineering to discover vulnerabilities is similar.

Current reverse engineering tools to support vulnerability research are fragmented, as are the approaches researchers use. Debuggers and disassemblers help to focus on specific narrow functionality, but are impeded by binary obfuscation and armoring mechanisms employed to protect intellectual property within software. Those mechanisms make binary analysis difficult by modifying normal instruction sequences in manners that make analysis more difficult (adding extra useless instructions, encrypting portions of code, etc.). Additionally, current reverse engineering tools are not designed to create the larger picture of a program's functionality.

While decompilers that attempt to re-create source code help at abstracting to a higher layer, they are even more susceptible to problems from binary obfuscation. Additionally, those approaches don't necessarily identify vulnerabilities — they just help a reverser understand how the program functions. Other approaches, either automated or manual, must be used to actually identify potential vulnerabilities.

Industry's Cutting Edge
Current public state-of-the-art reverse engineering tools are just now beginning to make strides in the area of automation, completeness and scale.

Automation is used for multiple purposes. Some tools may attempt to automatically strip away binary protections; others may attempt to identify common vulnerability sequences. While automation can be limited, tools that feature extensible application program interfaces, scripting interfaces, or other mechanisms to easily automate common tasks are much more powerful than stand-alone tools that only operate with a human typing and clicking. One of the problems with automated source-code analysis solutions is the signal-to-noise ratio. Within an application comprising millions of lines of code, there may be thousands of errors — an error being code that contains the potential for unintended behavior — most of which cannot be exploited and offer no security risk. When attempting to identify the most critical problems, knowing which errors are exploitable (i.e., which constitute vulnerabilities) and understanding what it takes to exploit one vulnerability versus another allows resources to be most effectively allocated in securing the software.

Reverse engineering efforts to discover vulnerabilities are only as effective as the code they can touch. In fuzzing, for example, corrupted input is sent to an application to discover if it handles it properly. Effective fuzzing must account for how much of the target application has been touched. If a file format is compressed, and the fuzzer only corrupts the compressed file itself, it is unlikely that the fuzzer will be impacting many of the important logic decisions the application makes based on the contents of the compressed format. Modern reverse engineering techniques, then, place an important emphasis on the completeness of the execution flow through an application.

Completeness metrics alone don't help. While they provide the map of yet-to-be explored territory, the search space can be huge and the variety of corrupted inputs wide. Therefore, technologies must often scale to large numbers of nodes before they can produce useful results in any reasonable time frame.

Raytheon's Cutting Edge
Automation, completeness and scale are all important components in an effective reverse engineering process, but they come with their own drawbacks and implementation problems as well. Fortunately, Raytheon is ahead of the curve. The company began walking this path during the past five years and has made great strides in not only implementing solutions that take these approaches into account, but also resolving the practical implications.

Automating reverse engineering tools is in some ways straightforward. It's a simple programming exercise to expose a reasonable automation interface. What is much more difficult is automating the learning process — the interpretation of results to focus efforts on the most fruitful segments of code. Most approaches described in public literature for advanced automation are fragile, unworkable or merely theoretical. Raytheon SI's reverse engineering tool set — based on the Kernel Mode I^2 full-state tracking virtualization platform — offers an extensive API for integration into a variety of applications and a number of advanced features such as dataflow tracking, rewinding, unlimited differential snapshotting, and many others.



To address issues of completeness, a reverse engineering process must be able to instrument the application being executed. While application instrumentation is often accomplished with a debugger, that technology simply isn't powerful enough for detailed code-coverage analysis of modern applications. Existing public instrumentation tools capable of analyzing program execution down to the instruction level are much slower than Raytheon SI technology based on the internal Kernel Mode I^2 tool.

The most basic and efficient way to improve scale is to add more machines and add some basic command and control functionality for parallel processing problems like fuzzing a binary, but such a solution produces its own problems. One consequence is the volume of data produced. Simply increasing the amount of data produced by an automated process does not necessarily help make humans better at their tasks. A corresponding suite of advanced analysis tools must be built to handle the increased results, whether they're more crashes from fuzzing or more information about program code coverage. Figure 1 illustrates one important capability of our automated analysis. The graph — taken during a fuzzing test — plots the rate of unique exceptions discovered over time. A steady decline would be a sign that this test has exhausted the range of errant behaviors, but the upturn in this example indicates that it may be worth continuing. Note in the top center that we have automated the initial assessment of the risk associated with each exception.

While the state of the art has advanced in recent years, there are a huge number of potential spots for growth, and Raytheon SI is proud to be leading the way in identifying advancements in reverse engineering solutions to help identify and remediate vulnerabilities.

Jordan Wiens