The following is an excerpt from the book The Art of Software Security Testing: Identifying Software Security Flaws. In these sections of Chapter 11: Local Fault Injection (.pdf), authors Chris Wysopal, Lucas Nelson, Dino Dai Zovi and Elfriede Dustin explain which methods should be used to investigate file formats.
Fuzzing File Formats
Applications such as Web browsers, image viewers, and media players regularly process files provided by untrusted remote users. The formats and encoding of these files, especially those used for compressed images, video, and audio, are quite complex and thus are difficult to parse securely. It is therefore essential that the applications' processing of these files be properly scrutinized and tested.
As an example of a common file format vulnerability, consider the following code fragment. It is an example of a style of code commonly seen parsing binary file formats. The file format may consist of a file header and a number of sections, each with section headers. Each section header contains a section size field that describes how many bytes of data are contained within that section. If the file format parsing code uses these values unchecked in a memory allocation request size or as an offset into the file, a denial-of-service or memory trespass vulnerability may be likely. The following code does not check the section size field read from the file section header. It reads file data into a heap-allocated data buffer without validating the size or checking the return value of HeapAlloc. This presents several problems (see Listing 11-4).
A Common Binary File Format Parsing Vulnerability
ReadFile(hFile, &fh, sizeof(FILE_HEADER));
sh = HeapAlloc(fh.dwSectionSize + SIZEOF(SECTION_HEADER));
ReadFile(hFile, sectionData, fh.dwSectionSize);
Consider the case in which the value of the section size field read in the file header is very large. If the allocation fails and a buffer cannot be allocated, HeapAlloc returns NULL. When the application calls ReadFile with a nonzero size and a NULL buffer pointer, the application crashes with an access violation. This causes an exception to be generated that the application might catch and handle. If the application doesn't handle it, an application crash occurs, indicating a possible denial-of-service vulnerability. However, if the section size field is set to be equal to 0 minus the size of the section header, the HeapAlloc call allocates a 0-byte length buffer due to the integer arithmetic overflowing and wrapping around 0. The subsequent call to ReadFile below it attempts to write a large amount of data to the 0-length heap block, causing a heap overflow. An attacker may exploit this vulnerability to achieve arbitrary code execution.
An application's file format handling should be tested against improper and malformed files. The test methodology should generate a series of malformed files by mutating properly formatted files, generating random garbage files, and creating files likely to trigger errors handling boundary conditions. The application should be tested against each file to ensure that it properly handles each one without crashing or causing unexpected behavior. The next section describes automated file corruption testing and some freely available file format testing tools.
File Corruption Testing
File corruption testing is a form of input fuzzing targeted at applications and interfaces operating on binary input files. Common applications of file corruption testing include testing image, font, and archive file format parsing.
Testing an application's handling of a binary file format may be performed at several different levels. A straightforward yet labor-intensive approach is to manually create a series of files that have been corrupted in different ways and proceed to attempt to use the file in the application being tested. This approach requires little or no programming; the file corruption can be performed manually with a hex editor or by a small Python script. With this approach, however, several issues arise. For example, there may be a large number of test cases, and manually corrupting the files and testing them may take too long—not to mention being mindnumbingly boring.
Some level of automation can speed up this process of file creation and testing to free the application penetration tester to do other, more interesting things.
Automated File Corruption
Binary file formats can be complicated, involving a large number of structures with type, option, size, and offset fields that may have intricate interdependencies. They may also contain file sections possibly involving compression or encryption. Manually creating test cases requires in-depth knowledge of the file format. It also may require "borrowing" a good deal of code from the application to be tested to properly compress or pack data into the file format. Although this sort of code reuse may save time, it may make some bugs difficult to find because the same incorrect assumptions made in file parsing would be assumed in the file creation. Luckily, by deliberately avoiding intimate knowledge of the file format, we can sidestep this pitfall and create a generic binary file corruption test harness that can uncover a good number of vulnerabilities quickly.
A simple tool to perform quick file format testing is Ilja van Sprundel's Mangle.4 Mangle overwrites random bytes in a binary file format's header with random values, slightly biased toward large and negative numbers. Mangle requires a template file to mangle and the size in bytes of the file header. As an example, let's mangle a JPG file.
Want to learn how to mangle a JPG file? Or how to automate the testing of local applications? Download the rest of Chapter 11: Local Fault Injection (.pdf)
Note: Printed with permission from Addison-Wesley. "The Art of Software Security Testing: Identifying Software Security Flaws" by Chris Wysopal, Lucas Nelson, Dino Dai Zovi and Elfriede Dustin. Copyright 2007. For more information about this title and other similar books, please visit www.awprofessional.com.