Adventures with /analyze for Visual Studio–the Good

The /analyze feature of the Visual C++ compiler (requires expensive Super Man edition or available for free with the Platform SDK) is a great way of improving code quality. It is basically a 21st century ‘lint’ that identifies many coding errors. Many of these errors are very hard for programmers to see but, with proper annotations of your code, the unblinking eye of /analyze will reliably find them.

I’ve recently been using /analyze on a large code base and it found about 500 bugs that were worth fixing. These bugs included out of range reads and writes, printf parameter errors, logic errors, and more. These bugs were at risk of causing memory corruption, crashes, and unexpected behavior of all types. Several of the crash bugs bit my coworkers while I was working on the code cleanup, thus proving that these bugs were not just theoretical problems.

A list of all of the bugs that /analyze found would be tedious to create and boring to read. Instead I’ve created sample code that demonstrates the types of bugs where /analyze did the best job. The places where /analyze did a horrible job are saved for a later post.

Buffer Overruns

When calling functions that require that you pass a pointer and a buffer size it is very easy to pass in the wrong size. Some of the mistakes that I have seen include passing an incorrect hard-coded constant, passing the wrong named constant, or passing the size of the wrong buffer. In this example sizeof() is used when _countof() is required, which means that the count is twice as large as it should be, and there is a possible buffer overrun:

void TestBufferWarnings(const wchar_t* pSource)
{
wchar_t buffer[10];
wcscpy_s(buffer, sizeof(buffer), pSource);
}

The warnings emitted by /analyze are:

warning C6057: Buffer overrun due to number of characters/number of bytes mismatch in call to ‘wcscpy_s’
warning C6386: Buffer overrun: accessing ‘argument 1′, the writable size is ’20’ bytes, but ’40’ bytes might be written

Buffer overruns can cause hard to diagnose crashes and can leave your software vulnerable to exploitation by hackers. The /analyze feature can accurately diagnose this potential overrun because the wcscpy_s function, like many other C runtime functions, is annotated with special macros that indicate the contract. A simplified version of the wcscpy_s annotation is shown below:

wcscpy_s(_Out_z_cap_(cc) wchar_t* pD, size_t cc, const wchar_t* pS);

The relevant portion is “_Out_z_cap_(cc) wchar_t* pD”. This says that pD points to an output buffer that will be filled with a null-terminated string and whose size in characters is specified by the ‘cc’ parameter. In addition to letting /analyze diagnose this size mismatch the annotation also helps with other types of analysis, by letting /analyze assume that after calling wcscpy_s the buffer will be filled with a null-terminated string. The annotations also help when implementing wcscpy_s because /analyze knows that it should verify that wcscpy_s always null-terminates the output buffer.

Another surprisingly common buffer overrun is when code writes to more elements of an array then there are, with a hard-coded index. It seems ridiculous, but in any large code base these errors do creep in:

int ReturnFromArray()
{
int array[2] = {};
return array[2];
}

These are the warnings generated by /analyze for the code above:

warning C6201: Index ‘2’ is out of valid index range ‘0’ to ‘1’ for possibly stack allocated buffer ‘array’
warning C6385: Invalid data: accessing ‘array’, the readable size is ‘8’ bytes, but ’12’ bytes might be read: Lines: 33, 34

Format String Mismatches

It is very easy to write code with mismatches between the printf format string and the parameters passed. Some of these mismatches represent portability problems (reliance on byte ordering or the size of ‘int’ versus pointers) and others represent undefined behavior or guaranteed crashes. Here is a function that demonstrates many of the mismatches that I found:

void TestPrintfWarnings()
{
    int x = 0;
    float f = 1.0f;
    void* p = 0;
    __int64 i64 = 0;
    std::string s = “Hello”;

    printf(“%f, %d, %08x, %d, %p\n”, x, f, p, i64, s);
}

The warning this generates with /analyze are:

warning C6272: Non-float passed as argument ‘2’ when float is required in call to ‘printf’
warning C6273: Non-integer passed as parameter ‘3’ when integer is required in call to ‘printf’: if a pointer value is being passed, %p should be used
warning C6273: Non-integer passed as parameter ‘4’ when integer is required in call to ‘printf’: if a pointer value is being passed, %p should be used
warning C6328: ‘__int64’ passed as parameter ‘5’ when ‘int’ is required in call to ‘printf’
warning C6066: Non-pointer passed as parameter ‘6’ when pointer is required in call to ‘printf’

The first two warnings (6272 and 6273) are clear – don’t mismatch float and int parameters.

The third warning (6273) – for using %08x to print a pointer – is less clear. %08x works well for printing pointers until you port your code to 64 bits, at which point your pointers will print incorrectly, and following arguments may also be offset and therefore wrong.

The fourth warning is fairly obvious – don’t use %d to print a 64-bit integer – but is a surprisingly easy mistake to make. Due to the byte ordering on x86 processors this method of printing a 64-bit integer will actually work, as long as the 64-bit integer is small enough, and as long as no other printf parameters follow. If other parameters follow then, depending on the calling convention, they may be offset by four bytes, leading to incorrect output or a crash. The best portable way to print 64-bit integers is using %lld. %I64d will not work with gcc.

The fifth warning could be far clearer, but it does accurately describe the problem. When printing a string with ‘%s’ you are supposed to pass a char*. Passing a string object is a really bad idea – although very common in the code base I was cleaning up. The string class that we use is 16 bytes, but it worked when passed to printf style functions because the first member of the string object was the char* that pointed to the string. However if any other parameters followed they would be offset by 12 bytes, which could easily cause problems.

The reason that /analyze can find these bugs is because it knows that printf uses a standard printf style format string. As usual it knows this because the printf function is annotated:

printf(_In_z_ _Printf_format_string_ const char * _Format, …);

Other warnings that /analyze found a lot of include missing arguments and extra arguments.

Logic Errors

There are many other miscellaneous errors that /analyze can find. As one example consider this simple function:

bool TestConditionalWarnings(int val)
{
return val & kFlag1 || val & kFlag3 || kFlag4;
}

The intent was to see if any of the flag bits were set but an error in the check for kFlag4 means that this expression will always be true, which /analyze indicates:

warning C6236: (<expression> || <non-zero constant>) is always a non-zero constant

The Importance of Annotations

The majority of /analyze warnings – and certainly the majority of the accurate warnings about serious problems – require annotations on function declarations. If you are serious about finding bugs in your code then you need to annotate all of your functions that take printf style arguments, and all of your functions that take buffer and size parameters. These annotations are easy to do, they force you to think about what the contracts for these functions actually are, and they can significantly increase the security and reliability of your code.