/analyze for Visual Studio–the Ugly (part 1)

Introduction

/analyze is a powerful Visual C++ feature that can help to find bugs. However it uses some heuristics which are rather peculiar and which make it difficult to decide how seriously to take its warnings. Today we discuss one of those peculiar heuristics, and show a case where /analyze is flat out wrong.

Update: these peculiar warnings are mostly fixed in VS 2012. Most of the bad warnings in the repro project that I shared with Microsoft were addressed, so the false positive rate is lower.

These tests are all done with Visual Studio 2010 SP1, C/C++ Optimizing Compiler Version 16.00.40219.01 for 80×86.

The gory details

Take a look at this piece of code:

int g_values[10];

int GetValue1(int index)
{
    return g_values[index];
}

This piece of code is neither good nor bad, correct nor broken. It all depends on how it is used. It should probably have an assert to add a run-time check in debug builds for ‘index’ being in range, and index should probably be ‘unsigned’ or ‘size_t’, but it is not intrinsically broken. One could argue that since ‘index’ could potentially range from INT_MIN to INT_MAX that /analyze should warn about potential out-of-range accesses to g_values, but if /analyze did that then it would issue so many low-value warnings that it would be useless. So /analyze, having nothing useful to say about this function, wisely remains silent.

This piece of code is a different story. It is dangerously broken. It reads beyond the end of the array every single time which gives undefined behavior and should generally be avoided.

int GetValue2()
{
    return g_values[_countof(g_values)];
}

/analyze is smart enough to handle this. It gets a bit carried away and gives two warnings, but better safe than sorry, right?

warning C6200: Index ’10’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’
warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ’44’ bytes might be read

So far so good. Now things get weird. This piece of code is, from a correctness point of view, identical to GetValue1. But /analyze begs to differ:

int GetValue3(int index)
{
    if (index == 8 )
        return 0;

    return g_values[index];
}

To this benign piece of code /analyze says:

warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ’44’ bytes might be read

A bit of experimentation with the value that index is compared against reveals what /analyze appears to be thinking. If you compare ‘index’ to an integer then /analyze assumes that this means that you will be indexing into g_values with an index 2 greater, in this case an index of 10. An index of 10 means you read the 11th element which extends out to the 44th byte.

If you compare index against 100 as shown below:

int GetValue4(int index)
{
    if (index == 100)
        return 0;

        return g_values[index];
}

then naturally /analyze assumes that you will be indexing into g_values with an index of 102 so it gives this warning:

warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ‘412’ bytes might be read

Quite strange to say the least. When encountered in your code base with no explanation these warnings are practically impossible to understand. Only if you are the odd type who starts writing test functions for a /analyze blog post do you have any hope of figuring out the internal logic.

An equally annoying problem with /analyze’s ability to track indices is shown with the code below.

int GetValue6()
{
    int sum = 0;
    for (int index = 0; index < _countof(g_values) * 2; ++index)
        if (index < _countof(g_values))
            sum += g_values[index];
        else
            sum += g_values[index – _countof(g_values)];

    return sum;
}

The index variable loops over twice the extent of the array, but there are carefully written checks to ensure that index is always used correctly. Despite this code being provably correct and safe, /analyze spits out the following pair of warnings for the impossible scenarios that keep it up late at night.

warning C6200: Index ’19’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’
warning C6200: Index ‘-10’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’

Summary

Unfortunately these inappropriate or wrong warnings make /analyze much harder to use. Each warning must be carefully analyzed to decide whether it is a brilliant insight into a subtle bug, a bizarre revelation about /analyze’s internal workings, or just plain wrong. This requires expert knowledge of a type that makes the arcane rules of C++ seem simple by comparison.

These warnings then either have to be suppressed or somehow marked as uninteresting. Either way these bugs prevent /analyze from being used by all but the most dedicated developer.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code analysis, Code Reliability, Visual Studio. Bookmark the permalink.

6 Responses to /analyze for Visual Studio–the Ugly (part 1)

  1. Pingback: /Analyze for Visual Studio—the Ugly (part 3), False Positives | Random ASCII

  2. Dave says:

    I must say that PREfast’s “readable/writeable size is X but Y may be read/written” is by far the most confusing warning that it produces. After eliminating the small number of obvious issues, I’m left with a slew of warnings that simply make no sense. I grumbled to one of the PREfast developers some time ago that the analyzer should have an option to explain its reasoning for issuing a warning (which you’ve patiently reverse-engineered in one case above, but that doesn’t work when you’ve got a whole string of them and the issue isn’t as easy as comparison-with-a-constant).

    • brucedawson says:

      Be sure to read my other posts — I reverse engineered more than just the one case.

      I also recommend trying VS 11 beta. I reported all of the glitches that I discovered to Microsoft and, by and large, they fixed them all. This makes /analyze less confusing.

      • Dave says:

        >Be sure to read my other posts — I reverse engineered more
        >than just the one case.

        You have a lot more patience than I do🙂.

        >I also recommend trying VS 11 beta.

        I’ll probably wait for the final release, since I’m a bit limited in how much stuff I can deploy and test. Thanks for the heads-up on that though.

        Have you tried other tools like (commercial) Coverity, Klocwork, and Fortify, or (free) clang and cppcheck? It seems you need all of those (and more, a recent version of gcc told me I was freeing (pointer-to-pointer-to-data) rather than (pointer-to-data) which no other analyzer had spotted) in order to get good coverage. In particular I’m continually surprised at how powerful clang’s analyzer is when it spots some problem condition 22 levels down inside a set of nested statements. PREfast and cppcheck are more along the lines of “are you sure you meant to do this?” while clang (and the commercial products) go much deeper, “if you do this and then this and then this and […] and then this, bad things will happen”.

  3. Dave says:

    cppcheck is interesting because, unlike checkers like clang and PREfast which only detect problems in the code profile for the host environment (e.g. PREfast will only check the bits that are compiled under Windows but not the Unix code sitting alongside it), cppcheck runs through all possible profiles, #ifdef __WIN32__, #ifdef __UNIX__, #ifdef __iOS__, and so on. This makes it awfully slow if you’ve got multiplatform code, but does give you at least some analysis of configurations other than the one that the tool is hosted on.
    Another nice thing about cppcheck is that the developers very actively deal with updates and feedback, so if you report a problem and sometimes when you ask for a feature (it helps if you have a good test case or cases for it) it’ll appear in the next release (they’re on a two-monthly release cycle). That said, the checks are done on a somewhat ad-hoc basis rather than using the deep analysis performed by clang/Coverity/etc, so you don’t get quite the level of analysis that the other tools give you. I’m not sure how long that approach is sustainable before the multiple layers of handcoded checks start causing problems.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s