/analyze for Visual Studio–the Ugly (part 1)

Introduction

/analyze is a powerful Visual C++ feature that can help to find bugs. However it uses some heuristics which are rather peculiar and which make it difficult to decide how seriously to take its warnings. Today we discuss one of those peculiar heuristics, and show a case where /analyze is flat out wrong.

Update: these peculiar warnings are mostly fixed in VS 2012. Most of the bad warnings in the repro project that I shared with Microsoft were addressed, so the false positive rate is lower.

These tests are all done with Visual Studio 2010 SP1, C/C++ Optimizing Compiler Version 16.00.40219.01 for 80×86.

The gory details

Take a look at this piece of code:

int g_values[10];

int GetValue1(int index)
{
    return g_values[index];
}

This piece of code is neither good nor bad, correct nor broken. It all depends on how it is used. It should probably have an assert to add a run-time check in debug builds for ‘index’ being in range, and index should probably be ‘unsigned’ or ‘size_t’, but it is not intrinsically broken. One could argue that since ‘index’ could potentially range from INT_MIN to INT_MAX that /analyze should warn about potential out-of-range accesses to g_values, but if /analyze did that then it would issue so many low-value warnings that it would be useless. So /analyze, having nothing useful to say about this function, wisely remains silent.

This piece of code is a different story. It is dangerously broken. It reads beyond the end of the array every single time which gives undefined behavior and should generally be avoided.

int GetValue2()
{
    return g_values[_countof(g_values)];
}

/analyze is smart enough to handle this. It gets a bit carried away and gives two warnings, but better safe than sorry, right?

warning C6200: Index ’10’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’
warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ’44’ bytes might be read

So far so good. Now things get weird. This piece of code is, from a correctness point of view, identical to GetValue1. But /analyze begs to differ:

int GetValue3(int index)
{
    if (index == 8 )
        return 0;

    return g_values[index];
}

To this benign piece of code /analyze says:

warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ’44’ bytes might be read

A bit of experimentation with the value that index is compared against reveals what /analyze appears to be thinking. If you compare ‘index’ to an integer then /analyze assumes that this means that you will be indexing into g_values with an index 2 greater, in this case an index of 10. An index of 10 means you read the 11th element which extends out to the 44th byte.

If you compare index against 100 as shown below:

int GetValue4(int index)
{
    if (index == 100)
        return 0;

        return g_values[index];
}

then naturally /analyze assumes that you will be indexing into g_values with an index of 102 so it gives this warning:

warning C6385: Invalid data: accessing ‘int * g_values’, the readable size is ’40’ bytes, but ‘412’ bytes might be read

Quite strange to say the least. When encountered in your code base with no explanation these warnings are practically impossible to understand. Only if you are the odd type who starts writing test functions for a /analyze blog post do you have any hope of figuring out the internal logic.

An equally annoying problem with /analyze’s ability to track indices is shown with the code below.

int GetValue6()
{
    int sum = 0;
    for (int index = 0; index < _countof(g_values) * 2; ++index)
        if (index < _countof(g_values))
            sum += g_values[index];
        else
            sum += g_values[index – _countof(g_values)];

    return sum;
}

The index variable loops over twice the extent of the array, but there are carefully written checks to ensure that index is always used correctly. Despite this code being provably correct and safe, /analyze spits out the following pair of warnings for the impossible scenarios that keep it up late at night.

warning C6200: Index ’19’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’
warning C6200: Index ‘-10’ is out of valid index range ‘0’ to ‘9’ for non-stack buffer ‘int * g_values’

Summary

Unfortunately these inappropriate or wrong warnings make /analyze much harder to use. Each warning must be carefully analyzed to decide whether it is a brilliant insight into a subtle bug, a bizarre revelation about /analyze’s internal workings, or just plain wrong. This requires expert knowledge of a type that makes the arcane rules of C++ seem simple by comparison.

These warnings then either have to be suppressed or somehow marked as uninteresting. Either way these bugs prevent /analyze from being used by all but the most dedicated developer.

Advertisements

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code analysis, Code Reliability, Visual Studio. Bookmark the permalink.

13 Responses to /analyze for Visual Studio–the Ugly (part 1)

  1. Pingback: /Analyze for Visual Studio—the Ugly (part 3), False Positives | Random ASCII

  2. Dave says:

    I must say that PREfast’s “readable/writeable size is X but Y may be read/written” is by far the most confusing warning that it produces. After eliminating the small number of obvious issues, I’m left with a slew of warnings that simply make no sense. I grumbled to one of the PREfast developers some time ago that the analyzer should have an option to explain its reasoning for issuing a warning (which you’ve patiently reverse-engineered in one case above, but that doesn’t work when you’ve got a whole string of them and the issue isn’t as easy as comparison-with-a-constant).

    • brucedawson says:

      Be sure to read my other posts — I reverse engineered more than just the one case.

      I also recommend trying VS 11 beta. I reported all of the glitches that I discovered to Microsoft and, by and large, they fixed them all. This makes /analyze less confusing.

      • Dave says:

        >Be sure to read my other posts — I reverse engineered more
        >than just the one case.

        You have a lot more patience than I do :-).

        >I also recommend trying VS 11 beta.

        I’ll probably wait for the final release, since I’m a bit limited in how much stuff I can deploy and test. Thanks for the heads-up on that though.

        Have you tried other tools like (commercial) Coverity, Klocwork, and Fortify, or (free) clang and cppcheck? It seems you need all of those (and more, a recent version of gcc told me I was freeing (pointer-to-pointer-to-data) rather than (pointer-to-data) which no other analyzer had spotted) in order to get good coverage. In particular I’m continually surprised at how powerful clang’s analyzer is when it spots some problem condition 22 levels down inside a set of nested statements. PREfast and cppcheck are more along the lines of “are you sure you meant to do this?” while clang (and the commercial products) go much deeper, “if you do this and then this and then this and […] and then this, bad things will happen”.

        • brucedawson says:

          Clang has some great warnings and I plan to make better use of them in the future. I haven’t used anything else yet.

          • Dave says:

            gcc 7 with every warning and then some turned on can also turn up some useful things. It looks like the competition from clang has made them improve their diagnostics quite a bit.

            Diversity in compiling really helps here. For example a few months ago the IAR compiler, which normally drives me insane with its pedantic warnings which seem more designed to show off the devs ability to write a clever C parser than to help identify bugs, actually turned up a coding error that no other compiler had identified.

  3. Dave says:

    cppcheck is interesting because, unlike checkers like clang and PREfast which only detect problems in the code profile for the host environment (e.g. PREfast will only check the bits that are compiled under Windows but not the Unix code sitting alongside it), cppcheck runs through all possible profiles, #ifdef __WIN32__, #ifdef __UNIX__, #ifdef __iOS__, and so on. This makes it awfully slow if you’ve got multiplatform code, but does give you at least some analysis of configurations other than the one that the tool is hosted on.
    Another nice thing about cppcheck is that the developers very actively deal with updates and feedback, so if you report a problem and sometimes when you ask for a feature (it helps if you have a good test case or cases for it) it’ll appear in the next release (they’re on a two-monthly release cycle). That said, the checks are done on a somewhat ad-hoc basis rather than using the deep analysis performed by clang/Coverity/etc, so you don’t get quite the level of analysis that the other tools give you. I’m not sure how long that approach is sustainable before the multiple layers of handcoded checks start causing problems.

  4. Mike says:

    > It should probably have an assert to add a run-time check in debug builds for ‘index’ being in range

    In my experience, that would fix your problem – what you currently have is a function where you give Analyze absolutely no indication of the legal range of “index”, so it tries to guess.

    Asserts tell it exactly what range it needs to check against.

    • brucedawson says:

      > Asserts tell it exactly what range it needs to check against.

      The compiler can already see what the legal range of index is because it can see the array definition. What it can’t see, necessarily, is what values the callers are passing in. So, it often doesn’t know whether to assert or not.

      One common way of using assert macros with /analyze is to use them not to tell /analyze that a number *must* be in range, but to tell /analyze that a number *is* in range – to suppress warnings. This is an important tool for minimizing false positives.

      If you want to tell /analyze that a number *must* be in a range then the standard way to do that is using annotations on function declarations. Putting the annotations on the declaration ensures that they can be seen at the call sites, which is where bad values are created.

      Unfortunately /analyze has so many false positives on out-of-bounds array accesses that I have been forced to ignore those warnings entirely.

      • Dave says:

        You can also use _In_range_ to guide the analyzer. If you annotate your functions with that, you get some remarkably good analysis, with very few false positives. So the tools are definitely there, and relatively easy to use, it just takes a bit of work to annotate things.

        The one annotation form that I’ve never managed to get working is pointer-by-reference functions, so foo( thingptr **pointer ), no matter how I massage the annotations all I get is FPs. I’ve actually got a sample project which I use to test all the different combinations of code and annotations to see which one works, a bit like the reverse-engineering you did, but so far returning pointers in by-reference args hasn’t worked.

        • brucedawson says:

          I got tired of putting a lot of effort into /analyze because it seemed like they weren’t working to fix the avoidable errors – false positives/negatives where the compiler had full information but did the wrong thing.

          You should share your test project – it could be helpful.

          In the past one of the problems that I had was that even when the compiler knew exactly what range was *legal* it would still mess up because it had weird heuristics about what range was *possible* – like the weird (now fixed) behavior where if you compared an index to n it would assume that it could be equal to n+2.

          • Dave says:

            >You should share your test project – it could be helpful.

            It’s a hacked-up mess of commented-out guesses at annotations, ifdef’d code alternatives, and code comments in which question marks feature prominently. It would actually have negative utility if I published it, people would be more confused after reading it than they were before.

            Having said that, if anyone has any idea how to get by-reference pointer returns working in a manner that /analyze can deal with them, I’d love to hear from them.

      • Mike says:

        It was the use in your second paragraph that I was attempting to describe.

        Annotations can be a pain where the same code has to be used with multiple compilers, because they seem to be different on different compilers. Compilers don’t even agree on where some of them go.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s