Developers Rejoice–Windows 7 Stack Corruption Fixed!

64-bit Windows 7 SP1 has a stack corruption bug that affects developers. Any developer with an AVX capable processor who is writing 32-bit code on 64-bit Window 7 SP1 is vulnerable. That sounds like a lot of conditions but I could summarize it by saying that most developers are vulnerable to this bug.

This bug corrupts the stack when you are debugging a 32-bit program and it crashes, leaving you with a garbage call stack that doesn’t even show where the crash happened. It has been reported on here twice before.

A hot fix is available now. If you are reading this then you probably need it.

Update: it appears that the fix was simultaneously rolled into a security update, thus making the hot-fix unnecessary. If you have KB2859537 (part of the August 2013 set of security patches) then you should have the fix and don’t need the hot-fix. Odd, but great!

The bug is easy to reproduce. Create a Win32 console project, 32-bit, and paste in this code:

int main(int argc, char* argv[])
{
       char* p = 0;
       p[0] = 0;
}

Debug or release, it doesn’t matter. Run it under the VS debugger. What you want to see when it crashes is this call stack:

clip_image002[4]

What you actually get is this:

clip_image002

That’s a pretty broken call stack. The function that crashed is missing. Think about that. The location of the crash has been lost! That means that this trivial bug (can you see the mistake in my code?) is now challenging to find.

I initially wrote about this bug (along with another crash-related peculiarity in 64-bit Windows) in July 2012. I wrote about this bug a second time in March 2013, asking whether Microsoft should fix it.

This bug has been known for well over a year, but a fix is now available. There are several steps to the process of installing the hot fix, but I think the result makes it well worth it.

http://support.microsoft.com/kb/2864432/en-us

And there was much rejoicing, and some people just finding out the cause of their woes.

Should I install the fix?

Are you a C++ programmer, working on 64-bit Windows 7, debugging 32-bit programs, with a recent processor that has AVX support? If you answered yes to this question then hell-yeah you should install this patch. I installed it the morning it came out. I’ve had no problems with it and I’ve advised all of my coworkers to install it.

If you are running Windows 8, or Windows Vista, or 32-bit Windows, or you don’t have an AVX capable processor, then don’t bother.

Prior to the availability of this fix the best workaround was to disable AVX support in Windows. This was done by running “bcdedit /set xsavedisable 1” from an administrator command prompt. This workaround is still recommended if you can’t install the hot fix for some reason. If you previously applied this workaround and you want to remove it just run this command from an administrator command prompt:

bcdedit /deletevalue xsavedisable

You can see the state of the xsavedisable flag by running bcdedit with no parameters from an administrator command prompt. If xsavedisable is not listed or is set to zero then the workaround is not in place.

Why so long?

I think there are a few reasons why it took a while for this bug to get fixed. Developers who hit this bug had no idea what the problem was and therefore no idea to whom they should report it. Many developers actually got used to this broken behavior quite quickly and forgot that call stacks on crashes used to work. Unlike application crashes there is no instrumentation that automatically counts up incidences of this bug so Microsoft had no visibility into the severity of the problem. And, most people at Microsoft upgraded to Windows 8 where this bug is fixed, so they never saw it. Relying on measurements to decide what bugs to fix is smart, but you also have to consider what to do for bugs that your measurements don’t detect.

In this case the appropriate calculation would be to estimate the number of developers affected by the bug (I’m gonna ballpark that at ‘millions’) and then multiply by the severity of the bug (‘bloody annoying’) and that gives you an overall impact of ‘millions of bloody annoyed developers, developers, developers’. See? Math is easy. Then you just compare that to the risk and make a decision. But if you wait for complaints or other measurements then you may underestimate the seriousness of bugs like this.

A hot fix isn’t ideal because many developers who are affected by this bug won’t know to install it, but it’s a good start. Maybe it will get rolled into a service pack or a Windows Update at some point. If you think that’s a good idea then be sure to mention that to your Microsoft contacts.

The reddit discussion can be found here.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Code Reliability, Programming, Visual Studio and tagged , , , , , . Bookmark the permalink.

17 Responses to Developers Rejoice–Windows 7 Stack Corruption Fixed!

  1. gergap says:

    thx for that info, I already gave it up to expect a fix for this bug. All my coworkers and me were affected by this bug. Luckily we are writing portable code and could resolve most issues using Linux and GDB.

  2. Aaron Avery says:

    Do you know if this affects walking the stack with dbghelp.dll or generating minidumps? If this will fix our self-generated crash reports, we should start installing this hotfix on our systems that still run 32-bit applications.

    • brucedawson says:

      Our experience has been that this does not affect generating of crash dumps. It’s not that dbghelp.dll is immune to the bug in anyway — once the stack is corrupted information is lost and cannot be retrieved — it’s that crash dumps are usually generated before the stack corruption occurs.

      The order of events, as I understand it, is: a crash happens, the first chance exception handler is called, in many products this triggers code that saves a minidump, then some Wow64 debugging code runs and corrupts the stack, and then the debugger gains control. I’m not clear on the details but our observations suggest something like this.

      So, the hot-fix should only be needed on developer machines.

  3. Riley says:

    It almost feels like it’s too late. While we are still on Windows 7 the amount of 32bit debugging that we do now is extremely minimal.

    • brucedawson says:

      We have 64-bit versions of much of our code, but we still have enough customers running 32-bit Windows that 32-bit code is what we ship, so we mostly debug 32-bit code.

      But yeah, it is pretty late.

  4. Interesting, I got used to crashes due to null pointer references required careful stepping and printf-like statements to fix. This may be the end of that!

  5. Thank you so much for this! I’ve spent far too many cycles trying to figure out how _unlock() could call Lua’s garbage collector…

  6. Mārtiņš says:

    Excellent!

  7. Pingback: Should This Windows 7 Bug be Fixed? | Random ASCII

  8. Alen Ladavac says:

    I’m wondering… does this happen only if you attach a debugger after the app has already crashed? I’ve seen this quite a lot, but not always. I thought that I don’t see it when running directly from the debugger. But I might be wrong. I didn’t know what actually causes it, so I adapted. Guess that, with years, I learned that when there’s a “noncritical” bug like that in MS code there not much hope that it will be ever fixed, so I better just learn to live with it. 🙂

    • brucedawson says:

      It also happens if you start the application under the debugger. I think that if the first-chance exception handler runs then the stack is corrupted.

      Microsoft *will* fix bugs, but sometimes you have to insist. A few people worked hard to convince Microsoft to fix this one.

  9. yuhong says:

    They checked the fix into the GDR branch so that KB2859537 aka MS13-063 includes this fix already.

  10. Pingback: Developers Rejoice Again | Random ASCII

  11. Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.