Should This Windows 7 Bug be Fixed?

Last year I reported on a bug in 64-bit Windows 7 SP1’s support for AVX-capable processors. This bug causes stack corruption when a 32-bit program crashes while being debugged in Visual Studio, even if AVX is not used.

Microsoft has a fix, but they will only ship it for Windows 7 if there is enough demand.

Update: the fix has been shipped. See Developers Rejoice–Windows 7 Stack Corruption Fixed! for details.

So this is your chance. Comment honestly on whether this bug affects you. Note that for this bug to be triggered it is sufficient to have an AVX capable processor – you don’t have to be doing AVX programming.

The reddit discussion thread is here if you prefer to comment there.

Bug details

The bug is in the AVX support added to Windows 7 SP1. Saving the state of the AVX registers requires additional space, and apparently the WoW64 (32-bit Windows on 64-bit Windows) debug support fails to reserve enough space, so the stack gets corrupted. Oops.

In my sample test program I have a Crash() function which can be invoked by selecting “Crash normally” from the file menu. It seems reasonable, especially in a debug build, that crashing in this function should give a nice helpful call stack like this:

image

That used to be what would happen. But no longer. On 64-bit Windows SP1 on AVX processors when debugging 32-bit C++ code with any version of Visual Studio you will probably see something like this:

image

The most common signature of this bug is seeing ntdll.dll!_ZwRaiseException on the call stack, typically twice.

The first call stack makes the bug trivial to diagnose. The second call stack… it doesn’t even show the location of the crash, and it lists three functions that aren’t really on the crash call stack. At least it lists the parent function this time – but don’t count on that.

Clearly the corrupted stack can make crash analysis a lot trickier. Depending on the stack layout the corruption may hit multiple stack frames, including the local variables contained within them.

Luckily this bug does not seem to affect minidump files saved by exception handlers, so post-mortem debugging seems to be unaffected.

Take action now

The bug is well understood, and Microsoft really just wants to know whether it’s worth the cost and risk of fixing it. So let them know. Remember that this bug requires 64-bit Windows 7 SP1, an AVX capable processor, and 32-bit development. If you’re running a 32-bit OS (really?), or don’t have an AVX capable processor, or you’re doing 64-bit development then you are immune. You’re also immune if you are running Windows 8 (it’s fixed there), Windows Vista (no AVX support), Linux, or MacOS.

  1. If you have noticed this bug then say so in a comment below.
  2. If you have not noticed this bug then maybe download the test program and see if you can repro it. Share your experiences either way.
  3. If you think this is a complete waste of time, perhaps because you have already moved on to Windows 8, Linux, or MacOS, then let us know.

I prefer comments here, but commenting on reddit works also. Whatever is easiest.

Workarounds

While waiting for Microsoft to respond there are two workarounds available, each with its own downsides:

Change Visual Studio solution settings

The stack corruption happens in the first-chance exception handler. You can tell Visual Studio to halt in the debugger before running this, thus giving you a chance to see the crash details before they are corrupted. To do this go to the Visual Studio ‘Debug’ menu and select ‘Exceptions’. In the dialog that comes up check Win32 Exceptions.

image

One problem with this workaround is that this must be done for every Visual Studio solution. Also, this workaround doesn’t help if a process crashes and then the just-in-time debugger attaches. The stack will already be corrupted before you attach.

Disable AVX

The other workaround is to disable AVX support. You can do that by running this command from an elevated command prompt and then rebooting:

bcdedit /set xsavedisable 1

The obvious disadvantage is that you no longer have AVX support – if you implement AVX detection properly then it will be detected as no longer available. I don’t like this solution, but given the number of different projects that I work on, and the importance of just-in-time debugging, I had no choice but to do this. If Microsoft ever fixes this bug then you can remove the workaround by running this command and then rebooting:

bcdedit /set xsavedisable 0

You can see your current bcdedit settings by running bcdedit with no parameters from an elevated command prompt. If xsavedisable is present in the output and has a non-zero value then the buggy code in Windows is disabled.

C:\>bcdedit

Windows Boot Loader

xsavedisable            1

Documentation on the bcdedit options can be found here.

I recommend getting your IT department to push the bcdedit command to all developer machines, or to all machines. It’s the only way to solve the problem until Microsoft fixes it.

Why a blog vote?

I tried creating an issue at connect.microsoft.com but that site doesn’t seem to support Windows bugs. A suggestion I made for Visual Studio that would have mitigated this bug was marked private, thus shutting down voting. So I’m posting here. And I promise that Microsoft will at least take a look.

Credit where credit is due

The root cause of this bug was first reported last February, here.

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code Reliability, Programming, Visual Studio and tagged , , , , , . Bookmark the permalink.

84 Responses to Should This Windows 7 Bug be Fixed?

  1. billco says:

    Reproducible on my i5-3570K. I can understand Microsoft’s reluctance in patching everyone’s WoW64, so maybe skip Windows Update, but they should release this as a hotfix at the very least, so those of us with the need and knowledge can apply the fix.

    • brucedawson says:

      I’ll actually be quite disappointed if they go this route. My educated guesses suggest that this bug affects hundreds of thousands of developers — maybe millions. However my blog stats suggest only thousands (low tens of thousands at most) realize the cause. A hotfix would, at most, reach thousands, leaving most of those affected no better off.

      I guess a hotfix would let our IT department deploy the fix, but it would still leave me feeling sad.

  2. Jon says:

    I’ve tripped over this at work, but had no idea at the time what it was. Just faked up that same crash with your workaround… yep. I’d like to see this fixed.

  3. Kevin Gadd says:

    This definitely needs to be fixed. Wow64 debugging is crazy enough without this complicating things.

  4. Mārtiņš says:

    I also want to see this fixed. How many times I’ve attached just-in-time debugger on crash only to find out that there’s nothing much to see… Annoying. Disabling AVX on boot is not an option.

  5. Promit says:

    Has anyone checked if Windows 8 is affected?

  6. Pravetz says:

    I’ve tested this bug on my laptop which is a core I7 running Win7 64bit. It does indeed crash with an unreadable stack just as Bruce describes.

    I would very much like to see 32-bit software development continue to be supported. There are many instances where the traditional programming models are useful; it would be a shame to be forced to cut back on 32-bit support.

  7. fgiesen says:

    Definitely needs to get fixed. I’ve been bit by this many times.

  8. I’d very much like to see this fixed, obviously. As long as there are people running 32-bit Windows, it makes sense to do 32-bit development, but please don’t force me to use a 32-bit OS for that.

  9. ismail says:

    I hit this bug again and again and was not sure why it was crashing until I saw your blog post. Since most developers have fast machines with AVX enabled, this will hit them.

    Microsoft should release a fix for this.

  10. Sean Barrett says:

    This definitely needs fixing. It has interfered with my ability to debug crashes on my machine and on my customers’ machines.

  11. I’ve experienced this bug, along with colleagues. As I use SEH in development as well as AVX the workarounds are not that useful. Microsoft need to fix this.

    I’ve not tried a crash dump handler like Breakpad yet on Win 7 to see if this issue prevents us from getting proper call stack information from customers – if it does this is an even more serious issue.

  12. Roy Eltham says:

    Yes, this needs fixing. I’ve run into it many times.

  13. hcpizzi says:

    I spent weeks after getting a new computer at work trying to chase crashes with useless stacks. It took debugging back to the stone age, and I didn’t understand why.

    When I finally had enough and decided to dig into the problem, I noticed the wrong stack pointer after returning from the exception handling code. With that in hand, a bit of googling took me to your previous post on this matter and the workaround to disable AVX, and life was good again.

    But obviously that is just a workaround, and it needs to be fixed. Please, Microsoft, remember: it’s all about developers, developers, developers.

  14. Been bit by this too many times (and sometimes you can’t get another repro of the crash for a while) and it’s a definite must fix IMO (I’ve seen this on other platform than PC and it’s really cumbersome)

  15. Please fix this. We still ship 32-bit executables, and unfortunately they still crash. Disabling AVX is far from ideal.

  16. Chris says:

    So that’s why.. Would like to see this fixed.

  17. Ben says:

    Yep, this one is really annoying. It’s ridiculous that they would even consider not fixing this. It smacks of forced obsolescence, trying to force people into Win8.

  18. I would love to see this fixed as well. Disabling AVX is rather an annoying workaround, especially when we need to test that path.

  19. As a middleware developer, I need to support a wide variety of systems, and that definitely includes shipping 32-bit executables on Windows 7 SP1.
    That bug needs to get fixed. It’s hindering development.

    Disabling AVX is not an option if we also want to have optimized AVX code paths!

  20. Sid says:

    I don’t do Windows dev any more, but when I did I was plagued with mysterious and useless crash stacks because of this bug. I think this is absolutely worth fixing.

  21. Phileosophos says:

    Yup. Needs fixin’.

  22. Jose Carlos says:

    I can confirm this same thing happens on my i7, Never knew why until now (always thought I was doing something wrong somewhere!)

  23. Chris says:

    I did experience the crash too and had troubles for days before finding the workaround you described. And I blamed myself thinking “what morron I was to deactivate that”. I never suspected that it was a bug, nor due to a windows update. With time I stumbled on the problem again and I have to re-enable exception Break like you describe every once in a while. Would be nice to have a fix.

  24. cube says:

    So, uh, what version of visual studio is this with? 2008? 2010? 2012? All of the above?

    • brucedawson says:

      All of the above. It is an OS bug, not a debugger bug. The bug can actually be triggered with windbg (or any other debugger) as well. The first-chance exception handling in any debugger will corrupt the stack.

  25. Siles says:

    I’d like to see this fixed.

  26. entheh says:

    This affects us at work – please fix it!

  27. Dave Moore says:

    This one affects me too, I’d like to see it fixed. I ship 32-bit code to people using older versions of Max/Maya.

  28. Rene Zwanenburg says:

    So that’s what’s going on… This has been driving me completely crazy, a patch would be greatly appreciated.

  29. xilefian says:

    Wow, I have actually had this error before, I always found the cause however it would have been helpful if I was given the line! I always blamed my code when the debugger failed to pick up the location of errors.

  30. trnsfrmcobalt says:

    Yep, I’ve seen that corruption before. I didn’t realize it was a Windows bug and thought our code was corrupting the stack somehow. Would be a great help to see this fixed.

  31. Cort says:

    Definitely fix. I like billco’s idea of a hotfix quietly deployed to the developers who need it, rather than a full-on mandatory Windows Update patch.

  32. So that’s what happened!
    I develop in 32 bit because it’s the only way to get edit and continue to work and the startup times of the applications I’m working on are frustratingly long, so indeed it’d be nice to have a fix for this.

  33. Gab says:

    It would be great if this bug got fixed.

  34. Plagued me for a long time until I happened to read a Gamasutra article that just happened to mention it by Bruce Dawson.

  35. I only recently stumbled on this one, since most of my work is prototyped in 64 bit mode and then built in 32 bit when it’s pretty much bug free. A current personal project was only 32 bit only code and then I spent a few hours wondering why Visual Studio was dying. Yes, this must be fixed. I’m rather shocked this wasn’t considered a class #1 bug due to its nature. :(

  36. Alan A says:

    As someone who’s doing mixed 32, and 64-bit Win32 development on Win7, I would love to see this bug fixed.

  37. Damn, that explains why the stack is thrashed all the time! Thank you. Yes, please fix this, I’m sure not going to Windows 8 any time soon.

  38. Marcin Krystianc says:

    No doubts,
    correction is needed definitely.
    This issue makes debugging much longer. If you don’t know which process will crash, you don’t know to which process attach in advance and you need to relay only on post-mortem debugging. But on W7 you can’t :/

  39. Goran Zauhar says:

    I would also like this bug to be fixes. 32bit programming is far from dead.

  40. Jetro Lauha says:

    My main development machine is a Win7 64-bit SP1 one. But it seems the Core i7 cpu is old enough to not have AVX support, so I couldn’t reproduce this with the test program. My normal development is still 32-bit so if I would have a newer CPU I would be certainly annoyed about this, and would hope for a fix. (However, next time I upgrade hardware, it’s most likely I’ll also switch to newer version of Windows, so I’ll be probably skipping over this issue.)

  41. Hey Bruce.

    This bug is ridiculously annoying _and_ has caused me and/or people I work with many wasted hours over the last year or two.

    Please fix it Microsoft!

    Alex

  42. Vectortrex says:

    I hit this all the time and it drove me mad until I figured out it was AVX related and implemented a workaround. For the hours of productivity I’ve wasted on this, It amazes me that MS hasn’t shipped a fix for it.

  43. CodeRanger says:

    Been doing my head in for ages, just thought it was my fault. Please fix!

  44. For sake of consistency and cleanliness alone this should already be fixed. Let alone running into it while doing professional programming work.

  45. Florian George says:

    At the company I work at, 8 out of 11 development machines are running Windows 7 SP1 x64 on an AVX supporting i7, developing 32bit Software.

    Now that we know about the issue, we can disable AVX at work. At home, I am not willing to disable AVX as I use my computer for a lot of other things than just coding, most of them benefitting from AVX.

    The worst thing about this issue is the huge amount of people not knowing about it. Hard to imagine how much productivity might be crushed by this bug just at this very moment.

    Please fix it.

  46. NB says:

    If my code ever crashed I would most certainly want this to be fixed too.

  47. Rob Allen says:

    I’ve just run into it at work! It’d be great if Microsoft really fixed this.

  48. Zbyl says:

    Yes, this needs fixing. I’ve run into it just today.

  49. asherkin says:

    This has burnt me hundreds of times while debugging Source Engine crashes, definitely want to see it fixed.

  50. Ahmed Charles says:

    i think this should be fixed.

  51. Rich Skorski says:

    We don’t hit this often during development, and I think it’s because we have our own unhandled exception filter. I’m surprised it’s working though, because I would expect the same broken pieces are being used. In particular, we use LPEXCEPTION_POINTERS::ContextRecord and I would expect that to fall victim to this bug. There’s something that’s going on there that hides this bug most of the time.

    After our filter has gathered the info and created a crash dump, the application will usually keep on throwing STATUS_WAIT_0 exceptions. Maybe that’s because of this stack corruption? That is super annoying because the app needs to be force killed through task manager. This is a pain for some of the devs, because they won’t realize they need to do that. They wind up frustrated and confused with several ghost processes running on their machine taking up time and stopping them from rebuilding the exe.

    I have run into the bug many times when the solution wasn’t set to trap on the exception.

    I would be a happier dev if this bug were fixed!

  52. Patrick Reddeck says:

    Informative post Bruce thanks for the warning.

    Microsoft, I or my fellows here could be writing the NEXT big app, so tell me what is the risk assessment on losing that to a non-windows platform vs. patching this bug? With a platform that isn’t the be-all it used to, the one area Microsoft has a generally stronger position is developer tools… or does it?

    I’ll take that fix with a side of hash-browns please.

  53. Jason Stern says:

    I’ve encountered this several times at work, but never actually knew the underlying cause. A fix would obviously be helpful.

  54. Kornél Lehőcz says:

    - Yes.

  55. djmips says:

    This bug cost a lot of my time developing a Windows game. Please fix this for all those poor souls out there who don’t realize (like I didn’t for a very long time) that this isn’t a problem in their codebase but a problem in the OS.

  56. BPT says:

    bloody hell Microsoft fix this!

  57. Maciej Pawlowski says:

    Fix this, definitely!

  58. Craig Mesdag says:

    Looks like a hotfix has been released to resovle this – http://support.microsoft.com/kb/976038

  59. Mārtiņš says:

    It seems this hotfix is for non-SP1 Windows 7. Googling number 976038 shows that this fix is from year 2011, so it definitely doesn’t resolve this.
    Anyway – I tried, but I got “The update is not applicable to your computer.” for my Windows 7 SP1 x64.

  60. Tim says:

    Please fix this – it has cost me many hours of debugging time

  61. Danny says:

    I hit this all the time, in multiple projects. it must have lost me so many hours (that have been repaid in grey hairs), and I’m sure I’m not alone. Great to have a workaround – a proper fix would help everyone who hits this but isn’t fortunate enough to know about the workarounds.

  62. It never even occurred to me that there might be a bug, I just thought Visual Studio was mostly useless when it came to debugging crashes. Talking to people around the office, that’s what they thought too. Please fix!

  63. salertom says:

    jeez..and i though VS just sucked.
    Any news on this?

  64. lum zhaveli says:

    I just tried this on my machine with x9000 running on windows 7 professional 64 bit and vs 2012.

    I believe Microsoft has enough cash and its a small risk to fix something like this. I cant say that im as experienced as some programmers that did comment but i believe we as Developers gave something to Microsoft and they should respond in a better way.
    Since they have solution it makes no sense ( at the end we pay for a good support and a good product with features and being bug-free).

    From Developers to Developers believe.

  65. sixxgate says:

    They released a hotfix for Visual Studio 2010 back in 2010 for this issue.
    http://archive.msdn.microsoft.com/KB2116602

    • brucedawson says:

      Unfortunately, no. That hot-fix adds AVX support to VS 2010, which is important if you are trying to debug AVX code. The issue discussed in this post is a Windows 7 SP1 bug that corrupts the stack when your code crashes if you have an AVX processor, regardless of whether you are using AVX. There is no way that a patch in VS 2010 can undo the stack corruption which the OS causes. Only an OS fix can correct this bug.

  66. Pingback: Developers Rejoice–Windows 7 Stack Corruption Fixed! | Random ASCII

  67. Pingback: Developers Rejoice Again | Random ASCII

  68. Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s