Last year I reported on a bug in 64-bit Windows 7 SP1’s support for AVX-capable processors. This bug causes stack corruption when a 32-bit program crashes while being debugged in Visual Studio, even if AVX is not used.
Microsoft has a fix, but they will only ship it for Windows 7 if there is enough demand.
Update: the fix has been shipped. See Developers Rejoice–Windows 7 Stack Corruption Fixed! for details.
So this is your chance. Comment honestly on whether this bug affects you. Note that for this bug to be triggered it is sufficient to have an AVX capable processor – you don’t have to be doing AVX programming.
The reddit discussion thread is here if you prefer to comment there.
Bug details
The bug is in the AVX support added to Windows 7 SP1. Saving the state of the AVX registers requires additional space, and apparently the WoW64 (32-bit Windows on 64-bit Windows) debug support fails to reserve enough space, so the stack gets corrupted. Oops.
In my sample test program I have a Crash() function which can be invoked by selecting “Crash normally” from the file menu. It seems reasonable, especially in a debug build, that crashing in this function should give a nice helpful call stack like this:
That used to be what would happen. But no longer. On 64-bit Windows SP1 on AVX processors when debugging 32-bit C++ code with any version of Visual Studio you will probably see something like this:
The most common signature of this bug is seeing ntdll.dll!_ZwRaiseException on the call stack, typically twice.
The first call stack makes the bug trivial to diagnose. The second call stack… it doesn’t even show the location of the crash, and it lists three functions that aren’t really on the crash call stack. At least it lists the parent function this time – but don’t count on that.
Clearly the corrupted stack can make crash analysis a lot trickier. Depending on the stack layout the corruption may hit multiple stack frames, including the local variables contained within them.
Luckily this bug does not seem to affect minidump files saved by exception handlers, so post-mortem debugging seems to be unaffected.
Take action now
The bug is well understood, and Microsoft really just wants to know whether it’s worth the cost and risk of fixing it. So let them know. Remember that this bug requires 64-bit Windows 7 SP1, an AVX capable processor, and 32-bit development. If you’re running a 32-bit OS (really?), or don’t have an AVX capable processor, or you’re doing 64-bit development then you are immune. You’re also immune if you are running Windows 8 (it’s fixed there), Windows Vista (no AVX support), Linux, or MacOS.
- If you have noticed this bug then say so in a comment below.
- If you have not noticed this bug then maybe download the test program and see if you can repro it. Share your experiences either way.
- If you think this is a complete waste of time, perhaps because you have already moved on to Windows 8, Linux, or MacOS, then let us know.
I prefer comments here, but commenting on reddit works also. Whatever is easiest.
Workarounds
While waiting for Microsoft to respond there are two workarounds available, each with its own downsides:
Change Visual Studio solution settings
The stack corruption happens in the first-chance exception handler. You can tell Visual Studio to halt in the debugger before running this, thus giving you a chance to see the crash details before they are corrupted. To do this go to the Visual Studio ‘Debug’ menu and select ‘Exceptions’. In the dialog that comes up check Win32 Exceptions.
One problem with this workaround is that this must be done for every Visual Studio solution. Also, this workaround doesn’t help if a process crashes and then the just-in-time debugger attaches. The stack will already be corrupted before you attach.
Disable AVX
The other workaround is to disable AVX support. You can do that by running this command from an elevated command prompt and then rebooting:
bcdedit /set xsavedisable 1
The obvious disadvantage is that you no longer have AVX support – if you implement AVX detection properly then it will be detected as no longer available. I don’t like this solution, but given the number of different projects that I work on, and the importance of just-in-time debugging, I had no choice but to do this. If Microsoft ever fixes this bug then you can remove the workaround by running this command and then rebooting:
bcdedit /set xsavedisable 0
You can see your current bcdedit settings by running bcdedit with no parameters from an elevated command prompt. If xsavedisable is present in the output and has a non-zero value then the buggy code in Windows is disabled.
C:\>bcdedit
…
Windows Boot Loader
…
xsavedisable 1
Documentation on the bcdedit options can be found here.
I recommend getting your IT department to push the bcdedit command to all developer machines, or to all machines. It’s the only way to solve the problem until Microsoft fixes it.
Why a blog vote?
I tried creating an issue at connect.microsoft.com but that site doesn’t seem to support Windows bugs. A suggestion I made for Visual Studio that would have mitigated this bug was marked private, thus shutting down voting. So I’m posting here. And I promise that Microsoft will at least take a look.
Credit where credit is due
The root cause of this bug was first reported last February, here.
Reproducible on my i5-3570K. I can understand Microsoft’s reluctance in patching everyone’s WoW64, so maybe skip Windows Update, but they should release this as a hotfix at the very least, so those of us with the need and knowledge can apply the fix.
I’ll actually be quite disappointed if they go this route. My educated guesses suggest that this bug affects hundreds of thousands of developers — maybe millions. However my blog stats suggest only thousands (low tens of thousands at most) realize the cause. A hotfix would, at most, reach thousands, leaving most of those affected no better off.
I guess a hotfix would let our IT department deploy the fix, but it would still leave me feeling sad.
I’ve tripped over this at work, but had no idea at the time what it was. Just faked up that same crash with your workaround… yep. I’d like to see this fixed.
This definitely needs to be fixed. Wow64 debugging is crazy enough without this complicating things.
I also want to see this fixed. How many times I’ve attached just-in-time debugger on crash only to find out that there’s nothing much to see… Annoying. Disabling AVX on boot is not an option.
Has anyone checked if Windows 8 is affected?
Nevermind, just saw that Win8 is fine.
I’ve tested this bug on my laptop which is a core I7 running Win7 64bit. It does indeed crash with an unreadable stack just as Bruce describes.
I would very much like to see 32-bit software development continue to be supported. There are many instances where the traditional programming models are useful; it would be a shame to be forced to cut back on 32-bit support.
Definitely needs to get fixed. I’ve been bit by this many times.
I’d very much like to see this fixed, obviously. As long as there are people running 32-bit Windows, it makes sense to do 32-bit development, but please don’t force me to use a 32-bit OS for that.
I hit this bug again and again and was not sure why it was crashing until I saw your blog post. Since most developers have fast machines with AVX enabled, this will hit them.
Microsoft should release a fix for this.
This definitely needs fixing. It has interfered with my ability to debug crashes on my machine and on my customers’ machines.
I’ve experienced this bug, along with colleagues. As I use SEH in development as well as AVX the workarounds are not that useful. Microsoft need to fix this.
I’ve not tried a crash dump handler like Breakpad yet on Win 7 to see if this issue prevents us from getting proper call stack information from customers – if it does this is an even more serious issue.
Yes, this needs fixing. I’ve run into it many times.
I spent weeks after getting a new computer at work trying to chase crashes with useless stacks. It took debugging back to the stone age, and I didn’t understand why.
When I finally had enough and decided to dig into the problem, I noticed the wrong stack pointer after returning from the exception handling code. With that in hand, a bit of googling took me to your previous post on this matter and the workaround to disable AVX, and life was good again.
But obviously that is just a workaround, and it needs to be fixed. Please, Microsoft, remember: it’s all about developers, developers, developers.
Thanks for all the comments. This one in particular shows the challenges in dealing with the mysterious behavior. I’m glad the original post was helpful.
Been bit by this too many times (and sometimes you can’t get another repro of the crash for a while) and it’s a definite must fix IMO (I’ve seen this on other platform than PC and it’s really cumbersome)
Please fix this. We still ship 32-bit executables, and unfortunately they still crash. Disabling AVX is far from ideal.
So that’s why.. Would like to see this fixed.
Yep, this one is really annoying. It’s ridiculous that they would even consider not fixing this. It smacks of forced obsolescence, trying to force people into Win8.
I would love to see this fixed as well. Disabling AVX is rather an annoying workaround, especially when we need to test that path.
As a middleware developer, I need to support a wide variety of systems, and that definitely includes shipping 32-bit executables on Windows 7 SP1.
That bug needs to get fixed. It’s hindering development.
Disabling AVX is not an option if we also want to have optimized AVX code paths!
I don’t do Windows dev any more, but when I did I was plagued with mysterious and useless crash stacks because of this bug. I think this is absolutely worth fixing.
Fix this! We do 32b dev on Win7.
Yup. Needs fixin’.
I can confirm this same thing happens on my i7, Never knew why until now (always thought I was doing something wrong somewhere!)
I did experience the crash too and had troubles for days before finding the workaround you described. And I blamed myself thinking “what morron I was to deactivate that”. I never suspected that it was a bug, nor due to a windows update. With time I stumbled on the problem again and I have to re-enable exception Break like you describe every once in a while. Would be nice to have a fix.
So, uh, what version of visual studio is this with? 2008? 2010? 2012? All of the above?
All of the above. It is an OS bug, not a debugger bug. The bug can actually be triggered with windbg (or any other debugger) as well. The first-chance exception handling in any debugger will corrupt the stack.
I’d like to see this fixed.
This affects us at work – please fix it!
This one affects me too, I’d like to see it fixed. I ship 32-bit code to people using older versions of Max/Maya.
So that’s what’s going on… This has been driving me completely crazy, a patch would be greatly appreciated.
Wow, I have actually had this error before, I always found the cause however it would have been helpful if I was given the line! I always blamed my code when the debugger failed to pick up the location of errors.
Yep, I’ve seen that corruption before. I didn’t realize it was a Windows bug and thought our code was corrupting the stack somehow. Would be a great help to see this fixed.
Definitely fix. I like billco’s idea of a hotfix quietly deployed to the developers who need it, rather than a full-on mandatory Windows Update patch.
So that’s what happened!
I develop in 32 bit because it’s the only way to get edit and continue to work and the startup times of the applications I’m working on are frustratingly long, so indeed it’d be nice to have a fix for this.
It would be great if this bug got fixed.
Plagued me for a long time until I happened to read a Gamasutra article that just happened to mention it by Bruce Dawson.
I only recently stumbled on this one, since most of my work is prototyped in 64 bit mode and then built in 32 bit when it’s pretty much bug free. A current personal project was only 32 bit only code and then I spent a few hours wondering why Visual Studio was dying. Yes, this must be fixed. I’m rather shocked this wasn’t considered a class #1 bug due to its nature. 😦
As someone who’s doing mixed 32, and 64-bit Win32 development on Win7, I would love to see this bug fixed.
Damn, that explains why the stack is thrashed all the time! Thank you. Yes, please fix this, I’m sure not going to Windows 8 any time soon.
No doubts,
correction is needed definitely.
This issue makes debugging much longer. If you don’t know which process will crash, you don’t know to which process attach in advance and you need to relay only on post-mortem debugging. But on W7 you can’t
I would also like this bug to be fixes. 32bit programming is far from dead.
My main development machine is a Win7 64-bit SP1 one. But it seems the Core i7 cpu is old enough to not have AVX support, so I couldn’t reproduce this with the test program. My normal development is still 32-bit so if I would have a newer CPU I would be certainly annoyed about this, and would hope for a fix. (However, next time I upgrade hardware, it’s most likely I’ll also switch to newer version of Windows, so I’ll be probably skipping over this issue.)
Hey Bruce.
This bug is ridiculously annoying _and_ has caused me and/or people I work with many wasted hours over the last year or two.
Please fix it Microsoft!
Alex
I hit this all the time and it drove me mad until I figured out it was AVX related and implemented a workaround. For the hours of productivity I’ve wasted on this, It amazes me that MS hasn’t shipped a fix for it.
Been doing my head in for ages, just thought it was my fault. Please fix!
For sake of consistency and cleanliness alone this should already be fixed. Let alone running into it while doing professional programming work.
At the company I work at, 8 out of 11 development machines are running Windows 7 SP1 x64 on an AVX supporting i7, developing 32bit Software.
Now that we know about the issue, we can disable AVX at work. At home, I am not willing to disable AVX as I use my computer for a lot of other things than just coding, most of them benefitting from AVX.
The worst thing about this issue is the huge amount of people not knowing about it. Hard to imagine how much productivity might be crushed by this bug just at this very moment.
Please fix it.
If my code ever crashed I would most certainly want this to be fixed too.
I’ve just run into it at work! It’d be great if Microsoft really fixed this.
Yes, this needs fixing. I’ve run into it just today.
This has burnt me hundreds of times while debugging Source Engine crashes, definitely want to see it fixed.
i think this should be fixed.
We don’t hit this often during development, and I think it’s because we have our own unhandled exception filter. I’m surprised it’s working though, because I would expect the same broken pieces are being used. In particular, we use LPEXCEPTION_POINTERS::ContextRecord and I would expect that to fall victim to this bug. There’s something that’s going on there that hides this bug most of the time.
After our filter has gathered the info and created a crash dump, the application will usually keep on throwing STATUS_WAIT_0 exceptions. Maybe that’s because of this stack corruption? That is super annoying because the app needs to be force killed through task manager. This is a pain for some of the devs, because they won’t realize they need to do that. They wind up frustrated and confused with several ghost processes running on their machine taking up time and stopping them from rebuilding the exe.
I have run into the bug many times when the solution wasn’t set to trap on the exception.
I would be a happier dev if this bug were fixed!
Informative post Bruce thanks for the warning.
Microsoft, I or my fellows here could be writing the NEXT big app, so tell me what is the risk assessment on losing that to a non-windows platform vs. patching this bug? With a platform that isn’t the be-all it used to, the one area Microsoft has a generally stronger position is developer tools… or does it?
I’ll take that fix with a side of hash-browns please.
I’ve encountered this several times at work, but never actually knew the underlying cause. A fix would obviously be helpful.
– Yes.
Yes
This bug cost a lot of my time developing a Windows game. Please fix this for all those poor souls out there who don’t realize (like I didn’t for a very long time) that this isn’t a problem in their codebase but a problem in the OS.
bloody hell Microsoft fix this!
Fix this, definitely!
Looks like a hotfix has been released to resovle this – http://support.microsoft.com/kb/976038
That hot fix is for a different issue. The bug that I am hoping that they will fix is stack corruption when a crash occurs (on 64-bit Windows 7 SP1 on an AVX capable system while debugging 32-bit code).
I originally blogged about two issues in one article, which may have caused some confusion. The original article is here:
https://randomascii.wordpress.com/2012/07/05/when-even-crashing-doesnt-work/
The issue that the hot fix addresses is the ignoring of crashes in callbacks. I’m not sure what it actually does to change this behavior.
Did you read the install instructions of that KB? It requires enabling, as I’ve pointed out below: https://randomascii.wordpress.com/2013/03/11/should-this-windows-7-bug-be-fixed/#comment-8153
Enabling or not, that hot-fix is for a separate issue.
I am aware it’s not for the stack corruption issue, I just wanted to point out that for the issue it DOES fix, it needs to also be enable via the registry setting, not just installed (Which isn’t possible if you already have an OS version that it was rolled into, so many people assume the KB article isn’t applicable at all.). And since you said you weren’t sure what it did to change the behavior that hotfix addresses – the registry key name itself gives us at least a hint on what it does to do so. 🙂 (As does this stackoverflow entry: http://stackoverflow.com/questions/11376795/why-cant-64-bit-windows-unwind-user-kernel-user-exceptions )
Ah — that explains the confusion. This post was written to talk entirely about the stack corruption issue. You should post your comments here instead:
https://randomascii.wordpress.com/2012/07/05/when-even-crashing-doesnt-work/
That article covers both the stack corruption and the unwinding of exceptions across the kernel boundary.
It seems this hotfix is for non-SP1 Windows 7. Googling number 976038 shows that this fix is from year 2011, so it definitely doesn’t resolve this.
Anyway – I tried, but I got “The update is not applicable to your computer.” for my Windows 7 SP1 x64.
The hotfix does not just require installation but also enabling via a registry setting – I assume this hotfix was rolled into SP1, but you still need to enable it via the registry setting specified in the KB!
And I forgot one thing: For this to work as you’d expect it to, you probably also want to have the DisablePagingExecutive registry setting enabled. (Which you should already have enabled anyway as a developer.)
That hot fix is for a separate issue. The stack corruption bug which this post discusses was introduced in Windows 7 SP1 so it could hardly have been fixed in a hot-fix that was released before then. However a hot-fix for the stack corruption has finally been released — see this post:
https://randomascii.wordpress.com/2013/08/19/developers-rejoicewindows-7-stack-corruption-fixed/
Please fix this – it has cost me many hours of debugging time
I hit this all the time, in multiple projects. it must have lost me so many hours (that have been repaid in grey hairs), and I’m sure I’m not alone. Great to have a workaround – a proper fix would help everyone who hits this but isn’t fortunate enough to know about the workarounds.
It never even occurred to me that there might be a bug, I just thought Visual Studio was mostly useless when it came to debugging crashes. Talking to people around the office, that’s what they thought too. Please fix!
jeez..and i though VS just sucked.
Any news on this?
No news. Microsoft is aware of this post and has seen the comments. We will see whether that sways their decision making at all.
I just tried this on my machine with x9000 running on windows 7 professional 64 bit and vs 2012.
I believe Microsoft has enough cash and its a small risk to fix something like this. I cant say that im as experienced as some programmers that did comment but i believe we as Developers gave something to Microsoft and they should respond in a better way.
Since they have solution it makes no sense ( at the end we pay for a good support and a good product with features and being bug-free).
From Developers to Developers believe.
They released a hotfix for Visual Studio 2010 back in 2010 for this issue.
http://archive.msdn.microsoft.com/KB2116602
Unfortunately, no. That hot-fix adds AVX support to VS 2010, which is important if you are trying to debug AVX code. The issue discussed in this post is a Windows 7 SP1 bug that corrupts the stack when your code crashes if you have an AVX processor, regardless of whether you are using AVX. There is no way that a patch in VS 2010 can undo the stack corruption which the OS causes. Only an OS fix can correct this bug.
Pingback: Developers Rejoice–Windows 7 Stack Corruption Fixed! | Random ASCII
Pingback: Developers Rejoice Again | Random ASCII
Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII