Last year I reported on a bug in 64-bit Windows 7 SP1’s support for AVX-capable processors. This bug causes stack corruption when a 32-bit program crashes while being debugged in Visual Studio, even if AVX is not used.
Microsoft has a fix, but they will only ship it for Windows 7 if there is enough demand.
Update: the fix has been shipped. See Developers Rejoice–Windows 7 Stack Corruption Fixed! for details.
So this is your chance. Comment honestly on whether this bug affects you. Note that for this bug to be triggered it is sufficient to have an AVX capable processor – you don’t have to be doing AVX programming.
The reddit discussion thread is here if you prefer to comment there.
The bug is in the AVX support added to Windows 7 SP1. Saving the state of the AVX registers requires additional space, and apparently the WoW64 (32-bit Windows on 64-bit Windows) debug support fails to reserve enough space, so the stack gets corrupted. Oops.
In my sample test program I have a Crash() function which can be invoked by selecting “Crash normally” from the file menu. It seems reasonable, especially in a debug build, that crashing in this function should give a nice helpful call stack like this:
That used to be what would happen. But no longer. On 64-bit Windows SP1 on AVX processors when debugging 32-bit C++ code with any version of Visual Studio you will probably see something like this:
The most common signature of this bug is seeing ntdll.dll!_ZwRaiseException on the call stack, typically twice.
The first call stack makes the bug trivial to diagnose. The second call stack… it doesn’t even show the location of the crash, and it lists three functions that aren’t really on the crash call stack. At least it lists the parent function this time – but don’t count on that.
Clearly the corrupted stack can make crash analysis a lot trickier. Depending on the stack layout the corruption may hit multiple stack frames, including the local variables contained within them.
Luckily this bug does not seem to affect minidump files saved by exception handlers, so post-mortem debugging seems to be unaffected.
Take action now
The bug is well understood, and Microsoft really just wants to know whether it’s worth the cost and risk of fixing it. So let them know. Remember that this bug requires 64-bit Windows 7 SP1, an AVX capable processor, and 32-bit development. If you’re running a 32-bit OS (really?), or don’t have an AVX capable processor, or you’re doing 64-bit development then you are immune. You’re also immune if you are running Windows 8 (it’s fixed there), Windows Vista (no AVX support), Linux, or MacOS.
- If you have noticed this bug then say so in a comment below.
- If you have not noticed this bug then maybe download the test program and see if you can repro it. Share your experiences either way.
- If you think this is a complete waste of time, perhaps because you have already moved on to Windows 8, Linux, or MacOS, then let us know.
I prefer comments here, but commenting on reddit works also. Whatever is easiest.
While waiting for Microsoft to respond there are two workarounds available, each with its own downsides:
Change Visual Studio solution settings
The stack corruption happens in the first-chance exception handler. You can tell Visual Studio to halt in the debugger before running this, thus giving you a chance to see the crash details before they are corrupted. To do this go to the Visual Studio ‘Debug’ menu and select ‘Exceptions’. In the dialog that comes up check Win32 Exceptions.
One problem with this workaround is that this must be done for every Visual Studio solution. Also, this workaround doesn’t help if a process crashes and then the just-in-time debugger attaches. The stack will already be corrupted before you attach.
The other workaround is to disable AVX support. You can do that by running this command from an elevated command prompt and then rebooting:
bcdedit /set xsavedisable 1
The obvious disadvantage is that you no longer have AVX support – if you implement AVX detection properly then it will be detected as no longer available. I don’t like this solution, but given the number of different projects that I work on, and the importance of just-in-time debugging, I had no choice but to do this. If Microsoft ever fixes this bug then you can remove the workaround by running this command and then rebooting:
bcdedit /set xsavedisable 0
You can see your current bcdedit settings by running bcdedit with no parameters from an elevated command prompt. If xsavedisable is present in the output and has a non-zero value then the buggy code in Windows is disabled.
Windows Boot Loader
Documentation on the bcdedit options can be found here.
I recommend getting your IT department to push the bcdedit command to all developer machines, or to all machines. It’s the only way to solve the problem until Microsoft fixes it.
Why a blog vote?
I tried creating an issue at connect.microsoft.com but that site doesn’t seem to support Windows bugs. A suggestion I made for Visual Studio that would have mitigated this bug was marked private, thus shutting down voting. So I’m posting here. And I promise that Microsoft will at least take a look.
Credit where credit is due
The root cause of this bug was first reported last February, here.