VC++ /arch:AVX option – unsafe at any speed

Microsoft’s VC++ compiler has an option to generate instructions for new instruction sets such as AVX and AVX2, which can lead to more efficient code when running on compatible CPUs. So, an obvious tactic is to compile critical math-heavy functions twice, once with and once without /arch:AVX (or whatever instruction set you want to optionally support).

It seems like a good idea, and it’s been used in various forms for years, but it’s devilishly difficult to do safely. It usually works, but guaranteeing that is trickier than I had realized.

Continue reading

Posted in Programming, Visual Studio | Tagged , , , , | 26 Comments

CPU Performance Counters on Windows

TL;DR – I can finally record CPU performance counters for processes on Windows.

I’m mostly a Windows developer but I’ll occasionally fire up my Linux box to use the perf tool to examine CPU performance counters. Sometimes you really need to see how many cache misses or branch mispredicts your code is causing, and Windows has been curiously hostile to this endeavor.

Some time ago Windows gained the ability to record CPU performance counters from within ETW events, but (so the story goes) there was no way to enable it. Then the ability to enable this feature was added, but there was virtually no documentation.

Continue reading

Posted in xperf | Tagged , , | 2 Comments

WPA Symbol Loading is Much Faster, but Broken for Chrome

There’s good news, and there’s bad news.

The good news is that the latest Windows Performance Analyzer (WPA), the visualization tool for ETW (Event Tracing for Windows) traces, can now load symbols faster than ever before – it’s multi-threaded, and it scans huge PDBs about eight times faster.

The bad news is that it utterly fails to download Chrome’s symbols.

Oops. Luckily I was able to diagnose and work around the problem.

Continue reading

Posted in uiforetw | Tagged , , , | 4 Comments

Vestibular Dysfunction, or, How I Went Half Deaf

This part doesn't work anymoreTwo weeks ago I had a severe inner-ear episode, presumed to be an infection. One moment I was 100% healthy and then, ten minutes later, I was deaf in one ear, with severe vertigo. The word ‘vertigo’ doesn’t quite capture the horrific nausea and vomiting that ensued as my lunch guest drove me to immediate care, nor does it capture the six uncomfortable days that followed.

I’m doing much better now and I thought I’d share the recovery timeline:

Continue reading

Posted in Fun, Unicycling | Tagged , , | 15 Comments

Everything Old is New Again, and a Compiler Bug

“What’s an EXCEPTION_FLT_STACK_CHECK exception?” one of my coworkers said. I said “It’s a weird and rare crash. Why do you ask?”

It turns out that one of these weird and rare crashes had started showing up in Chrome (M54 branch, not the stable branch that consumers are running). We began looking at it together until I decided it made more sense to assign the bug to myself. Partly I did this because the crash analysis required some specialized knowledge that I happened to have, but mostly because I was curious and I thought that this bug was going to be interesting. I was not disappointed.

  • The crash was in a FPU that Chrome barely uses
  • The instruction that crashed Chrome was thousands of instructions away from the one that triggered the exception
  • The instruction that triggered the exception was not at fault
  • The crash only happened because of third-party code running inside of Chrome
  • The crash was ultimately found to be caused by a code-gen bug in Visual Studio 2015

Continue reading

Posted in Debugging, Floating Point | Tagged | 27 Comments

ETW Flame Graphs Made Easy

A bit over three years ago I wrote about how to use flame graphs to visualize CPU Usage (Sampled) data from ETW, and a year ago I added flame graph support to UIforETW. However these techniques are clumsy and slow and what I really wanted – what I asked for – was flame graph support in Windows Performance Analyzer (WPA), Microsoft’s ETW trace viewer.

And with the 10.0.14393 (Windows 10 Anniversary Edition) version of Windows Performance Toolkit (WPT) I finally got my wish! WPA can now natively display data as flame graphs, and it is good.

Disclaimer: WPT 10.0.14393 requires Windows 8 or above. If you install it on Windows 7 or below it will crash. UIforETW v1.42 will install the latest WPT on Windows8 or above, but will install the previous version on Windows 7. If you want flame graphs on ‘ancient’ operating systems you’ll need to stick to the old-fashioned methods.

Continue reading

Posted in Performance, uiforetw, xperf | Tagged , , | 21 Comments

Zeroing Memory is Hard (VC++ 2015 arrays)

Quick, what’s the difference between these two C/C++ definitions of initialized local variables?

char buffer[32] = { 0 };
char buffer[32] = {};

One difference is that the first is legal in C and C++, whereas the second is only legal in C++.

Okay, so let’s focus our attention on C++. What do these two definitions mean?

Continue reading

Posted in Performance, Programming, Visual Studio | Tagged , , | 22 Comments