I just completed a series of changes that shrunk the Chrome browser’s on-disk size on Windows by over a megabyte, moved about 500 KB of data from its read/write data segments to its read-only data segments, and reduced its private working set by about 200 KB per-process. The amusing thing about this series of changes is that they consisted entirely of removing const from some places, and adding const to others. Compilers be weird.
This was originally written in the summer of 2005. I’m reposting it here to get all my writing in one place. For more information on long-distance and high-speed unicycling go to the unicycle section of my blog, available here.
In the spring of 2004 I purchased a Coker unicycle. This unicycle has a 36″ diameter tire and a reinforced air seat with hand grips and was designed for riding long distances. I started riding my new unicycle to work (an eight mile commute each way), and on some twenty to thirty mile rides with other Seattle area unicyclists. Around the same time I met Lars Clausen and read his book “One Wheel – Many Spokes”, about his unicycle ride across the United States. These events conspired to get me thinking about doing the STP (Seattle to Portland bike classic) on a unicycle.
The STP is a two-day 204 mile ride which draws 8,000 cyclists every year. A few people – notably Jack Hughes – have done it before on a unicycle. Other unicyclists have done STP level distances in a single day, so clearly doing it in two days was quite possible, but not easy.
Microsoft’s VC++ compiler has an option to generate instructions for new instruction sets such as AVX and AVX2, which can lead to more efficient code when running on compatible CPUs. So, an obvious tactic is to compile critical math-heavy functions twice, once with and once without /arch:AVX (or whatever instruction set you want to optionally support).
It seems like a good idea, and it’s been used in various forms for years, but it’s devilishly difficult to do safely. It usually works, but guaranteeing that is trickier than I had realized.
TL;DR – I can finally record CPU performance counters for processes on Windows.
I’m mostly a Windows developer but I’ll occasionally fire up my Linux box to use the perf tool to examine CPU performance counters. Sometimes you really need to see how many cache misses or branch mispredicts your code is causing, and Windows has been curiously hostile to this endeavor.
Some time ago Windows gained the ability to record CPU performance counters from within ETW events, but (so the story goes) there was no way to enable it. Then the ability to enable this feature was added, but there was virtually no documentation.
Posted in xperf
Tagged ETW, pcm, pmc
There’s good news, and there’s bad news.
The good news is that the latest Windows Performance Analyzer (WPA), the visualization tool for ETW (Event Tracing for Windows) traces, can now load symbols faster than ever before – it’s multi-threaded, and it scans huge PDBs about eight times faster.
The bad news is that it utterly fails to download Chrome’s symbols.
Update: the Creators Update version of WPA fixes this bug and the latest UIforETW automatically installs it.
Oops. Luckily I was able to diagnose and work around the problem.
Two weeks ago I had a severe inner-ear episode, presumed to be an infection. One moment I was 100% healthy and then, ten minutes later, I was deaf in one ear, with severe vertigo. The word ‘vertigo’ doesn’t quite capture the horrific nausea and vomiting that ensued as my lunch guest drove me to immediate care, nor does it capture the six uncomfortable days that followed.
I’m doing much better now and I thought I’d share the recovery timeline:
“What’s an EXCEPTION_FLT_STACK_CHECK exception?” one of my coworkers said. I said “It’s a weird and rare crash. Why do you ask?”
It turns out that one of these weird and rare crashes had started showing up in Chrome (M54 branch, not the stable branch that consumers are running). We began looking at it together until I decided it made more sense to assign the bug to myself. Partly I did this because the crash analysis required some specialized knowledge that I happened to have, but mostly because I was curious and I thought that this bug was going to be interesting. I was not disappointed.
- The crash was in a FPU that Chrome barely uses
- The instruction that crashed Chrome was thousands of instructions away from the one that triggered the exception
- The instruction that triggered the exception was not at fault
- The crash only happened because of third-party code running inside of Chrome
- The crash was ultimately found to be caused by a code-gen bug in Visual Studio 2015