I tend to launch most programs on my Windows 10 laptop by typing the <Win> key, then a few letters of the program name, and then hitting enter. On my powerful laptop (SSD and 32 GB of RAM) this process usually takes as long as it takes me to type these characters, just a fraction of a second.
Usually. Continue reading
Last month I wrote about an odd crash that was hitting a few Chrome users. Something was corrupting the XMM7 register and that was causing Chrome to crash. We fixed a couple of bugs in Chrome and we were able to contact the third-party company whose software was causing the problems. They released a fixed version, and I assumed that my work was done.
However, instead of a gradual decline in the rates of this crash I saw a gradual increase. Apparently enterprise software updates roll out extremely slowly, and users were installing the old buggy version faster than they were updating to the fixed version. This situation will resolve itself eventually, but a lot of crashes were going to happen in the meantime. With the proper fix moving slowly through the pipelines I decided to try to hack in a decidedly improper fix.
“Hey, you. Yes you, that function over there. When you’re cleaning up please remember to restore all of my registers. Yes, that one too – what do you think this is, Linux?”
That’s the problem I was dealing with in a nutshell. Functions are required by a platform’s ABI (Application Binary Interface) to preserve certain registers – restoring them if they were used – but the set of registers that must be restored varies between platforms, and the rules on Linux are different from those on Windows. That may be why I encountered register corruption in Chrome on Windows. But let’s take a step back.
I apologize for this title because there are many things that can make modern software slow. Blindly applying one explanation without a bit of investigation is the software equivalent of a cargo cult. That said, this post describes one example of why modern software can be painfully slow.
All I wanted was to record a forty-second voiceover for a throw-away video, so I fired up the Windows Voice Recorder app and hit the record button. Nothing seemed to happen.
While investigating some performance mysteries in Chrome I discovered that Microsoft had parallelized how they zero memory, and in some cases this was making it a lot slower. This slowdown may be mitigated in Windows 11 but in the latest Windows Server editions – where it matters most – this bug is alive and well.
The good news is that this issue seems to only apply to machines with a lot of processors. By “a lot” of processors I mean probably 96 or more. So, your laptop is fine. And, even the 96-processor machines may not hit this problem all the time. But I’ve now found four different ways to trigger this inefficiency and when it is hit – oh my. The CPU power wasted is impressive – I estimate that memory zeroing is using about 24x the CPU time it should.
Okay – time for the details.
In 2004 I was working for Microsoft in the Xbox group, and a new console was being created. I got a copy of the detailed descriptions of the Xbox 360 CPU and I read it through multiple times and suddenly I’d learned enough to become the official CPU expert. That meant I started having regular meetings with the hardware engineers who were working with IBM on the CPU which gave me even more expertise on this CPU, which was critical in helping me discover a design flaw in one of its instructions, and in helping game developers master this finicky beast.
It was literally the day after I cracked the __FILE__ determinism bug that I hit a completely different build determinism issue. I was asked to investigate why the Chrome build number reported for Chrome crashes on Windows 11 was lagging behind what was reported by winver. For example, Chrome crashes on 10.0.22000.376 were being reported as happening on 10.0.22000.318. After some code spelunking I found that crashpad retrieves the Windows version number from kernel32.dll, so I focused on that.
Aside: crashpad grabs the Windows version number from kernel32.dll instead of using GetVersionExW (which is deprecated, BTW) because the GetVersion* functions will frequently lie about the Windows version for compatibility reasons. For crash reporting we really want the actual-no-lies-we-can-handle-the-truth version number, and kernel32.dll used to be the best way to get this.
That’s when things got weird.
‘Twas the week before Christmas and I ran across a deterministic-build bug. And then another one. One was in Chromium, and the other was in Microsoft Windows. It seemed like a weird coincidence so I thought I’d write about both of them (the second one can be found here).
A deterministic build is one where you get the same results (bit identical intermediate and final result files) whenever you build at the same commit. There are varying levels of determinism (are different directories allowed? different machines?) that can increase the level of difficulty, as described in this blog post. Deterministic builds can be quite helpful because they allow caching and sharing of build results and test results, thus reducing test costs and giving various other advantages.
ETW is the best way to analyze performance on Windows, and Windows Performance Analyzer (WPA) has been the preferred tool for analyzing ETW traces for ten years now, generally obtained either by running UIforETW or by getting it from the Windows 10 SDK. However the SDK version was not updated for a long time.
Starting in 2018 WPA has been available from the Microsoft store and the hope was that this version would be updated more frequently, however this version was also not updated for a long time.
Three years ago I found a 32 GB memory leak caused by CcmExec.exe failing to close process handles. That bug is fixed, but ever since then I have had the handles column in Windows Task Manager enabled, just in case I hit another handle leak.
Because of this routine checking I noticed, in February of 2021, that one of Chrome’s processes had more than 20,000 handles open!
This Chrome bug is fixed now but I wanted to share how to investigate handle leaks because there are other leaky programs out there. I also wanted to share my process of learning.