While investigating some performance mysteries in Chrome I discovered that Microsoft had parallelized how they zero memory, and in some cases this was making it a lot slower. This slowdown may be mitigated in Windows 11 but in the latest Windows Server editions – where it matters most – this bug is alive and well.
The good news is that this issue seems to only apply to machines with a lot of processors. By “a lot” of processors I mean probably 96 or more. So, your laptop is fine. And, even the 96-processor machines may not hit this problem all the time. But I’ve now found four different ways to trigger this inefficiency and when it is hit – oh my. The CPU power wasted is impressive.
Okay – time for the details.
In 2004 I was working for Microsoft in the Xbox group, and a new console was being created. I got a copy of the detailed descriptions of the Xbox 360 CPU and I read it through multiple times and suddenly I’d learned enough to become the official CPU expert. That meant I started having regular meetings with the hardware engineers who were working with IBM on the CPU which gave me even more expertise on this CPU, which was critical in helping me discover a design flaw in one of its instructions, and in helping game developers master this finicky beast.
It was literally the day after I cracked the __FILE__ determinism bug that I hit a completely different build determinism issue. I was asked to investigate why the Chrome build number reported for Chrome crashes on Windows 11 was lagging behind what was reported by winver. For example, Chrome crashes on 10.0.22000.376 were being reported as happening on 10.0.22000.318. After some code spelunking I found that crashpad retrieves the Windows version number from kernel32.dll, so I focused on that.
Aside: crashpad grabs the Windows version number from kernel32.dll instead of using GetVersionExW (which is deprecated, BTW) because the GetVersion* functions will frequently lie about the Windows version for compatibility reasons. For crash reporting we really want the actual-no-lies-we-can-handle-the-truth version number, and kernel32.dll used to be the best way to get this.
That’s when things got weird.
‘Twas the week before Christmas and I ran across a deterministic-build bug. And then another one. One was in Chromium, and the other was in Microsoft Windows. It seemed like a weird coincidence so I thought I’d write about both of them (the second one can be found here).
A deterministic build is one where you get the same results (bit identical intermediate and final result files) whenever you build at the same commit. There are varying levels of determinism (are different directories allowed? different machines?) that can increase the level of difficulty, as described in this blog post. Deterministic builds can be quite helpful because they allow caching and sharing of build results and test results, thus reducing test costs and giving various other advantages.
ETW is the best way to analyze performance on Windows, and Windows Performance Analyzer (WPA) has been the preferred tool for analyzing ETW traces for ten years now, generally obtained either by running UIforETW or by getting it from the Windows 10 SDK. However the SDK version was not updated for a long time.
Starting in 2018 WPA has been available from the Microsoft store and the hope was that this version would be updated more frequently, however this version was also not updated for a long time.
Three years ago I found a 32 GB memory leak caused by CcmExec.exe failing to close process handles. That bug is fixed, but ever since then I have had the handles column in Windows Task Manager enabled, just in case I hit another handle leak.
Because of this routine checking I noticed, in February of 2021, that one of Chrome’s processes had more than 20,000 handles open!
This Chrome bug is fixed now but I wanted to share how to investigate handle leaks because there are other leaky programs out there. I also wanted to share my process of learning.
Near the end of January I was pointed to a twitter thread where a Windows user with a powerful machine was hitting random hangs in explorer. Lots of unscientific theories were being proposed. I don’t generally do random analysis of strangers’ performance problems but the case sounded interesting so I thought I’d take a look.
The behavior of the Windows scheduler changed significantly in Windows 10 2004 (aka, the April 2020 version of Windows), in a way that will break a few applications, and there appears to have been no announcement, and the documentation has not been updated. This isn’t the first time this has happened, but this change seems bigger than last time. So far I have found three programs that hit problems because of this silent change.
This post was an accidental duplicate. The content has been deleted from this copy. You can find the (updated) original here.
WhatsApp has served me well as a communications medium for my family, but I was never thrilled with its ownership by Facebook, and the recently announced privacy changes made it necessary for me to move on.
I was inspired by the release of Apple’s M1 ARM processor to tweet about the perils of lock-free programming which led to some robust discussion. The discussion went pretty well given the inanity of trying to discuss something as complicated as CPU memory models in the constraints of tweets, but it still left me wanting to expand slightly on the topic in blog form.
This is intended to be a casual introduction to the perils of lock-free programming (which I last wrote about some fifteen years ago), but also some explanation of why ARM’s weak memory model breaks some code, and why that code was probably broken already. I also want to explain why C++11 made the lock-free situation strictly better (objections to the contrary notwithstanding).