I’ve recently started using heap snapshots on Windows to track heap allocations. I was able to use heap snapshots to record call stacks for all outstanding allocations in Chrome’s browser process over a full two weeks, letting me account for pretty much every byte of memory consumed.
Since then I have used heap snapshots to find wasteful memory usage in the Windows heap, a memory leak in a security tool injected into Chrome, and many details of Chrome’s memory usage that I was not previously aware of.
I first read about heap snapshots here. This page gives the mechanics of how to record a heap snapshot but it spends very little time explaining what heap snapshots are or how to use them effectively.
I seem to have a habit of writing about super powerful machines whose many cores are laid low by misuse of locks. So. Yeah. It’s that again.
But this one seems particularly impressive. I mean, how often do you have one thread spinning for several seconds in a seven-instruction loop while holding a lock that stops sixty-three other processors from running. That’s just awesome, in a horrible sort of way.
Contrary to popular belief I don’t actually have a machine with 64 logical processors, and I’ve never seen this particular problem. But a friend hit this problem,
nerd-sniped me asked for help, and I decided it was interesting enough to look at. They sent me an ETW trace that contained enough information for me to craft a tweet-for-help which resolved the issue swiftly.
If we tax fossil fuels – making them more expensive – then the awesome power and creativity of the free market will create diverse alternatives and efficiencies with minimal additional government intervention. We will ultimately save money, be healthier, and slow the irreversible transformation of our climate.
Every year we extract billions of tons of hydrocarbons from the ground and from forests and burn them. Not surprisingly this has added hundreds of billions of tons of CO2 to the atmosphere and the oceans. CO2 in the atmosphere traps heat, and CO2 in the oceans makes them more acidic. Because of all this the glaciers and icecaps are melting, temperatures and ocean levels are rising, and corals are dying. Exxon’s scientists warned about this in 1982, but like other oil companies has continued funding climate-change denial. When the situation is bad enough to lead Bill Nye to drop the f-bomb then maybe we should pay attention. Continue reading
So many possible introductions to this one:
- Windows 7: Sheesh, I sure am slow at creating processes
- Windows 10: Hold my beer…
Or how about:
- A) How long does CreateProcess take on Windows?
- B) How long would you like it to take?
- A) You mean you can make it as fast as I want?
- B) No, I can make it as *slow* as you want
O(n^2) algorithms that should be linear are the best.
Note that, despite breathless and click-baity claims to the contrary, the performance of Chrome and Chromium was never affected by this bug. Only Chromium’s tests were affected, and that slowdown has been 100% mitigated.
CFG ended up being a big part of this issue, and eight months earlier I had hit a completely unrelated CFG problem, written up here.
I often find odd performance issues all on my own, but sometimes they are given to me. So it was when I returned from vacation to find that I’d been CCed on an interesting looking bug. Vivaldi had reported “Unit test performance much worse on Win10 than Win7”. unit_tests were taking 618 seconds on Win10, but just 125 seconds on Win7.
Update, April 23, 2019: Microsoft received the initial “anomaly” report on the 15th, the repro steps on the 21st, and announced a fix today. Quick work!
By the time I looked at the bug it was suspected that CreateProcess running slowly was the problem. My first guess was that the problem was UserCrit lock contention caused by creating and destroying default GDI objects. Windows 10 made these operations far more expensive, I’d already written four blog posts about the issues that this causes, and it fit the symptoms adequately well.
Years ago I worked in the Xbox 360 group at Microsoft. We were thinking about releasing a new console, and we thought it would be nice if that console could run the games of the previous console.
Emulation is always hard, but it is made more challenging when your corporate masters keep changing CPU types. The Xbox one – sorry, the original Xbox – used an x86 CPU. The Xbox two – sorry, the Xbox 360 – used a PowerPC CPU. The Xbox three – sorry, the Xbox One – used an x86/x64 CPU. These ISA flip-flops did not make life easy.
I made some contributions to the team that taught the Xbox 360 how to emulate a lot of the original Xbox games – emulating x86 on PowerPC – and was given the job title Emulation Ninja for that work*. Then I was asked to help investigate what it would take to emulate the Xbox 360’s PowerPC CPU with an x64 CPU. To set expectations, I’ll mention up front that I didn’t find a satisfactory solution.
Last week I wrote about the performance consequences of inadvertently loading gdi32.dll into processes that are created and destroyed at very high rates. This week I want to share some techniques for digging deeper into this behavior, and the odd things that I found when trying to use them.
When I first wrote UIforETW I noticed that an inordinate amount of the size of the traces it recorded was coming from the Microsoft-Windows-Win32k provider. This provider records useful information about UI hangs and which window is active, but some less useful events were filling the trace buffers and crowding out the interesting ones. The most verbose events were the ExclusiveUserCrit, ReleaseUserCrit, and SharedUserCrit events and they were routinely generating 75% of the Microsoft-Windows-Win32k event traffic. So I stopped recording those events and forgot about them until quite recently. And that’s funny because those events record exactly the information that is needed for investigating all of these UI hangs – theoretically.