Earlier this month I wrote about how Windows 10 holds a lock during too much of process destruction, which both serializes this task and causes mouse-cursor hitches and UI hangs (because the same lock is used for these UI tasks).
I thought I’d use this as an excuse to dig slightly deeper into what is going on using a clunky-but-effective ETW profiling technique. This technique shows that a 48 instruction loop is consuming a huge chunk of the CPU time while the lock is held – the 80/20 rule is alive and well. And, thanks to some discussion on hacker news I know have an idea of what that function does and why it got so much more expensive in Windows 10 Anniversary Edition.
This story begins, as they so often do, when I noticed that my machine was behaving poorly. My Windows 10 work machine has 24 cores (48 hyper-threads) and they were 50% idle. It has 64 GB of RAM and that was less than half used. It has a fast SSD that was mostly idle. And yet, as I moved the mouse around it kept hitching – sometimes locking up for seconds at a time.
So I did what I always do – I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10.
I just got a new laptop. My old machine was more than six years old so it was probably overdue. I wanted to record some of the reasons for the upgrade, and the process, if only for myself, so here we go.
One would think that the main reason to upgrade a six-year-old laptop would be the hardware. Bigger, faster, etc., but it turns out that software was as big a factor.
I’ve written in the past about how to compare floating-point numbers for the common scenario where two results should be similar but may not be identical. In that scenario it is reasonable to use an AlmostEqual function for comparisons. But there are actually cases where floating-point math is guaranteed to give perfect results, or at least perfectly consistent results. When programmers treat floating-point math as a dark art that can return randomly wrong results then they do themselves (and the IEEE-754 committee) a disservice.
A common example given is that in IEEE floating-point math 0.1 + 0.2 does not equal 0.3. This is true. However this “odd” behavior is then extrapolated in some ill-defined way to suggest that all floating-point math is wrong, in unpredictable ways. The linked discussion then used one of my blog posts to justify their incorrect analysis – hence this article.
In fact, IEEE floating-point math gives specific guarantees, and when you can use those guarantees you can sometimes make strong conclusions about your results. Failing to do so leads to a cascade of uncertainty in which any outcome is possible, and analysis is impossible.
I’m lucky enough to live just 2 km (1.25 miles) away from the place where I work. Because of this – and because I dislike driving – I tend to commute in a variety of non-car ways. A few months into my new job I noticed that I tended to use about six different commute methods on a regular basis: walking, running, cycling, unicycling, inline skating, and taking a bus. Having that many commute methods got me thinking: how many commute methods could I come up with? Could I commute to work using a different method every work day for a month?
And so was born the commute challenge. After much procrastination I tried this challenge in April 2017. One month, twenty work days, twenty different commute methods.
In 1992, near the end of a very slow trip around the world, my fiancée (now wife) and I visited Iran for two weeks. I thought I’d share a few pictures and stories from that visit.
We entered Iran through the Taftan land crossing from Pakistan, having taken an all-night bus trip from Quetta. I don’t mean to criticize Pakistan by the description that follows – I am merely setting up our first impressions of Iran.
The bus trip was horrible. Imagine, if you will, a twelve hour trip in a school bus – famed for their comfortable seats and smooth suspension – on a road that was almost entirely washboard and potholes. We quite enjoyed Pakistan (it was safer then) but that bus trip was not one of our favorite bits.
In the previous episode of “Simple Changes to Shrink Chrome” I discussed how deleting ‘const’ from a few key locations could lead to dramatic size savings, due to a VC++ compiler quirk. In this episode I’ll show how deleting an inline function definition can lead to similar savings.
The savings this time are less important as they are mostly in the .BSS segment, but there are also some modest code-size savings, and some interesting lessons. It’s worth confessing up front that this time the problem being solved was not caused by a compiler quirk – it was entirely self inflicted.
Doing this investigation has reminded me that the behavior of linkers is best described by chaos theory – the details of their behavior defy prediction by simple heuristics, and the results can be changed dramatically by tiny
butterflies code changes.