Zombie Processes are Eating your Memory

Zombies probably won’t consume 32 GB of your memory like they did to me, but zombie processes do exist, and I can help you find them and make sure that developers fix them. Tool source link is at the bottom.

Is it just me, or do Windows machines that have been up for a while seem to lose memory? After a few weeks of use (or a long weekend of building Chrome over 300 times) I kept noticing that Task Manager showed me running low on memory, but it didn’t show the memory being used by anything. In the example below task manager shows 49.8 GB of RAM in use, plus 4.4 GB of compressed memory, and yet only 5.8 GB of page/non-paged pool, few processes running, and no process using anywhere near enough to explain where the memory had gone:

image

My machine has 96 GB of RAM – lucky me – and when I don’t have any programs running I think it’s reasonable to hope that I’d have at least half of it available.

Continue reading

Advertisements
Posted in Debugging, Investigative Reporting, Performance, Programming, Rants | Tagged , , | 27 Comments

What We Talk About When We Talk About Performance

Describing performance improvements exists at the intersection of mathematics and linguistics. It is quite common to use incorrect math to describe performance improvements, and it is possible to use incorrect, misleading, or just sub-optimal rhetoric to describe your math.

Consider this hypothetical press release:

AirTrain Inc. is proud to announce the new AirTrain-8000. This revolutionary new plane can fly from London to Seattle at an average speed of 7,700 km/h – a huge improvement over the 770 km/h of other jets. This drops the travel time from ten hours to just one, making the AirTrain-8000 90% faster than our competitors.

AirTrain-8000 takes 1 hour, other jets take 10 hoursA press release like this would never be released. The new plane is ten times faster than previous planes (7,700 km/h divided by 770 km/h), and no marketing team would allow this improvement to be summarized as “90% faster”, which means “almost twice as fast.” And yet, when talking about computers – where ten-times speedups happen fairly often – this mistake is made quite frequently.

This abuse of percentages has made them meaningless for describing optimizations – we need to stop using them. The AirTrain-8000 is ten times as fast, full stop.

Continue reading

Posted in Math, Performance, Rants | Tagged , , | 19 Comments

What Would Jesus’s Resume Say?

Some years ago I saw a set of resumes created by not-yet-graduated university students. They were all three to four pages long. This is too long, and I said to my daughter “Even Jesus Christ’s resume fits on one page”. His resume was then revealed to us during an epic road trip and I was inspired to share it with the world:

Continue reading

Posted in Fun | Tagged , , , | 13 Comments

Finding a CPU Design Bug in the Xbox 360

The recent reveal of Meltdown and Spectre reminded me of the time I found a related design bug in the Xbox 360 CPU – a newly added instruction whose mere existence was dangerous.

Back in 2005 I was the Xbox 360 CPU guy. I lived and breathed that chip. I still have a 30-cm CPU wafer on my wall, and a four-foot poster of the CPU’s layout. I spent so much time understanding how that CPU’s pipelines worked that when I was asked to investigate some impossible crashes I was able to intuit how a design bug must be their cause. But first, some background…

Continue reading

Posted in Debugging | Tagged , , , , | 65 Comments

Analyzing a Confusing Crash–Stack Walks Gone Bad

Part of my job always seems to include crash analysis. A program crashes on a customer’s machine, a minidump is uploaded to the cloud, and it might be my desk that it appears on when Monday morning rolls around. The expectation is that I can make sense of it so that we can ship more reliable software.

The Windows ecosystem makes this as easy as possible because the debuggers will automatically download the binaries and the debug symbols, and if source indexing has been applied then they will also automatically download the correct source files as needed.

Chrome has a public symbol server and uses source indexing so debugging Chrome is accessible for any developer.

Continue reading

Posted in Debugging | Tagged , | 8 Comments

Hey Synaptics, Can You Please Stop Polling!

TL;DR – be wary of buying a Lenovo laptop or any other laptop that uses a Synaptics touch pad until Synaptics ships a fixed driver. Their driver has a memory leak and they have a battery-life bug that causes Windows to repeat the same system-scan once a second. So far Synaptics has failed to respond so the wait for a fix must be assumed to be infinite.

Update: October 15, 2017: an updated driver for my laptop was released September 11, 2017, six days after I published this post. However I have yet to see the driver on Windows Update, Lenovo’s System Update, or in any communications from Lenovo or Synaptics. So, I don’t know if or how ‘normal’ customers are supposed to get this fix. I only found out through the kindness of a stranger.

Now for more details to back up my claims…

My new laptop has great battery life and I want to keep it that way so if I notice something consuming CPU time for no good reason I investigate. If it’s a web page with battery-wasting ads I close it, and if it’s a process that keeps doing background work I’ll kill it.

So, when I noticed that Task Manager said that WmiPrvSE.exe was consuming ~0.7% of my CPU time (~5.6% of a core) continuously I investigated.

Continue reading

Posted in Investigative Reporting, Uncategorized | 30 Comments

What is Windows *doing* while hogging that lock

Earlier this month I wrote about how Windows 10 holds a lock during too much of process destruction, which both serializes this task and causes mouse-cursor hitches and UI hangs (because the same lock is used for these UI tasks).

I thought I’d use this as an excuse to dig slightly deeper into what is going on using a clunky-but-effective ETW profiling technique. This technique shows that a 48 instruction loop is consuming a huge chunk of the CPU time while the lock is held – the 80/20 rule is alive and well. And, thanks to some discussion on hacker news I know have an idea of what that function does and why it got so much more expensive in Windows 10 Anniversary Edition.

Continue reading

Posted in Investigative Reporting, uiforetw, xperf | Tagged , , | 15 Comments