Making Windows Slower Part 2: Process Creation

Windows has long had a reputation for slow file operations and slow process creation. Have you ever wanted to make these operations even slower? This weeks’ blog post covers a technique you can use to make process creation on Windows grow slower over time (with no limit), in a way that will be untraceable for most users!

And, of course, this post will also cover how to detect and avoid this problem.

This issue is a real one that I encountered earlier this year, and this post explains how I uncovered the problem and found a workaround. Previous posts on making Windows slower include:

Making Windows Slower Part 0: Making VirtualAlloc arbitrarily slower
Making Windows Slower Part 1: Making file access slower

Noticing that something is wrong

I don’t go looking for trouble, but I sure seem to find it. Maybe it’s because I build Chrome from source hundreds of times over the weekend, or maybe I’m just born with it. I guess we’ll never know. For whatever reason, this post documents the fifth major problem that I have encountered on Windows while building Chrome.

Unplanned serialization that lead to full-system UI hangs: 24-core CPU and I can’t move my mouse
Process handle leak in one of Microsoft’s add-ons to Windows: Zombie Processes are Eating your Memory
A long-standing correctness bug in the Windows file cache: Compiler bug? Linker bug? Windows Kernel bug
A performance glitch if you misuse file notifications: Making Windows Slower Part 1: File Access
And this one – an odd design decision that makes process creation slower over time

Tracking a rare crash

Computers should be reliable and predictable and I get annoyed when they aren’t. If I build Chrome a few hundred times in a row then I would like every build to succeed. So, when our distributed compiler process (gomacc.exe) would crash occasionally I wanted to investigate. I have automatic recording of crash dumps configured so I could see that the crashes happened when heap corruption was detected. A simple way of investigating that is to turn on pageheap so that the Windows heap puts each allocation on a separate page. This means that use-after-free and buffer overruns become instant crashes instead of hard to diagnose corruption. I’ve written about enabling pageheap using App Verifier before.

App Verifier causes your program to run more slowly, both because allocations are now more expensive and because the page-aligned allocations mean that your CPU’s cache is mostly neutered. So, I expected my builds to run a bit slower, but not too much, and indeed the build seemed to be running fine.

But when I checked in later the build seemed to have stopped. After about 7,000 build steps there was no apparent sign of progress.

O(n^2) is usually not okay

It turns out that Application Verifier likes to create log files. Never mind that nobody ever looks at these log files, it creates them just in case. And these log files need to have unique names. And I’m sure it seemed like a good idea to just give these log files numerically ascending names like gomacc.exe.0.dat, gomacc.exe.1.dat, and so on.

To get numerically ascending names you need to find what number you should use next, and the simplest way to do that is to just try the possible names/numbers until you find something that hasn’t been used. That is, try to create a new file called gomacc.exe.0.dat and if that already exists then try gomacc.exe.1.dat, and so on.

What’s the worst that could happen?

Actually, the worst is pretty bad

It turns out that if you do a linear search for an unused file name whenever you create a process then launching N processes takes O(N^2) operations. A good rule of thumb is that O(N^2) algorithms are too slow unless you can guarantee that N always stays quite small.

Exactly how bad this will be depends on how long it takes to see if a file name already exists. I’ve since done measurements that show that in this context Windows seems to take about 80 microseconds (80 µs or 0.08 ms) to check for the existence of a file. Launching the first process is fast, but launching the 1,000th process requires scanning through the 1,000 log files that have already been created, and that takes 80 ms, and it keeps getting worse.

A typical build of Chrome requires running the compiler about 30,000 times. Each launch of the compiler requires scanning over the previously created N log files, at 0.08 ms for each existence check. The linear search for the next available log file name means that launching N processes takes (N^2)/2 file existence checks, so 30,000 * 30,000 / 2 which is 450 million. Since each file existence check takes 0.08 ms that’s 36 million ms, or 36,000 seconds. That means that my Chrome build, which normally takes five to ten minutes, was going to take an additional ten hours.

Darn.

When writing this blog post I reproduced the bug by launching an empty executable about 7,000 times and I saw a nice O(n^2) curve like this:

Oddly enough, if you grab an ETW trace and just look at the average time to call CreateFile on these many different file names then the result – from beginning to end – suggests that it takes less than five microseconds per file (an average of 4.386 microseconds in the example below):

It looks like this just reveals a limitation of ETW’s file I/O tracing. The file I/O events only track the very lowest level of the file system, and there are many layers above Ntfs.sys, including FLTMGR.SYS and ntoskrnl.exe. However the cost can’t hide entirely – the CPU time all shows up in the CPU Usage (Sampled) graph. The screen show below shows a 548 ms time period, representing the creation of one process, mostly just scanning over about 6,850 possible log file names:

Would a faster disk help?

No.

The amount of data being dealt with is tiny, and the amount being written to disk is even tinier. During my tests to repro this behavior my disk was almost completely idle. This is a CPU bound problem because all of the relevant disk data is cached. And, even if the overhead was reduced by an order of magnitude it would still be too slow. You can’t make an O(N^2) algorithm be good.

Detection

You can detect this specific problem by looking in %userprofile%\appverifierlogs for .dat files. You can detect process creation slowdowns more generally by grabbing an ETW trace, and now you know one more thing to look for.

The solution

The simplest solution is to disable the generation of the log files. This also stops your disk from filling up with GB of log files. You can do that with this command:

appverif.exe -logtofile disable

With log file creation disabled I found that my tracked processes started about three times faster (!) than at the beginning of my test, and the slowdown is completely avoided. This allows 7,000 Application Verifier monitored processes to be spawned in 1.5 minutes, instead of 40 minutes. With my simple test batch file and simple process I see these process-creation rates:

200 per second normally (5 ms per process)
75 per second with Application Verifier enabled but logging disabled (13 ms per process)
40 per second with Application Verifier enabled and logging enabled, initially… (25 ms per process, increasing to arbitrarily high limits)
0.4 per second after building Chrome once

Microsoft could fix this problem by using something other than a monotonically increasing log-file number. If they used the current date and time (to millisecond or higher resolution) as part of the file name then they would get log file names that were more semantically meaningful, and could be created extremely quickly with virtually no unique-file-search logic.

But, Application Verifier is not being maintained anymore, and the log files are worthless anyway, so just disable them.

Supporting information

The batch files and script to recreate this after enabling Application Verifier for empty.exe can be found here.

An ETW trace from around the end of the experiment can be found here.

The raw timing data used to generate the graph can be found here.

Reddit discussion is here.

Hacker news discussion is here.

Russian translation is here.

For other examples of O(n^2) algorithms going bad see Accidentally Quadratic.

For lighter fun see a video summary of my nineteen different September commute methods – I was too busy to blog that month.

7 Responses to Making Windows Slower Part 2: Process Creation

nzo says:

October 16, 2018 at 6:35 am

Nice story, thank you !

Alexander Riccio says:

October 16, 2018 at 4:18 pm

I assume you have NTFS 8.1 short file name creation turned off? From my days working on my fork of WinDirStat, I remember that it makes everything a million times slower and it’s a totally unreasonable feature to have these days anyways.

- brucedawson says:
  
  October 16, 2018 at 4:34 pm
  
  I don’t have short file name creation turned off, and the CPU usage data does suggest that it might be making things worse, maybe.
  
  But, I try to reduce the number of setup tweaks that I do, and given that turning off log files solves the problem perfectly I’ll stick with that, and not both disabling short file name creation.
  
Ralph Corderoy says:

October 17, 2018 at 5:34 am

Hi Bruce,
> I saw a nice O(n^2) curve like this:
An empty img entity follows.
Cheers, Ralph.

- brucedawson says:
  
  October 17, 2018 at 9:11 am
  
  Woah – that’s weird. Thanks for the report. Fixed.
  
WindowsSucks says:

January 24, 2019 at 3:46 pm

Do you have a linkable source for “Application Verifier is not being maintained anymore”?

- brucedawson says:
  
  January 25, 2019 at 2:41 pm
  
  I can’t find one, but the impression I got (from discussions on twitter?) is that there was no work being done on it and the O(n^2) way of creating n logs was not going to be fixed.