24-core CPU and I can’t move my mouse

This story begins, as they so often do, when I noticed that my machine was behaving poorly. My Windows 10 work machine has 24 cores (48 hyper-threads) and they were 50% idle. It has 64 GB of RAM and that was less than half used. It has a fast SSD that was mostly idle. And yet, as I moved the mouse around it kept hitching – sometimes locking up for seconds at a time.

Update July 27, 2017: a follow-up post dissects the problem and finds the root cause.

Update Oct 29, 2017: a video showing how to see if the bug is fixed can be found here, and the bug is fixed in the 17025 insider preview builds.

Update Nov 20, 2017: the fix has made it to Creators Update (RS2) which means I can now build Chrome without encountering micro-hangs!

Update, March 22, 2018: the fix has made it to Fall Creators Update (RS3), finally, so the fix is now everywhere.

Update, August 17, 2018: analysis of an unrelated bug that also caused UI hangs due to lock contention is available here.

Update, September, 2020: the original issue was UserCrit contention while many processes were destroyed, whether they had gdi32.dll loaded or not. That issue was fixed but if many processes are destroyed that do have gdi32.dll loaded then the issue still happens. That is, the mouse hitches and the horrible performance are still a risk. Maybe someday Microsoft will fix that, but probably not. For details see this blog post.

So I did what I always do – I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10.

The ETW trace showed UI hangs in multiple programs. I decided to investigate a 1.125 s hang in Task Manager:

UI Delays graph in WPA

In the image below you can see CPU usage for the system during the hang, grouped by process name – notice that total CPU usage rarely goes above 50%:

CPU Usage grouped by process name

The CPU Usage (Precise) table showed that Task Manager’s UI thread was repeatedly blocked on calls to functions like SendMessageW, apparently waiting on a kernel critical region (which are the kernel-mode version of critical sections), deep in the call stack in win32kbase.sys!EnterCrit (not shown):

CPU Usage (Precise) showing where TaskMgr.exe was blocked

I manually followed the wait chain through a half-dozen processes to see who was hogging the lock. My notes from the analysis look something like this:

Taskmgr.exe (72392) hung for 1.125 s (MsgCheckDelay) on thread 69,196. Longest delay was 115.6 ms on win32kbase.sys!EnterCrit, readied by conhost.exe (16264), thread 2560 at 3.273101862. conhost.exe (16264), 2560 was readied at 3.273077782 after waiting 115,640.966 ms, by mstsc.exe (79392), 71272. mstsc.exe was readied (same time, same delay) by TabTip.exe (8284), 8348, which was readied by UIforETW.exe (78120), 79584, which was readied by conhost.exe (16264), 58696, which was readied by gomacc.exe (93668), 59948, which was readied by gomacc.exe (95164), 76844.

I had to keep going because most of the processes were releasing the lock after holding it for just a few microseconds. But eventually I found several processes (the gomacc.exe processes) that looked like they were holding the lock for a few hundred microseconds. Or, at least, they were readied by somebody holding the lock and then a few hundred microseconds later they readied somebody else by releasing the lock. These processes were all releasing the lock from within NtGdiCloseProcess.

I was tired of manually following these wait chains so I decided to see if the same readying call stack was showing up a lot of times. I did that by dragging the Ready Thread Stack column to the left and searching the column for NtGdiCloseProcess. I then used WPA’s View Callers-> By Function option to show me all of the Ready Thread Stacks that went through that function – in this view the stack roots are at the bottom:

CPU Usage (Precise) showing all readying by NtGdiCloseProcess

There were 5,768 context switches where NtGdiCloseProcess was on the Ready Thread Stack, each one representing a time when the critical region was released. The threads readied on these call stacks had been waiting a combined total of 63.3 seconds – pretty impressive for a 1.125 second period! And, if each of these readying events happened after the thread had held the lock for just 200 microseconds then the 5,768 readying events would be enough to account for the 1.125 second hang.

I’m not familiar with this part of Windows but the combination of PspExitThread and NtGdiCloseProcess made it clear that this behavior was happening during process exit.

This was happening during a build of Chrome, and a build of Chrome creates a lot of processes. I was using our distributed build system which means that these processes were being created – and destroyed – quite quickly.

The next step was to find out how much time was being spent inside of NtGdiCloseProcess. So I moved to the CPU Usage (Sampled) table in WPA and got a butterfly graph, this time of callees of NtGdiCloseProcess. You can see from the screen shot below that over a 1.125 s period there was, across the entire system, about 1085 ms of time spent inside of NtGdiCloseProcess, representing 96% of the wall time:

CPU Usage (Sampled) data showing how much time was spent inside of NtGdiCloseProcess

Anytime you have a lock that is held more than 95% of the time by one function you are in a very bad place – especially if that same lock must be acquired in order to call GetMessage or update the mouse position. In order to experiment better I wrote a test program that creates 1,000 processes as quickly as possible, waits half a second, and then tells all of the processes to exit simultaneously. The CPU usage of this test program on my four-core eight-thread home laptop, grouped by process name, can be seen below:

Left block is process creation, devil horns to the right are process destruction

Well, what do you know. Process creation is CPU bound, as it should be. Process shutdown, however, is CPU bound at the beginning and the end, but there is a long period in the middle (about a second) where it is serialized – using just one of the eight hyperthreads on the system, as 1,000 processes fight over a single lock inside of NtGdiCloseProcess. This is a serious problem. This period represents a time when programs will hang and mouse movements will hitch – and sometimes this serialized period is several seconds longer.

I’d noticed that this problem seems to be worse when my computer has been running for a while so I rebooted and ran the test as soon as my laptop had settled down. The process-shutdown serialization is indeed less severe, but the issue is still clearly present on the freshly rebooted machine:

Devil horns are narrower after rebooting, but process destruction is still serialized for a while

I then ran the same test on an old Windows 7 machine (Intel Core 2 Q8200, circa 2008) – you can see the results here:

Windows 7 CPU usage shows no serialization on process destruction

Process creation is slower, as you would expect from a much slower CPU, but process destruction is as fast as my new laptop at its best, and is fully parallelized.

This tells us that this serialization on process shutdown is a new issue, introduced sometime between Windows 7 and Windows 10.

48 hyper-threads, 47 of them idle

Amdahl’s law says that if you throw enough cores at your problem then the parts that cannot be parallelized will eventually dominate execution. When my work machine has been heavily used for a few days this serialization issue gets bad enough that process-shutdown becomes a significant part of my distributed build times – and more cores can’t help with that. In order to get maximum build speeds (and if I want to move my mouse while doing builds) I need to reboot my machine every few days. Even then my build speeds are not as fast as they should be, and Windows 7 starts to look tempting.

In fact, adding more cores to my workstation makes the UI less responsive. That is because Chrome’s build system is smart enough to spawn more processes if you have more cores, which means that there are more terminating processes fighting over the global lock. So it’s not just “24-core CPU and I can’t move my mouse” it’s “24-core CPU and therefore I can’t move my mouse.”

This problem has been reported to Microsoft and they are investigating.

Just one more thing…

This is what what my process create test program looks like when run on my 24-core work machine:

Process destruction serialization is worse on my 24-core workstation

See that tiny horizontal red line on the bottom right? That’s Amdahl’s law visualized, as 98% of my machine’s CPU resources sit idle for almost two seconds, while process destruction hogs the lock that I need in order to move the mouse.

These are before/after traces from March 22, 2018, the date the fix made it to Fall Creators Update. The images show the process-destruction portion of ProcessCreateTests.exe. You can clearly see the serialization (one out of four cores allowed to run at a time) in the before image, and the perfect parallelization and much better performance in the after image. The horizontal (time) scales are the same in both images.

GDI Serialization fixed

Resources

The ProcessCreateTests code is available here. Deeper investigation of the functions that hog the lock was done in a follow-up post here, including an understanding of the likely root cause of this new problem. A video showing how to investigate this bug can be found here.

Discussions of this post can be found at:

  1. https://news.ycombinator.com/item?id=14733829
  2. https://www.reddit.com/r/programming/comments/6mcruo
  3. https://m.habrahabr.ru/post/332816/ (popular Russian translation)
  4. https://www.meneame.net/story/tengo-cpu-24-nucleos-no-puedo-mover-raton-eng
  5. https://tech.slashdot.org/story/17/07/11/2055251
  6. https://twitter.com/brucedawson0xb/status/884280598348480512?lang=en

If you liked this post you might like these other investigative reporting posts:

You Got Your Web Browser in my Compiler!

Windows Slowdown, Investigated and Identified (and the follow-up)

PowerPoint Poor Performance Problem

Self Inflicted Denial of Service in Visual Studio Search

24-core CPU and I can’t type an email (part one)

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Investigative Reporting, uiforetw, xperf and tagged , , . Bookmark the permalink.

214 Responses to 24-core CPU and I can’t move my mouse

  1. jaimemmoreno says:

    I’ve also seen this on my hexacore PC running Win 10. It’s really annoying when it happens. Never saw the hitching with the mouse or keyboard input when running Windows 7 or Windows 8 on same machine so think the problem was introduced between Windows 8 and Win 10.

    • I’ve got no issues with an 8-core Ryzen 1700 on Windows 10, is your chip Intel?

      • brucedawson says:

        If it’s the issue that I found then the type of CPU does not matter. What does matter is the workload you are running. It’s possible to hit the issue on a four-core CPU, or to not hit it on a 24-core machine, depending on what you are doing.

        • TheGreatCabbage says:

          Ah, that’s interesting. Thanks for the info 🙂

        • suck3rs says:

          Actually, it can and does matter. There is in fact a bug with newer Intel CPUs that was announced by Intel. If my memory serves it sounds like what you are experiencing.

          • suck3rs says:

            So you link a german .org? Go read up on the actual issue. The bug is cause from a very specific work load hitting specific registries. It can cause corruption and a number of other issues. Although, both AMD and Intel are IBM compatible a bug in one brand can be present but not in the other.

          • brucedawson says:

            Yes, there is a bug in newer Intel CPUs. However the issue I found is a software bug, first discovered on older Intel CPUs. The Intel hardware bug tends to cause crashes, not lock contention.

          • tachyon1 says:

            You might try actually reading _this_ article here first before posting irrelevant comments. Just a thought.

            • OldHack says:

              Exactly. Read the article twice. This is about a Windows spin lock contention bug, on a 12 core, that effects all Windows 10 machines, that were running prior versions of Windows 10. (

              The Intel bug is different, in scope, and effect.

              Try reading the article, and It is well written.

              I used both Process Explorer and Process Monitor to hunt down this same problem, Win10/4core/8thred box.

              2.1 Version 1507 [BUG]
              2.2 Version 1511 (November Update) [BUG]
              2.3 Version 1607 (Anniversary Update) [BUG]
              2.4 Version 1703 (Creators Update) [BUG]
              **Build 1705 which the bug was fixed
              2.5 Version 1709 (Fall Creators Update) [?]
              2.6 Version 1803 (April 2018 Update) [?]
              2.7 Version 1809 (October 2018 Update) [?]
              2.8 Version 1903 [?][?][?][?]

              Thanks. Great work, and great article. I wish there was a place to talk about massive builds like this, and using distrubuted make/compiles…

  2. yuhong says:

    I can’t imagine that moving the Win32k stuff back to CSRSS would help much in this case, right? Though it is still a good thing especially for terminal servers where hopefully one CSRSS process crashing just terminate the session.

  3. James says:

    I’m almost crying in tears. This is exactly what happens with my recent 8 core 6900k on Windows 10 and couldn’t find what could be the reason. Specifically when that behaviour is not happening on Linux!

  4. MBH says:

    If you have a Skylake or Kaby Lake CPU, there’s a bug in their hyperthreading code, so disable hyperthreading and see how it goes.
    That bug caused a lot of segfaults on Linux and data corruption in registers.

    On Linux, you can circumvent this by using the intel-microcode package which modifies the CPU’s microcode on every boot and fixes this issue.

    • brucedawson says:

      I have heard of that bug but it is not related to this issue. The lock contention issue that I discovered is a pure Windows 10 performance bug, not a CPU flaw.

  5. rezna says:

    When I was playing with IncrediBuild (a distributed build system) few years ago, I encountered that having more than 4 or 8 threads building the code useless (Lenovo T420, 8GB RAM, i5 Core, mSata SSD drive) because the overhead of spawing and killing the processes was to high. May be the Chrome build system might also try throttling the threads at some reasonable level.

    • brucedawson says:

      The ninja build system (used by Chrome) carefully manages the number of processes created in order to avoid overwhelming the system. It scales to the number of cores. We are not overloading the system and if we throttle process creation we will build Chrome more slowly.

      I am not aware of any mitigation which we can do – we need a fix from Microsoft.

      • Allan Jensen says:

        Ninja doesn’t manage anything. It just measures how many virtual cores you have and launches that number of parallel proces by default. You can control it a bit with the command line arguments -j (number of jobs) and -m (maximum load)

        • gim (@gim) says:

          what I think bruce meant is that ninja waits for old processes to finish first. Moreover, even if you specify `-j 20` that doesn’t mean ninja will start 20 processes, it will throttle it depending on the load (see `RealCommandRunner`)

      • mohammed imran says:

        Has Msoft replied back, what are they doing with this bug, it cripples my system sometimes. what are they doing about it??

  6. R Questioner says:

    Moderator: This is off topic so I filed a bug for this issue and I’ll delete future comments on this topic. Please comment on the bug at crbug.com/740760 if you have further information which can help us investigate this issue. We do take memory consumption issues seriously and we are working on them but we need more specific complaints or there is nothing we can do to help.

  7. Anon says:

    The moment I got to these two words, “Windows 10”, in the first sentence…

    Moderator: Non-constructive. Deleted.

  8. Ohm says:

    Did you try another mouse?

  9. Steve says:

    They should of called windows 10, Vista 1.0 real garbage…

    Moderator: off-topic and non-constructive

  10. Roy Adams says:

    Excellent post Bruce!

    Did Microsoft happen to give you a bug/issue tracking number when you reported the problem?

    Thanks,
    Roy

  11. Satay Nutella must go says:

    Windows 10 is absolute garbage.

    Moderator: Non-constructive and off-topic. Deleted (well, mostly).

  12. I has a simular problem recently, (there was a one second pause on every process exit) and after plenty of debugging and hair pulling I discovered that it was caused by the AMD drivers. Never understood how they caused IT exactly, but updating to a newer version resolved it.

  13. Criação says:

    I have a similar problem with my mouse. Thanks for giving me a clue to what might be happening under the hood. This is most annoying, really.

  14. Thomas Maher says:

    Had this issue and fixed it somehow. Try disabling windows defender fully.

  15. Rich Talbot-Watkins says:

    I had exactly this problem too, and fixed it to a large extent by updating a USB driver. But I’ve still seen occasional hitches since then.

  16. Franklin says:

    I think the universe will thank Bruce when MS fixes this. I have similar symptoms during big builds.

    • Aaron says:

      Absolutely agree. Per usual Bruce has done a thorough job of root causing and pinpointing a serious flaw in Windows 10.

      Thanks Bruce!

  17. Jay B says:

    Disable hyperthreading if you have a newer Intel CPU. Horrible bug in them that nobody is releasing micro code for.

    • Jonathon Reinhart says:

      See Bruce’s comment above. That bug causes corruption in CPU registers, and has nothing to do with this Windows 10 bug.

      • jaimemmoreno says:

        Yeah I’m running an older 6 core 4930K Intel CPU that doesn’t have that new hyperthread bug and still seeing the problem on Win10.

  18. Rick James says:

    Hi Bruce,

    Long time! 🙂
    For MS folks: 12699333.

    Cheerz,
    Rick.

  19. JT Turner says:

    Isn’t there a flag on the compiler to force it to only use so many cores? Like -j 4 (use only 4 processes)?

    • brucedawson says:

      We invoke a separate compiler instance for each translation unit so we have full control of the amount of parallelism. The problem is that this Windows bug can manifest even if you only have one process per core. And, for distributed builds we aim to have ~20 processes per core and this works fine – except for this bug.

  20. John Walluck says:

    What a great job of analysis abd reporting. If MSFT doesn’t fix this issue quickly after this they’re not trying.

  21. Metro Melvin says:

    Nice work..

  22. John Doe says:

    Specs on all your hardware?

  23. Gwilbor says:

    I have a very similar problem on my laptop, it’s a HP Pavilion with Windows 8.1 (CPU is Intel i5-3230m 2.6 GHz: the only difference is that the trackpad is affected, but the external usb mouse is not. Anyway, since the first day, very often the mouse pointer freezes, while the rest of the computer seems to be working fine. Much of the time I am forced to use the keyboard to scroll webpages. Do you think is the same issue?

  24. DavidW says:

    You have a 24 core laptop? With 64G of RAM? What brand/model is this? Some sort of Xeon beast?

  25. Andrew says:

    So why is CPU Usage at 50% on idle. There’s your first problem.

    • brucedawson says:

      CPU Usage was not at 50% on idle. CPU usage was at 50% during a build of Chrome. Which is appropriate. In fact, CPU usage probably should have been higher, and after this bug in Windows is fixed it will be higher.

  26. jdrch says:

    Ummm I’ve seen PCs lock up when trying to exit too many things at once before. Nothing new here.

  27. adam c says:

    i’ve always been sensitive to mouse lag in windows 10, i put it down to the pc hardware ecosystem, things like cpu/gpu stepping 100hz~60hz monitors and the such, i thought it got better recently but im not using a 24 core system.

    p.s. total witch hunt here, but im noticing so many direct sound issues maybe related

  28. I think it’s ironic that there’s GDI-related lock contention even when Console-mode processes that never even touch GDI (such as the compiler) are closed.

  29. Jamesits says:

    This have been noticed by me since Windows 10 rs1. Strange input lags or frame drops happen when playing osu! on my 16 thread workstation with <25% CPU usage. Sometimes the mouse keeps freezing for ~3s or more.

    • Jamesits says:

      Plus, have you noticed taskmgr processes tab loads significantly slower when computer is on for 1 day than just after boot?

      • brucedawson says:

        I have not noticed that – it appears very quickly for me. And, your issues sound different because they aren’t around process destruction. Sorry, you may have to investigate them. Sigh… computers.

  30. Lennie says:

    This problem has always existed in Windows, Windows 7 was better at it than previous versions. Seems Windows 10 is worse again.

    Just run a Linux VM with your Chome build. 🙂

    • brucedawson says:

      No. This problem is new. My tests show it did not exist in Windows 7. Process creation/destruction may have always been slow on Windows, but that is actually separate from this issue.

      And I’m a Windows developer, using Windows tools to develop Windows software. A Linux VM isn’t going to help.

  31. George McCabe says:

    I’m running Windows 7 ultimate, 12 core i7cpu. My PC ran sluggish and choppy after I installed Chrome browser. After I would close Chrome it left a lot of processes still running in the background. I had to completely uninstall Chrome to get back to normal

    • brucedawson says:

      Chrome shouldn’t make your machine sluggish – that seems odd.

      The background processes are probably from background apps. See chrome://settings, advanced, “Continue running background apps when Google Chrome is closed” – set that to off.

  32. alvarolucas says:

    Sorry, I didn’t read it… because as soon as I read “I’ve a 24 CPU and I can’t move my mouse” I automatically think… what windows version do you have?… and I wasn’t wrong.
    Put a linux distro on your live! :-)) … and forget about an endless live of problems with no sense on windows…

    • barton96 says:

      You Linux evangelists are boring as a sack of rocks. It was cool maybe for a while in 1996, but not anymore.

    • MOW says:

      Try copying a 100GB file to a USB harddisk … Linux has its own scheduler problems. Hopefully 4.12 will fix this.

  33. plexus says:

    Thank you. Have the same problem, couldn’t identify it though. I have the i7-5820 with 6 cores (12 threads). I always thought some hardware must be broken or having issues.

  34. Diego says:

    Hola. A mi me paso parecido y lo solucione añadiendo un disipador con ventilador al chip GPU de mi placa base, a pesar de que tenía dos tarjetas en SLI.
    Hello. I had the same problema time ago. I could solve that fixing a fan in the GPU chip of my motherboard. Im sorry about my english.

  35. Sarreq Teryx says:

    hrm… my mouse starts locking up like that after a while, as well. Since it’s wireless, I assumed it was something interfering with the radio, but it’s so intermittent, I could figure out what might be the source. This certainly could explain it.

  36. Anonymous says:

    What about Windows 8.1? Can you test on it please?

  37. akraus1 says:

    Could be a desktop heap leak. The GDI resources are put onto a special user session bound heap which can leak if you e.g. call RegisterWindowMessage many times with different parameters. This type of leak is hard to find. Which sort of GDI resources are your build processes creating? You could try if this changes if you omit some processes of your build to check if one specific process is responsible for the degradation over time.

    • brucedawson says:

      That sounds plausible except that the serialization happens on a freshly booted machine running the ProcessCreateTests project (source code link in the post, and see the post-reboot image). So, if there is a leak then it is in the OS and is triggered even by programs that never touch user32.dll or gdi32.dll.

      • akraus1 says:

        In your test application you are calling to GetDesktopWindow and PeekMessage which are located in user32.dll. Even such innocent methods can use the desktop heap as implementation detail. There is no list of methods available which cause desktop heap allocations. If you can get away without a messae loop and PeekMessage then you should not see this degradation. Win10 is known for having a much slower VirtualAlloc performance for which a hotfix is available which could also be somehow related. As far as I know this fix is not public yet so you should check with MS support if this changes things.

        • brucedawson says:

          I am very careful to only call those functions in the master process, not in the 1,000 descendant processes. I did this to avoid the concerns that you raise and because the DETACHED_PROCESS mitigation doesn’t work if user32.dll is loaded.

  38. Doniel says:

    How’s the proceses IPC? Did you try diferent system clocks windows seems to sincronice badly with some of them.
    Try HPET, or windows default. Have to turn on/off in the bios and on off in the system to completely work.

  39. Ian Brodowski says:

    Out of curiousity, are you running the creators update (1703) or the anniversary update (1607), or the 1511 release?

    • brucedawson says:

      I am running Anniversary Edition at work and Creators Update at home. It is possible that the bug first showed up in Anniversary Edition because I didn’t notice it before I upgraded, but that may also be because of work-flow differences.

  40. People here in the comments are emphasizing too much on “mouse” and “chrome”. This problem is further beyond that. It can affect the entire machine and on different workloads within W10.
    I, for instance, couldn’t handle the hiccups and had to downgrade to W7.

    I used to have some Gradle projects and couldn’t even use my machine while compiling them. Suffered from it even on daily-base computer usage. Now on W7 it’s a total different story.
    I know this might not be the same bug, but it might also totally be.

  41. Lars Berntrop-Bos says:

    Curious if this bug is also in Terminal Server settings, i.e. present in Windows Server versions. Those frequently have lots of cores and lots of processes, making occurrence of the bug and hindering performance a serious risk.

    • brucedawson says:

      This bug is almost certainly present in the server editions. The mouse-hitching aspect of the bug is less likely to matter on a server, but the process-destruction bottleneck will affect some workloads.

  42. trawg says:

    This is really interesting. About 3-4 weeks ago I noticed the same kind of mouse skipping problem on my Windows 8.1 PC. I can’t remember how many cores I have (overseas at the moment) but I want to say 16, with 16GB of RAM. I ended up buying a new mouse to see if that was the problem but haven’t had a chance to use it long enough before I had to go overseas to see if it fixed it. I did wonder if it might be a recent weird low-level Windows patch that might’ve changed something but I thought it was equally likely to be my 4 year old mouse

    I took my “broken” mouse overseas with me to use on my laptop and it seems to be working fine. So I will be very interested to see if there is any progression on this issue.

  43. Rob K says:

    Very cool article. Explained so well for those of us trying to climb the steep curve. Thank you!

  44. Nicolas Ramz says:

    Windows is great, but there are lots of bottlenecks, like this one. One other area where it can (and *need*) to improve a lot is filesystem: Windows is really really slow at creating/deleting lots of files. Uncompressing an archive containing Firefox sources can take about 21 minutes (!!) while on a same machine, in a VM, with macOS machine it would take around 2 minutes. That’s 10x times slower than macOS that is slowed down by the VM…

    • Zaru says:

      Turn off Windows Defender real-time protection while extracting large archives (from known safe sources). Every new file created triggers a full scan in the background, that’s your main source of Windows 10 file system sluggishness.

      • Nicolas Ramz says:

        Tha’s better without Real-time protection: 5m13s to uncompress the same directory. But it’s still two times slower than macOS (that’s running inside a VM). Also, you cannot tel people to stop virus protection before doing heaving file operations…

    • Zaru says:

      And just to add: temporarily turning off real-time protection and using a RAM-drive (ImDisk VDD etc.) for large builds or any operation on large numbers of files, greatly speeds things up in Windows 10 as well.

  45. Marty G says:

    I have noticed in my experience that all computers and OSs I use seem to be getting more and more bogged down during process closing operations. Not to the point of UI freezes mind you, just in system load. In my mind it has seemed correlated with attempts in all OSs to deal with security issues involving clearing memory on dealloc and doing proper memory management when returning the freed mem to the available pool. This is a pure blackbox/shotgun line of thinking on my part. But can you think of any factor that would affect linux, android, windows process close loading that would be more a result of an overall approach; like an industry-wide way of doing things?

    • brucedawson says:

      I can’t think of anything. Zeroing of memory is usually done at allocation time or (on Windows) done asynchronously by the system process, and zeroing memory isn’t even a process-destruction specific issue. See this article for details:

      Hidden Costs of Memory Allocation

      • Marty G says:

        Thanks for responding. I should not have gone down the rabbit hole of memory dealloc as I don’t have any reason to suspect that in particular and I think I ended up focusing your answer at a level I didn’t intend. What I was getting at was that closing processes in general seem to take a lot more CPU than it used to across the OSs I use, which is all the majors except Apple stuff and was wondering if an industry-wide OS or userland programming strategy might be behind it. I know apps save a lot more state data when ending these days for example, but again, I don’t really know if that is what I’m seeing. I was just curious if some general response to security issues or similar may be resulting in this effect, but it is likely just bloat in size, statefulness, and features in general causing more cleanup to be required. I’m just throwing it out there. But back to the topic, excellent work in finding this specific W10 bug. MS should send you a check for your time!

  46. Any chance you could upload a binary of the 1000 process test app? I’d be interested if playing with this bug on a few versions of Windows, but I don’t have things set up to compile stuff.

  47. Marcos Sebastian Alsina says:

    Great investigation Bruce. Thanks a lot for your time, you saved mine :-). The question is who was the genious that put such Critical Section on such critical part of code.

  48. Just asking why process instead of thread?

    • brucedawson says:

      The ninja build system is designed around creating processes. And, because it invokes many different build tools (compiler, linker, python, etc.) it mostly has to use processes.

  49. Yeah think this is a flaw in chromes build system and not windows. First I don’t think chrome should be spawning many processes. They should be threads. Second even still there shouldn’t be an effort to have n threads/process where n is numb of cpus. Because they contend with each other for time on any thread to execute. It is actually improbable that any thread actually executing at same time. Threads should be used more abstractly. If there is so much logical parallel work than make threads for them, so that if and when ever possible they can be handled independently. But windows will handle them in the end as it sees fit. Either way you shouldn’t be using 48 process to compile a web browser, but go ahead try to get windows to change the operating system to fit google chrome. -_-

    • Richard M. says:

      “I don’t think chrome should be spawning many processes. They should be threads.”

      It’s not Chrome, it’s a Chrome build!

      “you shouldn’t be using 48 process to compile a web browser”

      What??????

      do you even know what you are talking about??? I hope you don’t work for Microsoft…

    • Richard M. says:

      –were not the best way, methinks, albeit it is not to be denied that authorities differ as concerning this point, some contending that the onion is but an unwholesome berry when stricken early from the tree whileas others do yet maintain, with much show of reason, that this is not of necessity the case, instancing that plums and other like cereals do be always dug in the unripe state yet are they clearly wholesome, the more especially when one doth assuage the asperities of their nature by admixture of the tranquilizing juice of the wayward cabbage and further instancing the known truth that in the case of animals, the young, which may be called the green fruit of the creature, is the better, all confessing that when a goat is ripe, his fur doth heat and sore engame his flesh, the which defect, taken in connection with his several rancid habits, and fulsome appetites, and godless attitudes of mind, and bilious quality of morals–

      King Arthur could have produced such comment on the topic…

    • brucedawson says:

      I’m not sure how ninja (Chrome’s build system) is going to spawn compilers/linkers/python/etc. as threads instead of processes.

      Ninja is very good at spawning the number of processes that I want. That is n-processes for local builds (perfectly saturating the local CPUs) and I use n*20 processes (-j 960) for distributed builds (because they use fewer local resources). This works very well, and will work even better once Microsoft fixes their regression – which I am confident they will do.

      I am quite familiar with the idea of threads/processes contending for the CPU and I am careful not to pointlessly over-saturate the CPUs.

      Chromium is open source. Give it a try. I think you will find that it has a very well implemented build system, across multiple systems. Contributions welcome.

      • Well yeah if you have each thread just work on a single thing, they will end up smothering themselves past a certain point as they compete for time on the cpu, but if they choose to take up work where others leave off they no longer follow that saturation rule. As more threads are used they increase the time on the cpu.

  50. Would this qualify for a MS bug bounty reward? I think you deserve it.

  51. Lee says:

    I mean… Windows 10 … in 2017 … -> roflmao
    Not because of your very specific but rather because of many different issues that W10 caused on my workstation (during a 2 year “grace period”) I switched back to Windows 7 and I don’t regret it at all.

    Everything just runs SMOOTHER.

  52. Set says:

    I am experiencing the exact same thing. Usually after Chrome was opened but even after I close Chrome it happens for a minute or two more.

    • brucedawson says:

      It’s not clear if that is the same issue or not, especially since I only encountered this issue when *building* Chrome, not when using it. If you can record an ETW trace of this poor behavior after you close Chrome (start tracing, close Chrome, repro the poor behavior, then save the trace) that would be helpful. You can then file a bug at crbug.com. UIforETW makes recording traces easy – go to https://tinyurl.com/etwcentral

  53. tester says:

    Verision of Windows 10 is????? So if have to bite Windows 10 atleast give as a full specyfication of your machine.

    • brucedawson says:

      I saw this problematic behavior with process exits on Windows 10 Anniversary Edition and Windows 10 Creators Update. It may have happened on more versions as well – I don’t know – but I suspect it first started happening in Anniversary Edition.

  54. Ark-kun says:

    Sorry for a not very relevant question, but as you work on Chrome and performance in Windows, you’re the closest expert for me.

    When I have many tabs open, the root Chrome process uses quite a lot of CPU and the performance degrades (e.g. the file download animation – the button expanding at the bottom of the window can take couple of seconds). This happens even when there are no CPU-hogging content processes (say, I’ve killed them…).
    Is this considered normal behavior (given that I have many tabs open) or should I collect some kind of trace and submit a bug?

    • brucedawson says:

      That is not normal. Even with lots of tabs open Chrome can be almost completely idle. It’s hard to speculate about what might be going on. It could be a bad web page, an ill-behaved extension, or a Chrome bug. Filing a bug at crbug.com and attaching a trace (chrome://tracing or ETW trace) would be helpful.

  55. santagada says:

    Did you fill a bug report with microsoft? I would love to hear their answer/patch.

    • brucedawson says:

      I reported the bug through informal channels. Earlier in the comments Rick James helpfully shared a Microsoft bug number: 12699333. Unfortunately I doubt the underlying details will be shared.

      • Christopher Katko says:

        Googling that number shows “crazy bad Windows Defender [remote execution] bug”. Is it possible that (per someone else’s mention) it was some sort of Windows Defender bug (linearly scanning all processes)? Or is the number wrong?

        Thanks.

        • brucedawson says:

          It’s not a Windows Defender bug – that would have showed up in the trace. It is a performance bug caused by a security change that made looking up GDI objects more expensive. I don’t know why searching on 12699333 finds articles about that other bug – I couldn’t find that number anywhere in the source of the result pages.

          Expect a fix “soon”.

  56. goawaey says:

    what is the cpu you have? is it haswell? broadwell?

    • brucedawson says:

      I first saw it on a Haswell Xeon processor. I then reproed it on a much older processor (circa 2011) and on a brand-new Kaby Lake. Which is to say, this bug has nothing to do with the processor you are running (other than that you need at least two cores to see it). It is a software performance bug.

  57. James says:

    Question: does the issue manafest with USB and ps/2 devices? You’re running a hefty setup , and High-end desktop proc dell towers still come with PS\2 ports (for obv government contract reasons) and wonder if it mitigated the issue.

    • brucedawson says:

      The type of mouse does not matter. Access to the Windows message queues is blocked by lock contention. The issue is not even specific to mouse movement – it affects anything that needs the lock, which *includes* general responsiveness of all UI programs, and more.

  58. Jeremiah Penery says:

    I ran the ProcessCreateTest.exe a few times on my machine (3930k, 6 cores/12 threads, Windows 10). Cutting out all the process creation parts:

    Process destruction took 0.686 s (0.686 ms per process).
    Lock blocked for 0.085 s.
    Average block time was 0.012 s.

    Process destruction took 0.657 s (0.657 ms per process).
    Lock blocked for 0.005 s.
    Average block time was 0.001 s.

    Process destruction took 0.656 s (0.656 ms per process).
    Lock blocked for 0.029 s.
    Average block time was 0.004 s.

    Process destruction took 0.644 s (0.644 ms per process).
    Lock blocked for 0.009 s.
    Average block time was 0.001 s.

    Process destruction took 0.635 s (0.635 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Strange that I’m not seeing any issues here.

    • brucedawson says:

      It is a bit strange, but not totally. The effect only really becomes noticeable (without looking at an ETW trace) on machines that have been up for a while and heavily used – whatever that means. Mine is also behaving well at the moment – go figure.

      • Lars Berntrop-Bos says:

        I would love to know the actual build of WIndows 10 your on. For my development environment I have seen several bugs squashed only since a specific build, 15063.447. An overview of builsds and corresponding KB numbers is here: https://technet.microsoft.com/en-us/windows/release-info.aspx
        The build number is listed at Settings:System:About:OS Build.
        One of the bugs squashed was in WinForms, where normal userland code could cause a 0x7f aka Unexpected_kernel_mode_trap bluescreen….

  59. Konstantin says:

    The new half transparent windows calculator moves slower than other windows when dragged with the mouse.
    Machine: AMD FX-8320 (technically 4 cores, 8 threads), AMD RX580 GPU, 4 GB of RAM (I know that the RAM is the bottleneck in many cases).

    But – could this be a similar problem? I recently swapped GPU to RX580 (for gaming, not for mining cryptocurrencies) – and the only thing it could have problems with is with moving the calculator window…
    If I think about it it can only be Windows…

  60. I’ve run your compiled “ProcessCreatetests.exe” program on my laptop and all processes terminated rather quickly (around 1.488 s – creation, 1.388 s – destruction).

    Testing with 1000 descendant processes.
    Process creation took 1.488 s (1.488 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.388 s (1.388 ms per process).
    Lock blocked for 0.284 s.
    Average block time was 0.024 s.

    All done on Windows 10 Pro Insider Preview, Build 16232.
    Processor Intel Core i7-2630QM @ 2.00GHz, 8GB RAM.

    • brucedawson says:

      And? There is some randomness in how long the processes take to exit, with how long your system has been up being one factor. But, it’s quite clear that your system was suffering from the problem that I found. The lock was blocked for at least 284 ms which is far longer than it should be. If process destruction wasn’t serialized then the processes would have terminated even faster, and without the risk of micro-hangs.

  61. Viet says:

    I really want to know when MS fix this.
    Does they really accepted this as a bug ?

    • brucedawson says:

      Microsoft is aware of the issue (I have talked to them about it informally) and there is a bug filed (apparently 12699333). I suspect that it will be fixed for the Fall Creators Update, but I don’t know for sure.

      • mohammed imran says:

        Why don’t your file a bug report using the feedback app available. And then we all vote up and push them to deliver a fix.

        • brucedawson says:

          I don’t have a lot of faith in the feedback app, and it probably isn’t necessary in this case. But, not a bad idea. Feel free to do that and post it here and/or tweet a link.

          • mohammed imran says:

            Hi Bruce,
            Ok i shall file a bug report on the feedback, but would also like yourself to comment on the report with further data,as we need to squash this bug for all Windows Users. For the greater good.

          • mohammed imran says:

            here the report filed in the feedback app
            https://aka.ms/Av19n4

          • J_s8 says:

            Yep.. I’ve got an impression that there is some sort of black hole in between submitted feedback and cognitive processing. I’ve submitted over 100 feedback and none is replied or acknowledged – oh boy I must be bad in this…

  62. jeffstokes says:

    Thanks Bruce for the write-up here. I expected this to be hard-core bad driver/DPC issues. 😛

  63. Mohammed Imran says:

    Mr. Jimmy A at the link said this
    quote/
    Is it locking up, or is it a redraw issue? I’ve run into this as well in the past, what looked like locking up was actually the video driver not redrawing the screen fast enough. I could tell by moving the cursor across the screen quickly at a length I knew it should make it across, which it did, but skipped across. If it was not a redraw issue, the cursor would not make it all the way across the screen, as the x/y commands from the mouse would be dropped and never received from the computer, which would indicate a problem with actual processing of the information (which is what you are indicating). If you are working remotely, there are a few more pieces to the puzzle and the introduction of an additional video card and the network connection./unquote

    any reply Mr.Bruce?

    • brucedawson says:

      Why do you think that x/y movement commands from the mouse would be dropped? And what link are you referring to where this comment came from?

      Regardless, the issue is well understood – a crucial system lock is held for too long during process destruction. The heavy contention for this lock causes long delays in accessing message queues and other resources, which leads to repeated hangs. There is no need to invoke other explanations.

      And, anyone who wants to explore this on their own can do so using the supplied test program.

  64. TG2 says:

    @brucedawson – I read english and I understand the words but following as deep as you did is just not in my world (light programing .. think “hello world” compared to your expertise)…

    I question, do you think what you’ve found could be effected outside of running compiles, and heavy loads?

    Search the web for mouse lag, and you get tons of hits and complaints freshly booted PC’s, PC’s up for hours or days, SSD’s for drives, regular HD’s too, USB wired, USB dongle’d, bluetooth and non …

    You think its frustrating and you’re a heavy user .. think of what its like for the rest of us, not anywhere near the workload you’re putting on a system, and we can’t click on icons because our mouse’s cursor won’t “get there” or goes beyond because it finally catches up with where your mouse was moved too .. etc etc etc etc …

    I’m not the basic user.. have anywhere from 8 to 25 windowed apps open, 3 different browsers (FF, Chrome, Vivaldi, sometimes IE as the 4th one), outlook, SecureCRT (putty), various FTP clients, and something like Winamp, or Spotify, or even Itunes .. etc.. so use is more than just your basic user .. but not advanced like you … and the frustration knows no bounds when trying to perform simplistic work, and the mouse just won’t do what it needs to. 😦

    • brucedawson says:

      The problem that I found was specific to processes being destroyed at a very high rate – more than would ever be encountered on a ‘normal’ system. It may be that you are encountering something else hogging the same lock, or it may be something completely different. Unfortunately there is no easy way to determine from afar – there are far too many possible causes. I understand your frustration, and I’m just glad that I am able to investigate the issues that bother me.

  65. J M says:

    Technical details you provided are beyond what I know about Windows, but I thought I’d just add on an experience that I suspect is related to the issue you described with process destruction.

    I run a Matlab script that uses ActiveX control of MS Word to copy and paste graphics (~100 images, one at a time) into a Word document.The graphics are not rendered to the screen.

    In Windows 7 this could run in the background without much disrupting my usage (some webpages would occasionally flicker).

    Running the same thing in Windows 10, two changes occurred:

    1. Mouse lags on every copy/paste.

    2. I had to put a pause into the script between each copy/paste pair, because the previous pasting would lock up the Word application and cause the script to crash.

  66. H says:

    A few thoughts….

    1. Why would the updating of the mouse/cursor need to share a lock with code that “terminate” a process?
    2. As far as I know, the cursor/mouse is handled by “hardware”, and traditionally the mouse have been known to continue to “work” and be updated on the screen when just about all other activity have seized (crashed/hung/freezed), even during severe fatal errors.
    3. Perhaps this has nothing to do with closing processes at all, perhaps it related to closing threads?
    4. When I provoke this hitchy behaviour, I see only problem with input from the mouse, the keyboard seem to not be affected at all with these mini freezes…strange…

    • brucedawson says:

      1. It appears that the same lock protects GDI objects and message queues. I don’t know why. Ask Microsoft? I’m just reporting what the trace tells me.
      2. Yes, the mouse cursor is typically implemented as a hardware sprite. But in a multi-process environment there can be multiple inputs/programs moving the mouse, so a lock is not surprising. But, I think the lock is not protecting the mouse per-se, but message queues, some of which control mouse movement.
      3. Maybe, but Occam’s razor says that if you’re in a function called NtGdiCloseProcess then maybe it has something to do with closing a process.
      4. Mouse input can easily come in at ~125 Hz, and delays of just 10-20 ms are noticeable. Keyboard input delays have to be slightly longer before they are noticeable. That’s probably the difference.

      I sense some skepticism in your comment. That’s fine, but understand that I’m not guessing about what is going on. The ETW traces and my stand-alone repro make most aspects of the behavior completely clear. Guesswork was, generally, not required. And, Microsoft is working on fixing the problem.

      • Juhani Suhonen says:

        hmm.. I wonder if new advanced mice actually make this issue worse; my logitech is using 1000Hz polling rate.

      • H says:

        The scepticism is more related to the fact that I have a similar problem, but without starting/terminating any processes at all, so there for my question if perhaps it is related to threads and not processes….and NtGdiCloseProcess close threads as well? Also my curor freezes can be anything from parts of a second to, perhaps, 5 seconds….during which time keyboard and other stuff works just fine.

        • brucedawson says:

          NtGdiCloseProcess is, as far as I know, only called when processes are going away. That said, this shared lock may well be acquired when threads are closing as well. But holding on to the lock during thread destruction would be a separate bug. The only way to figure out the problem would be by recording and analyzing an ETW trace.

          The fact that keyboard and other stuff works fine suggests that the root cause may be unrelated because the lock that I was seeing contention on is needed to read any input messages, not just mouse messages.

  67. Would be awesome to get some feedback on our latest RS3 insider build 16281 to see if it’s improved the situation with this issue: https://blogs.windows.com/windowsexperience/2017/09/01/announcing-windows-10-insider-preview-build-16281-pc/#6vZ0PrVezurp10is.97

    • jeffstokes says:

      If we didn’t have to beta test all of Windows just to get this fix I’d test for you and confirm.

      Pity Microsoft can’t cut an LDR/QFE to fix a bug anymore and instead forces a 4-6GB install to fix one issue (I care about anyway).

      Kudos to you guys though, James Clark, for paying attention here, tha’ts nice at least. I guess I can expect to get this fix like, next year, in CB.

    • brucedawson says:

      I have an Insider Build machine but it’s not on fast-ring. I will test the fix when it ships to regular insider builds – feel free to ping me here or on twitter when that happens.

    • brucedawson says:

      I just looked at a trace of ProcessCreateTests.exe running on 16281 and the graph of CPU usage during process destruction looks unchanged. For the central portion I see that process destruction is serialized, apparently still blocked in win32kbase.sys!NtGdiCloseProcess. This continues to confuse me since these processes have zero GDI objects.

      In short, I don’t see any signs of a fix to the bug. Am I missing something? Grab an ETW trace and graph CPU Usage by process name to see what I’m seeing.

      • Juhani Suhonen says:

        I will take my debugging glasses and deep into this later today. Based on your analysis it seems that there are two related issues: A) win32kbase.sys!NtGdiCloseProcess which is serialized for some reason (bad design?) and B) Some other issue, which causes significant delay in executing NtGdiCloseProcess after certain uptime.

        I concluded existence of B based on your observation that machine becomes slow only after some time of usage. Therefore, my previous posting (and the change of ProcessCreatetests.exe results) may not be due to improved code, but fresh boot.

  68. Juhani Suhonen says:

    I can confirm that build 16281 seem to (at least partially) resolve the issue. I don’t have time (and knowledge :D) enough to debug what changed but the results of running ProcessCreatetests.exe shows the following:

    Windows 10 build 15063.540 (mainline)
    Testing with 1000 descendant processes.
    Process creation took 3.158 s (3.158 ms per process).
    Lock blocked for 0.012 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.783 s (1.783 ms per process).
    Lock blocked for 0.854 s.
    Average block time was 0.078 s.

    Windows 10 build 16281 (insider fast)
    Testing with 1000 descendant processes.
    Process creation took 0.862 s (0.862 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 0.946 s (0.946 ms per process).
    Lock blocked for 0.084 s.
    Average block time was 0.009 s.

    • brucedawson says:

      The ProcessCreateTests behavior also depends on how long a system has been running. So, first-run after install is always better, can lead to false confidence in a fix.

      The only way to tell for sure is to disable Defender’s real-time checking and then record an ETW trace of ProcessCreateTests, and graph the ProcessCreateTests CPU Usage (Precise) – by process name. Share a trace and I can check.

      Honestly, lock blocked for 0.854 s does not look like it was fixed.

      • Bruce, it went from 0.854 s to 0.084 s. Looks like pretty good improvement to me

        • brucedawson says:

          Ah – I read too quickly and didn’t see that it was before/after. You still have to be careful however because if the “before” run was after the system had been up for a while and the “after” run was after a reboot then it’s not a valid comparison. I should print up-time as part of the test. And, the lock contention time should really be *zero*.

          Anyway, I’ll get an ETW trace at some point and report back.

          • Juhani Suhonen says:

            I have ETW trace now with build 16281, do you still need it for further investigation? I’d rather not post it to public forum 😉

            below is the console output from ProcessCreatetests.exe with freshly booted OS.

            Testing with 1000 descendant processes.
            Process creation took 0.679 s (0.679 ms per process).
            Lock blocked for 0.000 s.
            Average block time was 0.000 s.

            Process termination starts now.
            Process destruction took 0.606 s (0.606 ms per process).
            Lock blocked for 0.000 s.
            Average block time was 0.000 s.

            Elapsed uptime is 0.01 days.
            Awake uptime is 0.01 days.

  69. mohammed imran says:

    @brucedawson how long should a system have being online or uptime?

    • brucedawson says:

      I don’t understand the question. The process destruction lock contention gets worse with a system that has been used for a while, but how much worse depends on how heavily it is used and on ??? The only way to do an even comparison is to compare two freshly rebooted systems.

  70. Christopher Katko says:

    I just ran the binary and my mouse freezes briefly in Windows 7 64-bit with an AMD FX-8370 (8 cores). Are you sure the bug doesn’t affect Windows 7?

    — SNIP

    Testing with 1000 descendant processes.
    Process creation took 0.575 s (0.575 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.139 s (1.139 ms per process).
    Lock blocked for 0.615 s.
    Average block time was 0.088 s.

    Elapsed uptime is 12.04 days.
    Awake uptime is 12.04 days.

    • brucedawson says:

      The change which increased the cost of deleting GDI objects was added in Windows 10 Anniversary Edition. However your results do seem to show significant lock contention during process destruction. My Windows 7 testing was restricted to a four-core desktop and more cores makes the problem easier to expose, but I think that the problem you are hitting is a different flavor of the same issue. Share a trace and I can take a quick look. Maybe some third-party software is hooking in to process destruction, or ???

      • Christopher Katko says:

        I’ve never run an ETW trace before. But I followed your guide best I could and hit trace, ran the program, and stopped the trace. I also added the program exe name into the settings.

        Here’s a (temporary) public link to the 7zip of the trace:

        https://drive.google.com/open?id=0B8Cyek_k55TiVlBwOC0wWkcyQTA

        Let me know if I need a different trace or something. Thank you.

        • brucedawson says:

          First off, your system is incredibly busy. It’s amazing how many processes are all simultaneously trying to consume lots of CPU time. This complicates the analysis because ProcessCreateTests.exe is fighting for CPU time with Steam, Chrome, and lots of other things. Maybe one of these processes is somehow making GDI object destruction more expensive on your Windows 7 machine, but I don’t know. For some reason HmgNextOwned is very expensive on your machine, while holding the lock, whereas on my Windows 7 machine it is not.

          Unfortunately this mystery will have to be left for Microsoft to investigate, but they won’t.

          • Christopher Katko says:

            Yeah… I thought about that. (I actually shut off VMWare which was running a full Windows 10 platform with a SQL server. =D) But I didn’t have a chance to save all my work and shut everything else off at the time. It’s strange that Steam is an issue since I wasn’t playing any games or updating them at the time…

            I appreciate you looking at it! I can nuke everything except raw windows (maybe even safe-mode…), run the test again and post a cleaner trace if you want.

            This is one thing that really frustrates me with Windows. The closed-source nature means you’re “on the outside looking in”. Whereas I can–and have on many occasions–found a strange error message in Linux and then ended up tracking down the source code for the answer. And it wouldn’t be so bad with Windows, but as we all know, getting Microsoft involved in fixing their own software (even with a full core dump/trace/etc proving it) is an exercise in patience and learning to speak Hindi. I have multiple clients that have run into issues that are “Microsoft problems” and after paying for support it still goes nowhere, one frustrating conference call after another frustrating screen share.

            • brucedawson says:

              It was odd how much stuff was running and how busily. I have Chrome setup to restore the previous set of pages so I can easily shut it down when recording traces. As for Steam… I no longer have symbols so I can’t guess what they were doing. Updating some game perhaps?

              I’m torn between curiosity (what triggers this odd Windows 7 behavior!) and apathy (not my machine, not my OS, nothing I can do) and I’m afraid apathy wins. I do appreciate your sharing your trace.

  71. Mohammed Imran says:

    Dear Mr. Bruced, recieved a reply from Ms asking for validation of this bug report, so can all the concerned party please reply and add feedback if the bug still exists?

    https://aka.ms/av19n4

    I think they applied bandage on the issue rather then fix it.

    • brucedawson says:

      I agree that they have not fixed the issue. I have looked at two “post-fix” ETW traces of ProcessCreateTests.exe running and I see no sign of improvement. I believe that a real fix would be a dramatic improvement.

  72. I have the same issue, installed 16281 build and the problem persists. The only way that kept my system running and my sanity was to disable the graphics driver. Obviously my second display is a no show but the problem so far has disappeared.

  73. mohammedimran says:

    now that the final build is out, can you please test it again and confirm the issue is resolved or not? Mr. Bruce :).

  74. Mohammed Imran says:

    Tested on latest patch
    Microsoft Windows [Version 10.0.16299.98]
    (c) 2017 Microsoft Corporation. All rights reserved.

    Testing with 1000 descendant processes.
    Process creation took 3.292 s (3.292 ms per process).
    Lock blocked for 0.002 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.384 s (1.384 ms per process).
    Lock blocked for 0.001 s.
    Average block time was 0.000 s.

    Elapsed uptime is 0.03 days.
    Awake uptime is 0.03 days.

  75. p_lider says:

    Maybe you are facing the same issue with CPU Scheduler like it is seen on ThreadRippers? Recently it was confirmed that this is a bug in CPU Scheduler in Windows. You can try to use CorePrio software with NUMA disassociation enabled to help in the performance. Look here for more details: https://www.youtube.com/watch?v=M2LOMTpCtLA

  76. Juhani Suhonen says:

    @p_lider: umm.. no. As Bruce explained, the bug was present in certain Windows 10 versions, and although a software hack (in theory) could circumvent the bug, CorePrio does nothing that would help to resolve this particular bug.

  77. Andrei says:

    I have RS5 installed (1809 build 17763.253) and it seems that the bug returned.

  78. Andrei says:

    Procmon shows indeed gdi32.dll being loaded many times by cl.exe, git.exe and even msbuild.exe, but why, that remains a mistery. However, the count is not that big, 2000 times in 20 minutes.
    I will probably need to investigate this issue myself in WPA. It’s annoying to not be able to do anything while running the build scripts.

    • brucedawson says:

      That suggests that some sort of extension DLL is installed on your system that has a dependency on gdi32.dll – a hook of some sort, perhaps. cl.exe doesn’t normally pull in gdi32.dll. Look at the other DLLs loaded in to cl.exe and see if any of them look suspicious and/or have a dependency on gdi32.dll directly or indirectly (shell32.dll, etc.) – good luck.

  79. 265 993 303 says:

    Could the mouse cursor sizes and the hardware effects associated with them affect the mouse motion performance?
    Windows up to Windows 2000/ME: 32×32 cursor at all DPI
    Windows XP/Vista: 32×32 cursor up to 149dpi, 64×64 cursor for 150dpi and up
    Windows 7/8/8.1: 32×32 cursor up to 143dpi, 48×48 cursor for 144dpi—191dpi, 64×64 cursor for 192dpi and up
    Even the 64×64 cursor may have mouse pointer shadow, and the hardware effect associated with 64×64 mouse pointer shadow in Windows XP and up might be problematic for graphical performance.

    • brucedawson says:

      The CPU/GPU load associated with a larger cursor should be irrelevant, I think. A 64×64 cursor could be about 64x64x4-bytes = 16 KiB of memory. Reading and writing that a half-dozen times 1,000 times per second would be 96 MiB of memory bandwidth per second, which barely registers.

      The problem in this case was some other (very expensive) operations that required the same lock as updating the mouse pointer, so the mouse-pointer updates were blocked for user-visible periods of time waiting for the lock to be available.

      • 265 993 303 says:

        The sizes I mentioned before are the SM_CXCURSOR and SM_CYCURSOR sizes but with SetSystemCursor it is possible to set much bigger cursors, in Windows 7 I was able to set 16384×16384 monochrome cursor and still have mouse pointer shadow, I was also able to set 32768×32768 cursor although it lost mouse pointer shadow. How would that affect the mouse motion performance?

        • Jeff Stokes says:

          Apologies for hopping in here,

          I don’t know the impact of such a change as manipulating the cursor size but going back to your general concern on shadow and rendering. as long as Nested Page Table entries/Shadow Page Table entries are available (which all modern cpus have I believe) this shouldn’t be an issue, except maybe in a virtual gpu scenario.

          I do know when we made the Win7 guidance on creating a VDI image we disabled all the shadows/etc that Aero brought to bear by default.

          hth and is relevant.

          • brucedawson says:

            A 16,384×16,384 cursor would be 1 GiB (assuming 4 bytes per pixel). Updating that at 1,000 fps would not be possible. Updating that at a reasonable rate would be possible on most machines but extremely taxing.

            I’m not sure it’s really relevant, however. Yep, a huge cursor would stress the system. But this blog post was reporting on a situation where any arbitrarily small cursor would stutter.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.