24-core CPU and I can’t move my mouse

Posted on July 9, 2017 by brucedawson

This story begins, as they so often do, when I noticed that my machine was behaving poorly. My Windows 10 work machine has 24 cores (48 hyper-threads) and they were 50% idle. It has 64 GB of RAM and that was less than half used. It has a fast SSD that was mostly idle. And yet, as I moved the mouse around it kept hitching – sometimes locking up for seconds at a time.

Update July 27, 2017: a follow-up post dissects the problem and finds the root cause.

Update Oct 29, 2017: a video showing how to see if the bug is fixed can be found here, and the bug is fixed in the 17025 insider preview builds.

Update Nov 20, 2017: the fix has made it to Creators Update (RS2) which means I can now build Chrome without encountering micro-hangs!

Update, March 22, 2018: the fix has made it to Fall Creators Update (RS3), finally, so the fix is now everywhere.

Update, August 17, 2018: analysis of an unrelated bug that also caused UI hangs due to lock contention is available here.

Update, September, 2020: the original issue was UserCrit contention while many processes were destroyed, whether they had gdi32.dll loaded or not. That issue was fixed but if many processes are destroyed that do have gdi32.dll loaded then the issue still happens. That is, the mouse hitches and the horrible performance are still a risk. Maybe someday Microsoft will fix that, but probably not. For details see this blog post.

So I did what I always do – I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10.

The ETW trace showed UI hangs in multiple programs. I decided to investigate a 1.125 s hang in Task Manager:

In the image below you can see CPU usage for the system during the hang, grouped by process name – notice that total CPU usage rarely goes above 50%:

The CPU Usage (Precise) table showed that Task Manager’s UI thread was repeatedly blocked on calls to functions like SendMessageW, apparently waiting on a kernel critical region (which are the kernel-mode version of critical sections), deep in the call stack in win32kbase.sys!EnterCrit (not shown):

I manually followed the wait chain through a half-dozen processes to see who was hogging the lock. My notes from the analysis look something like this:

Taskmgr.exe (72392) hung for 1.125 s (MsgCheckDelay) on thread 69,196. Longest delay was 115.6 ms on win32kbase.sys!EnterCrit, readied by conhost.exe (16264), thread 2560 at 3.273101862. conhost.exe (16264), 2560 was readied at 3.273077782 after waiting 115,640.966 ms, by mstsc.exe (79392), 71272. mstsc.exe was readied (same time, same delay) by TabTip.exe (8284), 8348, which was readied by UIforETW.exe (78120), 79584, which was readied by conhost.exe (16264), 58696, which was readied by gomacc.exe (93668), 59948, which was readied by gomacc.exe (95164), 76844.

I had to keep going because most of the processes were releasing the lock after holding it for just a few microseconds. But eventually I found several processes (the gomacc.exe processes) that looked like they were holding the lock for a few hundred microseconds. Or, at least, they were readied by somebody holding the lock and then a few hundred microseconds later they readied somebody else by releasing the lock. These processes were all releasing the lock from within NtGdiCloseProcess.

I was tired of manually following these wait chains so I decided to see if the same readying call stack was showing up a lot of times. I did that by dragging the Ready Thread Stack column to the left and searching the column for NtGdiCloseProcess. I then used WPA’s View Callers-> By Function option to show me all of the Ready Thread Stacks that went through that function – in this view the stack roots are at the bottom:

There were 5,768 context switches where NtGdiCloseProcess was on the Ready Thread Stack, each one representing a time when the critical region was released. The threads readied on these call stacks had been waiting a combined total of 63.3 seconds – pretty impressive for a 1.125 second period! And, if each of these readying events happened after the thread had held the lock for just 200 microseconds then the 5,768 readying events would be enough to account for the 1.125 second hang.

I’m not familiar with this part of Windows but the combination of PspExitThread and NtGdiCloseProcess made it clear that this behavior was happening during process exit.

This was happening during a build of Chrome, and a build of Chrome creates a lot of processes. I was using our distributed build system which means that these processes were being created – and destroyed – quite quickly.

The next step was to find out how much time was being spent inside of NtGdiCloseProcess. So I moved to the CPU Usage (Sampled) table in WPA and got a butterfly graph, this time of callees of NtGdiCloseProcess. You can see from the screen shot below that over a 1.125 s period there was, across the entire system, about 1085 ms of time spent inside of NtGdiCloseProcess, representing 96% of the wall time:

Anytime you have a lock that is held more than 95% of the time by one function you are in a very bad place – especially if that same lock must be acquired in order to call GetMessage or update the mouse position. In order to experiment better I wrote a test program that creates 1,000 processes as quickly as possible, waits half a second, and then tells all of the processes to exit simultaneously. The CPU usage of this test program on my four-core eight-thread home laptop, grouped by process name, can be seen below:

Well, what do you know. Process creation is CPU bound, as it should be. Process shutdown, however, is CPU bound at the beginning and the end, but there is a long period in the middle (about a second) where it is serialized – using just one of the eight hyperthreads on the system, as 1,000 processes fight over a single lock inside of NtGdiCloseProcess. This is a serious problem. This period represents a time when programs will hang and mouse movements will hitch – and sometimes this serialized period is several seconds longer.

I’d noticed that this problem seems to be worse when my computer has been running for a while so I rebooted and ran the test as soon as my laptop had settled down. The process-shutdown serialization is indeed less severe, but the issue is still clearly present on the freshly rebooted machine:

I then ran the same test on an old Windows 7 machine (Intel Core 2 Q8200, circa 2008) – you can see the results here:

Process creation is slower, as you would expect from a much slower CPU, but process destruction is as fast as my new laptop at its best, and is fully parallelized.

This tells us that this serialization on process shutdown is a new issue, introduced sometime between Windows 7 and Windows 10.

48 hyper-threads, 47 of them idle

Amdahl’s law says that if you throw enough cores at your problem then the parts that cannot be parallelized will eventually dominate execution. When my work machine has been heavily used for a few days this serialization issue gets bad enough that process-shutdown becomes a significant part of my distributed build times – and more cores can’t help with that. In order to get maximum build speeds (and if I want to move my mouse while doing builds) I need to reboot my machine every few days. Even then my build speeds are not as fast as they should be, and Windows 7 starts to look tempting.

In fact, adding more cores to my workstation makes the UI less responsive. That is because Chrome’s build system is smart enough to spawn more processes if you have more cores, which means that there are more terminating processes fighting over the global lock. So it’s not just “24-core CPU and I can’t move my mouse” it’s “24-core CPU and therefore I can’t move my mouse.”

This problem has been reported to Microsoft and they are investigating.

Just one more thing…

This is what what my process create test program looks like when run on my 24-core work machine:

See that tiny horizontal red line on the bottom right? That’s Amdahl’s law visualized, as 98% of my machine’s CPU resources sit idle for almost two seconds, while process destruction hogs the lock that I need in order to move the mouse.

These are before/after traces from March 22, 2018, the date the fix made it to Fall Creators Update. The images show the process-destruction portion of ProcessCreateTests.exe. You can clearly see the serialization (one out of four cores allowed to run at a time) in the before image, and the perfect parallelization and much better performance in the after image. The horizontal (time) scales are the same in both images.

GDI Serialization fixed

Resources

The ProcessCreateTests code is available here. Deeper investigation of the functions that hog the lock was done in a follow-up post here, including an understanding of the likely root cause of this new problem. A video showing how to investigate this bug can be found here.

Discussions of this post can be found at:

If you liked this post you might like these other investigative reporting posts:

You Got Your Web Browser in my Compiler!

Windows Slowdown, Investigated and Identified (and the follow-up)

PowerPoint Poor Performance Problem

Self Inflicted Denial of Service in Visual Studio Search

24-core CPU and I can’t type an email (part one)

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in Investigative Reporting, uiforetw, xperf and tagged UI hangs, wait analysis, WIndows 10. Bookmark the permalink.

214 Responses to 24-core CPU and I can’t move my mouse

jaimemmoreno says:

July 9, 2017 at 11:18 pm

I’ve also seen this on my hexacore PC running Win 10. It’s really annoying when it happens. Never saw the hitching with the mouse or keyboard input when running Windows 7 or Windows 8 on same machine so think the problem was introduced between Windows 8 and Win 10.

Reply
- sampmccormack says:
  
  July 10, 2017 at 7:11 am
  
  I’ve got no issues with an 8-core Ryzen 1700 on Windows 10, is your chip Intel?
  
  Reply
  - brucedawson says:
    
    July 10, 2017 at 8:46 am
    
    If it’s the issue that I found then the type of CPU does not matter. What does matter is the workload you are running. It’s possible to hit the issue on a four-core CPU, or to not hit it on a 24-core machine, depending on what you are doing.
    
    Reply
    - TheGreatCabbage says:
      
      July 10, 2017 at 9:44 am
      
      Ah, that’s interesting. Thanks for the info 🙂
      
      Reply
    - suck3rs says:
      
      July 10, 2017 at 2:27 pm
      
      Actually, it can and does matter. There is in fact a bug with newer Intel CPUs that was announced by Intel. If my memory serves it sounds like what you are experiencing.
      
      Reply
      - Chad Layton says:
        
        July 10, 2017 at 4:47 pm
        
        I think your memory does not serve: https://forum.manjaro.org/t/bug-in-intel-cpus-skylake-a-kaby-lake/26587.
        
        Reply
      - suck3rs says:
        
        July 10, 2017 at 5:09 pm
        
        So you link a german .org? Go read up on the actual issue. The bug is cause from a very specific work load hitting specific registries. It can cause corruption and a number of other issues. Although, both AMD and Intel are IBM compatible a bug in one brand can be present but not in the other.
        
        Reply
      - brucedawson says:
        
        July 10, 2017 at 9:30 pm
        
        Yes, there is a bug in newer Intel CPUs. However the issue I found is a software bug, first discovered on older Intel CPUs. The Intel hardware bug tends to cause crashes, not lock contention.
        
        Reply
      - tachyon1 says:
        
        July 13, 2017 at 2:02 pm
        
        You might try actually reading _this_ article here first before posting irrelevant comments. Just a thought.
        
        Reply
        
        OldHack says:
        
        August 12, 2018 at 4:17 pm
        
        Exactly. Read the article twice. This is about a Windows spin lock contention bug, on a 12 core, that effects all Windows 10 machines, that were running prior versions of Windows 10. (
        
        The Intel bug is different, in scope, and effect.
        
        Try reading the article, and It is well written.
        
        I used both Process Explorer and Process Monitor to hunt down this same problem, Win10/4core/8thred box.
        
        2.1 Version 1507 [BUG]
        2.2 Version 1511 (November Update) [BUG]
        2.3 Version 1607 (Anniversary Update) [BUG]
        2.4 Version 1703 (Creators Update) [BUG]
        **Build 1705 which the bug was fixed
        2.5 Version 1709 (Fall Creators Update) [?]
        2.6 Version 1803 (April 2018 Update) [?]
        2.7 Version 1809 (October 2018 Update) [?]
        2.8 Version 1903 [?][?][?][?]
        
        Thanks. Great work, and great article. I wish there was a place to talk about massive builds like this, and using distrubuted make/compiles…
        
        Reply
yuhong says:

July 10, 2017 at 12:13 am

I can’t imagine that moving the Win32k stuff back to CSRSS would help much in this case, right? Though it is still a good thing especially for terminal servers where hopefully one CSRSS process crashing just terminate the session.

Reply
James says:

July 10, 2017 at 12:24 am

I’m almost crying in tears. This is exactly what happens with my recent 8 core 6900k on Windows 10 and couldn’t find what could be the reason. Specifically when that behaviour is not happening on Linux!

Reply
MBH says:

July 10, 2017 at 12:34 am

If you have a Skylake or Kaby Lake CPU, there’s a bug in their hyperthreading code, so disable hyperthreading and see how it goes.
That bug caused a lot of segfaults on Linux and data corruption in registers.

On Linux, you can circumvent this by using the intel-microcode package which modifies the CPU’s microcode on every boot and fixes this issue.

Reply
- brucedawson says:
  
  July 10, 2017 at 8:48 am
  
  I have heard of that bug but it is not related to this issue. The lock contention issue that I discovered is a pure Windows 10 performance bug, not a CPU flaw.
  
  Reply
rezna says:

July 10, 2017 at 1:16 am

When I was playing with IncrediBuild (a distributed build system) few years ago, I encountered that having more than 4 or 8 threads building the code useless (Lenovo T420, 8GB RAM, i5 Core, mSata SSD drive) because the overhead of spawing and killing the processes was to high. May be the Chrome build system might also try throttling the threads at some reasonable level.

Reply
- brucedawson says:
  
  July 10, 2017 at 8:49 am
  
  The ninja build system (used by Chrome) carefully manages the number of processes created in order to avoid overwhelming the system. It scales to the number of cores. We are not overloading the system and if we throttle process creation we will build Chrome more slowly.
  
  I am not aware of any mitigation which we can do – we need a fix from Microsoft.
  
  Reply
  - Allan Jensen says:
    
    July 10, 2017 at 12:47 pm
    
    Ninja doesn’t manage anything. It just measures how many virtual cores you have and launches that number of parallel proces by default. You can control it a bit with the command line arguments -j (number of jobs) and -m (maximum load)
    
    Reply
    - gim (@gim) says:
      
      July 11, 2017 at 3:14 am
      
      what I think bruce meant is that ninja waits for old processes to finish first. Moreover, even if you specify `-j 20` that doesn’t mean ninja will start 20 processes, it will throttle it depending on the load (see `RealCommandRunner`)
      
      Reply
  - mohammed imran says:
    
    July 14, 2017 at 7:34 am
    
    Has Msoft replied back, what are they doing with this bug, it cripples my system sometimes. what are they doing about it??
    
    Reply
    - brucedawson says:
      
      July 14, 2017 at 8:38 am
      
      If you reboot occasionally then this bug should not be too serious. I do 960-way parallel distributed builds and, while the mouse hitches a bit, it’s not debilitating. It is possible that there is something else wrong with your system.
      
      I don’t know Microsoft’s schedule for this bug. They’ve got top people working on it. Top. People.
      
      Reply
      - imbushuo says:
        
        July 20, 2017 at 2:14 am
        
        A Microsoft employee told us that he had successfully reproduced the issue. Currently the fix is in progress. (Chinese site: https://www.zhihu.com/question/62543187/answer/199861472)
        
        Reply
R Questioner says:

July 10, 2017 at 2:02 am

Moderator: This is off topic so I filed a bug for this issue and I’ll delete future comments on this topic. Please comment on the bug at crbug.com/740760 if you have further information which can help us investigate this issue. We do take memory consumption issues seriously and we are working on them but we need more specific complaints or there is nothing we can do to help.

Reply
- Manuel says:
  
  July 10, 2017 at 2:25 am
  
  They have so much RAM because building Chrome from source requires at least 32 GB for release builds and 64 GB for debug builds.
  
  Reply
  - Ruud van Gaal says:
    
    July 10, 2017 at 3:38 am
    
    You would say a little divide and conquer is needed here, with these requirements. 😉 Is the symbol tree that big or something?
    
    Reply
- brucedawson says:
  
  July 10, 2017 at 8:52 am
  
  We are working on fixing memory leaks in Chrome and otherwise reducing its memory footprint. Many fixes have been made in the last couple of years. If you have specific examples of pages that are using too much memory in Chrome then please files bugs at crbug.com.
  
  Reply
  - Ruud van Gaal says:
    
    July 10, 2017 at 12:34 pm
    
    Only 40Mb here indeed. Things like Facebook and GMail take up around 700Mb each here, which is a bit rich. 😉 A bit the nature of the companies though.
    
    Reply
  - dgw says:
    
    July 10, 2017 at 3:09 pm
    
    Moderator: moved to crbug.com/740760
    
    Reply
Anon says:

July 10, 2017 at 3:37 am

The moment I got to these two words, “Windows 10”, in the first sentence…

Moderator: Non-constructive. Deleted.

Reply
Ohm says:

July 10, 2017 at 4:49 am

Did you try another mouse?

Reply
- Severin Pappadeux says:
  
  July 10, 2017 at 7:06 am
  
  Yeah, time to get 48-cores mouse
  
  Reply
Steve says:

July 10, 2017 at 4:50 am

They should of called windows 10, Vista 1.0 real garbage…

Moderator: off-topic and non-constructive

Reply
Roy Adams says:

July 10, 2017 at 5:08 am

Excellent post Bruce!

Did Microsoft happen to give you a bug/issue tracking number when you reported the problem?

Thanks,
Roy

Reply
- brucedawson says:
  
  July 10, 2017 at 8:53 am
  
  I do not have a bug/issue tracking number – sorry.
  
  Reply
  - Roy Adams says:
    
    July 10, 2017 at 11:24 am
    
    No problem, thanks for taking the time to reply
    
    Reply
Satay Nutella must go says:

July 10, 2017 at 5:13 am

Windows 10 is absolute garbage.

Moderator: Non-constructive and off-topic. Deleted (well, mostly).

Reply
- Anon says:
  
  July 10, 2017 at 6:32 am
  
  Yeah, this was a minor problem in linux for a while, and it took awhile to get it fixed properly:
  https://www.linux.com/learn/whats-new-linux-2639-ding-dong-big-kernel-lock-dead
  
  Reply
- It Was Not Me (@yearandom) says:
  
  July 10, 2017 at 9:41 am
  
  What are you smoking? They haven’t off-shored any development.
  
  Reply
Joakim Hårsman says:

July 10, 2017 at 5:47 am

I has a simular problem recently, (there was a one second pause on every process exit) and after plenty of debugging and hair pulling I discovered that it was caused by the AMD drivers. Never understood how they caused IT exactly, but updating to a newer version resolved it.

Reply
- Aleksandr Ivanov says:
  
  July 10, 2017 at 9:17 am
  
  Simple – interrupt locks.
  
  Reply
Criação says:

July 10, 2017 at 5:49 am

I have a similar problem with my mouse. Thanks for giving me a clue to what might be happening under the hood. This is most annoying, really.

Reply
Thomas Maher says:

July 10, 2017 at 6:10 am

Had this issue and fixed it somehow. Try disabling windows defender fully.

Reply
Rich Talbot-Watkins says:

July 10, 2017 at 6:38 am

I had exactly this problem too, and fixed it to a large extent by updating a USB driver. But I’ve still seen occasional hitches since then.

Reply
Franklin says:

July 10, 2017 at 7:49 am

I think the universe will thank Bruce when MS fixes this. I have similar symptoms during big builds.

Reply
- Aaron says:
  
  July 10, 2017 at 9:31 am
  
  Absolutely agree. Per usual Bruce has done a thorough job of root causing and pinpointing a serious flaw in Windows 10.
  
  Thanks Bruce!
  
  Reply
Jay B says:

July 10, 2017 at 8:57 am

Disable hyperthreading if you have a newer Intel CPU. Horrible bug in them that nobody is releasing micro code for.

Reply
- Jonathon Reinhart says:
  
  July 10, 2017 at 9:53 am
  
  See Bruce’s comment above. That bug causes corruption in CPU registers, and has nothing to do with this Windows 10 bug.
  
  Reply
  - jaimemmoreno says:
    
    July 10, 2017 at 3:06 pm
    
    Yeah I’m running an older 6 core 4930K Intel CPU that doesn’t have that new hyperthread bug and still seeing the problem on Win10.
    
    Reply
Rick James says:

July 10, 2017 at 9:33 am

Hi Bruce,

Long time! 🙂
For MS folks: 12699333.

Cheerz,
Rick.

Reply
JT Turner says:

July 10, 2017 at 10:21 am

Isn’t there a flag on the compiler to force it to only use so many cores? Like -j 4 (use only 4 processes)?

Reply
- brucedawson says:
  
  July 10, 2017 at 11:14 am
  
  We invoke a separate compiler instance for each translation unit so we have full control of the amount of parallelism. The problem is that this Windows bug can manifest even if you only have one process per core. And, for distributed builds we aim to have ~20 processes per core and this works fine – except for this bug.
  
  Reply
John Walluck says:

July 10, 2017 at 11:06 am

What a great job of analysis abd reporting. If MSFT doesn’t fix this issue quickly after this they’re not trying.

Reply
Metro Melvin says:

July 10, 2017 at 11:15 am

Nice work..

Reply
John Doe says:

July 10, 2017 at 11:38 am

Specs on all your hardware?

Reply
Gwilbor says:

July 10, 2017 at 12:15 pm

I have a very similar problem on my laptop, it’s a HP Pavilion with Windows 8.1 (CPU is Intel i5-3230m 2.6 GHz: the only difference is that the trackpad is affected, but the external usb mouse is not. Anyway, since the first day, very often the mouse pointer freezes, while the rest of the computer seems to be working fine. Much of the time I am forced to use the keyboard to scroll webpages. Do you think is the same issue?

Reply
- Gwilbor says:
  
  July 10, 2017 at 12:17 pm
  
  Forgot to subscribe to new comments
  
  Reply
- brucedawson says:
  
  July 10, 2017 at 1:28 pm
  
  If the external mouse is not affected then it is probably a different issue.
  
  Reply
DavidW says:

July 10, 2017 at 1:05 pm

You have a 24 core laptop? With 64G of RAM? What brand/model is this? Some sort of Xeon beast?

Reply
- brucedawson says:
  
  July 10, 2017 at 1:28 pm
  
  Sorry for the ambiguity, but that’s a 24-core workstation. I then moved the investigation to my home machine which is a four-core laptop, the one described here:
  
  The Spoils of Law (Moore’s Law)–a New Laptop
  
  Reply
Andrew says:

July 10, 2017 at 5:33 pm

So why is CPU Usage at 50% on idle. There’s your first problem.

Reply
- brucedawson says:
  
  July 10, 2017 at 9:28 pm
  
  CPU Usage was not at 50% on idle. CPU usage was at 50% during a build of Chrome. Which is appropriate. In fact, CPU usage probably should have been higher, and after this bug in Windows is fixed it will be higher.
  
  Reply
jdrch says:

July 10, 2017 at 5:46 pm

Ummm I’ve seen PCs lock up when trying to exit too many things at once before. Nothing new here.

Reply
- David says:
  
  July 10, 2017 at 6:26 pm
  
  The new part is the in-depth analysis.
  
  Reply
adam c says:

July 10, 2017 at 6:56 pm

i’ve always been sensitive to mouse lag in windows 10, i put it down to the pc hardware ecosystem, things like cpu/gpu stepping 100hz~60hz monitors and the such, i thought it got better recently but im not using a 24 core system.

p.s. total witch hunt here, but im noticing so many direct sound issues maybe related

Reply
- adam c says:
  
  July 10, 2017 at 6:56 pm
  
  n/a forgot to enable notify
  
  Reply
Martin Fiedler says:

July 11, 2017 at 12:22 am

I think it’s ironic that there’s GDI-related lock contention even when Console-mode processes that never even touch GDI (such as the compiler) are closed.

Reply
Jamesits says:

July 11, 2017 at 12:42 am

This have been noticed by me since Windows 10 rs1. Strange input lags or frame drops happen when playing osu! on my 16 thread workstation with <25% CPU usage. Sometimes the mouse keeps freezing for ~3s or more.

Reply
- Jamesits says:
  
  July 11, 2017 at 12:44 am
  
  Plus, have you noticed taskmgr processes tab loads significantly slower when computer is on for 1 day than just after boot?
  
  Reply
  - brucedawson says:
    
    July 11, 2017 at 8:26 am
    
    I have not noticed that – it appears very quickly for me. And, your issues sound different because they aren’t around process destruction. Sorry, you may have to investigate them. Sigh… computers.
    
    Reply
Lennie says:

July 11, 2017 at 3:04 am

This problem has always existed in Windows, Windows 7 was better at it than previous versions. Seems Windows 10 is worse again.

Just run a Linux VM with your Chome build. 🙂

Reply
- brucedawson says:
  
  July 11, 2017 at 8:29 am
  
  No. This problem is new. My tests show it did not exist in Windows 7. Process creation/destruction may have always been slow on Windows, but that is actually separate from this issue.
  
  And I’m a Windows developer, using Windows tools to develop Windows software. A Linux VM isn’t going to help.
  
  Reply
George McCabe says:

July 11, 2017 at 3:56 am

I’m running Windows 7 ultimate, 12 core i7cpu. My PC ran sluggish and choppy after I installed Chrome browser. After I would close Chrome it left a lot of processes still running in the background. I had to completely uninstall Chrome to get back to normal

Reply
- brucedawson says:
  
  July 11, 2017 at 8:32 am
  
  Chrome shouldn’t make your machine sluggish – that seems odd.
  
  The background processes are probably from background apps. See chrome://settings, advanced, “Continue running background apps when Google Chrome is closed” – set that to off.
  
  Reply
alvarolucas says:

July 11, 2017 at 3:57 am

Sorry, I didn’t read it… because as soon as I read “I’ve a 24 CPU and I can’t move my mouse” I automatically think… what windows version do you have?… and I wasn’t wrong.
Put a linux distro on your live! :-)) … and forget about an endless live of problems with no sense on windows…

Reply
- barton96 says:
  
  July 11, 2017 at 4:41 am
  
  You Linux evangelists are boring as a sack of rocks. It was cool maybe for a while in 1996, but not anymore.
  
  Reply
- MOW says:
  
  July 11, 2017 at 3:00 pm
  
  Try copying a 100GB file to a USB harddisk … Linux has its own scheduler problems. Hopefully 4.12 will fix this.
  
  Reply
plexus says:

July 11, 2017 at 4:32 am

Thank you. Have the same problem, couldn’t identify it though. I have the i7-5820 with 6 cores (12 threads). I always thought some hardware must be broken or having issues.

Reply
Diego says:

July 11, 2017 at 5:41 am

Hola. A mi me paso parecido y lo solucione añadiendo un disipador con ventilador al chip GPU de mi placa base, a pesar de que tenía dos tarjetas en SLI.
Hello. I had the same problema time ago. I could solve that fixing a fan in the GPU chip of my motherboard. Im sorry about my english.

Reply
Sarreq Teryx says:

July 11, 2017 at 6:01 am

hrm… my mouse starts locking up like that after a while, as well. Since it’s wireless, I assumed it was something interfering with the radio, but it’s so intermittent, I could figure out what might be the source. This certainly could explain it.

Reply
Anonymous says:

July 11, 2017 at 6:39 am

What about Windows 8.1? Can you test on it please?

Reply
- brucedawson says:
  
  July 11, 2017 at 8:30 am
  
  I don’t have a Windows 8/8.1/Windows 10 RTM system available. The code is available, so give it a try.
  
  Reply
akraus1 says:

July 11, 2017 at 11:13 am

Could be a desktop heap leak. The GDI resources are put onto a special user session bound heap which can leak if you e.g. call RegisterWindowMessage many times with different parameters. This type of leak is hard to find. Which sort of GDI resources are your build processes creating? You could try if this changes if you omit some processes of your build to check if one specific process is responsible for the degradation over time.

Reply
- brucedawson says:
  
  July 11, 2017 at 1:21 pm
  
  That sounds plausible except that the serialization happens on a freshly booted machine running the ProcessCreateTests project (source code link in the post, and see the post-reboot image). So, if there is a leak then it is in the OS and is triggered even by programs that never touch user32.dll or gdi32.dll.
  
  Reply
  - akraus1 says:
    
    July 11, 2017 at 1:59 pm
    
    In your test application you are calling to GetDesktopWindow and PeekMessage which are located in user32.dll. Even such innocent methods can use the desktop heap as implementation detail. There is no list of methods available which cause desktop heap allocations. If you can get away without a messae loop and PeekMessage then you should not see this degradation. Win10 is known for having a much slower VirtualAlloc performance for which a hotfix is available which could also be somehow related. As far as I know this fix is not public yet so you should check with MS support if this changes things.
    
    Reply
    - brucedawson says:
      
      July 11, 2017 at 6:34 pm
      
      I am very careful to only call those functions in the master process, not in the 1,000 descendant processes. I did this to avoid the concerns that you raise and because the DETACHED_PROCESS mitigation doesn’t work if user32.dll is loaded.
      
      Reply
Doniel says:

July 11, 2017 at 2:21 pm

How’s the proceses IPC? Did you try diferent system clocks windows seems to sincronice badly with some of them.
Try HPET, or windows default. Have to turn on/off in the bios and on off in the system to completely work.

Reply
Ian Brodowski says:

July 11, 2017 at 5:39 pm

Out of curiousity, are you running the creators update (1703) or the anniversary update (1607), or the 1511 release?

Reply
- brucedawson says:
  
  July 11, 2017 at 6:32 pm
  
  I am running Anniversary Edition at work and Creators Update at home. It is possible that the bug first showed up in Anniversary Edition because I didn’t notice it before I upgraded, but that may also be because of work-flow differences.
  
  Reply
Renan Decarlo says:

July 11, 2017 at 9:31 pm

People here in the comments are emphasizing too much on “mouse” and “chrome”. This problem is further beyond that. It can affect the entire machine and on different workloads within W10.
I, for instance, couldn’t handle the hiccups and had to downgrade to W7.

I used to have some Gradle projects and couldn’t even use my machine while compiling them. Suffered from it even on daily-base computer usage. Now on W7 it’s a total different story.
I know this might not be the same bug, but it might also totally be.

Reply
Lars Berntrop-Bos says:

July 12, 2017 at 12:28 am

Curious if this bug is also in Terminal Server settings, i.e. present in Windows Server versions. Those frequently have lots of cores and lots of processes, making occurrence of the bug and hindering performance a serious risk.

Reply
- brucedawson says:
  
  July 12, 2017 at 9:16 am
  
  This bug is almost certainly present in the server editions. The mouse-hitching aspect of the bug is less likely to matter on a server, but the process-destruction bottleneck will affect some workloads.
  
  Reply
trawg says:

July 12, 2017 at 12:49 am

This is really interesting. About 3-4 weeks ago I noticed the same kind of mouse skipping problem on my Windows 8.1 PC. I can’t remember how many cores I have (overseas at the moment) but I want to say 16, with 16GB of RAM. I ended up buying a new mouse to see if that was the problem but haven’t had a chance to use it long enough before I had to go overseas to see if it fixed it. I did wonder if it might be a recent weird low-level Windows patch that might’ve changed something but I thought it was equally likely to be my 4 year old mouse

I took my “broken” mouse overseas with me to use on my laptop and it seems to be working fine. So I will be very interested to see if there is any progression on this issue.

Reply
Rob K says:

July 12, 2017 at 4:58 am

Very cool article. Explained so well for those of us trying to climb the steep curve. Thank you!

Reply
Nicolas Ramz says:

July 12, 2017 at 5:28 am

Windows is great, but there are lots of bottlenecks, like this one. One other area where it can (and *need*) to improve a lot is filesystem: Windows is really really slow at creating/deleting lots of files. Uncompressing an archive containing Firefox sources can take about 21 minutes (!!) while on a same machine, in a VM, with macOS machine it would take around 2 minutes. That’s 10x times slower than macOS that is slowed down by the VM…

Reply
- Zaru says:
  
  July 12, 2017 at 5:39 am
  
  Turn off Windows Defender real-time protection while extracting large archives (from known safe sources). Every new file created triggers a full scan in the background, that’s your main source of Windows 10 file system sluggishness.
  
  Reply
  - Nicolas Ramz says:
    
    July 12, 2017 at 6:04 am
    
    Tha’s better without Real-time protection: 5m13s to uncompress the same directory. But it’s still two times slower than macOS (that’s running inside a VM). Also, you cannot tel people to stop virus protection before doing heaving file operations…
    
    Reply
- Zaru says:
  
  July 12, 2017 at 5:41 am
  
  And just to add: temporarily turning off real-time protection and using a RAM-drive (ImDisk VDD etc.) for large builds or any operation on large numbers of files, greatly speeds things up in Windows 10 as well.
  
  Reply
Marty G says:

July 12, 2017 at 6:14 am

I have noticed in my experience that all computers and OSs I use seem to be getting more and more bogged down during process closing operations. Not to the point of UI freezes mind you, just in system load. In my mind it has seemed correlated with attempts in all OSs to deal with security issues involving clearing memory on dealloc and doing proper memory management when returning the freed mem to the available pool. This is a pure blackbox/shotgun line of thinking on my part. But can you think of any factor that would affect linux, android, windows process close loading that would be more a result of an overall approach; like an industry-wide way of doing things?

Reply
- brucedawson says:
  
  July 12, 2017 at 9:14 am
  
  I can’t think of anything. Zeroing of memory is usually done at allocation time or (on Windows) done asynchronously by the system process, and zeroing memory isn’t even a process-destruction specific issue. See this article for details:
  
  Hidden Costs of Memory Allocation
  
  Reply
  - Marty G says:
    
    July 13, 2017 at 5:51 am
    
    Thanks for responding. I should not have gone down the rabbit hole of memory dealloc as I don’t have any reason to suspect that in particular and I think I ended up focusing your answer at a level I didn’t intend. What I was getting at was that closing processes in general seem to take a lot more CPU than it used to across the OSs I use, which is all the majors except Apple stuff and was wondering if an industry-wide OS or userland programming strategy might be behind it. I know apps save a lot more state data when ending these days for example, but again, I don’t really know if that is what I’m seeing. I was just curious if some general response to security issues or similar may be resulting in this effect, but it is likely just bloat in size, statefulness, and features in general causing more cleanup to be required. I’m just throwing it out there. But back to the topic, excellent work in finding this specific W10 bug. MS should send you a check for your time!
    
    Reply
Kate (@Redback) says:

July 12, 2017 at 6:16 am

Any chance you could upload a binary of the 1000 process test app? I’d be interested if playing with this bug on a few versions of Windows, but I don’t have things set up to compile stuff.

Reply
- brucedawson says:
  
  July 12, 2017 at 9:06 am
  
  Sure, why not. Done. It’s in the same directory as the source code:
  https://github.com/randomascii/blogstuff/tree/master/ProcessCreateTests
  
  Reply
  - Kate (@Redback) says:
    
    July 12, 2017 at 3:16 pm
    
    Thanks!
    
    Reply
Marcos Sebastian Alsina says:

July 12, 2017 at 6:46 am

Great investigation Bruce. Thanks a lot for your time, you saved mine :-). The question is who was the genious that put such Critical Section on such critical part of code.

Reply
- brucedawson says:
  
  July 12, 2017 at 9:07 am
  
  I have a theory that Windows 10 RTM had corruption of the GDI object table, so a lock was added to avoid the corruption, and …
  
  Reply
marshal craft says:

July 12, 2017 at 7:29 am

Just asking why process instead of thread?

Reply
- brucedawson says:
  
  July 12, 2017 at 9:08 am
  
  The ninja build system is designed around creating processes. And, because it invokes many different build tools (compiler, linker, python, etc.) it mostly has to use processes.
  
  Reply
marshal craft says:

July 12, 2017 at 7:40 am

Yeah think this is a flaw in chromes build system and not windows. First I don’t think chrome should be spawning many processes. They should be threads. Second even still there shouldn’t be an effort to have n threads/process where n is numb of cpus. Because they contend with each other for time on any thread to execute. It is actually improbable that any thread actually executing at same time. Threads should be used more abstractly. If there is so much logical parallel work than make threads for them, so that if and when ever possible they can be handled independently. But windows will handle them in the end as it sees fit. Either way you shouldn’t be using 48 process to compile a web browser, but go ahead try to get windows to change the operating system to fit google chrome. -_-

Reply
- Richard M. says:
  
  July 12, 2017 at 8:07 am
  
  “I don’t think chrome should be spawning many processes. They should be threads.”
  
  It’s not Chrome, it’s a Chrome build!
  
  “you shouldn’t be using 48 process to compile a web browser”
  
  What??????
  
  do you even know what you are talking about??? I hope you don’t work for Microsoft…
  
  Reply
- Richard M. says:
  
  July 12, 2017 at 8:15 am
  
  –were not the best way, methinks, albeit it is not to be denied that authorities differ as concerning this point, some contending that the onion is but an unwholesome berry when stricken early from the tree whileas others do yet maintain, with much show of reason, that this is not of necessity the case, instancing that plums and other like cereals do be always dug in the unripe state yet are they clearly wholesome, the more especially when one doth assuage the asperities of their nature by admixture of the tranquilizing juice of the wayward cabbage and further instancing the known truth that in the case of animals, the young, which may be called the green fruit of the creature, is the better, all confessing that when a goat is ripe, his fur doth heat and sore engame his flesh, the which defect, taken in connection with his several rancid habits, and fulsome appetites, and godless attitudes of mind, and bilious quality of morals–
  
  King Arthur could have produced such comment on the topic…
  
  Reply
- brucedawson says:
  
  July 12, 2017 at 9:12 am
  
  I’m not sure how ninja (Chrome’s build system) is going to spawn compilers/linkers/python/etc. as threads instead of processes.
  
  Ninja is very good at spawning the number of processes that I want. That is n-processes for local builds (perfectly saturating the local CPUs) and I use n*20 processes (-j 960) for distributed builds (because they use fewer local resources). This works very well, and will work even better once Microsoft fixes their regression – which I am confident they will do.
  
  I am quite familiar with the idea of threads/processes contending for the CPU and I am careful not to pointlessly over-saturate the CPUs.
  
  Chromium is open source. Give it a try. I think you will find that it has a very well implemented build system, across multiple systems. Contributions welcome.
  
  Reply
  - marshal craft says:
    
    July 13, 2017 at 12:59 pm
    
    Well yeah if you have each thread just work on a single thing, they will end up smothering themselves past a certain point as they compete for time on the cpu, but if they choose to take up work where others leave off they no longer follow that saturation rule. As more threads are used they increase the time on the cpu.
    
    Reply
Alexander Perez says:

July 12, 2017 at 9:35 am

Would this qualify for a MS bug bounty reward? I think you deserve it.

Reply
- Nicolas Ramz says:
  
  July 12, 2017 at 9:40 am
  
  👍
  
  Reply
Lee says:

July 12, 2017 at 10:46 am

I mean… Windows 10 … in 2017 … -> roflmao
Not because of your very specific but rather because of many different issues that W10 caused on my workstation (during a 2 year “grace period”) I switched back to Windows 7 and I don’t regret it at all.

Everything just runs SMOOTHER.

Reply
Set says:

July 12, 2017 at 11:40 am

I am experiencing the exact same thing. Usually after Chrome was opened but even after I close Chrome it happens for a minute or two more.

Reply
- brucedawson says:
  
  July 12, 2017 at 2:33 pm
  
  It’s not clear if that is the same issue or not, especially since I only encountered this issue when *building* Chrome, not when using it. If you can record an ETW trace of this poor behavior after you close Chrome (start tracing, close Chrome, repro the poor behavior, then save the trace) that would be helpful. You can then file a bug at crbug.com. UIforETW makes recording traces easy – go to https://tinyurl.com/etwcentral
  
  Reply
tester says:

July 12, 2017 at 12:54 pm

Verision of Windows 10 is????? So if have to bite Windows 10 atleast give as a full specyfication of your machine.

Reply
- brucedawson says:
  
  July 12, 2017 at 2:34 pm
  
  I saw this problematic behavior with process exits on Windows 10 Anniversary Edition and Windows 10 Creators Update. It may have happened on more versions as well – I don’t know – but I suspect it first started happening in Anniversary Edition.
  
  Reply
Ark-kun says:

July 13, 2017 at 4:24 am

Sorry for a not very relevant question, but as you work on Chrome and performance in Windows, you’re the closest expert for me.

When I have many tabs open, the root Chrome process uses quite a lot of CPU and the performance degrades (e.g. the file download animation – the button expanding at the bottom of the window can take couple of seconds). This happens even when there are no CPU-hogging content processes (say, I’ve killed them…).
Is this considered normal behavior (given that I have many tabs open) or should I collect some kind of trace and submit a bug?

Reply
- brucedawson says:
  
  July 13, 2017 at 7:56 am
  
  That is not normal. Even with lots of tabs open Chrome can be almost completely idle. It’s hard to speculate about what might be going on. It could be a bad web page, an ill-behaved extension, or a Chrome bug. Filing a bug at crbug.com and attaching a trace (chrome://tracing or ETW trace) would be helpful.
  
  Reply
santagada says:

July 13, 2017 at 7:05 am

Did you fill a bug report with microsoft? I would love to hear their answer/patch.

Reply
- brucedawson says:
  
  July 13, 2017 at 7:54 am
  
  I reported the bug through informal channels. Earlier in the comments Rick James helpfully shared a Microsoft bug number: 12699333. Unfortunately I doubt the underlying details will be shared.
  
  Reply
  - Christopher Katko says:
    
    November 7, 2017 at 8:00 am
    
    Googling that number shows “crazy bad Windows Defender [remote execution] bug”. Is it possible that (per someone else’s mention) it was some sort of Windows Defender bug (linearly scanning all processes)? Or is the number wrong?
    
    Thanks.
    
    Reply
    - brucedawson says:
      
      November 7, 2017 at 9:27 am
      
      It’s not a Windows Defender bug – that would have showed up in the trace. It is a performance bug caused by a security change that made looking up GDI objects more expensive. I don’t know why searching on 12699333 finds articles about that other bug – I couldn’t find that number anywhere in the source of the result pages.
      
      Expect a fix “soon”.
      
      Reply
goawaey says:

July 13, 2017 at 12:02 pm

what is the cpu you have? is it haswell? broadwell?

Reply
- brucedawson says:
  
  July 13, 2017 at 2:17 pm
  
  I first saw it on a Haswell Xeon processor. I then reproed it on a much older processor (circa 2011) and on a brand-new Kaby Lake. Which is to say, this bug has nothing to do with the processor you are running (other than that you need at least two cores to see it). It is a software performance bug.
  
  Reply
James says:

July 13, 2017 at 1:51 pm

Question: does the issue manafest with USB and ps/2 devices? You’re running a hefty setup , and High-end desktop proc dell towers still come with PS\2 ports (for obv government contract reasons) and wonder if it mitigated the issue.

Reply
- brucedawson says:
  
  July 13, 2017 at 2:13 pm
  
  The type of mouse does not matter. Access to the Windows message queues is blocked by lock contention. The issue is not even specific to mouse movement – it affects anything that needs the lock, which *includes* general responsiveness of all UI programs, and more.
  
  Reply
Jeremiah Penery says:

July 13, 2017 at 7:04 pm

I ran the ProcessCreateTest.exe a few times on my machine (3930k, 6 cores/12 threads, Windows 10). Cutting out all the process creation parts:

Process destruction took 0.686 s (0.686 ms per process).
Lock blocked for 0.085 s.
Average block time was 0.012 s.

Process destruction took 0.657 s (0.657 ms per process).
Lock blocked for 0.005 s.
Average block time was 0.001 s.

Process destruction took 0.656 s (0.656 ms per process).
Lock blocked for 0.029 s.
Average block time was 0.004 s.

Process destruction took 0.644 s (0.644 ms per process).
Lock blocked for 0.009 s.
Average block time was 0.001 s.

Process destruction took 0.635 s (0.635 ms per process).
Lock blocked for 0.000 s.
Average block time was 0.000 s.

Strange that I’m not seeing any issues here.

Reply
- brucedawson says:
  
  July 13, 2017 at 7:55 pm
  
  It is a bit strange, but not totally. The effect only really becomes noticeable (without looking at an ETW trace) on machines that have been up for a while and heavily used – whatever that means. Mine is also behaving well at the moment – go figure.
  
  Reply
  - Lars Berntrop-Bos says:
    
    July 14, 2017 at 12:46 am
    
    I would love to know the actual build of WIndows 10 your on. For my development environment I have seen several bugs squashed only since a specific build, 15063.447. An overview of builsds and corresponding KB numbers is here: https://technet.microsoft.com/en-us/windows/release-info.aspx
    The build number is listed at Settings:System:About:OS Build.
    One of the bugs squashed was in WinForms, where normal userland code could cause a 0x7f aka Unexpected_kernel_mode_trap bluescreen….
    
    Reply
    - brucedawson says:
      
      July 14, 2017 at 8:29 am
      
      I’ve seen the lock contention/process-destruction serialization/mouse-hitches on Windows 10 Anniversary Edition and Windows 10 Creators Update.
      
      Reply
Konstantin says:

July 14, 2017 at 2:33 am

The new half transparent windows calculator moves slower than other windows when dragged with the mouse.
Machine: AMD FX-8320 (technically 4 cores, 8 threads), AMD RX580 GPU, 4 GB of RAM (I know that the RAM is the bottleneck in many cases).

But – could this be a similar problem? I recently swapped GPU to RX580 (for gaming, not for mining cryptocurrencies) – and the only thing it could have problems with is with moving the calculator window…
If I think about it it can only be Windows…

Reply
- dgw says:
  
  July 14, 2017 at 2:52 am
  
  I’ve noted performance issues with moving windows around with transparency enabled as far back as Win7 (I never had Vista). Probably something to do with how much extra work the desktop compositor has to do when windows don’t simply occlude each other.
  
  Reply
  - brucedawson says:
    
    July 14, 2017 at 8:31 am
    
    The only way to be sure about what causes a performance issue is to record a profile and see what is going on. The next best thing is to find some simple change that makes the problem appear/disappear. The calculator moves fine for me, although it also doesn’t seem to be partially transparent, so I guess I have some setting different.
    
    Reply
    - mohammed imran says:
      
      July 15, 2017 at 11:43 am
      
      any update on this issue from MS?
      
      Reply
      - James Clarke [MSFT] says:
        
        August 25, 2017 at 7:24 pm
        
        We believe we have root caused the issue and are testing a fix. Can’t say what the timeframe will be for pushing something out but progress at least.
        
        Reply
        
        brucedawson says:
        
        August 26, 2017 at 1:16 pm
        
        Excellent! Thank you for the update.
        
        Reply
      - Mohammed Imran says:
        
        September 2, 2017 at 11:15 am
        
        We’ve made some adjustments to address an issue resulting in sudden and brief CPU spikes where you couldn’t move your mouse. If you’ve encountered this, please try it in this new build and let us know if your experience has improved.
        
        https://blogs.windows.com/windowsexperience/2017/09/01/announcing-windows-10-insider-preview-build-16281-pc/#bsplZfKA4lEYWaaU.97
        
        Can this be validated again?
        
        Reply
Wiktor Wandachowicz says:

July 19, 2017 at 10:38 am

I’ve run your compiled “ProcessCreatetests.exe” program on my laptop and all processes terminated rather quickly (around 1.488 s – creation, 1.388 s – destruction).

Testing with 1000 descendant processes.
Process creation took 1.488 s (1.488 ms per process).
Lock blocked for 0.000 s.
Average block time was 0.000 s.

Process termination starts now.
Process destruction took 1.388 s (1.388 ms per process).
Lock blocked for 0.284 s.
Average block time was 0.024 s.

All done on Windows 10 Pro Insider Preview, Build 16232.
Processor Intel Core i7-2630QM @ 2.00GHz, 8GB RAM.

Reply
- brucedawson says:
  
  July 19, 2017 at 7:31 pm
  
  And? There is some randomness in how long the processes take to exit, with how long your system has been up being one factor. But, it’s quite clear that your system was suffering from the problem that I found. The lock was blocked for at least 284 ms which is far longer than it should be. If process destruction wasn’t serialized then the processes would have terminated even faster, and without the risk of micro-hangs.
  
  Reply
Viet says:

July 26, 2017 at 10:31 pm

I really want to know when MS fix this.
Does they really accepted this as a bug ?

Reply
- brucedawson says:
  
  July 27, 2017 at 11:05 am
  
  Microsoft is aware of the issue (I have talked to them about it informally) and there is a bug filed (apparently 12699333). I suspect that it will be fixed for the Fall Creators Update, but I don’t know for sure.
  
  Reply
  - mohammed imran says:
    
    July 28, 2017 at 11:46 am
    
    Why don’t your file a bug report using the feedback app available. And then we all vote up and push them to deliver a fix.
    
    Reply
    - brucedawson says:
      
      July 28, 2017 at 3:10 pm
      
      I don’t have a lot of faith in the feedback app, and it probably isn’t necessary in this case. But, not a bad idea. Feel free to do that and post it here and/or tweet a link.
      
      Reply
      - mohammed imran says:
        
        July 29, 2017 at 3:37 am
        
        Hi Bruce,
        Ok i shall file a bug report on the feedback, but would also like yourself to comment on the report with further data,as we need to squash this bug for all Windows Users. For the greater good.
        
        Reply
      - mohammed imran says:
        
        July 29, 2017 at 4:03 am
        
        here the report filed in the feedback app
        https://aka.ms/Av19n4
        
        Reply
        
        brucedawson says:
        
        July 29, 2017 at 9:40 am
        
        It says “Your account doesn’t have access to this feedback.” Anybody else have better luck?
        
        Oh wait, once I logged in to the Feedback App (being careful not to log into my Microsoft account for all of Windows) and tried the link again I could see it. What a UI mess – not letting users even *view* feedback without logging in?
        
        I upvoted it, and tweeted it: https://twitter.com/BruceDawson0xB/status/891338715456454657
        
        Reply
      - J_s8 says:
        
        August 16, 2017 at 8:30 am
        
        Yep.. I’ve got an impression that there is some sort of black hole in between submitted feedback and cognitive processing. I’ve submitted over 100 feedback and none is replied or acknowledged – oh boy I must be bad in this…
        
        Reply
jeffstokes says:

July 30, 2017 at 6:33 am

Thanks Bruce for the write-up here. I expected this to be hard-core bad driver/DPC issues. 😛

Reply
Mohammed Imran says:

August 6, 2017 at 10:34 am

Mr. Jimmy A at the link said this
quote/
Is it locking up, or is it a redraw issue? I’ve run into this as well in the past, what looked like locking up was actually the video driver not redrawing the screen fast enough. I could tell by moving the cursor across the screen quickly at a length I knew it should make it across, which it did, but skipped across. If it was not a redraw issue, the cursor would not make it all the way across the screen, as the x/y commands from the mouse would be dropped and never received from the computer, which would indicate a problem with actual processing of the information (which is what you are indicating). If you are working remotely, there are a few more pieces to the puzzle and the introduction of an additional video card and the network connection./unquote

any reply Mr.Bruce?

Reply
- brucedawson says:
  
  August 6, 2017 at 11:11 am
  
  Why do you think that x/y movement commands from the mouse would be dropped? And what link are you referring to where this comment came from?
  
  Regardless, the issue is well understood – a crucial system lock is held for too long during process destruction. The heavy contention for this lock causes long delays in accessing message queues and other resources, which leads to repeated hangs. There is no need to invoke other explanations.
  
  And, anyone who wants to explore this on their own can do so using the supplied test program.
  
  Reply
TG2 says:

August 9, 2017 at 9:47 pm

@brucedawson – I read english and I understand the words but following as deep as you did is just not in my world (light programing .. think “hello world” compared to your expertise)…

I question, do you think what you’ve found could be effected outside of running compiles, and heavy loads?

Search the web for mouse lag, and you get tons of hits and complaints freshly booted PC’s, PC’s up for hours or days, SSD’s for drives, regular HD’s too, USB wired, USB dongle’d, bluetooth and non …

You think its frustrating and you’re a heavy user .. think of what its like for the rest of us, not anywhere near the workload you’re putting on a system, and we can’t click on icons because our mouse’s cursor won’t “get there” or goes beyond because it finally catches up with where your mouse was moved too .. etc etc etc etc …

I’m not the basic user.. have anywhere from 8 to 25 windowed apps open, 3 different browsers (FF, Chrome, Vivaldi, sometimes IE as the 4th one), outlook, SecureCRT (putty), various FTP clients, and something like Winamp, or Spotify, or even Itunes .. etc.. so use is more than just your basic user .. but not advanced like you … and the frustration knows no bounds when trying to perform simplistic work, and the mouse just won’t do what it needs to. 😦

Reply
- brucedawson says:
  
  August 10, 2017 at 12:40 am
  
  The problem that I found was specific to processes being destroyed at a very high rate – more than would ever be encountered on a ‘normal’ system. It may be that you are encountering something else hogging the same lock, or it may be something completely different. Unfortunately there is no easy way to determine from afar – there are far too many possible causes. I understand your frustration, and I’m just glad that I am able to investigate the issues that bother me.
  
  Reply
J M says:

August 17, 2017 at 1:51 pm

Technical details you provided are beyond what I know about Windows, but I thought I’d just add on an experience that I suspect is related to the issue you described with process destruction.

I run a Matlab script that uses ActiveX control of MS Word to copy and paste graphics (~100 images, one at a time) into a Word document.The graphics are not rendered to the screen.

In Windows 7 this could run in the background without much disrupting my usage (some webpages would occasionally flicker).

Running the same thing in Windows 10, two changes occurred:

1. Mouse lags on every copy/paste.

2. I had to put a pause into the script between each copy/paste pair, because the previous pasting would lock up the Word application and cause the script to crash.

Reply
- brucedawson says:
  
  August 17, 2017 at 11:28 pm
  
  It might be related, if it started in Windows 10 Anniversary Edition. But, the only way to know for sure is to record an ETW trace and find someone who can analyze it. Or wait and see if the problem goes away when Microsoft fixes my bug.
  
  Reply
  - J M says:
    
    August 18, 2017 at 2:02 pm
    
    Thanks, I will look into it.
    
    Reply
    - Mohammed Imran says:
      
      August 22, 2017 at 12:20 pm
      
      https://aka.ms/av19n4
      
      login in and upvote please and also add your findings
      
      regards.
      
      Reply
H says:

August 20, 2017 at 11:15 pm

A few thoughts….

1. Why would the updating of the mouse/cursor need to share a lock with code that “terminate” a process?
2. As far as I know, the cursor/mouse is handled by “hardware”, and traditionally the mouse have been known to continue to “work” and be updated on the screen when just about all other activity have seized (crashed/hung/freezed), even during severe fatal errors.
3. Perhaps this has nothing to do with closing processes at all, perhaps it related to closing threads?
4. When I provoke this hitchy behaviour, I see only problem with input from the mouse, the keyboard seem to not be affected at all with these mini freezes…strange…

Reply
- brucedawson says:
  
  August 20, 2017 at 11:31 pm
  
  1. It appears that the same lock protects GDI objects and message queues. I don’t know why. Ask Microsoft? I’m just reporting what the trace tells me.
  2. Yes, the mouse cursor is typically implemented as a hardware sprite. But in a multi-process environment there can be multiple inputs/programs moving the mouse, so a lock is not surprising. But, I think the lock is not protecting the mouse per-se, but message queues, some of which control mouse movement.
  3. Maybe, but Occam’s razor says that if you’re in a function called NtGdiCloseProcess then maybe it has something to do with closing a process.
  4. Mouse input can easily come in at ~125 Hz, and delays of just 10-20 ms are noticeable. Keyboard input delays have to be slightly longer before they are noticeable. That’s probably the difference.
  
  I sense some skepticism in your comment. That’s fine, but understand that I’m not guessing about what is going on. The ETW traces and my stand-alone repro make most aspects of the behavior completely clear. Guesswork was, generally, not required. And, Microsoft is working on fixing the problem.
  
  Reply
  - Juhani Suhonen says:
    
    August 29, 2017 at 9:31 am
    
    hmm.. I wonder if new advanced mice actually make this issue worse; my logitech is using 1000Hz polling rate.
    
    Reply
  - H says:
    
    September 5, 2017 at 11:18 am
    
    The scepticism is more related to the fact that I have a similar problem, but without starting/terminating any processes at all, so there for my question if perhaps it is related to threads and not processes….and NtGdiCloseProcess close threads as well? Also my curor freezes can be anything from parts of a second to, perhaps, 5 seconds….during which time keyboard and other stuff works just fine.
    
    Reply
    - brucedawson says:
      
      September 5, 2017 at 8:18 pm
      
      NtGdiCloseProcess is, as far as I know, only called when processes are going away. That said, this shared lock may well be acquired when threads are closing as well. But holding on to the lock during thread destruction would be a separate bug. The only way to figure out the problem would be by recording and analyzing an ETW trace.
      
      The fact that keyboard and other stuff works fine suggests that the root cause may be unrelated because the lock that I was seeing contention on is needed to read any input messages, not just mouse messages.
      
      Reply
James Clarke [msft] says:

September 1, 2017 at 3:33 pm

Would be awesome to get some feedback on our latest RS3 insider build 16281 to see if it’s improved the situation with this issue: https://blogs.windows.com/windowsexperience/2017/09/01/announcing-windows-10-insider-preview-build-16281-pc/#6vZ0PrVezurp10is.97

Reply
- jeffstokes says:
  
  September 1, 2017 at 4:34 pm
  
  If we didn’t have to beta test all of Windows just to get this fix I’d test for you and confirm.
  
  Pity Microsoft can’t cut an LDR/QFE to fix a bug anymore and instead forces a 4-6GB install to fix one issue (I care about anyway).
  
  Kudos to you guys though, James Clark, for paying attention here, tha’ts nice at least. I guess I can expect to get this fix like, next year, in CB.
  
  Reply
- brucedawson says:
  
  September 1, 2017 at 8:42 pm
  
  I have an Insider Build machine but it’s not on fast-ring. I will test the fix when it ships to regular insider builds – feel free to ping me here or on twitter when that happens.
  
  Reply
- brucedawson says:
  
  September 4, 2017 at 10:35 pm
  
  I just looked at a trace of ProcessCreateTests.exe running on 16281 and the graph of CPU usage during process destruction looks unchanged. For the central portion I see that process destruction is serialized, apparently still blocked in win32kbase.sys!NtGdiCloseProcess. This continues to confuse me since these processes have zero GDI objects.
  
  In short, I don’t see any signs of a fix to the bug. Am I missing something? Grab an ETW trace and graph CPU Usage by process name to see what I’m seeing.
  
  Reply
  - Juhani Suhonen says:
    
    September 5, 2017 at 2:44 am
    
    I will take my debugging glasses and deep into this later today. Based on your analysis it seems that there are two related issues: A) win32kbase.sys!NtGdiCloseProcess which is serialized for some reason (bad design?) and B) Some other issue, which causes significant delay in executing NtGdiCloseProcess after certain uptime.
    
    I concluded existence of B based on your observation that machine becomes slow only after some time of usage. Therefore, my previous posting (and the change of ProcessCreatetests.exe results) may not be due to improved code, but fresh boot.
    
    Reply
Juhani Suhonen says:

September 2, 2017 at 6:06 am

I can confirm that build 16281 seem to (at least partially) resolve the issue. I don’t have time (and knowledge :D) enough to debug what changed but the results of running ProcessCreatetests.exe shows the following:
—
Windows 10 build 15063.540 (mainline)
Testing with 1000 descendant processes.
Process creation took 3.158 s (3.158 ms per process).
Lock blocked for 0.012 s.
Average block time was 0.000 s.

Process termination starts now.
Process destruction took 1.783 s (1.783 ms per process).
Lock blocked for 0.854 s.
Average block time was 0.078 s.
—
Windows 10 build 16281 (insider fast)
Testing with 1000 descendant processes.
Process creation took 0.862 s (0.862 ms per process).
Lock blocked for 0.000 s.
Average block time was 0.000 s.

Process termination starts now.
Process destruction took 0.946 s (0.946 ms per process).
Lock blocked for 0.084 s.
Average block time was 0.009 s.

Reply
- brucedawson says:
  
  September 2, 2017 at 10:28 am
  
  The ProcessCreateTests behavior also depends on how long a system has been running. So, first-run after install is always better, can lead to false confidence in a fix.
  
  The only way to tell for sure is to disable Defender’s real-time checking and then record an ETW trace of ProcessCreateTests, and graph the ProcessCreateTests CPU Usage (Precise) – by process name. Share a trace and I can check.
  
  Honestly, lock blocked for 0.854 s does not look like it was fixed.
  
  Reply
  - Severin Pappadeux says:
    
    September 2, 2017 at 3:11 pm
    
    Bruce, it went from 0.854 s to 0.084 s. Looks like pretty good improvement to me
    
    Reply
    - brucedawson says:
      
      September 2, 2017 at 3:38 pm
      
      Ah – I read too quickly and didn’t see that it was before/after. You still have to be careful however because if the “before” run was after the system had been up for a while and the “after” run was after a reboot then it’s not a valid comparison. I should print up-time as part of the test. And, the lock contention time should really be *zero*.
      
      Anyway, I’ll get an ETW trace at some point and report back.
      
      Reply
      - brucedawson says:
        
        September 2, 2017 at 4:09 pm
        
        Too late for you, but new version now prints up time:
        https://github.com/randomascii/blogstuff/tree/master/ProcessCreateTests
        
        Reply
      - Juhani Suhonen says:
        
        September 4, 2017 at 12:06 pm
        
        I have ETW trace now with build 16281, do you still need it for further investigation? I’d rather not post it to public forum 😉
        
        below is the console output from ProcessCreatetests.exe with freshly booted OS.
        —
        Testing with 1000 descendant processes.
        Process creation took 0.679 s (0.679 ms per process).
        Lock blocked for 0.000 s.
        Average block time was 0.000 s.
        
        Process termination starts now.
        Process destruction took 0.606 s (0.606 ms per process).
        Lock blocked for 0.000 s.
        Average block time was 0.000 s.
        
        Elapsed uptime is 0.01 days.
        Awake uptime is 0.01 days.
        
        Reply
mohammed imran says:

September 3, 2017 at 9:22 am

@brucedawson how long should a system have being online or uptime?

Reply
- brucedawson says:
  
  September 3, 2017 at 5:00 pm
  
  I don’t understand the question. The process destruction lock contention gets worse with a system that has been used for a while, but how much worse depends on how heavily it is used and on ??? The only way to do an even comparison is to compare two freshly rebooted systems.
  
  Reply
  - mohammed imran says:
    
    September 4, 2017 at 1:01 pm
    
    I wanted to know just that exactly.
    
    Reply
Christopher Katko says:

September 5, 2017 at 2:13 pm

I just ran the binary and my mouse freezes briefly in Windows 7 64-bit with an AMD FX-8370 (8 cores). Are you sure the bug doesn’t affect Windows 7?

— SNIP

Testing with 1000 descendant processes.
Process creation took 0.575 s (0.575 ms per process).
Lock blocked for 0.000 s.
Average block time was 0.000 s.

Process termination starts now.
Process destruction took 1.139 s (1.139 ms per process).
Lock blocked for 0.615 s.
Average block time was 0.088 s.

Elapsed uptime is 12.04 days.
Awake uptime is 12.04 days.

Reply
- brucedawson says:
  
  September 5, 2017 at 2:29 pm
  
  The change which increased the cost of deleting GDI objects was added in Windows 10 Anniversary Edition. However your results do seem to show significant lock contention during process destruction. My Windows 7 testing was restricted to a four-core desktop and more cores makes the problem easier to expose, but I think that the problem you are hitting is a different flavor of the same issue. Share a trace and I can take a quick look. Maybe some third-party software is hooking in to process destruction, or ???
  
  Reply
  - Christopher Katko says:
    
    September 6, 2017 at 12:14 pm
    
    I’ve never run an ETW trace before. But I followed your guide best I could and hit trace, ran the program, and stopped the trace. I also added the program exe name into the settings.
    
    Here’s a (temporary) public link to the 7zip of the trace:
    
    https://drive.google.com/open?id=0B8Cyek_k55TiVlBwOC0wWkcyQTA
    
    Let me know if I need a different trace or something. Thank you.
    
    Reply
    - brucedawson says:
      
      September 6, 2017 at 2:13 pm
      
      First off, your system is incredibly busy. It’s amazing how many processes are all simultaneously trying to consume lots of CPU time. This complicates the analysis because ProcessCreateTests.exe is fighting for CPU time with Steam, Chrome, and lots of other things. Maybe one of these processes is somehow making GDI object destruction more expensive on your Windows 7 machine, but I don’t know. For some reason HmgNextOwned is very expensive on your machine, while holding the lock, whereas on my Windows 7 machine it is not.
      
      Unfortunately this mystery will have to be left for Microsoft to investigate, but they won’t.
      
      Reply
      - Christopher Katko says:
        
        September 6, 2017 at 5:29 pm
        
        Yeah… I thought about that. (I actually shut off VMWare which was running a full Windows 10 platform with a SQL server. =D) But I didn’t have a chance to save all my work and shut everything else off at the time. It’s strange that Steam is an issue since I wasn’t playing any games or updating them at the time…
        
        I appreciate you looking at it! I can nuke everything except raw windows (maybe even safe-mode…), run the test again and post a cleaner trace if you want.
        
        This is one thing that really frustrates me with Windows. The closed-source nature means you’re “on the outside looking in”. Whereas I can–and have on many occasions–found a strange error message in Linux and then ended up tracking down the source code for the answer. And it wouldn’t be so bad with Windows, but as we all know, getting Microsoft involved in fixing their own software (even with a full core dump/trace/etc proving it) is an exercise in patience and learning to speak Hindi. I have multiple clients that have run into issues that are “Microsoft problems” and after paying for support it still goes nowhere, one frustrating conference call after another frustrating screen share.
        
        Reply
        
        brucedawson says:
        
        September 6, 2017 at 7:52 pm
        
        It was odd how much stuff was running and how busily. I have Chrome setup to restore the previous set of pages so I can easily shut it down when recording traces. As for Steam… I no longer have symbols so I can’t guess what they were doing. Updating some game perhaps?
        
        I’m torn between curiosity (what triggers this odd Windows 7 behavior!) and apathy (not my machine, not my OS, nothing I can do) and I’m afraid apathy wins. I do appreciate your sharing your trace.
        
        Reply
Mohammed Imran says:

September 6, 2017 at 10:49 am

Dear Mr. Bruced, recieved a reply from Ms asking for validation of this bug report, so can all the concerned party please reply and add feedback if the bug still exists?

https://aka.ms/av19n4

I think they applied bandage on the issue rather then fix it.

Reply
- brucedawson says:
  
  September 6, 2017 at 1:53 pm
  
  I agree that they have not fixed the issue. I have looked at two “post-fix” ETW traces of ProcessCreateTests.exe running and I see no sign of improvement. I believe that a real fix would be a dramatic improvement.
  
  Reply
  - jeffstokes says:
    
    September 6, 2017 at 8:11 pm
    
    THIS right here. I’m at the point where I hit these frustrations and I feel like I should just stand up ubuntu or something instead.
    
    Reply
Carlos Osorio says:

September 8, 2017 at 5:03 am

I have the same issue, installed 16281 build and the problem persists. The only way that kept my system running and my sanity was to disable the graphics driver. Obviously my second display is a no show but the problem so far has disappeared.

Reply
- jeffstokes says:
  
  September 8, 2017 at 5:40 am
  
  I’ve found issues with GPU drivers in Win10. https://illuminati.services/2017/08/20/dude-wheres-my-ram-aka-shellexperiencehost-steals-my-stuff/. I think the rendering engine for 10 has some flaws personally that haven’t really been worked out, but that’s just like, my opinion, man.
  
  Reply
mohammedimran says:

October 21, 2017 at 9:16 am

now that the final build is out, can you please test it again and confirm the issue is resolved or not? Mr. Bruce :).

Reply
- brucedawson says:
  
  October 23, 2017 at 10:55 am
  
  The bug is not fixed. The “fix” that they announced for an insider build in September was ineffective and that is all that made it to the Fall Creators Update. I am continuing to work with them to make sure that an actual fix makes it. This Fall Creators Update screenshot shows CPU usage dropping to one core due to serialization:
  
  "Is the process destruction bug fixed in Fall Creators Update?"
  Nope. CPU usage during destruction drops to 1 core:https://t.co/BRA3MdtsH5 pic.twitter.com/vZMP0oHUbj
  
  — Bruce Dawson (@BruceDawson0xB) October 23, 2017
  
  Reply
  - Mohammed Imran says:
    
    October 27, 2017 at 6:56 am
    
    Can you please amend my report, with your data . Please 🙂
    https://aka.ms/Qp6nkw
    I am holding of multi-crore processors till this issue is fixed.
    Please let me know when you have amended, yeah.\
    Thanks as your continuous assistance is of much importance to us all.
    
    Reply
    - brucedawson says:
      
      October 28, 2017 at 11:09 am
      
      Done.
      
      Reply
    - brucedawson says:
      
      October 30, 2017 at 11:34 am
      
      I just made a video explaining how to see whether your install of Windows 10 has this bug.
      
      I just made a video showing how to tell if your version of Windows has the process-destruction-lock-contention bug:https://t.co/olH6urD6Mn
      
      — Bruce Dawson (@BruceDawson0xB) October 29, 2017
      
      That should help with telling when this bug is fixed.
      
      Reply
      - mohammed imran says:
        
        November 23, 2017 at 11:55 pm
        
        What is the latest update? fix release for fall creator also?
        
        Reply
        
        brucedawson says:
        
        November 24, 2017 at 9:22 am
        
        Fall Creators Update fix should be coming soon, I assume
        
        Reply
Mohammed Imran says:

December 1, 2017 at 3:24 am

Tested on latest patch
Microsoft Windows [Version 10.0.16299.98]
(c) 2017 Microsoft Corporation. All rights reserved.

Testing with 1000 descendant processes.
Process creation took 3.292 s (3.292 ms per process).
Lock blocked for 0.002 s.
Average block time was 0.000 s.

Process termination starts now.
Process destruction took 1.384 s (1.384 ms per process).
Lock blocked for 0.001 s.
Average block time was 0.000 s.

Elapsed uptime is 0.03 days.
Awake uptime is 0.03 days.

Reply
- brucedawson says:
  
  December 1, 2017 at 9:01 am
  
  Unfortunately while ProcessCreateTests.exe is helpful for investigating this bug its output alone is not enough to determine whether the bug is fixed. One thing to watch for is whether the mouse can move smoothly and swiftly while the test program runs. But the best thing to do is to record and analyze a trace, as shown here: https://www.youtube.com/watch?v=cbg5O2Kbb9A
  
  From eyeballing your results all I can surmise is that you have a slow computer – my laptop runs the process creation phase more than eight times faster. It may be that you don’t have enough cores for this bug to be particularly relevant.
  
  Reply
  - mohammed imran says:
    
    December 2, 2017 at 7:48 am
    
    Questions, where do i get the UI for ETW?
    Indeed, compared to yours mine is a turtle see the specs
    Intel® Core™ i5-4200M CPU @ 2.50GHz × 4 🙂
    
    I am waiting for the fix to land in RS3 branch.
    
    Regards and thanks
    
    Reply
    - brucedawson says:
      
      December 2, 2017 at 8:46 am
      
      http://lmgtfy.com/?q=uiforetw
      
      I just tested with the .98 update – not fixed. https://twitter.com/BruceDawson0xB/status/936999680990310400
      
      Reply
      - Mohammed Imran says:
        
        December 15, 2017 at 3:25 am
        
        thanks for the UI 🙂
        
        Did you check if the fix has landed on build 16299.125 ??
        
        Reply
        
        brucedawson says:
        
        December 15, 2017 at 8:35 pm
        
        It is not yet fixed, but I think they have finally decided to merge the fix to RS3. I guess it was made in the RS4 branch and only got merged to RS1 and RS2? So, January maybe?
        
        Reply
      - Mohammed Imran says:
        
        December 22, 2017 at 8:36 am
        
        off-topic question? how can i totally disable those annoying notifications? pop outs? totally , i don’t want to see them, they pop out and won’t leave, at least google chrome should have a dismiss all button.
        
        Reply
p_lider says:

January 6, 2019 at 4:02 am

Maybe you are facing the same issue with CPU Scheduler like it is seen on ThreadRippers? Recently it was confirmed that this is a bug in CPU Scheduler in Windows. You can try to use CorePrio software with NUMA disassociation enabled to help in the performance. Look here for more details: https://www.youtube.com/watch?v=M2LOMTpCtLA

Reply
- brucedawson says:
  
  January 6, 2019 at 10:19 am
  
  Nope. This was a lock contention bug. An avoidable one. That has been fixed. See this article for a follow-up that discusses how the issue can still be triggered:
  
  A Not-Called Function Can Cause a 5X Slowdown
  
  Reply
Juhani Suhonen says:

January 6, 2019 at 7:40 am

@p_lider: umm.. no. As Bruce explained, the bug was present in certain Windows 10 versions, and although a software hack (in theory) could circumvent the bug, CorePrio does nothing that would help to resolve this particular bug.

Reply
Andrei says:

February 12, 2019 at 7:19 am

I have RS5 installed (1809 build 17763.253) and it seems that the bug returned.

Reply
- brucedawson says:
  
  February 12, 2019 at 7:54 am
  
  My home laptop is running 1809 and I haven’t seen any signs of this – no hangs when running ProcessCreateTests. My multi-socket workstations aren’t on 1809 yet so maybe I’ll see it when I upgrade them, but I don’t think so. Maybe you’re hitting this variant:
  
  A Not-Called Function Can Cause a 5X Slowdown
  
  Reply
Andrei says:

February 13, 2019 at 2:53 am

Procmon shows indeed gdi32.dll being loaded many times by cl.exe, git.exe and even msbuild.exe, but why, that remains a mistery. However, the count is not that big, 2000 times in 20 minutes.
I will probably need to investigate this issue myself in WPA. It’s annoying to not be able to do anything while running the build scripts.

Reply
- brucedawson says:
  
  February 15, 2019 at 7:10 am
  
  That suggests that some sort of extension DLL is installed on your system that has a dependency on gdi32.dll – a hook of some sort, perhaps. cl.exe doesn’t normally pull in gdi32.dll. Look at the other DLLs loaded in to cl.exe and see if any of them look suspicious and/or have a dependency on gdi32.dll directly or indirectly (shell32.dll, etc.) – good luck.
  
  Reply
265 993 303 says:

November 1, 2023 at 12:51 pm

Could the mouse cursor sizes and the hardware effects associated with them affect the mouse motion performance?
Windows up to Windows 2000/ME: 32×32 cursor at all DPI
Windows XP/Vista: 32×32 cursor up to 149dpi, 64×64 cursor for 150dpi and up
Windows 7/8/8.1: 32×32 cursor up to 143dpi, 48×48 cursor for 144dpi—191dpi, 64×64 cursor for 192dpi and up
Even the 64×64 cursor may have mouse pointer shadow, and the hardware effect associated with 64×64 mouse pointer shadow in Windows XP and up might be problematic for graphical performance.

Reply
- brucedawson says:
  
  November 2, 2023 at 10:31 am
  
  The CPU/GPU load associated with a larger cursor should be irrelevant, I think. A 64×64 cursor could be about 64x64x4-bytes = 16 KiB of memory. Reading and writing that a half-dozen times 1,000 times per second would be 96 MiB of memory bandwidth per second, which barely registers.
  
  The problem in this case was some other (very expensive) operations that required the same lock as updating the mouse pointer, so the mouse-pointer updates were blocked for user-visible periods of time waiting for the lock to be available.
  
  Reply
  - 265 993 303 says:
    
    November 3, 2023 at 5:58 am
    
    The sizes I mentioned before are the SM_CXCURSOR and SM_CYCURSOR sizes but with SetSystemCursor it is possible to set much bigger cursors, in Windows 7 I was able to set 16384×16384 monochrome cursor and still have mouse pointer shadow, I was also able to set 32768×32768 cursor although it lost mouse pointer shadow. How would that affect the mouse motion performance?
    
    Reply
    - Jeff Stokes says:
      
      November 3, 2023 at 6:09 am
      
      Apologies for hopping in here,
      
      I don’t know the impact of such a change as manipulating the cursor size but going back to your general concern on shadow and rendering. as long as Nested Page Table entries/Shadow Page Table entries are available (which all modern cpus have I believe) this shouldn’t be an issue, except maybe in a virtual gpu scenario.
      
      I do know when we made the Win7 guidance on creating a VDI image we disabled all the shadows/etc that Aero brought to bear by default.
      
      hth and is relevant.
      
      Reply
      - brucedawson says:
        
        November 3, 2023 at 9:29 am
        
        A 16,384×16,384 cursor would be 1 GiB (assuming 4 bytes per pixel). Updating that at 1,000 fps would not be possible. Updating that at a reasonable rate would be possible on most machines but extremely taxing.
        
        I’m not sure it’s really relevant, however. Yep, a huge cursor would stress the system. But this blog post was reporting on a situation where any arbitrarily small cursor would stutter.
        
        Reply