24-core CPU and I can’t move my mouse

This story begins, as they so often do, when I noticed that my machine was behaving poorly. My Windows 10 work machine has 24 cores (48 hyper-threads) and they were 50% idle. It has 64 GB of RAM and that was less than half used. It has a fast SSD that was mostly idle. And yet, as I moved the mouse around it kept hitching – sometimes locking up for seconds at a time.

So I did what I always do – I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10.

The ETW trace showed UI hangs in multiple programs. I decided to investigate a 1.125 s hang in Task Manager:

UI Delays graph in WPA

In the image below you can see CPU usage for the system during the hang, grouped by process name – notice that total CPU usage rarely goes above 50%:

CPU Usage grouped by process name

The CPU Usage (Precise) table showed that Task Manager’s UI thread was repeatedly blocked on calls to functions like SendMessageW, apparently waiting on a kernel critical region (which are the kernel-mode version of critical sections), deep in the call stack in win32kbase.sys!EnterCrit (not shown):

CPU Usage (Precise) showing where TaskMgr.exe was blocked

I manually followed the wait chain through a half-dozen processes to see who was hogging the lock. My notes from the analysis look something like this:

Taskmgr.exe (72392) hung for 1.125 s (MsgCheckDelay) on thread 69,196. Longest delay was 115.6 ms on win32kbase.sys!EnterCrit, readied by conhost.exe (16264), thread 2560 at 3.273101862. conhost.exe (16264), 2560 was readied at 3.273077782 after waiting 115,640.966 ms, by mstsc.exe (79392), 71272. mstsc.exe was readied (same time, same delay) by TabTip.exe (8284), 8348, which was readied by UIforETW.exe (78120), 79584, which was readied by conhost.exe (16264), 58696, which was readied by gomacc.exe (93668), 59948, which was readied by gomacc.exe (95164), 76844.

I had to keep going because most of the processes were releasing the lock after holding it for just a few microseconds. But eventually I found several processes (the gomacc.exe processes) that looked like they were holding the lock for a few hundred microseconds. Or, at least, they were readied by somebody holding the lock and then a few hundred microseconds later they readied somebody else by releasing the lock. These processes were all releasing the lock from within NtGdiCloseProcess.

I was tired of manually following these wait chains so I decided to see if the same readying call stack was showing up a lot of times. I did that by dragging the Ready Thread Stack column to the left and searching the column for NtGdiCloseProcess. I then used WPA’s View Callers-> By Function option to show me all of the Ready Thread Stacks that went through that function – in this view the stack roots are at the bottom:

CPU Usage (Precise) showing all readying by NtGdiCloseProcess

There were 5,768 context switches where NtGdiCloseProcess was on the Ready Thread Stack, each one representing a time when the critical region was released. The threads readied on these call stacks had been waiting a combined total of 63.3 seconds – not bad for a 1.125 second period! And, if each of these readying events happened after the thread had held the lock for just 200 microseconds then the 5,768 readying events would be enough to account for the 1.125 second hang.

I’m not familiar with this part of Windows but the combination of PspExitThread and NtGdiCloseProcess made it clear that this behavior was happening during process exit.

This was happening during a build of Chrome, and a build of Chrome creates a lot of processes. I was using our distributed build system which means that these processes were being created – and destroyed – quite quickly.

The next step was to find out how much time was being spent inside of NtGdiCloseProcess. So I moved to the CPU Usage (Sampled) table in WPA and got a butterfly graph, this time of callees of NtGdiCloseProcess. You can see from the screen shot below that over a 1.125 s period there was, across the entire system, about 1085 ms of time spent inside of NtGdiCloseProcess, representing 96% of the wall time:

CPU Usage (Sampled) data showing how much time was spent inside of NtGdiCloseProcess

Anytime you have a lock that is held more than 95% of the time by one function you are in a very bad place – especially if that same lock must be acquired in order to call GetMessage or update the mouse position. In order to experiment better I wrote a test program that creates 1,000 processes as quickly as possible, waits half a second, and then tells all of the processes to exit simultaneously. The CPU usage of this test program on my four-core eight-thread home laptop, grouped by process name, can be seen below:

Left block is process creation, devil horns to the right are process destruction

Well, what do you know. Process creation is CPU bound, as it should be. Process shutdown, however, is CPU bound at the beginning and the end, but there is a long period in the middle (about a second) where it is serialized – using just one of the eight hyperthreads on the system, as 1,000 processes fight over a single lock inside of NtGdiCloseProcess. This is a serious problem. This period represents a time when programs will hang and mouse movements will hitch – and sometimes this serialized period is several seconds longer.

I’d noticed that this problem seems to be worse when my computer has been running for a while so I rebooted and ran the test as soon as my laptop had settled down. The process-shutdown serialization is indeed less severe, but the issue is still clearly present on the freshly rebooted machine:

Devil horns are narrower after rebooting, but process destruction is still serialized for a while

I then ran the same test on an old Windows 7 machine (Intel Core 2 Q8200, circa 2008) – you can see the results here:

Windows 7 CPU usage shows no serialization on process destruction

Process creation is slower, as you would expect from a much slower CPU, but process destruction is as fast as my new laptop at its best, and is fully parallelized.

This tells us that this serialization on process shutdown is a new issue, introduced sometime between Windows 7 and Windows 10.

48 hyper-threads, 47 of them idle

Amdahl’s law says that if you throw enough cores at your problem then the parts that cannot be parallelized will eventually dominate execution. When my work machine has been heavily used for a few days this serialization issue gets bad enough that process-shutdown becomes a significant part of my distributed build times – and more cores can’t help with that. In order to get maximum build speeds (and if I want to move my mouse while doing builds) I need to reboot my machine every few days. Even then my build speeds are not as fast as they should be, and Windows 7 starts to look tempting.

In fact, adding more cores to my workstation makes the UI less responsive. That is because Chrome’s build system is smart enough to spawn more processes if you have more cores, which means that there are more terminating processes fighting over the global lock. So it’s not just “24-core CPU and I can’t move my mouse” it’s “24-core CPU and therefore I can’t move my mouse.”

This problem has been reported to Microsoft and they are investigating.

Just one more thing…

This is what what my process create test program looks like when run on my 24-core work machine:

Process destruction serialization is worse on my 24-core workstation

See that tiny horizontal red line on the bottom right? That’s Amdahl’s law visualized, as 98% of my machine’s CPU resources sit idle for almost two seconds, while process destruction hogs the lock that I need in order to move the mouse.

Resources

The ProcessCreateTests code is available here.

Discussions of this post can be found at:

  1. https://news.ycombinator.com/item?id=14733829
  2. https://www.reddit.com/r/programming/comments/6mcruo
  3. https://m.habrahabr.ru/post/332816/
  4. https://www.meneame.net/story/tengo-cpu-24-nucleos-no-puedo-mover-raton-eng
  5. https://tech.slashdot.org/story/17/07/11/2055251

If you liked this post you might like these other investigative reporting posts:

You Got Your Web Browser in my Compiler!

Windows Slowdown, Investigated and Identified (and the follow-up)

PowerPoint Poor Performance Problem

Self Inflicted Denial of Service in Visual Studio Search

Advertisements

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Investigative Reporting, uiforetw, xperf and tagged , , . Bookmark the permalink.

155 Responses to 24-core CPU and I can’t move my mouse

  1. jaimemmoreno says:

    I’ve also seen this on my hexacore PC running Win 10. It’s really annoying when it happens. Never saw the hitching with the mouse or keyboard input when running Windows 7 or Windows 8 on same machine so think the problem was introduced between Windows 8 and Win 10.

  2. Pingback: 4 – 24-core CPU and I can’t move my mouse

  3. yuhong says:

    I can’t imagine that moving the Win32k stuff back to CSRSS would help much in this case, right? Though it is still a good thing especially for terminal servers where hopefully one CSRSS process crashing just terminate the session.

  4. James says:

    I’m almost crying in tears. This is exactly what happens with my recent 8 core 6900k on Windows 10 and couldn’t find what could be the reason. Specifically when that behaviour is not happening on Linux!

  5. MBH says:

    If you have a Skylake or Kaby Lake CPU, there’s a bug in their hyperthreading code, so disable hyperthreading and see how it goes.
    That bug caused a lot of segfaults on Linux and data corruption in registers.

    On Linux, you can circumvent this by using the intel-microcode package which modifies the CPU’s microcode on every boot and fixes this issue.

    • brucedawson says:

      I have heard of that bug but it is not related to this issue. The lock contention issue that I discovered is a pure Windows 10 performance bug, not a CPU flaw.

  6. rezna says:

    When I was playing with IncrediBuild (a distributed build system) few years ago, I encountered that having more than 4 or 8 threads building the code useless (Lenovo T420, 8GB RAM, i5 Core, mSata SSD drive) because the overhead of spawing and killing the processes was to high. May be the Chrome build system might also try throttling the threads at some reasonable level.

    • brucedawson says:

      The ninja build system (used by Chrome) carefully manages the number of processes created in order to avoid overwhelming the system. It scales to the number of cores. We are not overloading the system and if we throttle process creation we will build Chrome more slowly.

      I am not aware of any mitigation which we can do – we need a fix from Microsoft.

      • Allan Jensen says:

        Ninja doesn’t manage anything. It just measures how many virtual cores you have and launches that number of parallel proces by default. You can control it a bit with the command line arguments -j (number of jobs) and -m (maximum load)

        • gim (@gim) says:

          what I think bruce meant is that ninja waits for old processes to finish first. Moreover, even if you specify `-j 20` that doesn’t mean ninja will start 20 processes, it will throttle it depending on the load (see `RealCommandRunner`)

      • mohammed imran says:

        Has Msoft replied back, what are they doing with this bug, it cripples my system sometimes. what are they doing about it??

  7. Pingback: 24-core CPU and I can’t move my mouse | ExtendTree

  8. R Questioner says:

    Moderator: This is off topic so I filed a bug for this issue and I’ll delete future comments on this topic. Please comment on the bug at crbug.com/740760 if you have further information which can help us investigate this issue. We do take memory consumption issues seriously and we are working on them but we need more specific complaints or there is nothing we can do to help.

    • Manuel says:

      They have so much RAM because building Chrome from source requires at least 32 GB for release builds and 64 GB for debug builds.

    • brucedawson says:

      We are working on fixing memory leaks in Chrome and otherwise reducing its memory footprint. Many fixes have been made in the last couple of years. If you have specific examples of pages that are using too much memory in Chrome then please files bugs at crbug.com.

  9. Pingback: 24-core CPU and I can’t move my mouse – PipisCrew Official Homepage

  10. Anon says:

    The moment I got to these two words, “Windows 10”, in the first sentence…

    Moderator: Non-constructive. Deleted.

  11. Ohm says:

    Did you try another mouse?

  12. Steve says:

    They should of called windows 10, Vista 1.0 real garbage…

    Moderator: off-topic and non-constructive

  13. Roy Adams says:

    Excellent post Bruce!

    Did Microsoft happen to give you a bug/issue tracking number when you reported the problem?

    Thanks,
    Roy

  14. Satay Nutella must go says:

    Windows 10 is absolute garbage.

    Moderator: Non-constructive and off-topic. Deleted (well, mostly).

  15. I has a simular problem recently, (there was a one second pause on every process exit) and after plenty of debugging and hair pulling I discovered that it was caused by the AMD drivers. Never understood how they caused IT exactly, but updating to a newer version resolved it.

  16. Criação says:

    I have a similar problem with my mouse. Thanks for giving me a clue to what might be happening under the hood. This is most annoying, really.

  17. Thomas Maher says:

    Had this issue and fixed it somehow. Try disabling windows defender fully.

  18. Rich Talbot-Watkins says:

    I had exactly this problem too, and fixed it to a large extent by updating a USB driver. But I’ve still seen occasional hitches since then.

  19. Franklin says:

    I think the universe will thank Bruce when MS fixes this. I have similar symptoms during big builds.

    • Aaron says:

      Absolutely agree. Per usual Bruce has done a thorough job of root causing and pinpointing a serious flaw in Windows 10.

      Thanks Bruce!

  20. Jay B says:

    Disable hyperthreading if you have a newer Intel CPU. Horrible bug in them that nobody is releasing micro code for.

    • Jonathon Reinhart says:

      See Bruce’s comment above. That bug causes corruption in CPU registers, and has nothing to do with this Windows 10 bug.

      • jaimemmoreno says:

        Yeah I’m running an older 6 core 4930K Intel CPU that doesn’t have that new hyperthread bug and still seeing the problem on Win10.

  21. Rick James says:

    Hi Bruce,

    Long time! 🙂
    For MS folks: 12699333.

    Cheerz,
    Rick.

  22. JT Turner says:

    Isn’t there a flag on the compiler to force it to only use so many cores? Like -j 4 (use only 4 processes)?

    • brucedawson says:

      We invoke a separate compiler instance for each translation unit so we have full control of the amount of parallelism. The problem is that this Windows bug can manifest even if you only have one process per core. And, for distributed builds we aim to have ~20 processes per core and this works fine – except for this bug.

  23. John Walluck says:

    What a great job of analysis abd reporting. If MSFT doesn’t fix this issue quickly after this they’re not trying.

  24. Metro Melvin says:

    Nice work..

  25. John Doe says:

    Specs on all your hardware?

  26. Gwilbor says:

    I have a very similar problem on my laptop, it’s a HP Pavilion with Windows 8.1 (CPU is Intel i5-3230m 2.6 GHz: the only difference is that the trackpad is affected, but the external usb mouse is not. Anyway, since the first day, very often the mouse pointer freezes, while the rest of the computer seems to be working fine. Much of the time I am forced to use the keyboard to scroll webpages. Do you think is the same issue?

  27. DavidW says:

    You have a 24 core laptop? With 64G of RAM? What brand/model is this? Some sort of Xeon beast?

  28. Pingback: Procek z 24 corami, a mysz się tnie. Czyli pod Windowsem wszystko po staremu :P https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/… | admin1

  29. Andrew says:

    So why is CPU Usage at 50% on idle. There’s your first problem.

    • brucedawson says:

      CPU Usage was not at 50% on idle. CPU usage was at 50% during a build of Chrome. Which is appropriate. In fact, CPU usage probably should have been higher, and after this bug in Windows is fixed it will be higher.

  30. jdrch says:

    Ummm I’ve seen PCs lock up when trying to exit too many things at once before. Nothing new here.

  31. adam c says:

    i’ve always been sensitive to mouse lag in windows 10, i put it down to the pc hardware ecosystem, things like cpu/gpu stepping 100hz~60hz monitors and the such, i thought it got better recently but im not using a 24 core system.

    p.s. total witch hunt here, but im noticing so many direct sound issues maybe related

  32. I think it’s ironic that there’s GDI-related lock contention even when Console-mode processes that never even touch GDI (such as the compiler) are closed.

  33. Jamesits says:

    This have been noticed by me since Windows 10 rs1. Strange input lags or frame drops happen when playing osu! on my 16 thread workstation with <25% CPU usage. Sometimes the mouse keeps freezing for ~3s or more.

    • Jamesits says:

      Plus, have you noticed taskmgr processes tab loads significantly slower when computer is on for 1 day than just after boot?

      • brucedawson says:

        I have not noticed that – it appears very quickly for me. And, your issues sound different because they aren’t around process destruction. Sorry, you may have to investigate them. Sigh… computers.

  34. Lennie says:

    This problem has always existed in Windows, Windows 7 was better at it than previous versions. Seems Windows 10 is worse again.

    Just run a Linux VM with your Chome build. 🙂

    • brucedawson says:

      No. This problem is new. My tests show it did not exist in Windows 7. Process creation/destruction may have always been slow on Windows, but that is actually separate from this issue.

      And I’m a Windows developer, using Windows tools to develop Windows software. A Linux VM isn’t going to help.

  35. George McCabe says:

    I’m running Windows 7 ultimate, 12 core i7cpu. My PC ran sluggish and choppy after I installed Chrome browser. After I would close Chrome it left a lot of processes still running in the background. I had to completely uninstall Chrome to get back to normal

    • brucedawson says:

      Chrome shouldn’t make your machine sluggish – that seems odd.

      The background processes are probably from background apps. See chrome://settings, advanced, “Continue running background apps when Google Chrome is closed” – set that to off.

  36. alvarolucas says:

    Sorry, I didn’t read it… because as soon as I read “I’ve a 24 CPU and I can’t move my mouse” I automatically think… what windows version do you have?… and I wasn’t wrong.
    Put a linux distro on your live! :-)) … and forget about an endless live of problems with no sense on windows…

    • barton96 says:

      You Linux evangelists are boring as a sack of rocks. It was cool maybe for a while in 1996, but not anymore.

    • MOW says:

      Try copying a 100GB file to a USB harddisk … Linux has its own scheduler problems. Hopefully 4.12 will fix this.

  37. plexus says:

    Thank you. Have the same problem, couldn’t identify it though. I have the i7-5820 with 6 cores (12 threads). I always thought some hardware must be broken or having issues.

  38. Diego says:

    Hola. A mi me paso parecido y lo solucione añadiendo un disipador con ventilador al chip GPU de mi placa base, a pesar de que tenía dos tarjetas en SLI.
    Hello. I had the same problema time ago. I could solve that fixing a fan in the GPU chip of my motherboard. Im sorry about my english.

  39. Sarreq Teryx says:

    hrm… my mouse starts locking up like that after a while, as well. Since it’s wireless, I assumed it was something interfering with the radio, but it’s so intermittent, I could figure out what might be the source. This certainly could explain it.

  40. Anonymous says:

    What about Windows 8.1? Can you test on it please?

  41. akraus1 says:

    Could be a desktop heap leak. The GDI resources are put onto a special user session bound heap which can leak if you e.g. call RegisterWindowMessage many times with different parameters. This type of leak is hard to find. Which sort of GDI resources are your build processes creating? You could try if this changes if you omit some processes of your build to check if one specific process is responsible for the degradation over time.

    • brucedawson says:

      That sounds plausible except that the serialization happens on a freshly booted machine running the ProcessCreateTests project (source code link in the post, and see the post-reboot image). So, if there is a leak then it is in the OS and is triggered even by programs that never touch user32.dll or gdi32.dll.

      • akraus1 says:

        In your test application you are calling to GetDesktopWindow and PeekMessage which are located in user32.dll. Even such innocent methods can use the desktop heap as implementation detail. There is no list of methods available which cause desktop heap allocations. If you can get away without a messae loop and PeekMessage then you should not see this degradation. Win10 is known for having a much slower VirtualAlloc performance for which a hotfix is available which could also be somehow related. As far as I know this fix is not public yet so you should check with MS support if this changes things.

        • brucedawson says:

          I am very careful to only call those functions in the master process, not in the 1,000 descendant processes. I did this to avoid the concerns that you raise and because the DETACHED_PROCESS mitigation doesn’t work if user32.dll is loaded.

  42. Pingback: Avec Windows 10, plus de cœurs = moins de fluidité - EXTEIN

  43. Doniel says:

    How’s the proceses IPC? Did you try diferent system clocks windows seems to sincronice badly with some of them.
    Try HPET, or windows default. Have to turn on/off in the bios and on off in the system to completely work.

  44. Ian Brodowski says:

    Out of curiousity, are you running the creators update (1703) or the anniversary update (1607), or the 1511 release?

    • brucedawson says:

      I am running Anniversary Edition at work and Creators Update at home. It is possible that the bug first showed up in Anniversary Edition because I didn’t notice it before I upgraded, but that may also be because of work-flow differences.

  45. People here in the comments are emphasizing too much on “mouse” and “chrome”. This problem is further beyond that. It can affect the entire machine and on different workloads within W10.
    I, for instance, couldn’t handle the hiccups and had to downgrade to W7.

    I used to have some Gradle projects and couldn’t even use my machine while compiling them. Suffered from it even on daily-base computer usage. Now on W7 it’s a total different story.
    I know this might not be the same bug, but it might also totally be.

  46. Lars Berntrop-Bos says:

    Curious if this bug is also in Terminal Server settings, i.e. present in Windows Server versions. Those frequently have lots of cores and lots of processes, making occurrence of the bug and hindering performance a serious risk.

    • brucedawson says:

      This bug is almost certainly present in the server editions. The mouse-hitching aspect of the bug is less likely to matter on a server, but the process-destruction bottleneck will affect some workloads.

  47. trawg says:

    This is really interesting. About 3-4 weeks ago I noticed the same kind of mouse skipping problem on my Windows 8.1 PC. I can’t remember how many cores I have (overseas at the moment) but I want to say 16, with 16GB of RAM. I ended up buying a new mouse to see if that was the problem but haven’t had a chance to use it long enough before I had to go overseas to see if it fixed it. I did wonder if it might be a recent weird low-level Windows patch that might’ve changed something but I thought it was equally likely to be my 4 year old mouse

    I took my “broken” mouse overseas with me to use on my laptop and it seems to be working fine. So I will be very interested to see if there is any progression on this issue.

  48. Rob K says:

    Very cool article. Explained so well for those of us trying to climb the steep curve. Thank you!

  49. Nicolas Ramz says:

    Windows is great, but there are lots of bottlenecks, like this one. One other area where it can (and *need*) to improve a lot is filesystem: Windows is really really slow at creating/deleting lots of files. Uncompressing an archive containing Firefox sources can take about 21 minutes (!!) while on a same machine, in a VM, with macOS machine it would take around 2 minutes. That’s 10x times slower than macOS that is slowed down by the VM…

    • Zaru says:

      Turn off Windows Defender real-time protection while extracting large archives (from known safe sources). Every new file created triggers a full scan in the background, that’s your main source of Windows 10 file system sluggishness.

      • Nicolas Ramz says:

        Tha’s better without Real-time protection: 5m13s to uncompress the same directory. But it’s still two times slower than macOS (that’s running inside a VM). Also, you cannot tel people to stop virus protection before doing heaving file operations…

    • Zaru says:

      And just to add: temporarily turning off real-time protection and using a RAM-drive (ImDisk VDD etc.) for large builds or any operation on large numbers of files, greatly speeds things up in Windows 10 as well.

  50. Marty G says:

    I have noticed in my experience that all computers and OSs I use seem to be getting more and more bogged down during process closing operations. Not to the point of UI freezes mind you, just in system load. In my mind it has seemed correlated with attempts in all OSs to deal with security issues involving clearing memory on dealloc and doing proper memory management when returning the freed mem to the available pool. This is a pure blackbox/shotgun line of thinking on my part. But can you think of any factor that would affect linux, android, windows process close loading that would be more a result of an overall approach; like an industry-wide way of doing things?

    • brucedawson says:

      I can’t think of anything. Zeroing of memory is usually done at allocation time or (on Windows) done asynchronously by the system process, and zeroing memory isn’t even a process-destruction specific issue. See this article for details:
      https://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/

      • Marty G says:

        Thanks for responding. I should not have gone down the rabbit hole of memory dealloc as I don’t have any reason to suspect that in particular and I think I ended up focusing your answer at a level I didn’t intend. What I was getting at was that closing processes in general seem to take a lot more CPU than it used to across the OSs I use, which is all the majors except Apple stuff and was wondering if an industry-wide OS or userland programming strategy might be behind it. I know apps save a lot more state data when ending these days for example, but again, I don’t really know if that is what I’m seeing. I was just curious if some general response to security issues or similar may be resulting in this effect, but it is likely just bloat in size, statefulness, and features in general causing more cleanup to be required. I’m just throwing it out there. But back to the topic, excellent work in finding this specific W10 bug. MS should send you a check for your time!

  51. Any chance you could upload a binary of the 1000 process test app? I’d be interested if playing with this bug on a few versions of Windows, but I don’t have things set up to compile stuff.

  52. Marcos Sebastian Alsina says:

    Great investigation Bruce. Thanks a lot for your time, you saved mine :-). The question is who was the genious that put such Critical Section on such critical part of code.

  53. Just asking why process instead of thread?

    • brucedawson says:

      The ninja build system is designed around creating processes. And, because it invokes many different build tools (compiler, linker, python, etc.) it mostly has to use processes.

  54. Yeah think this is a flaw in chromes build system and not windows. First I don’t think chrome should be spawning many processes. They should be threads. Second even still there shouldn’t be an effort to have n threads/process where n is numb of cpus. Because they contend with each other for time on any thread to execute. It is actually improbable that any thread actually executing at same time. Threads should be used more abstractly. If there is so much logical parallel work than make threads for them, so that if and when ever possible they can be handled independently. But windows will handle them in the end as it sees fit. Either way you shouldn’t be using 48 process to compile a web browser, but go ahead try to get windows to change the operating system to fit google chrome. -_-

    • Richard M. says:

      “I don’t think chrome should be spawning many processes. They should be threads.”

      It’s not Chrome, it’s a Chrome build!

      “you shouldn’t be using 48 process to compile a web browser”

      What??????

      do you even know what you are talking about??? I hope you don’t work for Microsoft…

    • Richard M. says:

      –were not the best way, methinks, albeit it is not to be denied that authorities differ as concerning this point, some contending that the onion is but an unwholesome berry when stricken early from the tree whileas others do yet maintain, with much show of reason, that this is not of necessity the case, instancing that plums and other like cereals do be always dug in the unripe state yet are they clearly wholesome, the more especially when one doth assuage the asperities of their nature by admixture of the tranquilizing juice of the wayward cabbage and further instancing the known truth that in the case of animals, the young, which may be called the green fruit of the creature, is the better, all confessing that when a goat is ripe, his fur doth heat and sore engame his flesh, the which defect, taken in connection with his several rancid habits, and fulsome appetites, and godless attitudes of mind, and bilious quality of morals–

      King Arthur could have produced such comment on the topic…

    • brucedawson says:

      I’m not sure how ninja (Chrome’s build system) is going to spawn compilers/linkers/python/etc. as threads instead of processes.

      Ninja is very good at spawning the number of processes that I want. That is n-processes for local builds (perfectly saturating the local CPUs) and I use n*20 processes (-j 960) for distributed builds (because they use fewer local resources). This works very well, and will work even better once Microsoft fixes their regression – which I am confident they will do.

      I am quite familiar with the idea of threads/processes contending for the CPU and I am careful not to pointlessly over-saturate the CPUs.

      Chromium is open source. Give it a try. I think you will find that it has a very well implemented build system, across multiple systems. Contributions welcome.

      • Well yeah if you have each thread just work on a single thing, they will end up smothering themselves past a certain point as they compete for time on the cpu, but if they choose to take up work where others leave off they no longer follow that saturation rule. As more threads are used they increase the time on the cpu.

  55. Would this qualify for a MS bug bounty reward? I think you deserve it.

  56. Lee says:

    I mean… Windows 10 … in 2017 … -> roflmao
    Not because of your very specific but rather because of many different issues that W10 caused on my workstation (during a 2 year “grace period”) I switched back to Windows 7 and I don’t regret it at all.

    Everything just runs SMOOTHER.

  57. Set says:

    I am experiencing the exact same thing. Usually after Chrome was opened but even after I close Chrome it happens for a minute or two more.

    • brucedawson says:

      It’s not clear if that is the same issue or not, especially since I only encountered this issue when *building* Chrome, not when using it. If you can record an ETW trace of this poor behavior after you close Chrome (start tracing, close Chrome, repro the poor behavior, then save the trace) that would be helpful. You can then file a bug at crbug.com. UIforETW makes recording traces easy – go to https://tinyurl.com/etwcentral

  58. tester says:

    Verision of Windows 10 is????? So if have to bite Windows 10 atleast give as a full specyfication of your machine.

    • brucedawson says:

      I saw this problematic behavior with process exits on Windows 10 Anniversary Edition and Windows 10 Creators Update. It may have happened on more versions as well – I don’t know – but I suspect it first started happening in Anniversary Edition.

  59. Pingback: Windows 10 does not like a lot of cores | NUTesla | The Informant

  60. Ark-kun says:

    Sorry for a not very relevant question, but as you work on Chrome and performance in Windows, you’re the closest expert for me.

    When I have many tabs open, the root Chrome process uses quite a lot of CPU and the performance degrades (e.g. the file download animation – the button expanding at the bottom of the window can take couple of seconds). This happens even when there are no CPU-hogging content processes (say, I’ve killed them…).
    Is this considered normal behavior (given that I have many tabs open) or should I collect some kind of trace and submit a bug?

    • brucedawson says:

      That is not normal. Even with lots of tabs open Chrome can be almost completely idle. It’s hard to speculate about what might be going on. It could be a bad web page, an ill-behaved extension, or a Chrome bug. Filing a bug at crbug.com and attaching a trace (chrome://tracing or ETW trace) would be helpful.

  61. santagada says:

    Did you fill a bug report with microsoft? I would love to hear their answer/patch.

    • brucedawson says:

      I reported the bug through informal channels. Earlier in the comments Rick James helpfully shared a Microsoft bug number: 12699333. Unfortunately I doubt the underlying details will be shared.

  62. Pingback: un CPU avec trop de cœurs rend le PC moins performant | 5H40

  63. goawaey says:

    what is the cpu you have? is it haswell? broadwell?

    • brucedawson says:

      I first saw it on a Haswell Xeon processor. I then reproed it on a much older processor (circa 2011) and on a brand-new Kaby Lake. Which is to say, this bug has nothing to do with the processor you are running (other than that you need at least two cores to see it). It is a software performance bug.

  64. James says:

    Question: does the issue manafest with USB and ps/2 devices? You’re running a hefty setup , and High-end desktop proc dell towers still come with PS\2 ports (for obv government contract reasons) and wonder if it mitigated the issue.

    • brucedawson says:

      The type of mouse does not matter. Access to the Windows message queues is blocked by lock contention. The issue is not even specific to mouse movement – it affects anything that needs the lock, which *includes* general responsiveness of all UI programs, and more.

  65. Pingback: Windows 10 : trop de cœurs nuisent à la performance ? - EXTEIN

  66. Jeremiah Penery says:

    I ran the ProcessCreateTest.exe a few times on my machine (3930k, 6 cores/12 threads, Windows 10). Cutting out all the process creation parts:

    Process destruction took 0.686 s (0.686 ms per process).
    Lock blocked for 0.085 s.
    Average block time was 0.012 s.

    Process destruction took 0.657 s (0.657 ms per process).
    Lock blocked for 0.005 s.
    Average block time was 0.001 s.

    Process destruction took 0.656 s (0.656 ms per process).
    Lock blocked for 0.029 s.
    Average block time was 0.004 s.

    Process destruction took 0.644 s (0.644 ms per process).
    Lock blocked for 0.009 s.
    Average block time was 0.001 s.

    Process destruction took 0.635 s (0.635 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Strange that I’m not seeing any issues here.

    • brucedawson says:

      It is a bit strange, but not totally. The effect only really becomes noticeable (without looking at an ETW trace) on machines that have been up for a while and heavily used – whatever that means. Mine is also behaving well at the moment – go figure.

      • Lars Berntrop-Bos says:

        I would love to know the actual build of WIndows 10 your on. For my development environment I have seen several bugs squashed only since a specific build, 15063.447. An overview of builsds and corresponding KB numbers is here: https://technet.microsoft.com/en-us/windows/release-info.aspx
        The build number is listed at Settings:System:About:OS Build.
        One of the bugs squashed was in WinForms, where normal userland code could cause a 0x7f aka Unexpected_kernel_mode_trap bluescreen….

  67. Konstantin says:

    The new half transparent windows calculator moves slower than other windows when dragged with the mouse.
    Machine: AMD FX-8320 (technically 4 cores, 8 threads), AMD RX580 GPU, 4 GB of RAM (I know that the RAM is the bottleneck in many cases).

    But – could this be a similar problem? I recently swapped GPU to RX580 (for gaming, not for mining cryptocurrencies) – and the only thing it could have problems with is with moving the calculator window…
    If I think about it it can only be Windows…

    • dgw says:

      I’ve noted performance issues with moving windows around with transparency enabled as far back as Win7 (I never had Vista). Probably something to do with how much extra work the desktop compositor has to do when windows don’t simply occlude each other.

      • brucedawson says:

        The only way to be sure about what causes a performance issue is to record a profile and see what is going on. The next best thing is to find some simple change that makes the problem appear/disappear. The calculator moves fine for me, although it also doesn’t seem to be partially transparent, so I guess I have some setting different.

  68. Pingback: 24-core CPU and the mouse is stuck on Windows 10 | Random Thoughts

  69. Pingback: Windows 10 Process-Termination Bug Slows Down Mighty 24-Core System to a Crawl | MonctonLife

  70. Pingback: Windows 10 Process-Termination Bug macht 24-Core-System zur Schnecke - Hardwareinside

  71. Pingback: Windows 10惊现尴尬Bug 24核心竟然卡成蜗牛 - 玩懂手机网

  72. Pingback: Windows 10: Too many cores hinder performance? | Creative Collaboration

  73. Pingback: Windows 10 又有新 Bug,掛上 24 核心處理器執行度反變慢 | TechNews 科技新報

  74. Pingback: Windows 10 are o problemă la închiderea în masă a programelor

  75. Pingback: Les liens de la semaine – Édition #241 | French Coding

  76. Pingback: Windows 10 惊现尴尬 Bug!24 核竟然卡成蜗牛 | News Pod

  77. Pingback: A 24-core CPU and 64GB RAM — and this Windows 10 user still experiences mouse-cursor hiccups | Doctissimus @ Port Urla

  78. I’ve run your compiled “ProcessCreatetests.exe” program on my laptop and all processes terminated rather quickly (around 1.488 s – creation, 1.388 s – destruction).

    Testing with 1000 descendant processes.
    Process creation took 1.488 s (1.488 ms per process).
    Lock blocked for 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.388 s (1.388 ms per process).
    Lock blocked for 0.284 s.
    Average block time was 0.024 s.

    All done on Windows 10 Pro Insider Preview, Build 16232.
    Processor Intel Core i7-2630QM @ 2.00GHz, 8GB RAM.

    • brucedawson says:

      And? There is some randomness in how long the processes take to exit, with how long your system has been up being one factor. But, it’s quite clear that your system was suffering from the problem that I found. The lock was blocked for at least 284 ms which is far longer than it should be. If process destruction wasn’t serialized then the processes would have terminated even faster, and without the risk of micro-hangs.

  79. Pingback: Windows 10 驚傳在 24 核處理器下有 卡頓(Lag) 的問題,實際上一般人並是比較不會遇到 | 晴喵の窩

  80. Pingback: 24-core CPU and I can’t move my mouse | Jkab Tekk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s