Taskbar Latency and Kernel Calls

I work quickly on my computer and I get frustrated when I am forced to wait on an operation that should be fast. A persistent nuisance on my over-powered home laptop is that closing windows on the taskbar is slow. I right-click on an entry, wait for the menu to appear, and then select “Close window”. The mouse movement should be the slow part of this but instead I find that the delay before the menu appears is the longest component.

This has been bothering me for a long time but I had been showing uncharacteristic self control and had resisted being distracted. Until today, when I finally broke down and grabbed an ETW trace.

This post was written as a test of speed-blogging. Total time from finding the issue and sarcastically tweeting about it to publishing the initial post was about 90 minutes.

The ETW trace records me right-clicking on the task bar to close two Explorer windows. I used UIforETW’s Tracing to file with the default options, giving me a 20.9 MB trace.

Sometimes the challenge in trace analysis is to find where the issue is, but for this issue that part of the analysis was trivial. There were three clear signals that all pointed to the right place, and a painfully obvious culprit.

The first signal is the input events. UIforETW contains an integrated input logger (anonymized enough so that I don’t accidentally steal passwords or personal information) so I could just drill down to the MouseUp events with a Button Type of 2, which represents the right mouse button. This draws highlights on the WPA timeline for when those events happened – see the vertical lines below:

image

That tells me when I released the right mouse button, and there is very suggestive window focus change that happens about 600 ms later which I’m guessing is when the menu appeared. Finally there is a clear block of CPU activity in RuntimeBroker.exe between the mouse up and window focus events.

The window focus change and CPU activity are not proven to show the end time, but measurements with a screen recorder app showed the menu taking about 660 ms to appear so I’m going to trust them.

The next step was to see what RuntimeBroker.exe was doing. While CPU Usage (Precise) is great for seeing how much CPU time a process is using, and why it is sitting idle, the CPU Usage (Sampled) table is the right tool for figuring out where CPU time is being spent. I drilled down and quickly found 264 samples hitting in KernelBase.dll!ReadFile:

image

Looking around a bit I found other call stacks that also ended up in that function so I right-clicked on it and went View Callers-> By Function. The view this gave me (with stacks now inverted) shows that a total of 628 samples, along various call stacks, ended up going through that function, accounting for 70% of the 899 total samples in that process:

image

271 samples in that thread didn’t go through that function, and the remaining samples (not shown) were in other threads.

Note that the 899 samples in thread 10,252 represent two mouse clicks, so about 450 samples or about 450 ms (using the default 1 kHz sampling rate) per mouse-click.

Sometimes file I/O is CPU time

CPU Usage (Sampled) shows CPU time, so disk I/O usually doesn’t show up there because then the thread goes to sleep and waits on the disk. The fact that this shows up as CPU time suggests that the reads were all hitting in the system cache and the CPU time was the kernel overhead (note ntoskrnl.exe on the first sampled call stack) of grabbing data from the cache.

Now that we suspect file I/O it’s necessary to go to Graph Explorer-> Storage-> File I/O. With a bit of column rearranging we get this impressive result:

image

What this says is that, over the course of two right-mouse clicks, RuntimeBroker.exe, thread 10,252, issued 229,604 ReadFile calls, reading a total of 15,686,586 bytes. That is an average read of 68 bytes each time.

Think about that for a moment.

Remember that these are calls to the operating system – kernel calls. This means that there is no caching between RuntimeBroker.exe and this file. In fact, the file itself is only 4,027,904 bytes long, so each time I right-click an explorer icon on the task bar this file is read 1.9 times in the slowest possible way.

After some encouragement from readers I decided to delete the file in question and now my explorer jumplist menus are almost instantaneous. I don’t recommend it (you’ll lose your quick access customizations and explorer history) but deleting this file might help you also:

%appdata%\Microsoft\Windows\Recent\AutomaticDestinations\f01b4d95cf55d32a.automaticDestinations-ms

The gains from deleting that file can be seen in this WPA screenshot. The thin vertical lines are right mouse clicks and the bottom row of rectangles represents when the menu started appearing. I deleted the file after the first two clicks and the delay is hugely reduced on the last two clicks:

Image

Alternately “you can use the official doohickey in Settings to both clear (toggle 2x) or disable (toggle 1x) jumplists on Start/Taskbar. Be sure to also disable animations for an extra speed boost.” Thanks Rafael!

This seems like a fixable problem. Once explorer’s taskbar menu latency is reduced it would be great to see the other delays addressed so that all taskbar menus appear instantly instead of with hundreds of ms of delays.

Summary

  • Don’t call ReadFile to get 68 bytes. Or, at least, don’t do it a hundred thousand times. I get cranky when databases do 4-KiB reads but this is amazing.
  • I hope somebody fixes this but until then I’ve got a workaround by deleting the problematic file
  • I hope somebody investigates to find out why context menus are slow in general. Hey, I shared an ETW trace and everything!
  • Feedback shared through official channels at https://aka.ms/AA60dfg
  • If you want to learn how to do this type of analysis then get my tool UIforETW and read some of my tutorials, linked from here
  • Note that you can get a context menu (simpler than the default Jumplist menu) much faster by shift+right-clicking on the icons. I’m gonna use this trick when closing lots of windows.
  • Hacker news discussion is here
  • Reddit discussion is here
  • Russian translation is here
  • Twitter threads are here and here

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8.
This entry was posted in Investigative Reporting, Performance, uiforetw, xperf and tagged , , , . Bookmark the permalink.

28 Responses to Taskbar Latency and Kernel Calls

  1. Jeff Stokes says:

    Looks to me like you should change power plan off balanced/whatnot, or update firmware, or both. Did you see the C3 assignment to core 0?

    • Jeff Stokes says:

      Guess I can’t edit my response. BIOS looks up to date more or less, I noticed in the System Config the power area is very verbose. Found this.

      7, Deep Sleep Enabled., True.

      That cpu 0 never comes off C3 seems strange to me.

      The deep sleep state thing reminds me of the old nehalem/westmere CPU errata. (https://support.citrix.com/article/CTX127395)

      The disk activity is odd, agreed. I remember reading that Start Menu and taskbar (I thought) were getting their own process? You on that build?

      • Jeff Stokes says:

        https://imgur.com/E19mkkM

        thought that was kinda interesting too, over 2 seconds dpcs in acpi.sys

        • brucedawson says:

          CPU 0 staying on C3 is probably power management working correctly – the system was idle enough to not need all CPUs busy. But, I didn’t look at that in detail.
          The ACPI.sys activity looks a bit odd – why is it waking up every second? – but I only see 30 ms of activity in the trace.

          Note that technical it is File I/O activity that is the problem, not disk activity. If those reads all hit the disk then the pain would be orders of magnitude worse.

  2. Anonymous says:

    Oh yeah, UI elements like that are laggy all the time in Windows. You can shift-right click to get an older, more efficient right click menu, getting around this problem. Bruce, if you dug around more you could probably find enough similar oddities to write dozens of posts like this, but it’s probably not good for your sanity to dig too deep.

    Why do I know about shift-right clicking? It’s a workaround to a UI bug that’s been around for years where the taskbar preview becomes sticky. https://news.ycombinator.com/item?id=12246834 Just accessing the old right-click UI once works around the issue without killing explorer.exe

    I also get laggy right-click menus when closing explorer if I don’t remember to hold down shift. Interacting with various UI elements has the possibility of spinning up my external hard drive for no reason and causing explorer to freeze until it’s awake. And I thought Windows was supposed to be good for async…

    Sometimes I’m tempted to just kill explorer.exe and leave it off permanently. Windows is fairly usable without it if you know what you’re doing.

    • brucedawson says:

      Thanks for the Shift+right-click trick. You’ve made a lot of people happy. Which is sad

      • Anonymous says:

        For advanced users there’s also the great 7+ Taskbar Tweaker tool. Does that without having to press Shift and a lot more.

    • Anonymous says:

      I found that the automatic destinations file that Bruce pointed out was in fact large, and deleting it mitigated the problem.

      Re: Rafael’s trick, I already had the group policy “Do not keep history of recently opened documents” enabled, so the settings menu was greyed out and set to “no.” Reading the group policy note I found:

      >If you enable this setting, but then later disable it or set it to Not Configured, the document shortcuts saved before the setting was enabled reappear in the Recent Items menu and program File menus, and Jump Lists.

      So it sounds like if you use group policy for this on a machine that was already in use, you have to delete the existing jump list to see the performance benefit. I definitely didn’t expect using the group policy would have a notable disadvantage compared to configuring with the UI. Probably still useful as long as you have the policy set from day one.

  3. Paul Jackson says:

    Microsoft should hire you to optimize their crappy code. It’s frustrating that Windows 10 is much slower than 7 or XP in doing basic operations.

  4. ericlaw says:

    @Paul: Bruce used to get paid by Microsoft to optimize crappy code, but he preferred to do.

    Bruce– AutomaticDestinations are the jumplist cache; my version of that cache is 744k. You could probably clear the file without a big negative UX impact. https://superuser.com/a/231105/16886

    • Dom says:

      AutomaticDestinations also work very poorly with roaming profiles and terminal servers. Multiple conflicting versions of the files are created, until they self corrupt. Then they don’t repair themselves, and using the GUI clear cache doesn’t fix it either. You need to manually delete the files.

      AutomaticDestinations.ms is just one load of baaaad programming.

    • brucedawson says:

      I deleted (moved, actually) that file and indeed the Explorer jumplists are much faster now. WinRot at its finest?

  5. SubmarineWipers says:

    This all boils down to the “many people working on the same thing and nobody has a detailed model of how the whole thing works in their head”, plus Microsoft’s culture of “adding features is the only valued contribution” (this has been made public by multiple bloggers from inside MS). They develop more and more stuff but management doesnt care about speed improvements, or even monitoring to not cause performance regressions. Because they think, they can only sell features to their customers.
    Meanwhile, I would gladly pay twice the price for Visual studio, if it wasnt so damn laggy and unstable.

  6. Jonathan says:

    This annoyance is something I’ve also been meaning to investigate, so thanks for this. However, I’m also interested in how it compares to Win 7/8. I have a small personal project that involves storage structure/jumplists and as far as I know the format hasn’t changed significantly.

    But if you take a blank 8.1 VM and a blank 10 VM side by side and just click through the default jumplists in the taskbar, the lists appear immediately after MouseUp on 8, but on 10, there’s always that delay. Turning off the animation makes the delay far more obvious. If you actually start using the machine and filling up RAM, the differential gets even worse. These same perf issues persist with bringing up Start too.

    Presumably we’re all just waiting for the next big framework/platform post-UWP to fix everything 😛

    • WeTibo says:

      It’s most certainly true that the Windows 10 UWP jumplist behaves terribly under RAM pressure. I once got an ETW trace of taskbar context menu taking 5369ms to show up. The cause, however, is unrelated to the small disk I/O described in the blog post. The symptom is different, too: once the memory is cleared up and reoccupied by the UWP bloat, the latency drops to normal. But sooner or later, the browser reclaims the RAM and the latency goes back up to ~ 5 seconds, which is horrendous.

      • brucedawson says:

        Note that in my case it is File I/O, not Disk I/O, which is the problem. It can be an important distinction.
        If the system gets low enough on memory that things gets paged out, or even aren’t in the disk cache, then things can definitely get much worse, because then Disk I/O becomes a factor. It’s pretty tough to make a computer uniformly fast in that case. A big block of RAM and an SSD sure helps.

        • WeTibo says:

          Right, I was mistaken, it was File I/O otherwise it would have been much worse.
          I agree it’s difficult to keep the system fast and responsive under memory pressure, but in the case of taskbar context menu, it was the whole Cortana (SearchUI, etc) stuff gets paged out and evicted from cache, which in turn relies on Edge, Chakra, UWP DLLs. It’s like starting the Edge browser anew on a RAM starved system just for showing up the jumplist, despite I’m on LTSC.
          The Shift+Right click traditional context menu it reliable and always fast.

  7. PhistucK says:

    Perhaps the more effective question could be – when will middle clicking on a taskbar button just close the damn thing instead of opening a new instance of the same damn thing(?!)?

    • Anonymous says:

      Again, the 7+ Taskbar Tweaker tool can do that for you.

    • brucedawson says:

      I like that middle-click opens a new instance and I hope they don’t take away that functionality.

      Turning off jump lists (see the end of the post) has made the context menus much faster on my machine, so the problem is solved for me. Also, a coworker pointed out that closing windows from Alt+Tab often works better.

      • PhistucK says:

        Really? What do you use it for?

        I found out about closing through Alt + Tab some time ago, it is indeed helpful.
        One really annoying related thing is when you hover over taskbar buttons and the preview shows up for the open windows of, say, Chrome. You middle click on those (hey, it works! It closes the window!), but it is not like Chrome does when you middle click on tabs (makes sure the next tab shifts to the location of the cursor), I have to wait and sometimes even move the cursor because it shifts to the other side of the screen.

        Disclaimer – I have many, many open windows (and open tabs) and I do not group them.

  8. Pingback: Taskbar Latency and Kernel Calls https://randomascii.wordpress.com/… | Dr. Roy Schestowitz (罗伊)

  9. Alex says:

    If I get it right, it seems to me that you just need to decrease delay on task bar previews in registry editor.
    1. Open the Registry Editor and go to HKEY_CURRENT_USER > Software > Microsoft > Windows > CurrentVersion > Explorer > Advanced.
    2. Right-click on the right pane and click New > DWORD (32-bit) Value. Name the new DWORD “ExtendedUIHoverTime.”
    3. Double-click on ExtendedUIHoverTime to open it. Under “Base,” click Decimal and then enter in the delay time (in milliseconds) in the “Value data” field. Click OK to set the time (default is 400 milliseconds).
    4. Log off and then log back in for the change to take effect.
    In my case it works perfectly well on 10y.o. laptop. The menu appears instantly

  10. Pingback: Links 11/9/2019: Django 3.0 Alpha, Sunsetting Python 2 | Techrights

  11. Pingback: Taskbar Latency and Kernel Calls – CodingNova

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.