I work quickly on my computer and I get frustrated when I am forced to wait on an operation that should be fast. A persistent nuisance on my over-powered home laptop is that closing windows on the taskbar is slow. I right-click on an entry, wait for the menu to appear, and then select “Close window”. The mouse movement should be the slow part of this but instead I find that the delay before the menu appears is the longest component.
Sources on twitter say that a fix is in fast-ring builds as of October 2019, build 18999, but unfortunately this won’t ship to a stable release until early 2020. And in late October Microsoft said “We’ve done some work to improve the launch speed of the taskbar jump lists”, with a thank you from one of the developers.
September 2019: ~700 ms to focus change
April 2020: 250-300 ms (optimizations? or different data?) to focus change
June 2020: 100 ms to focus change!!! The fix has landed in Windows 10 2004.
Unfortunately a deeper investigation shows that it takes at least another 100 ms from when the menu gets focus to when it is painted. So, the latency is a still-visible 200 ms from mouse-release to pixels on the screen. This is the absolute best case on a high-performance laptop, and apparently that’s as good as it’s going to get. Pity.
This has been bothering me for a long time but I had been showing uncharacteristic self control and had resisted being distracted. Until today, when I finally broke down and grabbed an ETW trace.
This post was written as a test of speed-blogging. Total time from finding the issue and sarcastically tweeting about it to publishing the initial post was about 90 minutes.
The ETW trace records me right-clicking on the task bar to close two Explorer windows. I used UIforETW’s Tracing to file with the default options, giving me a 20.9 MB trace of the issue. You can see a video of the analysis below on youtube.
Sometimes the challenge in trace analysis is to find where the issue is, but for this issue that part of the analysis was trivial. There were three clear signals that all pointed to the right place, and a painfully obvious culprit.
The first signal is the input events. UIforETW contains an integrated input logger (anonymized enough so that I don’t accidentally steal passwords or personal information) so I could just drill down to the MouseUp events with a Button Type of 2, which represents the right mouse button. This draws highlights on the WPA timeline for when those events happened – see the vertical lines below:
That tells me when I released the right mouse button, and there is very suggestive window focus change that happens about 600 ms later which I’m guessing is when the menu appeared. Finally there is a clear block of CPU activity in RuntimeBroker.exe between the mouse up and window focus events.
The window focus change and CPU activity are not proven to show the end time, but measurements with a screen recorder app showed the menu taking about 660 ms to appear so I’m going to trust them.
The next step was to see what RuntimeBroker.exe was doing. While CPU Usage (Precise) is great for seeing how much CPU time a process is using, and why it is sitting idle, the CPU Usage (Sampled) table is the right tool for figuring out where CPU time is being spent. I drilled down and quickly found 264 samples hitting in KernelBase.dll!ReadFile:
Looking around a bit I found other call stacks that also ended up in that function so I right-clicked on it and went View Callers-> By Function. The view this gave me (with stacks now inverted) shows that a total of 628 samples, along various call stacks, ended up going through that function, accounting for 70% of the 899 total samples in that process:
271 samples in that thread didn’t go through that function, and the remaining samples (not shown) were in other threads.
Note that the 899 samples in thread 10,252 represent two mouse clicks, so about 450 samples or about 450 ms (using the default 1 kHz sampling rate) per mouse-click.
Sometimes file I/O is CPU time
CPU Usage (Sampled) shows CPU time, so disk I/O usually doesn’t show up there because then the thread goes to sleep and waits on the disk. The fact that this shows up as CPU time suggests that the reads were all hitting in the system cache and the CPU time was the kernel overhead (note ntoskrnl.exe on the first sampled call stack) of grabbing data from the cache.
Now that we suspect file I/O it’s necessary to go to Graph Explorer-> Storage-> File I/O. With a bit of column rearranging we get this impressive result:
What this says is that, over the course of two right-mouse clicks, RuntimeBroker.exe, thread 10,252, issued 229,604 ReadFile calls, reading a total of 15,686,586 bytes. That is an average read of 68 bytes each time.
Think about that for a moment.
Remember that these are calls to the operating system – kernel calls. This means that there is no caching between RuntimeBroker.exe and this file. In fact, the file itself is only 4,027,904 bytes long, so each time I right-click an explorer icon on the task bar this file is read 1.9 times in the slowest possible way.
After some encouragement from readers I decided to delete the file in question and now my explorer jumplist menus are almost instantaneous. I don’t recommend it (you’ll lose your quick access customizations and explorer history) but deleting this file might help you also:
The gains from deleting that file can be seen in this WPA screenshot. The thin vertical lines are right mouse clicks and the bottom row of rectangles represents when the menu started appearing. I deleted the file after the first two clicks and the delay is hugely reduced on the last two clicks:
Alternately “you can use the official doohickey in Settings to both clear (toggle 2x) or disable (toggle 1x) jumplists on Start/Taskbar. Be sure to also disable animations for an extra speed boost.” Thanks Rafael!
This seems like a fixable problem. Once explorer’s taskbar menu latency is reduced it would be great to see the other delays addressed so that all taskbar menus appear instantly instead of with hundreds of ms of delays.
- Don’t call ReadFile to get 68 bytes. Or, at least, don’t do it a hundred thousand times. I get cranky when databases do 4-KiB reads but this is amazing.
- I hope somebody fixes this but until then I’ve got a workaround by deleting the problematic file
- I hope somebody investigates to find out why context menus are slow in general. Hey, I shared an ETW trace and everything!
- Feedback shared through official channels at https://aka.ms/AA60dfg
- If you want to learn how to do this type of analysis then get my tool UIforETW and read some of my tutorials, linked from here
- Note that you can get a context menu (simpler than the default Jumplist menu) much faster by shift+right-clicking on the icons. I’m gonna use this trick when closing lots of windows.
- Hacker news discussion is here
- Reddit discussion is here
- Russian translation is here
- Twitter threads are here and here