Xperf Wait Analysis–Finding Idle Time

Posted on May 5, 2012 by brucedawson

The Windows Performance Toolkit, also known as xperf, is a powerful (and free!) system-wide Windows profiler. In the past I’ve talked about using xperf to identify slowdowns in PowerPoint (tutorial version is here) by using xperf’s built-in sampling profiler, but that actually understates the true value of Xperf. While I think xperf is a better sampling profiler than most of the alternatives (higher frequency, lower overhead, kernel and user mode), xperf is really at its best when it reveals information that other profilers cannot measure at all.

This article was updated September 2015 to reflect WPA 10 and UIforETW

In short, lots of profilers can tell you what your program is doing, but few profilers are excellent at telling you why your program is doing nothing.

More details about this particular investigation can be found in the second ETW Training Video.

Our story so far

Xperf has a high learning curve. Therefore I highly recommend that you start by reading some previous articles from this series. The best place to start is at ETW Central which has a well organized list of resources.

The rest of this post assumes that you have installed UIforETW and familiarized yourself with some of the basics of WPA trace analysis.

Wait Analysis Victories

I’ve had good luck using Wait Analysis to find many performance problems. Some of these delays were short enough to be difficult to notice, yet long enough to matter. Others were debilitating. All were difficult or impossible to analyze through CPU sampling or other ‘normal’ CPU profilers. Some examples include:

Finding the cause of brief startup hangs in Internet Explorer and various games
Profiling Luke Stackwalker to find out why it caused frame rate glitches in the game it was profiling
Finding the cause of a 10x perf-reduction when upgrading to a newer version of Windows
Finding the cause of frame rate hitches during fraps recording
Finding the chain of lock contention that caused frame rate hitches on a heavily loaded system
Finding the cause of (and a workaround for) repeated 2-6 second hangs in Visual Studio’s output window

The last investigation is the one I want to cover today. It is sufficiently simple and self-contained that I can cover it end-to-end in a single (long) post.

Finding the hang

When profiling a transient problem such as a frame-rate glitch or a temporary hang the first challenge is to locate the hang in the trace. A trace might cover 60 seconds, and a hang might last for 2 seconds or less, so knowing where to look is crucial. There are a number of ways to do this:

Find the key stroke that triggered the hang, through logging of input events
Use instrumentation in the functions of interest
Look for patterns in the CPU usage or other data
Look at the Window in Focus graph for clues
Use OS hang-detection events
Remember when the hang happened relative to when you stopped and started tracing

I’ve used all of these techniques. To help with the first option UIforETW has an input event logger which puts all keyboard and mouse input into the trace. If a hang is triggered by a particular key press or mouse click then finding its start point in the trace is trivial.

Custom instrumentation (emitting ETW events at key points in your game, see the ETWEventDemo source) is also a common technique. Emitting an event every frame, or using CETWScope around slow operations, makes hitches plainly visible. However this doesn’t work when investigating performance problems in other people’s code, such as in Visual Studio.

In some cases a hang will be plainly visible in the CPU consumption. One recent hang showed a significant hole in the otherwise consistent CPU usage, plain as day.

The Window in Focus graph tracks the programs you switch between and can given vital temporal clues.

A specific event that indicates the time and duration of a hang would be ideal, and Windows 7 and above actually have such an event. The Microsoft-WindowsWin32k ETW user provider will emit an event whenever a thread resumes pumping messages after a significant delay. Windows Vista and earlier users are out of luck, but on Windows 7+ this is often exactly what is needed, and this provider is enabled by UIforETW.

It’s hands on time

I’ve uploaded a .zip file of a sample trace (2015-09-25_20-56-25 VS F8 short hang.zip) to my github bigfiles repo. If you download this trace, unzip it to UIforETW’s trace directory (default is documents\etwtraces) then you can load the trace into WPA 10 by double-clicking it in UIforETW. Following along is by far the best way to learn wait analysis. You should also download the .txt file with the matching name and copy it to the etwtraces directory. UIforETW will display it when the trace is selected, and you can type your findings into the Trace information area.

Once the trace is loaded be sure to Load Symbols from the Trace menu, unless you have Load symbols automatically after trace load configured.

This trace records a hang where I was looking through VC++ build warnings by repeatedly pressing F8. The hang was about six seconds long and happened after the third time that I pressed F8. These sorts of details can be very helpful when investigating performance problems so I always try to record them in the Trace information area.

Finding the hang start

With the trace loaded into WPA 10 you should see “Generic Events Randomascii Chrome and Multi Events” as the top graph and table. If not then you should use UIforETW’s Settings dialog to Copy startup profiles and then reload the trace. Having the right startup profile will make this process far easier.

In the Generic Events table you can drill down into UIforETW, Multi-Input, Keyboard, Keydown, and you will see 20 keyboard events. The F7 key press built the project and the three F8 presses navigated through the warnings. Since we know that the hang started on the third F8 press (selected) we can tell that the hang started at 6.5 s into the trace, shown by the blue vertical line:

Since I mentioned that the hang was about six seconds long we already have a rough estimate of when it ends, but we can do better than that.

Finding the hang length

By opening the UI Delays graph from System Activity in the Graph Explorer we can get even better information. Whenever a UI thread stops pumping messages for more than 200 ms the Microsoft-Windows-Win32k provider records events and the UI Delays graph displays them. Here we can see that devenv.exe went for 5,469 ms without checking for messages, and for 3,172 ms of that time there was input in the queue that was being ignored.

The table also shows us the process ID (9104) and the UI thread’s thread ID (10364). When investigating an idle-time hang it is crucial to know which process and thread to investigate so this information is vital.

Zooming in to the cause

When analyzing an idle hang you should select the entire region of the hang, and it is particularly important to select the end of the hang. It is better to select a few extra tens or hundreds of milliseconds at the end rather than risk missing the crucial events that end the hang. This selection can be done by dragging with the mouse or by selecting the MsgCheck Delay bar in the UI Delays graph. Then just right-click on the selected region and select Zoom from the context menu.

Who woke whom?

If a thread is not running, and it then starts running, then there was a context switch that started it (the new thread) running. That context switch is recorded in our ETW trace and contains all sorts of useful information. Include in this information is (for the traces recorded with UIforETW) the new process name and thread ID, the call stack which the thread woke up on (which is the same one it went to sleep on), the length of time it was not running and, for threads that were waiting on some synchronization primitive, the thread that woke it up.

Ponder that, because it’s crucial. An ETW trace tells you, for each context switch, how long the thread was not running, and who woke it up. That’s why it is important to have the end of the hang selected, because that is (presumably) the time of the context switch that gets the thread running again.

The CPU Usage (Precise) table contains the context switch information. With UIforETW’s startup profiles this data defaults to being displayed as a graph only. To do wait analysis we need to display the table, by clicking on Display Graph and Table:

We know need to drill down into New Process devenv.exe (9104), New Thread Id 10,364. The CPU Usage column shows us that this thread only consumed 111.7 ms of CPU time over the 5.469 s time range shown, so we immediately know that this is an idle hang. Knowing when to do wait analysis (to find idle delays) and CPU Usage analysis (for busy delays) is crucial. Doing CPU Usage analysis when a thread is idle will lead to nonsense results.

It is important to remember that there is nothing wrong with a thread being idle. Most thread’s should be idle. We are interested in this particular idleness because it is the UI thread and because it failed to pump messages for 5.469 s. This idleness is problematic because it caused a hang.

The devenv UI thread did 47 context switches during this time period and we want to find the one that ended the longest period of idle time, so we need to sort the table by Time Since Last which, as I explained here, represents how long a thread had not been running when it was context-switched in again. The UIforETW startup profile configures two Time Since Last columns, one aggregated by Sum and the other by Max. Aggregating by sum is appropriate when you want to find a call stack whose context switches collectively account for the most time. Aggregating by max is appropriate when you want to find a single context switch that represents the longest idle time. In this case it doesn’t matter which one we use, but I recommend experimenting.

If we drill down (select the first entry in the thread’s call stack and repeatedly press right-arrow) then we’ll find that there is a single context switch (see the Count column) that restarted the UI thread after it was idle for 5.341 s. That’s our hang, clear as day.

The NewThreadStack for this 5.202 s call stack starts at _RtlUserThreadStart and winds through a lot of Visual Studio code. Microsoft is kind enough to publish symbols for much of VS, as well as for Windows and about fifty rows down we get to the interesting details:

It’s a single context switch (‘count’ is 1) that put the Visual Studio UI thread to sleep for 5.202 s. It doesn’t get much clearer than that.

Farther down in the call stack we can see that ntoskrnl.exe calls FLTMGR.SYS which calls mup.sys!MupiCallUncProvider (and the eventual call to WaitForSingleObject). Ponder that for a bit.

If we go down to the bottom of the stack and expand the next three columns then we can see the Readying Process/Thread Id/Stack, which is the thread that woke us, which can also be described as “who we were waiting for”:

The Readying Process/Thread Id/Stack is the thread that “readied” the thread that was waiting. That is, it satisfies the condition that the thread was waiting on. That might be a lock, an event, a thread, a process, or something else. However if a thread is readied by the expiration of a timer then the Readying Process/Thread Id is irrelevant. Timer expiration is implemented by having a interrupt fire, the interrupt service routine schedules a deferred procedure call (DPC) and that DPC executes in the context of an existing thread. The DPC is like a body snatcher that assumes the identity of its host process/thread, readies a bunch of other threads, and then goes away. When doing wait analysis it is a good idea to have the Ready Thread in DPC enabled, and if it says True then ignore the Readying Process/Thread completely, as discussed in this random twitter thread.

In this case it was the System process (thread 5880) in an IopfCompleteRequest call that goes through MUP.SYS. If we know what MUP.SYS is then that gives us another clue as to the root cause, but even without that we know that Visual Studio called CreateFileW and it took a long time to return.

What about the other threads?

In our selected region their are context switch events for many threads in devenv.exe that spent most of the time range idle. So why aren’t we looking at them?

It’s important to understand that Sum:Time Since Last just measures how long a thread was idle, and there is nothing wrong with a thread being idle. A thread being idle is only a problem if it is supposed to be doing something and isn’t. In fact, if devenv.exe has lots of threads then they had better be idle most of the time or else my four-core machine is going to be constantly busy.

The reason we are looking at thread 10,364 is because (according to the MsgCheckDelay event) it is the UI thread and it went for 5.469 s without pumping messages. It kept me waiting, and that makes me angry. You wouldn’t like me when I’m angry.

File I/O summary table

Since we know that the hang is related to file I/O we should look at what file I/O is happening during this time period. Luckily this information is also recorded by UIforETW.

In the WPA Graph Explorer go to the Storage section and open up the File I/O graph and bring up a summary table. If you change the View Preset to Count by Process, Thread, Type and drill down into Process devenv.exe (9104), Thread 10,364 and look for an event with a long duration. If we sort by duration then we can just keep expanding with right-arrow. It doesn’t take long to find the event in question.

The IRP and File Object columns aren’t much use, but you can hide those or scroll over and find the File Path column. I’m sure this screen shot won’t show up very well, but I can’t shorten it any more. It contains too much glorious information. Click on the image for deeper details:

We can see here that a Create file event, from devenv.exe, thread 10,364, took 5.342 s, trying to open \\Perforce\main, and that ultimately the network path was not found.

Wow.

The hang is because Visual Studio tried to open a non-existent network file, and sometimes that causes a 5.342 s network timeout. Hence the hang.

The remainder of the hang is from a few other context switches and CPU time that account for the rest of the 5.469 s, but the one long bit of idle time is all that matters in this case.

What’s the cause?

The file name associated with this hang is quite peculiar. The full name is:

\\Perforce\main\src\Lib\Public\sharedstuff.lib#127 – opened for edit

That doesn’t look like a file name. That looks more like the output from Perforce, only with back slashes instead of the forward slashes that Perforce uses. And that’s exactly what it is. At Valve (where I was working when I hit this problem) build results are stored in Perforce so we have pre-build steps to check these files out. The checkout commands print their results to the Visual Studio output window.

Visual Studio ‘helpfully’ reverses the slashes and decides that this represents a file name on \\perforce\main. Since this whole thing started with me pressing F8 this means that Visual Studio was trying desperately to treat this potential file name as a source-file name associated with an error or warning message.

Oops.

That explains the CResultList::AttemptToNavigate entry on the hang call stack – everything makes more sense once you understand the problem.

Conclusion

Once the cause of the hang was understood I modified our pre-build step to pipe the output through sed.exe and had it rewrite the output so that Visual Studio would no longer find it interesting. This avoids the hang, and also made it so that F8 would take the selection to interesting errors and warnings, instead of to these mundane progress messages. A little sed magic replaces “//” with the empty string, and “…” with “—“ :

sed -e s!//!! -e s!\.\.\.!—!

This changes the hang-prone results before:

to the hang-proof benign text after:

I also reported the bug to the Visual Studio team. Having a trace is very powerful for this because it meant that I could tell them definitively what the problem was, and I could share the trace in order to let them confirm my findings. Just like minidump files are a powerful way to report crash bugs, xperf traces are a powerful way to report performance bugs. The bug was fixed in Visual Studio 2012 – UNC paths are now ignored by the output window’s parser.

Mup.sys is the driver used for network file I/O. Therefore its presence on the New Thread Stack and the Ready Thread stack was a clue that a network delay was the problem. Doing file I/O on the UI thread is always a bit dodgy if you want to avoid hangs, and doing network file I/O is particularly problematic, so watching for mup.sys is a good idea.

Wait chains

Some wait analysis investigations are more complicated than this one. In several investigations I have found that the main thread of a game was idle for a few hundred milliseconds waiting on a semaphore, critical section, or other synchronization object. In that case the readying thread is critical because that is the thread that released the synchronization object. Once I find out who was holding up the main thread I can move the analysis to that thread and apply either busy-thread analysis (CPU sampling) or idle thread analysis (finding what that thread was waiting on). Usually just one or two levels of hunting is needed to find the culprit, but I did recently trace back across six context switches in four different processes in order to track down an OS scheduling problem.

When following wait chains it is important to understand the order of events. If thread 1234 is readied by thread 5678 at time 10.5 s, then any context switches or CPU activity that happen to thread 5678 after that point are not relevant to the wait chain. Since they happened after thread 1234 was woken they can’t be part of its wait chain.

For CPU activity this is dealt with by selecting the region of interest. For context switches this is dealt with by drilling down all the way and then looking at the SwitchInTime (s) column. This column records the time of the context switch.

It’s worth pointing out that if you busy wait (spinning on some global variable flag) or use your own custom synchronization primitives (CSuperFastCriticalSection) then these techniques will not work. The OS synchronization primitives are instrumented with ETW events that allow, in almost all cases, perfect following of wait chains. Even if your custom synchronization code is faster (and it probably isn’t) it isn’t enough faster to make up for the loss of wait analysis. The ability to profile your code trumps any small performance improvement.

Can’t any profiler do this?

Sampling profilers and instrumented profilers might be able to tell you that your program is idle, and they might even be able to tell you where your program is idle, but they generally can’t tell you why your program is idle. Only by following the chain of readying threads and looking at other information can you be sure to find the cause of your idle stalls.

It’s also convenient that you can leave xperf running in continuous-capture mode, where it is constantly recording all system activity to a circular buffer. When you notice a problem you can just record the buffer to disk, and do some post-mortem profiling.

Wait analysis is not baking

Baking is all about precisely following a recipe – improvisation tends to lead to failure. Wait analysis, on the other hand, is all about creativity, thinking outside the box, and understanding the entire system. You have to understand context switches, you have to think about what idle time is good and what is bad, you have to think about when to look at CPU usage and when to look at idle time, and you often have to invent some new type of analysis or summary table ordering in order to identify the root cause. It’s not easy, but if you master this skill then you can solve problems that most developers cannot.

Second opinion

For another discussion of analyzing hangs with xperf see this wait analysis article by Trey Nash. For instructions on doing the same type of analysis on Linux see Brendan Gregg’s post on Off-CPU Analysis.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in AltDevBlogADay, Investigative Reporting, Performance, Programming, xperf and tagged glitches, hangs, idle time, inputprocessdelay, summary tables, visual studio, wait analysis, xperf. Bookmark the permalink.

48 Responses to Xperf Wait Analysis–Finding Idle Time

Pingback: The New Xperf is Here! | Random ASCII
Tom Kirby-Green says:

May 6, 2012 at 1:18 am

Very interesting article, thank you so much for taking the time to write this up! Lots to learn and that can only be good 🙂

Reply
Pingback: The Lost Xperf Documentation–CPU Scheduling | Random ASCII
Pingback: Windows Live Photo Gallery—Poor Performance Peculiarities | Random ASCII
Pingback: In Praise of Idleness | Random ASCII
Aaron says:

June 18, 2012 at 11:31 am

Do you have any idea where Wait Analysis went in the new WPA 6.2 GUI?

Reply
- brucedawson says:
  
  June 18, 2012 at 1:47 pm
  
  Yes, I do know where it went. Under Computation find CPU Usage (Precise) and that drag over to the analysis window. Then set the table to “Display graph and table” (right-hand corner of the table’s title bar) to make the summary table show up. Then use the View Editor to set up the columns correctly. You’re all set.
  
  Oh — one final step. Profiles->Save Startup Profile, so you only have to do this once.
  
  I’ll blog about this one of these days.
  
  WPA is better in most ways than xperfview. There are a couple of annoying bugs, but those should be fixed in the final release, and then I assume I will be able to recommend it with no misgivings.
  
  Reply
  - Aaron says:
    
    June 18, 2012 at 2:00 pm
    
    OK, it looks like this works. It looks like there’s a predefined column corresponding to the old sum column, but its label is indistinguishable from the max Waits column.
    
    Slightly unrelated, but on the topic of finding missing things: Do you know where the Disk I/O Detail graph? Specifically, xperfview graph has disk offset on the vertical access, along with some general descriptive information about the disks and partitions.
    
    Reply
    - brucedawson says:
      
      June 19, 2012 at 8:37 pm
      
      The Disk I/O Detail graph is well hidden, but it is still around. Crack open the Storage section, then right-click on the Disk Usage section, and select Disk Offset Graph.
      
      I guess it’s traditional for that graph to be well hidden.
      
      Reply
Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII
Eric says:

July 1, 2012 at 7:08 pm

> “However this doesn’t work when investigating performance problems in other people’s code, such as in Visual Studio.”

It sounds like you have an equally good method but there is actually an event that does this in VS 2010. WPF has a frame present event documented in ‘\Windows\Microsoft.NET\Framework\v4.0.30319\WPF\wpf-etw.man’ event id 11061 – WClientUceNotifyPresent signifies when WPF presents a frame to the screen.

Reply
- brucedawson says:
  
  July 1, 2012 at 10:42 pm
  
  Interesting. I’ll have to try that. Unfortunately they don’t make it easy to use because that provider is not registered by default. I had to do this to register it:
  
  wevtutil im C:\Windows\Microsoft.NET\Framework\v4.0.30319\WPF\wpf-etw.man
  
  Then it showed up in the list of providers given by “xperf -providers” as “Microsoft-Windows-WPF” so now I should be able to enable it, next time I need that.
  
  Reply
Pingback: Windows Slowdown, Investigated and Identified | Random ASCII
Pingback: The Lost Xperf Documentation–Disk Usage | Random ASCII
Pingback: Xperf for Excess CPU Consumption | Random ASCII
Pingback: Xperf Basics: Recording a Trace (the easy way) | Random ASCII
Pingback: Xperf for Excess CPU Consumption: WPA edition | Random ASCII
Pingback: Defective Heat Sinks Causing Garbage Gaming | Random ASCII
Pingback: New Version of Xperf–Upgrade Now | Random ASCII
Willi Schinmeyer says:

December 8, 2013 at 3:15 pm

Hi Bruce,

this turned out to be just what I needed. Steam recently occasionally hangs for a good minute or so. This is especially annoying since it also pauses any game trying to communicate with Steam.

So being a software engineer I gave profiling a try. I had only briefly looked into xperf after reading about it here so the most time was spent trying to understand the program and experimenting. But in the end it was mostly a matter of looking at the maximum Time Since Last of various threads.

The problem turned out to be a blocking call to WinHttpGetProxyForUrl in the steamclient.dll in one thread that takes 115 seconds. I don’t know why it takes so long*, I just know that about 30 seconds later a couple of other Steam threads start waiting for the blocked one, at which point the games stop reacting as well.

I should probably experiment with WinHttpGetProxyForUrl and see if I can reproduce the problem…

Do you think that’s something that should be fixed in Steam? I suppose so, it may be a rare thing to happen (and I’m still curious why it does) but such a blocking call should probably not be done in a time critical thread anyway, right? In case you want to fix it, do you need the trace? It’s pretty heavy at 1.7GB and I suppose the function name should narrow it down by a lot so probably not.

Thanks for the articles and have a nice day.

Willi Schinmeyer

* The MSDN has this to say: “The auto-detection process can be slow, possibly as long as several seconds.” That sounds more like “up to 10s or so” to me.

Reply
- brucedawson says:
  
  December 8, 2013 at 8:14 pm
  
  It seems terribly appropriate that you would turn xperf against Steam. Isn’t it wonderful that it can be used to profile *any* program, whether you wrote it or not.
  
  Proxy resolution is a tricky problem, I remember that much from my days working on Internet Explorer, but I don’t remember much more than that. However I think that if proxy resolution takes that long it is hard to avoid hangs. Asynchronous programming is all well and good, but at some point somebody *needs* the answer.
  
  I’ll e-mail you and get some traces — we have a system for getting traces when needed, with Valve’s instrumentation and my preferred providers.
  
  Reply
  - brucedawson says:
    
    December 11, 2013 at 10:41 pm
    
    Thanks for the trace. I confirmed your analysis. It’s annoying that WinHttpGetProxyForUrl sometimes takes so long to return, but we’ll fix the code to ensure it doesn’t cause hangs in games launched from Steam.
    
    Good job.
    
    Reply
tama says:

December 17, 2013 at 1:01 pm

Number 6 is ‘ReadyThreadStack’ not ‘ReadyingThreadStack’, right?

Unfortunately the link to the other wait analysis article at windowsitpro doesn’t work for me.

Google brought up this one: http://devproconnections.com/development/use-xperfs-wait-analysis-application-performance-troubleshooting

Reply
- brucedawson says:
  
  December 17, 2013 at 5:10 pm
  
  You are correct. “ReadyingThreadStack” would be more consistent with the other column names, but “ReadyThreadStack” is what it is actually called.
  
  I’ll update the article link — I think you found the right one, although I can’t be certain.
  
  Reply
tama says:

December 20, 2013 at 8:19 am

You can’t use Microsoft-WindowsWin32k with xbootmgr, can you?

Reply
- brucedawson says:
  
  December 20, 2013 at 9:53 am
  
  I don’t know — I haven’t actually tried xbootmgr.
  
  Reply
Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII
Pingback: You Got Your Web Browser in my Compiler! | Random ASCII
Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII
Alexander Riccio says:

April 23, 2014 at 10:53 pm

Hey there Bruce, somebody did look closely at the data – smart choice; and thanks but I neither know someone living in Kirkland, nor do I drink. So buy one for yourself! You deserve it for keeping up the “dying art” of easter eggs.

Reply
ralphcorderoy says:

August 10, 2014 at 1:53 am

Educational article, thanks. Minor inconsistency: “My system has eight hardware threads (four hyperthreaded cores)” v. “or else my six-core machine is going to be constantly busy”.

Reply
- brucedawson says:
  
  August 11, 2014 at 9:28 am
  
  I have a four-core machine at home and a six-core machine at work and I lost track of which one I was talking about. Thanks. Fixed.
  
  Reply
Pingback: UIforETW – Windows Performance Made Easier | Random ASCII
Pingback: The Lost Xperf Documentation–CPU Usage (Precise) | Random ASCII
Trass3r says:

September 23, 2015 at 2:37 am

Nice article. Could you also post one about finding the cause of deadlocks? In that case there is no readying thread, the app just hangs forever.

Reply
- brucedawson says:
  
  September 23, 2015 at 8:09 am
  
  Deadlocks are a pain. Your best bet there is to capture a crashdump, using procdump.exe from sysinternals, or task manager.
  
  Reply
Pingback: ETW Central | Random ASCII
Andrei says:

March 6, 2018 at 4:41 am

It all looks nice when you explain this, but it looks like you need lots of inside knowledge. For example, how did you know that Microsoft-Windows-Win32k provider records “UI Delays” events and what they actually mean. I had a look through some providers and for me it looks like a full time job digging through all those events generated by for example, Generic Events section of System Activity tab.

Reply
- brucedawson says:
  
  March 6, 2018 at 10:54 pm
  
  You are correct that learning this is extremely difficult due to the lack of comprehensive documentation. I have documented a lot of this but far from all. The lack of documentation of what different providers are for, and what stackwalk events are associated with providers, is a particular pet peeve. You can see the documentation that I have created here:
  
  https://randomascii.wordpress.com/category/documentation-2/
  
  Maybe someday Microsoft will explain the rest. The good news is that UIforETW provides good defaults and my blog explains most of the key concepts, and those concepts then prepare you for some explorations beyond the norm.
  
  Reply
  - Andrei says:
    
    March 7, 2018 at 1:17 am
    
    Thanks for the fast response, Bruce. I also think WPA/WPR are awesome tools, and I started to use them more and more, especially since Microsoft introduced mandatory performance tests. However, it happens many times that I get a little progress and then I get stuck and don’t know how to proceed. Is there a forum or site or discussion list where I can upload my Traces and I could get some pointers on what to do next?
    
    Reply
    - brucedawson says:
      
      March 7, 2018 at 8:22 am
      
      I don’t know of such a forum. A support group would be interesting. If you find or create one, let me know, and maybe mention it on the tinyurl.com/etwcentral post
      
      Reply
Jānis Veinbergs says:

April 5, 2020 at 11:57 pm

> Here we can see that devenv.exe went for 5,469 ms without checking for messages, and for 3,172 ms of that time there was input in the queue that was being ignored.

Isn’t that meant to be seconds?

Reply
- brucedawson says:
  
  April 6, 2020 at 9:01 am
  
  I suspect that this is internationalization confusion. I’m using the North American comma separator to indicate thousands of ms. 3,172 ms equals 3.172 s (a bit more than three seconds). But in most of Europe this would be written as 3.172 ms equals 3,172 s (a bit more than three seconds).
  
  I’m not sure how to avoid this confusion when writing. If there is a way then let me know. I guess I could reduce confusion by saying 3.17 s (only two digits after the period to reduce ambiguity) but I prefer to use the units that WPA displays, which is often milliseconds or microseconds.
  
  Reply
  - Jānis Veinbergs says:
    
    April 6, 2020 at 9:28 am
    
    Alright, it’s silly me. Yea, dot in the screenshot and comma in the text probably confused me 🙂 No worries, I should have known better.
    
    But an idea: not having any separator would be unambiguous for sure. Works for small numbers like thousands, but of course fails when talking about millions.
    
    Reply
  - Willi Schinmeyer says:
    
    April 6, 2020 at 9:36 am
    
    Going by https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping
    
    3’172ms
    3 172ms
    3 172ms
    3_172ms
    
    Reply
    - brucedawson says:
      
      April 8, 2020 at 11:24 am
      
      Thank you for the suggestions and the reference. Using a space (or half-space?) separator seems like the best option. I fear it will slightly reduce readability for those who are used to commas, but it is probably the correct tradeoff.
      
      Reply
Dai Rees says:

September 28, 2023 at 12:56 am

In your post you wrote:

> We know need to drill down into New Process devenv.exe (9104), New Thread Id 10,364. The CPU Usage column shows us that this thread only consumed 111.7 ms of CPU time over the 5.469 s time range shown, so we immediately know that this is an idle hang.

I have the 2015-09-25 .etl file open in WPA 11 and I see the CPUI Usage column for thread 10364 in process 9104, but it says 763.79ms, not 111.7ms; in fact, I don’t see any numbers similar to 111ms anywhere in WPA.

Also, when loading your WPA presets from UIForETW I get two errors in the Diagnostic Console window:

“[Microsoft.Performance.SDK.Runtime.Discovery.AssemblyExtensionDiscovery]: Information – Loaded some in 2301ms
Error Loading C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources:
Code: DISCOVERY_FAILED
Message: Extension discovery could not complete due to one or more errors.
DirectoryPaths:
C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources
IncludeSubDirectories: True
SearchPatterns:
*.dll
*.exe
ExclusionFileNames:
ExclusionsAreCaseSensitive: False
Details:
* Code: ASSEMBLY_LOAD_FAILED
Message: The assembly failed to load.
Target: C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources
Details:
* Code: SDK_VERSION_INCOMPATIBLE
Message: The assembly references a version of the SDK that is not compatible with the host’s version of the SDK.
Target: C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources\XPerf\XPerfDataExtensions.dll
ReferencedSdkVersion: 0.108.0
HostedSdkVersion: 1.0.16
Associating `.ETL` with `XPerf – 4e06112c-31c0-4631-b686-a5fefeee27e3 (C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources\XPerf\XPerfCustomDataSource.dll)`
Registering tables for `XPerf – 4e06112c-31c0-4631-b686-a5fefeee27e3 (C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources\XPerf\XPerfCustomDataSource.dll)`”

and

“Error: Could not GetFileStream for C:\Users\David\AppData\Local\Windows Performance Analyzer\MyPresets.wpaPresets. Exception: System.IO.FileNotFoundException: Could not find file ‘C:\Users\David\AppData\Local\Windows Performance Analyzer\MyPresets.wpaPresets’.
File name: ‘C:\Users\David\AppData\Local\Windows Performance Analyzer\MyPresets.wpaPresets’
at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle)
at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at Microsoft.Performance.Shell.WpaProfileSerializer.GetFileStream(String profilePath, FileMode fileMode, FileAccess fileAccess, FileShare fileShare, Boolean showErrorDialog, Int32 attempts, Int32 msTimeout)”

I’ve uploaded a screenshot to Imgur – I’ve circled the area where I thought the 111.7ms value would be, but I only see 763ms: https://imgur.com/a/WUwTjft

Reply
- brucedawson says:
  
  September 29, 2023 at 1:52 pm
  
  It’s simple enough. You are looking in the right spot, but you haven’t zoomed in to the right time range. You need to zoom in to the 5.469 s duration MsgCheck Delay, discussed earlier. You haven’t zoomed in at all.
  
  When you zoom in all of the numbers being displayed are updated to just cover the visible range.
  
  Reply