For many years xperfview.exe has been the main tool for analyzing xperf/ETW traces. However starting in fall 2011 the Windows Performance Toolkit started including wpa.exe as an alternative. While the early versions had some significant rough edges, the latest version (10.0.10240.16384, released in tandem with Windows 10) is now superior to xperfview in basically all ways.
In this post I’ll briefly explain how to switch from using xperfview to WPA, and why this is worthwhile.
Updated October 2013 and September 2015.
Before proceeding you should be sure to get the latest version of WPA. The simplest way to do that is by getting UIforETW. There are details at ETW Central, and this should make recording a sample trace easy.
It may be useful to understand xperf analysis basics, xperf for excess CPU consumption (WPA edition), and xperf wait analysis, but much of the UI flow is quite different with WPA, so some of the details will be different.
The two articles in the xperf documentation series, CPU sampling and CPU scheduling, are still relevant – understanding the column names is as important as ever. Those articles are timeless classics. The UI for selecting columns has changed, but that is easy enough to adjust to.
When you first launch WPA it may be dauntingly austere and blank. All of the graphs are collapsed and hidden:
If you see this then you’re not using UIforETW and you’ll have to drag graphs on by yourself. UIforETW copies over a startup profile which includes all of the graphs below. Whenever you install a new version of UIforETW you should probably use the Settings dialog to copy over a new startup profile, in case it has been improved.
App-specific generic events can be crucial for navigating a trace. I use them to identify frame boundaries, user input, and other key events. With them I can see when the frame-rate drops, and I can see what events the drop correlates to. UIforETW inserts lots of custom events for input, timer frequency, and more, and Generic Events is where they will be found.
The display of generic events in xperfview, the old trace viewing tool, left much to be desired:
If I looked in the ProviderIds drop-down then I could see that these blue diamonds corresponded to my Valve-FrameRate provider, but that provider emits events for simulation ticks as well as render ticks, and the distinction is invisible in xperfview.
The display of generic events in WPA is much improved. The events are displayed hierarchically so that I can drill down and find all my events. The layout is configurable (just show the table and rearrange columns) so you can group in different ways:
By default the UIforETW startup profile filters out all unknown generic events. If you want to see all of the events (there are many from Windows) then you can use the View Editor to change the filter or else use a different view preset.
The Generic Events graph can be found in Graph Explorer under System Activity.
Window in Focus
This graph shows what process’ window is active. This can be helpful to choose where to analyze. Most games lower their frame rate when they are not active and investigating a poor frame rate when you’re intentionally going idle is pointless.
In the graph shown below the user switched from explorer to WLXPhotoGallery, and then DWM owned the active window for a while.
Aside: DWM becomes active whenever Windows detects that a program has hung – DWM takes over window management so that the user can still move the hung window around.
The Windows in Focus graph can be found in the System Activity section.
CPU Usage (Precise)
CPU Usage (Precise) is the graph formerly known as CPU Usage (there were three variants for grouping by thread/process/CPU). This graph is constructed from context switch data (and interrupt and DPC data) which means it is a sub-microsecond accurate measurement of what thread is running when and why, as well as exactly how much CPU time each thread and process is consuming. That seems pretty important so you probably want this one available. In Graph Explorer under Computation you can find CPU Usage (Precise). UIforETW defaults to putting this graph in a form that is optimized for doing wait analysis – display the table if you want to do this.
The “Timeline by Process, Thread” view, available from the View Presets dropdown, also looks interesting, and makes some patterns more obvious. In particular, if two threads or processes are ping-ponging (taking turns executing) this behavior is much easier to follow on the timeline graph. However the timeline graph devolves into solid bars of color when zoomed out – it really only works when examining fine details.
The screen shot above shows the utilization and timeline graphs over a 20 ms time period.
CPU Usage (Sampled)
CPU Usage (Sampled) is the graph formerly known as CPU Sampling (there were three variants for grouping by thread/process/CPU). This data is constructed using data from the profile provider, which periodically interrupts all CPUs to see what they are doing, with a default rate of 1 KHz.
In UIforETW if you click Fast sampling then the sampling frequency is raised to 8 KHz
Because this data is sampled it cannot tell you what is going on between samples. However with enough samples and with well behaved code the sampling data can be extremely useful. In particular, if you have call stack collection enabled for the profile provider then each sample includes a call stack.
In general I would say that the main purpose of the sampled CPU data is looking at the call stacks, because this tells you (subject to sampling artifacts) what your code is actually doing. Therefore it seems passing strange that WPA defaults to not showing you call stacks.
Since most programs are CPU bound at least part of the time I think this data should always be enabled, which is why UIforETW’s startup profile defaults to showing this data, in table form (the CPU Usage (Precise) data is better for graphing). The UIforETW startup profile also enables stacks. If you want to see the data grouped by module and function then change the View Preset using the drop-down – look for presets that start with randomascii.
If you look in Graph Explorer in the Computation section you’ll find CPU Usage (Sampled section).
If you drag this graph over you now have two graphs showing CPU usage – one precise and one sampled. That’s a waste of space. The precise CPU usage is, well, more precise so we want its graph, and all we really want from the sampled data is call stacks, and those are on the summary table. That means that we now make our first use of the buttons on the right side of the graph’s title bar. The three left buttons let you control whether to display “graph and table”, “graph only”, or “table only”. The default is graph only, but for CPU Usage (Sampled) I think we want table only, as shown above. This gives us a table of precise CPU usage, and a graph of sample CPU usage. Perfect.
If you want to customize the columns for any table click on the View Editor (Ctrl+E) button that is circled in red in the screen shot below:
When you drill in to the sampled data call stacks the graphs are highlighted to show where the samples came from, which can be invaluable data once you learn how to interpret it.
At this point you should have something beautiful like this:
What about wait analysis?
Wait analysis is crucial for solving some types of performance problems and the table necessary for doing this type of analysis is attached to CPU Usage (Precise), since wait analysis uses context switch data. The first step is to enable the graph by clicking “Display graph and table”. If you are using the UIforETW startup profile then the columns will be arranged appropriately for wait analysis, as described here.
I normally have the CPU usage (precise) summary table turned off and I make it visible when I need it. However I always have my favorite column orderings set up, ready and waiting for me.
Profiles-> Save Startup Profile
Once you get everything perfectly arranged, don’t forget to save your startup profile. If you create a profile that is well suited to the type of analysis that your company does then you can share it with your coworkers. WPA 8.1 stores its startup profile in an XML file in “Documents\WPA Files\Startup.wpaProfile”. WPA 10 stores its startup profile in %”localappdata%\Windows Performance Analyzer\Startup.wpaProfile”.
As with xperfview you can zoom in/out with ctrl+mouse-wheel. You can also select a region and then zoom to that region, either in the main window (right-click, Zoom), in a new tab (right-click, Zoom all in new view), or just the current graph in a new tab (right-click, Zoom graph in new view). The workflow is a bit different from xperfview but ultimately the ability to do deep analysis in a single window makes it worthwhile.
In addition to mouse and mouse-wheel based zooming you can use ctrl+”+” and ctrl+”-“ for keyboard based zooming. They don’t zoom around the mouse, but at least I can finally zoom in/out a bit when I don’t have a mouse-wheel available.
With WPA 10 the navigation (zooming and panning) comes with an undo feature, which is quite wonderful.
WPA continues the fine xperfview tradition of hiding some of the disk analysis graphs. In particular the cool chart showing disk-head movements is now found by opening up the Storage section, right-clicking on Disk Usage, and selecting Disk Offset Graph. There’s something amazing and terrifying about seeing how far the physical read/write head on your poor disk sometimes ends up moving.
WPA has improved the trace analysis experience in several ways. Some of these include:
- Asynchronous symbol loading – keep working during the (sometimes slow) symbol loading process
- Easy saving and sharing of startup profiles
- Deep analysis in one window
- Better display of Generic events
- Ctrl+”+” and ctrl+”-“ for keyboard based zooming
- Highlighting of relevant graph sections – makes viewing profile sample locations easier (more on this later)
I keep finding that ETW/xperf/WPA give me access to information that most other developers don’t have. This lets me find and fix performance problems that are otherwise invisible or intractable. I continue to enjoy having x-ray glasses that actually work.