A new version of Windows means a new version of the Windows Performance Toolkit (WPT), the ship vehicle for xperf, WPA and other Event Tracing for Windows (ETW) tools.
I’m a huge fan of xperf/ETW (just look at some of the performance investigations I’ve done with it) and the new version offers enough improvements that I’m switching to WPT 10 immediately, but keeping WPT 8.1 installed, with UIforETW automatically selecting between them. You should too.
Here are some thoughts on why to upgrade and how to do a successful upgrade.
It Gets Better
The main improvements in WPT 10 are improved Windows 10 support and some new features in Windows Performance Analyzer (WPA). If you’re recording traces on Windows 10 you should definitely be using WPT 10, but even if you’re not the new WPA features are worthwhile. Here are some of the WPA improvements.
Automatic symbol loading
Sometimes it’s the little things that matter and in this case I’m very pleased to see this check box (open the Trace menu, then Configure Symbol Paths). Trace loading and symbol loading can both take a while and I appreciate that I no longer have to manually initiate the second of these steps. WPA may be somewhat less responsive while symbols are being loaded but it continues to be usable.
The WPA feature that I am most excited about is zoom undo. I’ve been requesting this feature for years and the wait has been worth it. Zoom undo lets you zoom in to a potentially interesting area without having to open a new view – you always have the option to undo back to where you were. This makes quick, lightweight excursions trivial, as they should be. Adding zoom undo is a recognition that when exploring a trace your position in the timeline is a critical piece of state, and therefore undo is needed to prevent loss of state.
The undo feature is implemented well. It covers both zooms and pans (although the menu always says Undo Zoom) and it animates between the old and new positions, which gives vital visual feedback as to what is happening. The only imperfection I could find is that when you duplicate a view the undo history is not copied. WPA should take a cue from web browsers which copy the undo/redo (back/forward) stack when tabs are duplicated.
Graphing custom data
WPT 10 adds the ability to graph custom data that is emitted into ETW traces. This can be used with the extra data that UIforETW emits, or can be used to graph your own custom data. This feature is interesting enough that I blogged about it separately, and there is also a video.
Windows 10 support
Windows 10 GPU-utilization data (recorded if you check GPU tracing in UIforETW) can only be displayed by the WPT 10 version of WPA. And, high-frequency CPU sampling (enabled if you check Fast sampling in UIforETW) can only be enabled by the WPT 10 version of xperf.
Using WPT 10
You can install WPT 10 as part of the Windows 10 SDK. Or, just download the latest release of UIforETW. If you download and unzip etwpackage.zip then when you run UIforETW the first time it will install WPT 10. If you are running Windows 7 then it will also install WPT 8.1 – see below for why.
You can use lots of different tools to record ETW traces but I recommend using UIforETW (an open source tool I released in April, 2015). It offers helpful features such as trace management, trace compression, easily configurable recording options, and it can integrate extra data into traces (input events, battery, working sets, CPU temperature, power draw, and more) which can then be graphed with WPT 10 (see above). UIforETW also works around bugs in WPT and with WPT 10 there are a couple more bugs to work around.
Note: if you installed an earlier version of WPT 10 then UIforETW will not detect this and upgrade you. In that case just navigate to the third_party\wpt10 directory and run the installer.
WPT 10 bugs
- WPT 10 is broken for recording traces on Windows 7. The dbgid information does not get recorded which means that symbols cannot be loaded. UIforETW works around this by automatically using WPT 8.1 to record traces on Windows 7 – it requires that WPT 8.1 be installed. These traces can then be analyzed with WPA 10.
- The option to restrict symbol loading by process or image is broken – attempting to add restrictions will crash WPA immediately. Oops. UIforETW can’t work around this one.
- The Window in Focus graph has always been flaky (it frequently doesn’t show up in short traces) but with WPT 10 it seems to be even flakier – it is missing from most traces. I’ve heard rumors of off-by-one errors in parsing the event stream but I really don’t know. This should be easy for the WPT team to fix because there is no shortage of traces that show this information in WPA 8.1 but not in WPA 10. I think I’ve also seen traces that show this graph in WPA 10 but not in WPA 8.1.
- WPA defaults to using Microsoft’s symbol server over http, which makes any bugs in symbol parsing vulnerable to man-in-the-middle exploitation. If you haven’t set _NT_SYMBOL_PATH yourself then UIforETW will request symbols over an https symbol server, for Microsoft and Chrome symbols. Okay, it may be hyperbole to call http symbol servers a bug, and this isn’t new to WPT 10, but I still strongly recommend https.
- If you have multiple WPA windows open and you close them all simultaneously (by selecting Close all windows from the task bar) then you you may get an error message because they all try to save their settings simultaneously. Visual Studio 2013 has a variation on the same bug.
- WPA 10 has problems displaying ETWMark data – in some or all cases the text shows up as the number zero. Oops. If you really need to see the data you can load the trace into WPA 8.1, installing it from etwpackage.zip’s third_party directory if necessary.
WPT 10 performance
WPT 10 feels like it is slower to load traces – hanging for several seconds after the trace was loaded but before displaying anything – so I recorded a profile of the two. On my one test trace (yes, bad science, but oh well) I found that WPA 8.1 took 12.4 seconds to load and display the trace, while WPA 10 took 14.3. WPA 8.1 used 18.9 seconds of CPU time while WPA 10 took 26.9. So, WPA 10 is somewhat slower.
Both versions have an anomalous half-second pause during startup where they do nothing. This appears to be caused by some bad input handling – the main thread waits for 2.1/1.5 seconds in msctf.dll!CThreadInputMgr::GetMessageW.
Analysis of WPA would be easier if it had slightly fewer threads. In my test the two copies of WPA both created about 450 threads, with most of these threads living for less than a millisecond. Very strange.
I think that WPA 10 is an improvement over WPA 8.1 so the recommendation is clear: everybody should install WPT 10, Windows 7 users should also install WPT 8.1, and when opening traces from UIforETW type Ctrl+E to invoke WPA 10.
For instruction on how to do ETW trace analysis take a look at the series of training videos I created.