Every Windows performance expert should be using xperf traces. My preferred viewer for xperf traces is WPA – Windows Performance Analyzer. However the Windows 8 version of this tool has a few bugs in its display of custom ETW events. The new Windows 8.1 Preview version fixes the most serious of these bugs but introduces some new ones.
These bugs have been fixed. To get the latest version of WPA go to ETW Central.
I’ve reported the bugs and I have high hopes for the 8.1 Final version – then maybe I’ll finally be able to completely stop using xperfview.
WPA has many advantages over xperfview. It has asynchronous symbol loading and improved symbol loading diagnostics. It can display multiple graphs and summary tables in a single window, and it uses tiling and tabbing to display multiple time ranges in that same window. It also highlights in the graphs where the selected data is from, which opens up new views onto the data.
However when it comes to displaying generic events the Windows 8 version of this tool (see the identifying about box to the right) has four defects that xperfview (the old and deprecated trace viewer) does not have, two of which are particularly annoying.
I’m going to refer to the win8_rtm version of WPA as WPA 8.0 and the win8.1 (winmain_bluemp) version of WPA as WPA 8.1 Preview.
Off by one byte
The first bug happens if you have a payload with an AnsiString type followed by any other data. The additional data is read incorrectly by WPA 8.0, with an offset of one byte. This causes integers to be multiplied by 256. In my sample MultiProvider project this can be seen by calling ETWBegin a couple of times and noting that the Depth field of the nested calls is displayed in WPA as 256, instead of 1 – see the screen shot to the right. The payload for that description in the manifest file is shown below – note the AnsiString followed by Int32:
<data inType=”win:AnsiString” name=”Description” />
<data inType=”win:Int32″ name=”Depth” />
A one-byte shift is confusing enough when dealing with integers, but with floats and AnsiString payloads it is devastating – they are effectively destroyed. If there is not an AnsiString payload first then the data is preserved, but that’s an annoying limitation.
This is a regression from xperfview. I sometimes have to fire up xperfview just to see the correct data in some of these custom payloads.
The second bug is that WPA 8.0 sorts numeric data in generic events alphabetically, instead of numerically. You can see that to the right – the data is sorted by the Duration column. If the number of digits in your numbers vary then alphabetic sorting is a wee bit problematic.
I hit this most frequently when displaying frame times. It is pretty common to want to select a range of time and then sort the frame lengths in order to find the slowest frames, find how much the frame times vary, etc.
This is a regression from xperfview. I sometimes use xperfview to sort the data correctly, but more frequently I copy the column of data to the clipboard and then sort it in Excel. Either way is suboptimal because you need to use an extra tool and when you find the slowest frame it is no longer in context.
This is a regression from xperfview which should truncate strings to 4094 characters, but always display them. At least this one is easy to work around.
The fourth bug is that WPA 8.0 does not display control characters in a very helpful way. In some games I like to emit useful status information every few seconds so that when looking at a trace I can see, for instance, the shadow settings, what map is loaded, allocated bytes, etc. This ends up being a lot of data that I sometimes want to view in the graph tooltips, sometimes in the table, and sometimes I want to copy to the clipboard and view elsewhere. Using line-feeds to separate chunks of data helps with readability in the tool tips and when copied to the clipboard, but it breaks displaying of data in the table – the displayed data is truncated when 0xA, 0xB, 0xC or 0xD is reached, as shown above.
This is a regression from xperfview.
New bugs replace the old
I reported the first two bugs (off-by-one-byte and incorrect sorting) a year ago and they are fixed in the recent Windows 8.1 Preview version of the Windows Performance Toolkit. Those two are the most serious of the bugs and I would love to upgrade to 8.1 Preview in order to get the fixes, but (alas) the new version has some new bugs.
WPA starts up with no graphs or tables, which is intimidating for new users. Having a good startup profile makes trace analysis easier and my coworkers and I depend on our company-standard startup profile. When I first started using WPA 8.1 Preview I thought that startup profiles were completely broken because the screen was initially blank. It turns out that what actually happens is you have to click on the analysis tab header before the startup view is displayed. It’s a tiny little bug, but quite confusing. If you save a startup profile from 8.1 Preview then it will load correctly. I have reported this bug.
Graph tooltips mostly don’t work
When exploring a trace I am often looking for slow frames and I like to hover over the diamonds in the generic events graph to see the payload data such as how long a frame took. We also have events for user input, map transitions, and other important events and hovering over those events is the easiest way to orient yourself prior to drilling down to the details. This works nicely in WPA 8.0 but in 8.1 Preview the tooltips only show the payload data if you hover over the rightmost diamond of a series.
This is a regression from previous versions of WPA. I have reported this bug.
GPU usage doesn’t display
ETW can record a lot of graphics related data that tracks how GPU work packets move through the system and are processed by the GPU. I have not found most of this information to be particularly useful, but it is extremely helpful to be able to display when the GPU is busy. Seeing when the GPU is busy lets me see whether a game is GPU bound, it lets me see frame boundaries, and it lets me identify ill-behaved games that alternate between GPU idle and CPU idle. I was able to convince WPA 8.0 to display a GPU usage graph but this has regressed in WPA 8.1 Preview. It still works for traces recorded on Windows 8 and above, but the majority of our customers are on Windows 7 or below so this doesn’t help much.
This is a regression from previous versions of WPA. I have reported this bug.
Getting the 8.1 Preview version
If you don’t depend on displaying GPU usage and hovering over tooltips then the 8.1 Preview version might be a good choice for you. It fixes some serious bugs and it adds some compelling new features. And, if you use the 8.1 Preview version now and find other bugs there is still a chance that you can report them in time to get them fixed for the 8.1 Final version – I think that my testing will be repaid by having the bugs that I care about fixed.
You can get the 8.1 Preview version here:
Here’s a list of what’s new:
I’ll talk more about the new features when 8.1 Final comes out.
ETW/xperf/WPT/WPA continues to be an amazing tool for performance investigations. It lets me see things that are invisible to other developers and lets me solve problems instead of guessing at causes. Microsoft is continuing to improve this tool, and it is free. If you need to investigate performance problems on Windows then you need to use this tool.
- To get started with xperf read Xperf Basics: Recording a Trace (the ultimate easy way)
- To avoid problems with symbols read Xperf Symbol Loading Pitfalls
- For more details, read the neatly categorized series
- It would be nice if bug fixes came out more frequently