Xperf Basics: Recording a Trace (the easy way)

Some time ago I wrote a long and detailed post about how to record traces using xperf. The steps needed to record a trace were daunting. However more recent versions of the Windows Performance Toolkit (WPT, the proper name for the xperf suite of tools) have made it a lot easier.

The new method for recording traces should handle most scenarios, and it is enough simpler that this post is far shorter than the previous one. In this post I describe the steps needed to record xperf/ETW traces so you can start using this impressive (and free) whole system profiling tool.

Step 1 – get the latest version of xperf

A web search on “windows performance toolkit installer” finds a lot of discussions on how to install various old versions of WPT/xperf. Even the Microsoft download pages for old versions of WPT have not been updated to acknowledge the existence of newer versions. It’s easy to accidentally install an obsolete version.

As of today (March, 2015) what you need to do is install the Windows Software Development Kit for Windows 8.1 RTM. The installer, available here, will let you install whatever components you want. In addition to Windows Performance Toolkit I also recommend installing Application Verifier and Debugging Tools:

image

The redistributable installers for these three components also get installed – you can find them (on 64-bit machines) in C:\Program Files (x86)\Windows Kits\8.1. Apparently the confusingly named “WPTx64-x86_en-us.msi” is a 32-bit installer that installs the 64-bit version of WPT. Wacky. The redistributables can be handy for sharing WPT with your coworkers, or even customers.

imageI’m sure that new versions will continue to be released, probably in the Windows SDK, so check for newer SDKs. You can check what version you have installed by running WPA and looking at the about box. Here is the about box for the 8.0 version of WPA, which is no longer the latest.

Step 2 – get the sample providers and configuration files

Update: stop reading now and go to the UIforETW announcement. UIforETW is a UI for recording ETW traces and it handles steps 2, 3, 4, 5, and 6, with much less hassle than using wprui. Plus I think it’s easier to learn and configure, it’s open source, and it has some handy features that wprui lacks. So please stop reading here.

You can skip this step if you want – it’s only needed for recording traces with custom ETW providers. But if you use xperf significantly you’ll definitely want to do this – getting custom events into ETW traces makes them far easier to analyze. However you don’t need this for your first trace.

You can download my sample user-mode ETW providers from ftp://ftp.cygnus-software.com/pub/MultiProvider.zip. This .zip file has been updated since the original post. Configuration files for wprui and wpa were added, and a –inputlogger option was added that turns the executable into a very handy key logger – all mouse and keyboard events are emitted as ETW events which can be recorded using the steps in this article. This can be very helpful in trace investigations, but don’t use this illicitly.

When you unzip the file you’ll find a Visual Studio 2010 solution file. Build either configuration. You might want to poke around and look at the ReadMe.txt file and the provider manifest file (etwprovider.man). If you want to use these providers then after you build the project you’ll have to run etwregister.bat from an elevated command prompt. For more details (some of them obsolete) look here.

Step 3 – enable stack collection

Apparently wprui can set the DisablePagingExecutive registry key for you so you don’t need to do this step. Just click OK if it says you need to.

If you are running 64-bit Windows (you are aren’t you?) then there is a registry key that you need to set. And then you need to reboot. The registry key tells Windows to keep information needed for stack walking in non-pageable memory. If you run the command below from an elevated command prompt (yes, it is all one line that is excessively word-wrapped) and then reboot then your call stacks will thank you. Setting this registry key wastes a little bit of memory, but I don’t think it’s enough to matter so I always leave it set.

REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f

Step 4 – configure wprui

Recording traces is now much easier than it used to be because of wprui. This tool – ask for it by name in the start menu search box – is an actual UI for recording traces. The default settings for wprui are quite good, but for best results you should enable some custom user providers. This lets you see events like frame start times, keyboard input, or whatever custom events you want to put in your traces.

To add custom user providers you should run wprui and then click on the Add Profiles… button. This lets you configure arbitrary data (from system, user/event, or heap providers) to be recorded in the trace. The profile definition format is rather lightly documented, with only a couple of samples (look in the wprui install directory for example .wprp files), but with a bit of hacking and experimentation and some help from some friends at Microsoft I created the XML configuration file that I needed. It is in the .zip file referenced above and is called MultiProviderProfile.wprp. When you add this profile to wprui (click Add Profiles… and select the file) and enable it then the four custom providers used by the MultiProvider project are enabled in wprui. For details on viewing the recorded events in wpa see step 7. With MultiProviderProfile.wprp loaded and enabled wprui should look something like this:

image

Step 5 – start tracing

When you click Start in wprui then tracing begins. By default the trace data is recorded to circular buffers in memory, which is usually ideal. In this mode xperf/wprui is constantly monitoring your system and it can be left running like this 24/7. The length of time recorded will depend on how busy your system is and how much memory is devoted to buffering.

Recording to a file is potentially useful – it lets you record arbitrarily long sessions – but this is also its downside. It’s easy to accidentally leave tracing enabled, and then your drive fills up with an ETL file so large you can’t ever load it. You can also do Light profiling instead of the default Verbose profiling. Generally this means that stacks are not recorded, so the overhead is lower, but it’s also harder to do analysis.

Step 6 – record a trace

For initial experimentation you might want to run the MultiProvider sample from the .zip file listed above while recording the trace. This will emit custom events for wprui to record, if you have configured it to do that, and this will give you more interesting data to look at in the trace.

If you’re recording to memory then you can save those circular buffers to disk whenever you want. If you hit a performance problem – in any program – just click on Save (the Start button changes to Save while recording) to save the buffers to disk. Retroactive profiling is truly awesome.

You can also use the system-global Ctrl+Win+C shortcut to trigger saving of the buffers. I recommend using this, especially when recording traces of full-screen games, since it lets you trigger the trace without the delay of alt+tabbing to wprui. Either way, after you ask to have the trace saved you’ll be taken to the window shown below. Take your time and write up a nice verbose description of what was happening when you recorded the trace. The trace is saved to disk in the background while you’re typing, and your comment will be added as a Mark (visible in the Marks table in the System Activity section) once you hit Save. This is your chance to annotate the trace with a description of what was happening.

image

Step 7 – detailed analysis

Once you’ve saved the trace you can navigate to where it was saved and open it in wpa, which is now the recommended trace viewer. Details of how to analyze it are beyond the scope of this post, but there are a few points worth mentioning. The default view in wpa is quite austere (completely blank!), but you can configure that. If you copy the Startup.wpaProfile included in MultiProvider.zip to the WPA Files folder in your documents directory then WPA will start up with (in my opinion) more useful defaults:

  • The top graph will show the Multi Provider Generic Events which lets you see frame rate events or whatever else your program is emitting. I configured it to filter out any providers that don’t contain “Multi-“ in order to reduce unwanted clutter. You can display all events by using the View Preset dropdown to change views, and you can display the data as a table to see event details more easily
  • The next graph shows Window in Focus information, which is another useful way of orienting yourself in a trace – note that this graph will sometimes be blank due to bugs in this provider
  • The next graph shows CPU Usage (Precise) data which is an extremely accurate measure of CPU consumption, derived from context switch records. I configured the columns so that if you display the table associated with this graph you can do idle-time analysis
  • The final graph shows CPU Usage (Sampled) data. I configured it to display call stacks that are grouped by Process and Thread ID. This can be used for CPU busy analysis. You can use the View Preset dropdown to display data grouped by the Module and Function based preset that I created

You don’t have to use this startup file, but I strongly recommend configuring some sort of defaults to make trace analysis initially easier. You can save your customizations with Profiles-> Save Startup Profile.

For more information on using WPA, including a list of known bugs, see the article I wrote last year titled WPA-Xperf Trace Analysis Reimagined.

Improvements I’d like to see

I have to confess that I don’t use wprui. I wrote my own trace recording UI which is tuned to my own needs and wprui is not good enough to tempt me to switch. Some of the features that wprui needs to acquire before I’ll consider switching include:

  • an (optional) input monitor to insert input events into traces – key loggers are very helpful. This could always be done as an external program
  • a way to manage (rename, annotate, view) the traces it has recorded
  • transparent compression and decompression of traces
  • one-click changing of the CPU sampling frequency
  • imagea way to adjust the buffer size, since on large-memory machines wprui’s buffers are excessively huge (2 GB? Really?)
  • automatic installation of the symbol loading DLLs – preferably the versions that don’t take hours to load symbols
  • Trace file names with the year before the date so that they sort sensibly

Adding these features, and perhaps a few others, would get rid of the existing clunkiness which makes wprui slightly frustrating to use.

wprui does have some great advantages however. It supports symbols for managed code and JavaScript, and it’s a huge improvement over using xperf from the command-line to record traces. Recommended.

Miscellany

If you search for documentation on wprui you won’t find much. You need to search for wpr instead as this covers documentation for wprui as well as the command line wpr tool. I’m not a fan of the documentation – too much clicking around for too little content – but you can find it here. It looks like WPT was supposed to ship with a documentation file – wpt.chm – but it is missing in action. Oops.

Michael Milirud’s Build 2011 talks, available here, show some of the features of wprui and wpa.

Microsoft recommends not checking too many of the wprui check boxes, in order to avoid increasing the data rate too much. However they also don’t give details about what each checkbox controls and I don’t want to experiment a lot so I typically check the top three boxes, plus the Multi Provider one. Your mileage may vary.

You can use xperf -loggers for debugging configuration files. It dumps detailed information about all enabled providers and you can search through it to see if the providers you requested got enabled. If it says no loggers enabled then you’re probably running it from a non-administrator command prompt.

You can use “wpr -profiles” and “wpr -profiledetails” to see what the built-in profiles contain.

You can find some documentation about profile definitions (.wprp files) here.

Discoverability

I ranted last week about the importance of discoverability – there’s not much point having keyboard shortcuts if your users can’t find them. The problem then was that the options for zooming in wpa were needlessly hidden. I think wprui takes a lack of discoverability to new heights. In order to find the global keyboard shortcut for wprui all I had to do was look in this file:

C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\wpr.config.xml

and in there I found this line:

<SaveRecordingKeyboardShortcut WinKey=”true” CtrlKey=”true” ShiftKey=”false” AltKey=”false” CharKey=”c” VirtualKeyCode=”0″ />

Of course! Ctrl+Win+C is the global keyboard shortcut for recording a trace. Displaying it in the UI would have been too obvious.

That configuration file contains some other settings. I don’t know what they do.

Summary

Use wprui. Out of the box it is the easiest way of recording xperf traces. With a bit of configuration you can get it to record custom user events which make trace navigation much easier. Use the Ctrl+Win+C shortcut to trigger the recording of traces

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Performance, xperf and tagged , , , . Bookmark the permalink.

44 Responses to Xperf Basics: Recording a Trace (the easy way)

  1. Kyle Sabo says:

    WPR should be able to set the 64-bit stack walking registry key for you if you try to take a trace on Win7: http://i.imgur.com/4BXhHIl.png. On Win8 it’s not needed as long as all the drivers are compiled with the latest compiler (ha!), so I think it just doesn’t bother.

    I believe you can also turn on file compression on Win8 if you stare at the schema long enough (), but if you do you’ll also only be able to open the traces on Win8 (though I haven’t ever tested this).

    • brucedawson says:

      That’s good to see that wprui can set that flag. I’ll adjust the post to reflect that. Thanks for letting me know.

      It’s unfortunate if they came up with a trace compression scheme that is OS specific. That seems like an odd way to implement this rather important feature.

    • Kyle Sabo says:

      I think WordPress ate the XML in my post, so try 2: xs:complexType name=”FileCompressionType”

      • brucedawson says:

        Where do you put the FileCompressionType tag? If you upload a file that demonstrates how to use it then I’m sure that would be quite helpful. But, if it’s Windows 8 specific then I won’t be interested myself for a while since I’m still exclusively on Windows 7.

        My tool just compresses traces into .zip files (I give them a .etz extension) and then decompresses them just-in-time before invoking wpa. Pretty simple, and it works on all versions of Windows.

        • Kyle Sabo says:

          Sorry, I probably should have tried the file compression feature before commenting about it. As far as I can tell, it’s in the schema but it’s simply just ignored. You were saying something about discoverability and documentation…

          You can compress the traces with “xperf -merge -compress trace.etl compressed_trace.etl”, but it won’t work or open on Win7, so you’re probably better off just continuing to zip them.

          • brucedawson says:

            Well, at least I can look forward to compressed traces in the future. Compressed traces that transparently decompress (and compressed .symcache files) would be nice, but I will wait.

          • brucedawson says:

            Minor correction to the xperf syntax: -compress has to come after the trace names:

            xperf -merge trace.etl compressed_trace.etl -compress

            Also, if you’re recording traces using xperf.exe, like I often do,then, assuming you are recording a kernel session and another session, you can merge the two traces with compression like this:

            @rem Start tracing:
            xperf -on %KernelProviders% %KernelStackWalk% %KBuffers% -f %kernelfile% -start %SessionName% -on %UserProviders%+%CustomProviders% -f %userfile%

            @rem Stop tracing
            xperf -stop %SessionName% -stop
            @rem Merge and compress the two trace files
            xperf -merge %kernelfile% %userfile% %FileName% -compress

  2. Pingback: Xperf Basics: Recording a Trace | Random ASCII

  3. Cort says:

    I followed steps 1-4 successfully (including the reboot after step 3), but when I click “Start” in wprui, two unexpected things happen. First, it tells me that an existing session is already running. Second, if I click OK to stop the running session and start a new one, the “Status” message changes to “Starting Recording…” but never gets any further. Do either of those problems sound familiar?

    • brucedawson says:

      You should try to figure out who else is recording kernel ETW data. Some tools, such as powercfg, also use the kernel ETW providers and therefore cannot run at the same time that wprui is recording. I think that procmon and Resource Monitor may also use the kernel ETW providers. If you have just rebooted then nobody else should be recording. If they are then rather than forcibly terminating them try using “xperf -loggers” to see if you can get any clues as to who is doing the tracing. Unfortunately it can be hard to figure out what is normal in the undocumented output of xperf -loggers — maybe compare the results to a machine where wprui works?

      Sorry, I haven’t seen this particular problem.

    • Ben Hymers says:

      I had the “existing session” warning too; I found that the other session was Process Explorer. Closing it and running wprui again made the warning go away, and forcibly stopping the other providers from wprui didn’t seem to have any effect on Process Explorer’s logging. Perhaps it’s just using it for some of its more esoteric stats?

  4. James Whitworth says:

    How do you handle other non Windows platforms when writing an application that contains custom ETW provider code. Do you simply wrap any ETW related code with a define so it is compiled out or do you have a more complex macros that produce ETW code on Windows and alternate profiling code on other platforms?

    • brucedawson says:

      In the sample code in MultiProvider.zip I use the preprocessor to select between alternate implementations of the profiling functions. The alternate implementations are just empty inline functions, but they could contain whatever you want. The profilers I use on other platforms are sufficiently different that mapping the custom ETW events to them doesn’t seem worthwhile at this point, but that could change.

  5. Pingback: Xperf for Excess CPU Consumption: WPA edition | Random ASCII

  6. Alois Kraus says:

    Very informative Bruce. I have tried to create a custom xml to configure my traces as I would like them but failed. I wanted to enable them in the UI but no matter what the default providers were always turned on.
    After checking with MS Support it turned out that WPRUI does ignore what you select and does record everything. The console application does really use your custom profile….

    If you are into .NET you can find a sample how to use ETW events with .NET 4.0 and a small script to to the managed stack walk on the fly here:

    http://geekswithblogs.net/akraus1/archive/2013/06/02/153041.aspx

  7. Martin Weißhaupt says:

    I just thought about creating my own Gui-Tool to select which events should be recorded. I didn’t know that the Windows 8 SDK already provides a tool for that.

    Thx again for the information about ETW-Traces, this is very usefull😉

  8. Pingback: Visual Studio Single Step Performance Fixes | Random ASCII

  9. Pingback: The New WPA Xperf Trace Viewer–New Bugs and Old | Random ASCII

  10. Pingback: 50 Bytes of Code That Took 4 GB to Compile | Random ASCII

  11. Pingback: Why does Explorer hang when I right-click on a folder? | Needle in a Thread Stack

  12. almahallawy says:

    I needed step 3 in Win 8 64

    • brucedawson says:

      You used wprui and it didn’t ask if you wanted that registry key set? It’s possible that that means that the registry key was already set — that would be nice.

      • almahallawy says:

        Yes I used WPR UI and it didn’t ask me for any registry key set.
        It is a corporate build machine, may be that is the reason.

  13. Pingback: New Version of Xperf–Upgrade Now | Random ASCII

  14. Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII

  15. billythefisherman says:

    Hi Bruce the link ftp://ftp.cygnus-software.com/pub/MultiProvider.zip seems to be down any chance of posting another? Thanks

    • brucedawson says:

      It works for me (famous last words). Probably you are behind some firewall which does not like it. I can’t see it from work myself but I can see it from home.

      I could post a mirror somewhere else but I’d rather not if I can avoid it. Let me know if you can get it from home.

  16. Pingback: Writing a Very Simple WPR Profile | Needle in a Thread Stack

  17. Pingback: Make VC++ Compiles Fast Through Parallel Compilation | Random ASCII

  18. Pingback: You Got Your Web Browser in my Compiler! | Random ASCII

  19. Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII

  20. Pingback: 50 Bytes of Code That Took 4 GB to Compile - HackZip

  21. Pingback: Probleme mit Package-C-States (Windows / Linux) - Seite 9

  22. Pingback: ETW Trace Compression (and xperf syntax refresher) | Random ASCII

  23. Pingback: Profiling the profiler: working around a six minute xperf hang | Random ASCII

  24. Pingback: How to use Xperf and Xbootmgr ? | Jacques DALBERA's IT world

  25. Madhu says:

    Hey bruce, I could not enable the Microsoft-Windows-Kernel-PnP Provider using the Multiprovider.wprp, i have added this in my custom providers list. Can you help in enabling it? i tried enabling the non paging memory thing, But did not work for me.
    Thanks,
    Madhu

  26. brucedawson says:

    You’ll have to give more details. What OS are you running? What error do you get? Also, you should probably consider giving up on wprui (I have) and switch to UIforETW (I wrote it). See https://randomascii.wordpress.com/2015/04/14/uiforetw-windows-performance-made-easier/ for details.

  27. Peter Liepmann says:

    I *think* this program does something similar-
    https://xperf123.codeplex.com/
    “This tool is used to automate the process of collecting xperf traces easy without the user worrying about the various settings and configuration options.”
    Perhaps you should collaborate?

    I already installed Windows Software Development Kit for Windows 8.1 RTM-didn’t want to install again, perhaps having the UI available by itself? Anyway, having an .exe file separate makes install clearer for us primitive types.

  28. Pingback: Xperf Basics: Recording a Trace (the ultimate easy way) | Random ASCII

  29. Pingback: ETW Central | Random ASCII

  30. Pingback: How to get the UI Delay graph to appear in ETL traces. Particularly for Black Box ETW flight recorders. | Libro Magica

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s