Xperf Basics: Recording a Trace

For information on newer/easier/better ways of recording xperf traces see Xperf Basics: Recording a Trace (the ultimate easy way). This article is left here for historical reasons only.

The Windows Performance Toolkit, also known as xperf, is a powerful (and free!) set of tools from Microsoft that allow profiling of all aspects of a Windows computer by using ETW (Event Tracing for Windows). Whether your performance issues are caused by excess CPU usage, waiting on file I/O, or interactions with drivers and other software, xperf usually provides the information needed to diagnose what is going on.

In addition to using xperf to diagnose dozens of tricky performance problems in the software that I work on I have used it to find (and both workaround and report) performance problems in:

  • VirtualAlloc
  • PowerPoint
  • Visual Studio (breakpoint related hangs, SQL access hangs, network access hangs, etc.)
  • Windows Live Photo Gallery

But, there is a problem. Xperf is difficult to learn, and the documentation is, well, imperfect. I hope to share some of what I have learned so that this valuable tool can be used by more developers, to make their software more awesome.

This post gives some resources on that most basic of problems, “how do I record a useful xperf trace that contains the information I’m likely to need.”

Recording a trace

Xperf is a command line tool with a bewildering array of options. Some of the things that you might need to specify when recording a trace are:

  • What kernel providers (context switches, virtual allocs, sampling profiler, disk I/O) do you want to record?
  • What events do you want call stacks for?
  • How many memory buffers do you want, and what size do you want them to be?

For best results you should also record product-specific events from a user-mode provider. This requires that you learn:

  • How to write a provider manifest
  • How to ensure portability to non-Windows platforms (compile-time checks) and pre-Vista platforms (run-time checks)

And so on. It’s a lot to learn and while the basics of recording a kernel trace with a couple of kernel providers are covered pretty well it can be difficult to even find out what other options might be useful.

I have no intention of writing full documentation for xperf. Instead I am going to provide and explain the user-mode providers and the batch files that I use to record traces. That at least should get those interested in xperf off to a better start.

Step 1 – get xperf

Xperf is distributed as part of the Windows Software Development Kit. There are many valuable tools in there, but that’s the topic of another post. For now, run the installer for the latest Platform SDK (currently version 8.0, found here as of May 2012). You will find Windows Performance Toolkit which is the official name for xperf. Install it. You can do what you want with the rest of the Platform SDK. The appropriate version for your operating system (32-bit or 64-bit) will be installed – I found it hidden in “C:\Program Files\Windows Kits\8.0\Windows Performance Toolkit”. The redistributable packages for all Windows flavors, to make installing on other machines easier, should be found at “C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\Redistributables”. Make sure the install directory is in your path, then move on.

Step 2 – get the sample providers and batch files

You can download my sample user-mode ETW providers from ftp://ftp.cygnus-software.com/pub/MultiProvider.zip. I’ll wait.

When you unzip the file you’ll find a Visual Studio 2010 solution file. Build the debug or release configuration. You might want to poke around and look at the ReadMe.txt file, the provider manifest file (etwprovider.man) and the batch files.

Step 3 – record a trace

Now things start getting messy. I’m going to try to document all of the necessary steps and gotchas but it’s hard to make it really fool proof.

If you are running 64-bit Windows then there is a registry key that you need to set. And then you need to reboot. The registry key tells Windows to keep information needed for stack walking in non-pageable memory. If you run the command below from an elevated command prompt (yes, it is all one line that is excessively wordwrapped) and then reboot then your call stacks will thank you.

REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f

In order to use user-mode providers you have to register them. The etwregister.bat file is for that purpose. Like most things xperf you need to open up an administrator command prompt. Navigate to the directory containing etwregister.bat and run it. It should be able to find the MultiProvider.exe that you built (which contains the providers) and etwprovider.man (which defines them) and register them with wevtutil.exe.

Now run etwrecord.bat. Normally I give it the name of a trace file to record to like “etwrecord.bat c:\temp\testtrace.etl”. If you don’t give it a name then it will manufacture one. Either way be careful not to have spaces in your path. At this point tracing has been started. Detailed information about the operation of your computer is being recorded. If you get error messages at this point then read them carefully. Make sure the etwregister.bat step worked properly. Make sure you are running from an administrator command prompt. Make sure you are running Windows 7 (ideal) or Windows Vista in a pinch. ETW tracing is very limited in XP and I have done zero testing on that platform.

Now run the MultiProvider.exe that you built way back in step 2. It will emit some ETW events that your trace is set up to record.

Once MultiProvider.exe has exited you should return to the command prompt and press the ‘any key’ to continue. At this point your trace will be saved to disk. Make sure your trace file name doesn’t contain any spaces because it’s very difficult to get batch files to handle that correctly – and mine don’t. Again, if anything goes wrong then you’ll have to examine the error messages and suggestions very carefully.

In order to be helpful the batch file will launch xperfview to view the trace. Since the batch file is running as administrator, and since xperfview doesn’t process traces while elevated you will get this dialog. Read the link above for details, or just click Yes to view the trace or No to cancel.

image

Step 4 – looking at the user events

To find the user-provider events – which are great for offering context when looking at the kernel trace data – go to the Generic Events graph in xperfview. It should be at the bottom. You can hover over the diamonds to get a summary of each event. Some of the events are random Windows events with minimal value so I usually use the ProviderIDs configuration dropdown to turn off all except for mine, which are creatively named Multi-Main, etc.

image

The blue diamonds are designed for worker thread events, the two purple diamonds (your colours may vary) in the screen shot above are faked up input events. The green diamonds are faked up events that indicate the beginning of a frame, as in a video game. The brown diamonds are generic events. The whole purpose of having multiple providers is so that they will show up on different rows in this view. By carefully selecting when to use each provider you can make patterns (such as high and low frame rates, or high and low packet frequencies) visually obvious.

To get more details you need to select a range of time on the graph, right click, and request a summary table. Summary tables are a complex topic, but for now suffice it to say that by enabling and disabling columns to show the ones that you care about, by reordering columns (paying close attention to the gold bar – columns to the left of it are used for hierarchical grouping), and by careful sorting you can answer many questions. Effective use of pivot tables is an art form and is crucial to getting full value from xperf.

On the summary table below we can see, for instance, that the obviously important task “Busy work…” started 5.462635 seconds into the trace, and ended 5.477991493 into the trace. That information then helps us make sense of the other graphs on the main window because they all share a common timeline. Or at least it would if we were profiling something real. Use your imagination and pretend that those Begin/End markers are helping you identify when your AI code is running, or when map loading occurred, or whatever significant event that is taking too long that you want to investigate.

image

Step 5 – customize it for your purposes

In order to integrate this into your projects and start squashing performance bugs you need to change some things.

  1. Copy etwprof.* and ETWProvider.man to one of your projects – the DLL or EXE that will contain the providers. You’ll need to copy over the custom build steps for ETWProvider.man and include ETWProviderGenerated.rc (created by those custom build steps) into your existing resource file.
  2. Modify ETWProvider.man and ETWRegister.bat to change the name of the DLL or EXE that will contain the providers
  3. Change all of the GUIDs and provider names in ETWProvider.man
  4. Make the same provider name changes in ETWCommonSettings.bat (so that tracing enables the correct providers) and etwprof.cpp (to adjust the run-time registering of the providers and the emitting of events)
  5. Run the updated ETWRegister.bat to register your providers – make sure this succeeds.

At that point you should be able to build your code, register your providers, add some calls to your event emitting functions, and record traces that contain your user events.

Step 6 – bonus!

As an extra bonus, when etwrecord.bat finishes it actually leaves tracing running. Tracing continues to a circular buffer and at any time you can save that buffer (retroactive profiling!) to disk with “etwcirc.bat c:\temp\retrotrace.etl”. If you don’t like wasting memory on those circular buffers, “etwcirc.bat stop” will fix things for you.

Step 7 – detailed analysis

Analyzing traces is a huge topic, full of undocumented summary tables, and it will have to be saved for another post.

Step 8 – understanding how it works

This isn’t really a step, but a section to explain some of the details of how this works, to aid in customizing the batch files and providers. There are also lots of comments in the batch files and source code which should be used as a resource.

If you type “etwrecord test.etl” then xperf.exe will be invoked a few times. Once immediately:

xperf -on Latency+POWER+DISPATCHER+FILE_IO+FILE_IO_INIT -stackwalk PROFILE+CSWITCH+READYTHREAD -buffersize 1024 -minbuffers 300 -start gamesession -on Microsoft-Windows-Win32k+Multi-MAIN+Multi-FrameRate+Multi-Input+Multi-Worker

The first command starts tracing. “-on” indicates that the kernel provider should be started, and the plus-sign separated words that follow are individual kernel providers (POWER+DISPATCHER+FILE_IO+FILE_IO_INIT) and kernel provider groups (Latency). See “xperf -providers k” for a list of kernel providers and “xperf -help start” for information on the very complex “-on” syntax.

Then, “-stackwalk” indicates which events should have call stacks recorded for them. Call stacks are very useful, but are also expensive. Note that if you turn on stack walks for an event that is not enabled then nothing happens. See Pigs Can Fly for details of this or look at “xperf -help stackwalk” to see the full list of flags.

Then the size and minimum number of buffers is specified. Setting this too high will waste memory, and too low risks losing events. Note that the buffer size is in KB, so the settings above request 300 MB of buffers for the kernel events. See “xperf -help start” for details.

Then we get to “-start gamesession” which requests that we start a user-mode logging session called gamesession. Any name will do, but only one session of that name can be running at a time. Following that is “-on” and a list of user-mode providers. The first one is a built-in Windows 7 provider. See “xperf -providers” for a list of user-mode providers. The others are the providers that are defined in MultiProvider.exe and registered with ETWregister.bat. Customizing these, in ETWCommonSettings.bat can be important.

Xperf is invoked again after you hit a key to stop tracing:

xperf -stop gamesession -stop -d test.etl

The “-stop gamesession” tells xperf to end our user-mode logging session. The “-stop” by itself then tells xperf to halt the kernel session. The “-d test.etl” tells xperf to take those just-stopped sessions and merge them into a single trace file, for integrated analysis.

There are also a couple of “xperf –loggers” commands. Those are there for diagnostic purposes. When things go awry this undocumented command gives a list of all active ETW sessions that can help understand what is going on.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Performance, xperf and tagged . Bookmark the permalink.

57 Responses to Xperf Basics: Recording a Trace

  1. Andre Vergison says:

    My ADD REG came back with “ERROR: Invalid key name.” Replacing the citation quotes by ordinary quotes fixed it. Also, -d 0x1 had to read -d 1.

  2. brucedawson says:

    Thanks Andre. The command was fine in Windows Live Writer, but was wrong on the blog. That’s twice that a post has been mangled during upload.

    I wrapped the command in tags and now it works properly, including the -d 0x1.

    Also, I should have mentioned that the batch files in the download listed in step 2 will set that registry key for you.

  3. Pingback: PowerPoint Poor Performance Problem | Random ASCII

  4. Pingback: Xperf for Excess CPU Consumption | Random ASCII

  5. Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII

  6. Pingback: The New Xperf is Here! | Random ASCII

  7. Pingback: The Lost Xperf Documentation–CPU sampling | Random ASCII

  8. Pingback: The Lost Xperf Documentation–CPU Scheduling | Random ASCII

  9. Pingback: Lightweight In-Memory Logging

  10. One thing I would like to know (especially when using custom events) is if there’s a way within the xperf GUI to view a “combined” summary table. Maybe I’m just not doing it right, but one of the things I’m trying to do at the moment is to instrument a bit of code with Begin/End events (where the End event shows how long it took). In the majority of cases a particular section of code will execute sufficiently quickly that I’m happy with it, but sometimes it takes “too long”, and I want to find out what it was doing in that time (and whether it was a lock contention, context switch, or particularly complex input data).

    I can find the End events that I think are “too long” easily enough, but I’ve yet to discover a way to then tell xperf to then use that event as a focus and show me a complete timeline of everything else that happened before that, in order, in table form (summary table without the gold bar). I can do it for single categories (eg. Generic Events) by moving the gold bar all the way to the left and then sorting by Time (s), but this of course only shows those events, not context switches or anything else.

    At the moment I’m doing it very kludgily, by exporting the entire ETL trace to CSV, then running a program to extract only those records within a specific range of my desired timestamp, and finally loading into Excel and staring at it for a while. But this seems overly complicated — and it takes a really long time to write out the CSV file and then throw most of it away to get something that can be loaded by Excel.

    Is there some option within xperfview that I’m missing?

    • brucedawson says:

      Some of my other posts cover trace analysis in a bit more detail, but the basic workflow is to select the region of interest in the main window and then right-click on the selection to bring up a summary table for that region. Once you have a selection you can clone it to all of the other graphs. You can select a region with the mouse (the selection snaps to diamonds on the graphs) or by right-clicking and using Select Interval… to type in specific time values.

      You should also consider using WPA to analyze traces. Starting with the recently released version of WPT it is probably the best choice. You can have multiple summary tables in one window and when you zoom to a selection they are all updated. I’ll blog about it soon.

      • Viewing the info in separate tables isn’t really good enough, though — you have to manually match up the events between the windows and it’s easy to miss something if you didn’t happen to have its window open. I haven’t tried using WPA yet though; I might give that a go.

  11. Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII

  12. Do you happen to know if -stackwalk VirtualAlloc works on Windows 7 SP1? I can get stacks for -stackwalk syscall and -stackwalk profile, but I only get “?!?” stacks when using the VirtualAlloc flag.

    • brucedawson says:

      It works for me. Note that the -stackwalk flag is VirtualAlloc but the provider name is virt_alloc, and you need to both enable the provider and request call stacks. Make sure you are using the latest version of the xperf toolset, from last month. I think there may have been display bugs prior to then, although I’m not sure.

      • I’m using -on PROC_THREAD+LOADER+VIRT_ALLOC -stackwalk VirtualAlloc with xperf and WPA 6.2.8400… strange. Using syscall and filtering for VirtualAlloc seems to work, but is much slower, and I don’t think it’s entirely equivalent.

        • brucedawson says:

          How are you trying to view the call stacks? I seem to remember some bugs with display virtualalloc information using command line processing, but with xperfview I could view them. Try both the Stack Counts by Type graph and the virtual alloc graph — whatever it is called. That is, enabled those graphs and then go to their respective summary tables.

      • Looks like my problem was looking at the wrong graph after all. I expected there to be stacks on the Virtual Total Commit Count graph, but they’re only on the Count by Event Name graph.

        Thanks for all this material on xperf! This one hiccup aside, you made it much easier to use this rather complicated tool.

        • brucedawson says:

          When I record a trace with virtualalloc stack walking I get stacks showing up in the Virtual Allocations Summary Table. The column is off by default, but once I enable it I get perfectly good stacks, for all processes. I got to this summary table from Virtual Outstanding Commit Size.

          I used xperfview 6.2.8229, released last month.

  13. Stephen says:

    Hi, I want to create my own boot trace using xperf.exe instead of xbootmgr.exe. Looking at the command-line options for xperf.exe, I believe it is possible.
    I have been able to create a boot trace using xperf.exe BUT when I attempt to create a summary file using the “boot” action, it fails. It looks like there are no explorer done events in the etl.

    Are you able to create a boot trace without using xbootmgr.exe, and if so, how? I can’t seem to find info out there that describes what I want to do.

    Thanks,
    Steve

    • brucedawson says:

      I’d recommend using xbootmgr.exe. I believe that is what Microsoft uses for investigating boot performance, so if you are doing otherwise you are heading into uncharted territory. Unfortunately I know almost nothing about xbootmgr. Let us know if you find anything?

  14. Pingback: xperf – Show me the performance! | dickyjim

  15. Nicholas Frechette says:

    FYI, to remove the warnings from the manifest file (related to the schema), you must add the schema files manually to visual studio. To do this, open the manifest, in the top menu, open XML -> Schemas… etc. The schema files are in the windows SDK.

  16. Martin Weißhaupt says:

    Hi Bruce,
    I have problems with my custom provider. I always see crimson events in the xperfview and wpa tools but not the data I tried to log. The Crimson events have the matching GUIDs but it seems that it doesn’t transmit my data correctly…

    I must be doing something wrong. Could you please take a look at my test program here?
    http://www.mw88.org/uploads/ETWTest.zip

    Thanks,
    Martin

    • brucedawson says:

      I found the problem with your sample app. Everything was perfect except that your app.rc file didn’t include etwproviderGenerated.rc. This include (see the sample) is easy to miss but vital. It apparently adds some extra metadata which the ETW recording process needs. Definitely something to watch for — it’s an easy mistake to make.

      • Martin Weißhaupt says:

        that was exactly the problem I had. Now it works perfectly!

        It has been some time since I implemented ETW-Traces in our Software and we have been able to identify some performance bottlenecks which otherwise would have been invisible for us.

        The best thing is that with the new WPA-Viewer you can drag and drop views together. For example your own ETWMarks and CPU-Usage.

        This is very usefull to see if there is a higher or lower CPU-Usage when certain functions are executed or if something blocks the system and the program has to wait. This was the case in our software which could have caused problems on slower computers.

        Thank you for your help😉

  17. Pingback: The Lost Xperf Documentation–Disk Usage | Random ASCII

  18. Tejas says:

    Hi Bruce
    Is there a specific reason as to why etwregister.bat copies over the provider’s assembly to a temporary directory?
    Thanks

    • brucedawson says:

      I’m a bit vague on the details, but the basic idea is that the manifest file specifies the path where the provider DLL will be found, and it had better be there. During development the provider DLL could be in any random location so the register process has to copy it somewhere.

      The temp directory is hardly ideal, but it is simple. A ‘real’ installer could place the DLL in the installation directory and then the manifest file could say something like:

      messageFileName=”%ProgramFiles%\Company Name\ProductName\ETWTest.exe”

      With the manifest file in the temp directory you can hit problems if the temp directory gets cleaned up between calling etwregister.bat and recording a trace, but I do the registration process frequently enough that that isn’t a problem.

      I hope that makes sense.

  19. I actually intend to save this blog post, “Xperf Basics: Recording a Trace | Random ASCII”
    on my personal page. Do you mind if I actuallydo? Many thanks ,Billy

  20. Harish says:

    Can I open the etl file from VC++ code??? can anyone help me on this…
    Thanks.

    • brucedawson says:

      It’s possible to write code to parse an ETL file (see Vance Morrison’s blog) but that is not a ‘normal’ work flow — that would only be needed for bulk processing when you have hundreds of traces that you wan to data mine. For initial investigations you should be using WPA.

      • Harish says:

        Hey Thanks for the reply… I will check the same….
        I have used WPA, media perf view and xray tool for analyzing etl files…
        .. But for one of the scenario I need to check dpc values and it should not exceed some limit..
        So if I can parse from code, it will be of great relief🙂

        • brucedawson says:

          If you get parsing of .etl files from C++ or C# working then please post some sample code. There are some examples at http://blogs.msdn.com/b/vancem/ but I’m not sure how well they work.

        • If you’re just checking that something doesn’t exceed a limit, you can do that easily enough with the GUI. Just open a summary table and reverse-sort by the appropriate column, and see if the top entries are over your limit or not.

          • Harish says:

            That’s correct.
            But in my case I have around different types of video and audio codec’s to be checked where the performance of these should not exceed. I use etl file for profiling and I have around 2000 varieties of clips. Currently I pick some random clips and see the performance… But later I know this has to be done for most of the clips.

            If I can parse this in some language I can automate the whole process and just wait for the result….

  21. Harish says:

    Also one more point.. In whck test there is one test called glitch free test. This test does the same thing..
    It plays some clip and use some Nike.exe(from microsoft) file to analyze the etl and gives result of video and audio glitches….
    The same I need to perform…🙂

    • brucedawson says:

      Have you looked at “xperf -help processing”? This lets you dump traces to text in various ways which can sometimes be used to do the analysis you want. I’ve used that to summarize heap information, and you might be able to use it for dumping DPC details. Maybe the dpcisr action?

      • Harish says:

        Oops I haven’t seen this.. Ok I will try this .. I think this should help me….
        That was great thought…. Thanks Bruce!!!

        • Harish says:

          I have tried the processing option and it works fine.
          Thanks Bruce… I need to start R & D in this action to get my data🙂 ….

  22. Pingback: Visual Studio Single Step Performance Fixes | Random ASCII

  23. Pingback: Xperf Basics: Recording a Trace (the easy way) | Random ASCII

  24. It looks like your multiprovider link is down? I’m not sure yet, it may be me.

    • I should add, this is a really cool idea! I’m excited to throw some events in our product to see if I can better orient myself in xperf traces.

      • brucedawson says:

        Having user events is, in my opinion, crucial. In most trace analysis you are interested in a specific spot in the trace and custom events help you find that crucial spot. Note that the new versions of the MultiProvider sample include a “-inputlogger” option which puts keyboard and mouse events into the ETW event stream.

        The MultiProvider link works fine for me. You may be blocked by a firewall.

        You should read the updated version of this article. This article is deprecated. Look near the top for a link to the new and improved article.

        • mrmarmot says:

          Yeah, probably an issue on my end. The link seems to be fine today. Weird. I’m not actually sure why I commented on this post since I was also looking at the new one. Anyway, thanks for your help.

  25. Sarkie says:

    I forgot that I got the xperf idea from here, I ran an xperf against Steam because if I leave Steam for a few weeks, it’ll take ages to load up first time (before the log in) and looking in ProcMon it looks to be checking all the files for consistency. I am assuming because I have a lot of games and a non-ssd drive.

    I just really wanted Steam to load like Chrome, fast as hell to draw the UI and then do its other stuff once it’d loaded. “Checking Game for Consistency” or whatever.

    Problem was I’d left it running too long and xperf couldn’t load the file as the file was GBs big.

    • brucedawson says:

      You generally shouldn’t need to record long traces. If Steam is taking a long time to startup then just record a trace for 20-30 seconds or so. It doesn’t matter whether this captures the entire Steam startup time — 30 seconds is *tons* of data to analyze for slowdown reasons.

      If you feel that this is a serious and systemic Steam problem then you might want to contact Valve support. They can share out a tool that is customized for recording ETW traces of Valve software. If they think this is a useful case then we’ll get you set up, get traces, and figure out what’s going on.

  26. Pingback: ETW Trace Compression (and xperf syntax refresher) | Random ASCII

  27. Pingback: Xperf Basics: Recording a Trace (the ultimate easy way) | Random ASCII

  28. Pingback: ETW Central | Random ASCII

  29. stej says:

    I wanted to start from basic before using UiForEtw. That’s why I’m here and not at the other page🙂 Is the MultiProvider.zip from ftp://ftp.cygnus-software.com/pub/MultiProvider.zip still available somewhere?

    • brucedawson says:

      The latest version of the MultiProvider DLLs and header files is in the latest UIforETW release .zip file, and their source is on github. You can use these without installing or running UIforETW – although I strongly recommend UIforETW for most purposes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s