Event Tracing for Windows (ETW) aka xperf is an amazing tool for investigating the performance of Windows machines – I’ve blogged about it many times and it’s helped me find some amazing issues. But recording ETW traces has always been tricky. Microsoft’s wprui.exe showed some potential, but is ultimately missing some features and often gets tripped up by ETW performance bugs.
The fallback plan is to use carefully crafted batch files to record ETW traces. These offer flexibility but… uggh. Batch files? Really? Are we savages?
I finally got annoyed enough by this situation to create UIforETW. This is a tool that records ETW traces, works around ETW performance bugs, allows configuration of trace recording options, works as a trace management UI, and more.
“Sounds great! How much is it?”
I’m pleased to announce that UIforETW has been released on github. Download it, build it, hack on it, fork it, or contribute.
This post has been updated several times to document the many great new features that have been added since the original release. For more information on UIforETW and ETW/xperf tracing see ETW Central.
What does it look like?
Using UIforETW is pretty straightforward. It has a few buttons to start/stop tracing and record traces. A few check boxes to control tracing options. A couple of combo boxes, a list of traces, and a text box for entering notes about traces. That’s pretty much it.
How do you use it?
The simplest way to get started is to download etwpackage.zip from the UIforETW releases page on github. When you run etwpackage\bin\UIforETW.exe it will install the necessary versions of the Windows Performance Toolkit (and unlike VS 2015 it knows which versions work on Windows 7). The process of installing UIforETW and recording a trace is documented in Xperf Basics: Recording a Trace (the ultimate easy way). It’s designed to be simple enough for non-experts to use, so send your customers to that page.
You can also get the UIforETW code from github and build it if you want. Pull requests are welcome – just read the CONTRIBUTING file in the main directory.
Once you’ve run UIforETW click Start Tracing. By default this will trace to circular buffers in memory, so you can leave this running 24×7, with minimal impact on your computer’s performance. Go off and use your computer and the next time something happens that you want to investigate just type Ctrl+Win+C to save the buffers to disk. Tracing to circular buffers will continue.
You can rename the just-saved trace (F2) to remind you why you recorded it, then double-click (or type Ctrl+E) it to launch Windows Performance Analyzer (WPA). See the trace list context menu for more options. I also like to use the trace information area to record notes about my analysis.
Note that UIforETW has not been tested on 32-bit Windows. It probably works, but who knows? Caveat luddite, and really, caveat anyone using UIforETW – no warranties and all that.
Where is UIforETW?
- Code is at https://github.com/google/UIforETW
- Binary releases are at https://github.com/google/UIforETW/releases
- Google group for discussion is at http://groups.google.com/d/forum/uiforetw
- The announcement on the Google open source blog is at http://google-opensource.blogspot.com/2015/04/uiforetw-windows-profiling-made-easier.html
I record a lot of ETW traces. The investigative reporting posts that I do almost always require one or more ETW traces, and there are many more that I record while optimizing Chrome, or whatever I happen to be doing for a living. If I’m not careful then my notes on what I found in a particular trace will end up scattered through dozens of e-mails, google docs, and hangouts discussions. Therefore I think that the trace management features of UIforETW are crucial. There are a few components to this:
- ETW trace compression is on by default in UIforETW and it makes traces 5-10x smaller. This lets me retain more ETW traces. This only works on Windows 8 and above, but Windows 10 is a free upgrade. This feature is effectively missing from wprui, which is bizarre.
- Trace information is an edit field that lets me record notes about what I have found in a trace. It’s a tiny detail, but I find it invaluable. It means that I can easily refresh my memory as to why I recorded a trace and what I found in it. Trace information is stored in a text file with the same base-name as the trace, and is automatically loaded and stored as you navigate through traces. Trace information can also be prepopulated with important details – for Chrome developers I prepopulate it with a list of Chrome process IDs by type. I recommend recording important conclusions in the trace information with call stacks, process and thread IDs, and time ranges. You’ll thank me.
- Trace naming is important in order to know when I recorded a trace and why. UIforETW defaults to naming traces based on the date, time, and username (and unlike wprui it names them so that they sort sensibly!) The username is handy when receiving traces from customers. UIforETW supports renaming traces (use the context menu or select a trace and press F2) and it renames the trace information text file at the same time.
- Finding the interesting part of an ETW trace is crucial to a successful investigation, and user input is often a key indicator – many bugs start with “it hangs when I click”, or “the frame rate drops when I type ‘w'”. Therefore UIforETW includes an input logger which inserts ETW events into the trace for every mouse-click, mouse movement, or key press. The default mode anonymizes alphanumeric key presses for privacy, but you can switch to full-Monty mode when necessary.
- WPA defaults to showing you a blank screen. If you try to display generic events it overwhelms you with irrelevant rubbish, and if you show the CPU Usage (Sampled) table it defaults to hiding the call stacks. UIforETW solves all of these problems by automatically setting up a WPA startup profile. Even the graphs that don’t show initially have new configurations – look for Randomascii in the name. If you don’t like the startup profile you can always change it, but I think it’s a better startup experience than the default. If the startup profile doesn’t copy over automatically, go to the Settings dialog and click the appropriate button. Because WPT 10 changed the startup format UIforETW ships with separate profiles for WPT 8.1 and WPT 10. These are updated occasionally so don’t forget to copy new ones occasionally.
- UIforETW also ships with source and binary for a tool that retrieves PDBs and PE files from symbol servers, the latest version of my ETW providers DLL to emit custom events, a sample ETWEventDemo project that shows how to emit these custom events from your software, and various batch files, Python scripts, and other ETW related extras.
- UIforETW automatically registers the Multi* provider set found in ETWProviders*.dll, configures the DisablePagingExecutive registry setting for 64-bit stack walking, and generally makes sure everything is going to work.
- UIforETW will automatically register the Chrome ETW providers if any chrome tracing categories are selected in the Settings dialog – this requires Chrome 46 or higher and enabling exporting of tracing events to ETW from chrome://flags.
- UIforETW can optionally monitor the working sets of specified processes. Just enter a semi-colon separated list of process names and details about the working sets (including PSS and private working set if Expensive working set monitoring is checked) will be recorded to the trace once a second and can be graphed in WPT 10 – on Windows 8 and above a built-in OS provider gives similar information for less CPU cost.
- UIforETW will monitor battery status (percent full and drain/charge rate) when applicable, and if Intel Power Gadget is installed will also monitor process/GPU/package power draw, frequency, and temperature – handy for investigating thermal throttling or power efficiency. This can all be graphed in WPT 10.
- UIforETW will monitor the actual CPU frequency (in addition to what Windows and Intel Power Gadget say). This allows detection of thermal throttling.
- UIforETW supports heap tracing – see this separate blog post for more details.
At this point you should be able to use UIforETW. It’s not really complicated, and there are tooltips for all of the buttons. But, if you want to know all of the hidden features and raison d’être then by all means read on. Here is an explanation of most of the buttons in UIforETW:
- Compress Trace: If you are on Windows 8 or higher then this check box will be available. If it is checked then the -compress flag will used during the trace merge step. This makes traces 5-10x smaller, but the compressed traces cannot be loaded on Windows 7 and below. If you record traces without compression then you can use the trace list context menu to compress traces later, either individual traces or all traces.
- Context switch call stacks: When this is checked (which it normally should be) then two call stacks will be recorded on every context switch – one for the thread being switched in, and one for the thread that ‘readied’ the new thread. This normally has a minimal overhead but can significantly alter performance when profiling code that does hundreds of thousands of context switches per second. These context switch call stacks are vital when doing idle-thread-analysis – see the CPU Usage (Precise) documentation for more information, so only uncheck this if necessary.
- CPU sampling call stacks: When this is checked (which it normally should be) then every sampling interrupt will record a call stack on every CPU. This has a fixed and relatively minimal overhead. These sampling call stacks are vital when investigating excess CPU consumption – see the CPU Usage (Sampled) documentation for more information.
- Fast sampling: When this is checked the CPU sampling speed will be set to 8 KHz instead of the default value of 1 KHz. This can be important when you need to investigate a single slow frame (giving you ~133 samples per 60 Hz frame instead of ~17) but is not needed for investigating longer periods of time. The higher sampling rate makes traces larger or makes circular-buffer traces cover a shorter period of time.
- GPU tracing: When this is checked GPU data will be recorded. GPU usage can be shown in WPA (under the Video category) and deeper analysis can be done in GPUView, which can be launched from UIforETW with Ctrl+G or through the trace-list context menu.
- Input tracing: If this is set to Private or Full then all mouse move and mouse-click events will be recorded in the trace and will show up in the Generic Events graph. If set to Full then all key presses will be fully recorded, whereas if set to Private all letters will be recorded as ‘A’ and all numbers will be recorded as ‘0’.
- Tracing types:
- Circular buffer tracing: In this mode tracing goes to in-memory circular buffers and can run continuously. When Save Trace Buffers is clicked or Ctrl+Win+C is typed then the buffers are saved to disk. The last ~5-500 seconds (depending on activity levels) will be recorded and traces will never be excessively large. This mode is ideal for “retroactive profiling” where you decide to save the trace after inadvertently triggering a performance problem.
- Tracing to file: In this mode tracing goes to files on disk. This allows tracing of arbitrarily long periods of time, but it can create arbitrarily huge and unwieldy traces. This mode is ideal when you want to capture all of a particular scenario, but you have to remember to start and stop tracing at the right times.
- Heap tracing to file: In this mode all heap allocations for the specified process names are recorded to the trace. Heap tracing only applies to processes that are started after tracing begins. This mode tends to have an enormous data rate and is only suitable for recording short periods of time.
- Show commands: UIforETW works by running xperf.exe to stop and start tracing. You can see the exact commands run by checking this box. If you encounter problems with UIforETW then this can be a helpful diagnostic tool. When this is checked the time to run various stages of saving a trace is also displayed.
- Settings…: This brings up the settings dialog which allows additional configuration. Currently the options are:
- Heap profiled processes: Specify what processes to monitor when doing heap profiling, separated by semi-colons
- WS monitored processes: Semi-colon separated list of processes whose working set should be recorded once a second.
- Expensive working set monitoring: This enables monitoring of PSS and private working set of the WS monitored processes. Note that this can consume 100-200 ms of CPU time, so use with care. This feature requires running the 64-bit version if running 64-bit Windows.
- Copy startup profile: Copy the official UIforETW startup profiles (for WPT 8.1 and WPT 10), with known-good settings for showing input, window in focus, GPU usage, CPU Usage (Precise) graph and CPU Usage (Sampled) table data, and custom. The WPT 10 profile is configured for easy graphing of Generic Events.
- Copy symbol DLLs: Copy known-good symsrv.dll and dbghelp.dll files to the xperf directory to try to work around xperf symbol loading pitfalls.
- Chrome developer: When this is checked the default symbol path will include Chrome’s symbols, and some Chrome specific symbol processing and trace analysis will be done.
- Auto view traces: When this is checked the trace viewer will be launched as soon as a trace is recorded.
- Stacks on heap tracing: Check this to include stacks in heap tracing. This is normally desirable, but is expensive.
- VirtualAlloc stacks always: Recording VirtualAlloc call stacks allows lightweight memory profiling that can often be left on all the time. Recommended.
- Chrome tracing categories: These check boxes correspond to some of the commonly used tracing categories in Google’s Chrome web browser. This requires Chrome 46 or higher and –trace-export-events-to-etw on the command-line.
For a quick reminder of what a particular button does just hover the mouse over it and wait for a tool-tip to appear.
Here is an explanation of most of the items in the Traces context menu:
- Open trace in 8.1/10 WPA: This launches the selected trace in that version of WPA
- Open trace in GPUView: This launches the selected trace in GPUView
- Delete trace: This sends the selected trace and the associated trace notes text file to the recycle bin.
- Rename trace: This lets you rename the selected trace and the associated trace notes text file from within UIforETW. By default the date/time portion of the trace is excluded from renaming – use Shift+F2 to allow renaming all of the trace name.
- ETW compress trace (8.0+): This will run the ETW “-merge -compress” step on the selected trace which will dramatically shrink traces that have not previously been compressed. This requires Windows 8 and above.
- Zip compress trace: This will, some day, compress traces into .zip files that UIforETW will transparently view. But this is currently unimplemented.
- Compress all traces: This will run the ETW compress trace step on all traces, saving the results if it makes the traces smaller.
- Zip compress all traces: This will run the (unimplemented) Zip compress trace step on all traces, saving the results if it makes the traces smaller.
- Browse folder: This launches Windows explorer to view the traces folder.
- Strip chrome symbols: This launches a Python script that, if _NT_SYMBOL_PATH contains the Chrome symbol server, will download, strip, and process Chrome’s symbol files to avoid ~25 minute hangs when loading Chrome’s symbols in WPA.
- Trace path to clipboard: This copies the full path to a trace to the clipboard. Type Ctrl+Shift+C to copy just the file part.
ETW bug avoidance
It is ironic that I seem to keep finding performance bugs in the ETW trace recording and analysis. As much as possible UIforETW works around these ironic bugs.
- In 2012 I found that symbol loading failed for some WPA users, and was excruciatingly slow (150x slower than normal!) for others. When symbol loading failed it was because either dbghelp.dll or symsrv.dll were not in the path. Profiling showed that when it was slow it was because a slow version of dbghelp.dll was in the path. The fix is to copy known-good versions of these DLLs to the xperf directory. The UIforETW settings dialog has a button to do this. I don’t currently have either of these problems so I can’t fully test the fix, so if you hit these problems then try Copy symbol DLLs and let me know if it helps.
- In 2014 I found that some symbols (notably the extremely large symbol files for Chrome) took ~25 minutes to load into WPA. Profiling of WPA showed that this time the problem was in msdia120.dll. The fix was to download the Chrome symbols, strip the private data, convert them to WPA’s .symcache files, and only then launch WPA. UIforETW does all this automatically, which saves about 24.5 minutes each time I profile a new version of Chrome. If you don’t have the Chrome symbol server enabled then the StripChromeSymbols.py script does nothing. This problem could occur on any large symbol files. The stripping step can be manually run from the traces-list context menu. This script now also fixes a bug in the 10.0.14393 (Windows 10 Anniversary Edition) that stops it from downloading symbols from Chrome’s symbol server.
- In 2015 I hit a problem when recording ETW traces. During the merge step I found that xperf.exe was spending all of its time doing uncached reads from Amcache.hve. This added a six minute delay to recording each trace! To avoid this UIforETW tries to rename this file before running the merge step, and then rename it back at the end.
- I’ve also recently hit a bug where when recording a trace from circular memory buffers the xperf.exe process does millions of one MB reads from a relatively small file. These reads are fully cached, but the huge volume of reads still causes multi-minute hangs. I have no workaround for this other than to trace to a file. Let me know if you hit this bug so that we can figure out what triggers it.
- WPT 10 fails to record dbgid information on Windows 7 so UIforETW automatically invokes the WPT 8.1 version of xperf on Windows 7.
- WPT 8.1 can’t change the CPU sampling frequency on Windows 10 so UIforETW automatically uses the WPT 10 version of xperf for this on Windows 10.
None of these bugs affect everybody, but many people are affected by some of them. If you hit one of these and UIforETW doesn’t work around it then UIforETW may print some helpful suggestions, or you can record a trace of xperf recording a trace using metatrace.bat.
- Currently the only supported form of trace compression is what is built in to Windows 8 and above. If you are recording or viewing traces on Windows 7 you are out of luck. It would be great if somebody implemented the context-menu options to compress traces into .zip files. The idea would be to compress each trace into a .zip file with the same name and then delete the .etl file. The .zip file will still show up in the trace list, and UIforETW would need to automatically extract the trace on demand
UIforETW is built with MFC and, in order to keep the binary small, it uses the DLL version of MFC. This means that it won’t run on any machines that don’t have these DLLs installed. A DLL installer is needed and I haven’t gotten around to writing it.This was solved by using github releases
- Unit tests, especially of the path and split functions, would be nice
- More options to control buffer sizes and tracing options would be nice
- A configurable option to upload compressed traces to a web server is proven to be useful
- Other issues listed on the github issues page
Contributions will need code-reviews and approval, and contributors must sign the Google Individual Contributor License Agreement – see the CONTRIBUTING file for details. Apologies for the red tape, and thanks to those who have contributed already (see the AUTHORS file and the history).
I’m very pleased with this project. It came together quickly and it solves the problems that I care about. I hope that it is useful for others.
As always, if you want to learn more about how to use ETW I recommend my series of training videos.