UIforETW – Windows Performance Made Easier

Event Tracing for Windows (ETW) aka xperf is an amazing tool for investigating the performance of Windows machines – I’ve blogged about it many times and it’s helped me find some amazing issues. But recording ETW traces has always been tricky. Microsoft’s wprui.exe showed some potential, but is ultimately missing some features and often gets tripped up by ETW performance bugs.

The fallback plan is to use carefully crafted batch files to record ETW traces. These offer flexibility but… uggh. Batch files? Really? Are we savages?

I finally got annoyed enough by this situation to create UIforETW. This is a tool that records ETW traces, works around ETW performance bugs, allows configuration of trace recording options, works as a trace management UI, and more.

“Sounds great! How much is it?”

I’m pleased to announce that UIforETW has been released on github. Download it, build it, hack on it, fork it, or contribute.

If you just want to record ETW traces then go to this page for some relatively simple instructions.

This post has been updated several times to document the many great new features that have been added since the original release. For more information on UIforETW and ETW/xperf tracing see ETW Central.

What does it look like?

Using UIforETW is pretty straightforward. It has a few buttons to start/stop tracing and record traces. A few check boxes to control tracing options. A couple of combo boxes, a list of traces, and a text box for entering notes about traces. That’s pretty much it.

image

How do you use it?

The simplest way to get started is to download etwpackage.zip from the UIforETW releases page on github. When you run etwpackage\bin\UIforETW.exe it will install the necessary versions of the Windows Performance Toolkit (and unlike VS 2015 it knows which versions work on Windows 7). The process of installing UIforETW and recording a trace is documented in Xperf Basics: Recording a Trace (the ultimate easy way). It’s designed to be simple enough for non-experts to use, so send your customers to that page.

You can also get the UIforETW code from github and build it if you want. Pull requests are welcome – just read the CONTRIBUTING file in the main directory.

Once you’ve run UIforETW click Start Tracing. By default this will trace to circular buffers in memory, so you can leave this running 24×7, with minimal impact on your computer’s performance. Go off and use your computer and the next time something happens that you want to investigate just type Ctrl+Win+C to save the buffers to disk. Tracing to circular buffers will continue.

You can rename the just-saved trace (F2) to remind you why you recorded it, then double-click (or type Ctrl+E) it to launch Windows Performance Analyzer (WPA). See the trace list context menu for more options. I also like to use the trace information area to record notes about my analysis.

Note that UIforETW has not been tested on 32-bit Windows. It probably works, but who knows? Caveat luddite, and really, caveat anyone using UIforETW – no warranties and all that.

Where is UIforETW?

Trace management

I record a lot of ETW traces. The investigative reporting posts that I do almost always require one or more ETW traces, and there are many more that I record while optimizing Chrome, or whatever I happen to be doing for a living. If I’m not careful then my notes on what I found in a particular trace will end up scattered through dozens of e-mails, google docs, and hangouts discussions. Therefore I think that the trace management features of UIforETW are crucial. There are a few components to this:

  1. ETW trace compression is on by default in UIforETW and it makes traces 5-10x smaller. This lets me retain more ETW traces. This only works on Windows 8 and above, but Windows 10 is a free upgrade. This feature is effectively missing from wprui, which is bizarre.
  2. Trace information is an edit field that lets me record notes about what I have found in a trace. It’s a tiny detail, but I find it invaluable. It means that I can easily refresh my memory as to why I recorded a trace and what I found in it. Trace information is stored in a text file with the same base-name as the trace, and is automatically loaded and stored as you navigate through traces. Trace information can also be prepopulated with important details – for Chrome developers I prepopulate it with a list of Chrome process IDs by type. I recommend recording important conclusions in the trace information with call stacks, process and thread IDs, and time ranges. You’ll thank me.
  3. Trace naming is important in order to know when I recorded a trace and why. UIforETW defaults to naming traces based on the date, time, and username (and unlike wprui it names them so that they sort sensibly!) The username is handy when receiving traces from customers. UIforETW supports renaming traces (use the context menu or select a trace and press F2) and it renames the trace information text file at the same time.

Bonus features

  1. Finding the interesting part of an ETW trace is crucial to a successful investigation, and user input is often a key indicator – many bugs start with “it hangs when I click”, or “the frame rate drops when I type ‘w'”. Therefore UIforETW includes an input logger which inserts ETW events into the trace for every mouse-click, mouse movement, or key press. The default mode anonymizes alphanumeric key presses for privacy, but you can switch to full-Monty mode when necessary.
  2. WPA defaults to showing you a blank screen. If you try to display generic events it overwhelms you with irrelevant rubbish, and if you show the CPU Usage (Sampled) table it defaults to hiding the call stacks. UIforETW solves all of these problems by automatically setting up a WPA startup profile. Even the graphs that don’t show initially have new configurations – look for Randomascii in the name. If you don’t like the startup profile you can always change it, but I think it’s a better startup experience than the default. If the startup profile doesn’t copy over automatically, go to the Settings dialog and click the appropriate button. Because WPT 10 changed the startup format UIforETW ships with separate profiles for WPT 8.1 and WPT 10. These are updated occasionally so don’t forget to copy new ones occasionally.
  3. UIforETW also ships with source and binary for a tool that retrieves PDBs and PE files from symbol servers, the latest version of my ETW providers DLL to emit custom events, a sample ETWEventDemo project that shows how to emit these custom events from your software, and various batch files, Python scripts, and other ETW related extras.
  4. UIforETW automatically registers the Multi* provider set found in ETWProviders*.dll, configures the DisablePagingExecutive registry setting for 64-bit stack walking, and generally makes sure everything is going to work.
  5. UIforETW will automatically register the Chrome ETW providers if any chrome tracing categories are selected in the Settings dialog – this requires Chrome 46 or higher and enabling exporting of tracing events to ETW from chrome://flags.
  6. UIforETW can optionally monitor the working sets of specified processes. Just enter a semi-colon separated list of process names and details about the working sets (including PSS and private working set if Expensive working set monitoring is checked) will be recorded to the trace once a second and can be graphed in WPT 10 – on Windows 8 and above a built-in OS provider gives similar information for less CPU cost.
    image
  7. UIforETW will monitor battery status (percent full and drain/charge rate) when applicable, and if Intel Power Gadget is installed will also monitor process/GPU/package power draw, frequency, and temperature – handy for investigating thermal throttling or power efficiency. This can all be graphed in WPT 10.
    image
  8. UIforETW will monitor the actual CPU frequency (in addition to what Windows and Intel Power Gadget say). This allows detection of thermal throttling.
  9. UIforETW supports heap tracing – see this separate blog post for more details.

More details

At this point you should be able to use UIforETW. It’s not really complicated, and there are tooltips for all of the buttons. But, if you want to know all of the hidden features and raison d’être then by all means read on. Here is an explanation of most of the buttons in UIforETW:

  • Compress Trace: If you are on Windows 8 or higher then this check box will be available. If it is checked then the -compress flag will used during the trace merge step. This makes traces 5-10x smaller, but the compressed traces cannot be loaded on Windows 7 and below. If you record traces without compression then you can use the trace list context menu to compress traces later, either individual traces or all traces.
  • Context switch call stacks: When this is checked (which it normally should be) then two call stacks will be recorded on every context switch – one for the thread being switched in, and one for the thread that ‘readied’ the new thread. This normally has a minimal overhead but can significantly alter performance when profiling code that does hundreds of thousands of context switches per second. These context switch call stacks are vital when doing idle-thread-analysis – see the CPU Usage (Precise) documentation for more information, so only uncheck this if necessary.
  • CPU sampling call stacks: When this is checked (which it normally should be) then every sampling interrupt will record a call stack on every CPU. This has a fixed and relatively minimal overhead. These sampling call stacks are vital when investigating excess CPU consumption – see the CPU Usage (Sampled) documentation for more information.
  • Fast sampling: When this is checked the CPU sampling speed will be set to 8 KHz instead of the default value of 1 KHz. This can be important when you need to investigate a single slow frame (giving you ~133 samples per 60 Hz frame instead of ~17) but is not needed for investigating longer periods of time. The higher sampling rate makes traces larger or makes circular-buffer traces cover a shorter period of time.
  • GPU tracing: When this is checked GPU data will be recorded. GPU usage can be shown in WPA (under the Video category) and deeper analysis can be done in GPUView, which can be launched from UIforETW with Ctrl+G or through the trace-list context menu.
  • Input tracing: If this is set to Private or Full then all mouse move and mouse-click events will be recorded in the trace and will show up in the Generic Events graph. If set to Full then all key presses will be fully recorded, whereas if set to Private all letters will be recorded as ‘A’ and all numbers will be recorded as ‘0’.
  • Tracing types:
    • Circular buffer tracing: In this mode tracing goes to in-memory circular buffers and can run continuously. When Save Trace Buffers is clicked or Ctrl+Win+C is typed then the buffers are saved to disk. The last ~5-500 seconds (depending on activity levels) will be recorded and traces will never be excessively large. This mode is ideal for “retroactive profiling” where you decide to save the trace after inadvertently triggering a performance problem.
    • Tracing to file: In this mode tracing goes to files on disk. This allows tracing of arbitrarily long periods of time, but it can create arbitrarily huge and unwieldy traces. This mode is ideal when you want to capture all of a particular scenario, but you have to remember to start and stop tracing at the right times.
    • Heap tracing to file: In this mode all heap allocations for the specified process names are recorded to the trace. Heap tracing only applies to processes that are started after tracing begins. This mode tends to have an enormous data rate and is only suitable for recording short periods of time.
  • Show commands: UIforETW works by running xperf.exe to stop and start tracing. You can see the exact commands run by checking this box. If you encounter problems with UIforETW then this can be a helpful diagnostic tool. When this is checked the time to run various stages of saving a trace is also displayed.
  • imageSettings…: This brings up the settings dialog which allows additional configuration. Currently the options are:
    • Heap profiled processes: Specify what processes to monitor when doing heap profiling, separated by semi-colons
    • WS monitored processes: Semi-colon separated list of processes whose working set should be recorded once a second.
    • Expensive working set monitoring: This enables monitoring of PSS and private working set of the WS monitored processes. Note that this can consume 100-200 ms of CPU time, so use with care. This feature requires running the 64-bit version if running 64-bit Windows.
    • Copy startup profile: Copy the official UIforETW startup profiles (for WPT 8.1 and WPT 10), with known-good settings for showing input, window in focus, GPU usage, CPU Usage (Precise) graph and CPU Usage (Sampled) table data, and custom. The WPT 10 profile is configured for easy graphing of Generic Events.
    • Copy symbol DLLs: Copy known-good symsrv.dll and dbghelp.dll files to the xperf directory to try to work around xperf symbol loading pitfalls.
    • Chrome developer: When this is checked the default symbol path will include Chrome’s symbols, and some Chrome specific symbol processing and trace analysis will be done.
    • Auto view traces: When this is checked the trace viewer will be launched as soon as a trace is recorded.
    • Stacks on heap tracing: Check this to include stacks in heap tracing. This is normally desirable, but is expensive.
    • VirtualAlloc stacks always: Recording VirtualAlloc call stacks allows lightweight memory profiling that can often be left on all the time. Recommended.
    • Chrome tracing categories: These check boxes correspond to some of the commonly used tracing categories in Google’s Chrome web browser. This requires Chrome 46 or higher and –trace-export-events-to-etw on the command-line.

For a quick reminder of what a particular button does just hover the mouse over it and wait for a tool-tip to appear.

Here is an explanation of most of the items in the Traces context menu:

  • Open trace in 8.1/10 WPA: This launches the selected trace in that version of WPA
  • Open trace in GPUView: This launches the selected trace in GPUView
  • Delete trace: This sends the selected trace and the associated trace notes text file to the recycle bin.
  • Rename trace: This lets you rename the selected trace and the associated trace notes text file from within UIforETW. By default the date/time portion of the trace is excluded from renaming – use Shift+F2 to allow renaming all of the trace name.
  • ETW compress trace (8.0+): This will run the ETW “-merge -compress” step on the selected trace which will dramatically shrink traces that have not previously been compressed. This requires Windows 8 and above.
  • Zip compress trace: This will, some day, compress traces into .zip files that UIforETW will transparently view. But this is currently unimplemented.
  • Compress all traces: This will run the ETW compress trace step on all traces, saving the results if it makes the traces smaller.
  • Zip compress all traces: This will run the (unimplemented) Zip compress trace step on all traces, saving the results if it makes the traces smaller.
  • Browse folder: This launches Windows explorer to view the traces folder.
  • Strip chrome symbols: This launches a Python script that, if _NT_SYMBOL_PATH contains the Chrome symbol server, will download, strip, and process Chrome’s symbol files to avoid ~25 minute hangs when loading Chrome’s symbols in WPA.
  • Trace path to clipboard: This copies the full path to a trace to the clipboard. Type Ctrl+Shift+C to copy just the file part.

ETW bug avoidance

It is ironic that I seem to keep finding performance bugs in the ETW trace recording and analysis. As much as possible UIforETW works around these ironic bugs.

  1. In 2012 I found that symbol loading failed for some WPA users, and was excruciatingly slow (150x slower than normal!) for others. When symbol loading failed it was because either dbghelp.dll or symsrv.dll were not in the path. Profiling showed that when it was slow it was because a slow version of dbghelp.dll was in the path. The fix is to copy known-good versions of these DLLs to the xperf directory. The UIforETW settings dialog has a button to do this. I don’t currently have either of these problems so I can’t fully test the fix, so if you hit these problems then try Copy symbol DLLs and let me know if it helps.
  2. In 2014 I found that some symbols (notably the extremely large symbol files for Chrome) took ~25 minutes to load into WPA. Profiling of WPA showed that this time the problem was in msdia120.dll. The fix was to download the Chrome symbols, strip the private data, convert them to WPA’s .symcache files, and only then launch WPA. UIforETW does all this automatically, which saves about 24.5 minutes each time I profile a new version of Chrome. If you don’t have the Chrome symbol server enabled then the StripChromeSymbols.py script does nothing. This problem could occur on any large symbol files. The stripping step can be manually run from the traces-list context menu. This script now also fixes a bug in the 10.0.14393 (Windows 10 Anniversary Edition) that stops it from downloading symbols from Chrome’s symbol server.
  3. In 2015 I hit a problem when recording ETW traces. During the merge step I found that xperf.exe was spending all of its time doing uncached reads from Amcache.hve. This added a six minute delay to recording each trace! To avoid this UIforETW tries to rename this file before running the merge step, and then rename it back at the end.
  4. I’ve also recently hit a bug where when recording a trace from circular memory buffers the xperf.exe process does millions of one MB reads from a relatively small file. These reads are fully cached, but the huge volume of reads still causes multi-minute hangs. I have no workaround for this other than to trace to a file. Let me know if you hit this bug so that we can figure out what triggers it.
  5. WPT 10 fails to record dbgid information on Windows 7 so UIforETW automatically invokes the WPT 8.1 version of xperf on Windows 7.
  6. WPT 8.1 can’t change the CPU sampling frequency on Windows 10 so UIforETW automatically uses the WPT 10 version of xperf for this on Windows 10.

None of these bugs affect everybody, but many people are affected by some of them. If you hit one of these and UIforETW doesn’t work around it then UIforETW may print some helpful suggestions, or you can record a trace of xperf recording a trace using metatrace.bat.

Needed features

  1. Currently the only supported form of trace compression is what is built in to Windows 8 and above. If you are recording or viewing traces on Windows 7 you are out of luck. It would be great if somebody implemented the context-menu options to compress traces into .zip files. The idea would be to compress each trace into a .zip file with the same name and then delete the .etl file. The .zip file will still show up in the trace list, and UIforETW would need to automatically extract the trace on demand
  2. UIforETW is built with MFC and, in order to keep the binary small, it uses the DLL version of MFC. This means that it won’t run on any machines that don’t have these DLLs installed. A DLL installer is needed and I haven’t gotten around to writing it. This was solved by using github releases
  3. Unit tests, especially of the path and split functions, would be nice
  4. More options to control buffer sizes and tracing options would be nice
  5. A configurable option to upload compressed traces to a web server is proven to be useful
  6. Other issues listed on the github issues page

Contributions will need code-reviews and approval, and contributors must sign the Google Individual Contributor License Agreement – see the CONTRIBUTING file for details. Apologies for the red tape, and thanks to those who have contributed already (see the AUTHORS file and the history).

In closing

I’m very pleased with this project. It came together quickly and it solves the problems that I care about. I hope that it is useful for others.

As always, if you want to learn more about how to use ETW I recommend my series of training videos.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in xperf and tagged , , . Bookmark the permalink.

64 Responses to UIforETW – Windows Performance Made Easier

  1. David Poole says:

    Brilliant stuff, is this reworking of your vtrace work at Valve or just a start again and do it properly?

    Every time I use WPR I always check your blog for more details, wondering why my computer and Chrome tabs were taking ages to load recently, turned out to be 2 Mcafee AV .sys files and ReadyBoost on a usb drive, all thanks to you Bruce!

  2. Nico says:

    Nice work! Can UIforETW be minimized (either to tray or regular minimize)? I can’t seem to figure out how with the latest binary from GitHub.

    • brucedawson says:

      That should just require a change to the window flags to enable the minimize button and menu. The code the resizes the window might not handle that case — I’m not sure. Open an issue or propose a fix?

  3. Trass3r says:

    How do I see or change the providers to be recorded?

    • brucedawson says:

      To see the providers make sure that Show commands is checked. As for changing the UI for that has not been written. I would imagine a grid of check-boxes for kernel providers (with some buttons to set various defaults) and a way to enter arbitrary user-mode providers. It really depends on what flexibility is needed, because there’s flexibility also means potential complexity.

  4. Nico says:

    In the “always-on” mode, I see quite a bit of perf impact. For example, opening a regular command windows and performing a directory listing (dir) command, the output is visibly laggy.

    I’m suspecting this is due to many OS kernel calls (a quick WPA trace inspection indicates calls into ntoskrnl.exe through kernel32.dll). I’m assuming this comes with a context switch (?) and so I found that turning “Context switch call stacks” removed the lag.

    I do a lot of command-line work, so this pretty much prevents me from turning on context switch call stacks in the “always-on” circular buffer mode. Is there an inbetween setting that I could use to do some “light” idle-thread-analysis when necessary?

    • brucedawson says:

      Interesting. While some overhead is expected I have never seen it to be perceptible, and I have left tracing on for thousands of hours.

      Conveniently enough, the sampling profiler records *everything* means that it will show you where the overhead is coming from. Typically it will show time spent in call-stack collection in SetEvent or WaitForSingleObject.

      Also, since the overhead is usually just proportional to the number of context switches you can use the CPU Usage (Precise) data (sorted by count) to see where the context switches are. Maybe some code is doing something foolish.

      When “Context switch call stacks” is checked then “CSWITCH+READYTHREAD” call stacks are recorded. CSWITCH records a stack on the thread being switched in, and READYTHREAD records a stack on the thread that readied that thread. READYTHREAD is needed less frequently, so you could see if removing it helps.

      But, unless you have a slow computer there really shouldn’t be lag. Maybe share a pair of traces of “dir /s”, one with and one without context switch callbacks.

      • Nico says:

        Looks like this could be related to ConEmu using the client/server subsystem (csrss.exe) to communicate to cmd.exe? I get a lot of context switch counts there. I couldn’t reproduce the issue anymore in a vanilla cmd shell.

        In any case, I’ve put with/without context switch call stack traces here: https://dl.dropboxusercontent.com/u/477936/etwtraces_conemu_dirs_stacks.zip

        • brucedawson says:

          I took a look at the traces. Here are a few things I noticed:

          The trace without stacks did not capture the entire scenario. The CPU usage only covered about 19 s. You should shorten the scenario so that it fits — do a “dir /s” on a smaller directory. That way the full scenario can be compared in both traces.

          Ideally use a batch file and “xperf -m Starting dir /s” to mark the start of the scenario, and ditto for the end. It would also be good to know the number of files being listed, for better context, and how long it takes without tracing, with light tracing, and with full tracing.

          On the no_stacks trace I see about 10,000,000 context switches over the 19 s of activity recorded, so 250,000 per second (to cmd.exe, conhost.exe, and csrss.exe). That is enormously excessive. That is your problem. That is bad behavior. That is also why the trace covered such a short time period — a high data rate.

          The context switch rate is orders of magnitude lower in the trace with context switch call stacks. The sampling data didn’t clearly attribute it to the context switch call stacks, but if you try doing 250,000 context switches per second then you have to expect performance issues, and recording two call stacks on each context switch will dramatically harm perf.

          Equally worrisome is that your system never goes idle. On the no_stacks trace I see ~30,000 context switches per second to the three processes listed above). I just tested on my machine (a trace recorded while typing this and while WPA was loading symbols) and it shows an order of magnitude fewer context switches. A truly idle process should have zero context switches, or perhaps a few hundred for background activity.

          I didn’t do a detailed analysis but my off-the-cuff analysis is that ConEmu64 is doing crazy things and it should stop. If it can’t go completely idle when idle and handle “dir /s” more sanely then I would point this out to the developers.

  5. Pingback: Profiling the profiler: working around a six minute xperf hang | Random ASCII

  6. Pingback: Xperf Basics: Recording a Trace (the easy way) | Random ASCII

  7. Pingback: ETW Trace Compression (and xperf syntax refresher) | Random ASCII

  8. Pingback: ETW Heap Tracing–Every Allocation Recorded | Random ASCII

  9. I realize you are already heavily invested in WPR/WPA and your own UIforETW, but it may still be useful for you to check out the ‘PerfView’ tool (web search will find it), as well as my blog entries on it). Like UIForETW it tries to simplify the collection of ETW (but it is a viewer as well), but its real goal diverges a bit from yours because it cares a lot about profiling .NET code well. Still, it will collect ETW traces that WPA or your tools will consume (it event Zips them as you suggested), and knows how to turn on the heap allocation events as well. Since it deploys as a single XCOPY EXE, it is MORE convenient than WPR if you need to collect on a ‘foreign’ machine (even if you decide to view the data with WPA). It also has some nice features for monitoring the trace and stopping when something ‘interesting happens (see /StopOn* options). You can stop on Exceptions, requests taking too long, ETW durations being too long or simply ETW events. Also it has some rather unique analysis features (grouping and entry point grouping), which can make analysis (especially diffing), quicker and more fruitful. WPA certainly makes prettier graphs, but the grouping features of PerfView are why I keep using it first (it also has a more sane symbol loading policy (lazy unless local)). Finally This is all built upon a library (TraceEvent (.NET)) that is available via a Nuget package so you an build your own high performance analyzers with relative ease.

    I may be worth your while to take it for a whirl. There are a bunch of tutorial videos, and you can be experimenting with it in literally a minute from now.

  10. Pingback: Windows Timer Resolution: Megawatts Wasted | Random ASCII

  11. Roy says:

    Bruce,
    This is an excellent tool.
    Question: the scenario we are currently facing is a bit more complicated. We are trying to pinpoint the cause behind sudden CPU spikes that pretty much bring the machine to a halt. Something gets stuck in a loop and starts consuming all the CPU cycles, which causes black screens that can only be “fixed” by a reboot. We were thinking of running a trace to record the call stacks (we have a few processes in mind one of which is likely the culprit), but the problem is the trace data would get corrupted / remain unflushed if the server is forced to shut down abnormally during the reboot. So the only way to achieve this, per my knowledge, is by running the trace remotely. Is it something that is possible to do with WPR?

    • brucedawson says:

      That sounds nasty. A ‘normal’ CPU spike should never render the machine unusable, so either there is some horrible lock-contention with priority-inversion on system locks, or else some sort of weird driver bug.

      A couple of options come to mind:
      1) Customize UIforETW so that it can detect this odd behavior and can flush the trace to disk when it detects it, hopefully before you reboot the machine.
      2) Customize UIforETW so that it is always tracing to disk, with a new trace being started every 60 seconds or so. You could always get it to only retain the last 20 traces. That way when the problem happens you should just have to wait five minutes and then be sure that a few traces have been saved.

      But, if the machine is in a bad enough state then none of this will work because the trace won’t get saved.

      Hooking up a kernel debugger from another machine may be your best bet. Not fun, but effective.

      The only time I hit CPU consumption that rendered my machine unusable was in this case:

      You Got Your Web Browser in my Compiler!


      It wasn’t actually the CPU consumption that was the problem, but the priority inversion in the Windows locks, and the heavy hammering of those locks.

  12. Roy says:

    Thanks! Like the idea of creating a bunch of 60 second long traces. Will try that as the next attempt at fixing it.
    Thanks for the link as well. Very useful.

  13. brucedawson says:

    Another option for analyzing ETW traces is the Media eXperience Analyzer, or MXA. This tool apparently allows you to have multiple streams of data (perhaps video, audio, or power measurements) synchronized with the ETW data stream. I have little experience with this tool – and it is very different from WPA – but it could be important in some scenarios.

    https://www.microsoft.com/en-us/download/details.aspx?id=43105

  14. Pingback: Graph All the Things (Using WPT 10) | Random ASCII

  15. Pingback: New Xperf and new WPA in the new WPT | Random ASCII

  16. Pingback: Xperf Basics: Recording a Trace (the ultimate easy way) | Random ASCII

  17. Jon Grah says:

    If I had a Win2008R2 sp1 64 bit vps server that was used for derivatives trading, could I use this UIforETW to help see where potential performance issues are coming from? How would i enable compression? I have WinRar 4.x or 5.x installed

  18. Pingback: ETW Central | Random ASCII

  19. Pingback: Process Tree from an Xperf Trace | Random ASCII

  20. Pingback: Xperf for Excess CPU Consumption: WPA edition | Random ASCII

  21. Jack says:

    How can I use this to do Shutdown or Boot traces?

    • brucedawson says:

      ETW can record shutdown and boot traces, but UIforETW has no support for them.

      • Jack says:

        Guess I’ll have to stick with WPRUI then. I know you mention WPR has issues that UIforETW trys to fix. Is there anything you can still use in UIforETW if only doing shutdown or boot traces?

        • brucedawson says:

          I find WPRUI’s UI and workflow annoying, and it didn’t let me configure the things that I want. I’m not sure if any of those bugs apply particularly to shutdown or boot traces.

          Since I use UIforETW as my trace management UI I would copy any traces I recorded with WPRUI to the UIforETW trace directory, since otherwise I won’t even see them. But that’s just me.

          • Jack says:

            Do you know what the main differences are between WPR, Xperf, Logman, etc? They all seem to be “Controllers” for ETW. Is any one more powerful than the others? Is WPR just a wrapper for Xperf and Xbootmgr (for boot traces)?

            • brucedawson says:

              I don’t know much about logman, but I think you are correct that these (and UIforETW) are all just controllers for ETW. ETW is baked into the OS, and these tools turn on the features of interest and direct the data to a file for later examination.

              xperf is probably the most powerful, but it is possible that some capabilities are only available to other tools.

  22. Tomer Ben Arye says:

    Hi Bruce, Please try to help me here with an analysis I’ve made.
    http://stackoverflow.com/questions/39557879/etw-drivers-match-d3d-present?noredirect=1#comment66431291_39557879

    Thanks for your time

    • brucedawson says:

      I don’t know where the “count of stutters” data is coming from. I see that you’ve uploaded two ETW files but without knowing more about them – i.e.; how they were recorded, what was happening when you recorded, what driver versions you were testing, etc., it is difficult to do any analysis.

      One thing to look for would be CPU usage inside the NVIDIA drivers, in the process being profiled. There will usually be a couple of graphics driver threads, and some driver overhead in other threads.

  23. Saurabh says:

    It is really a nice tool. But would like to know if we have this tool for ARM64 ?

    • brucedawson says:

      UIforETW would need a few tweaks to get it to build and run on ARM64, but it shouldn’t be anything too difficult. The assembly language files need to not be compiled, and the code that calls them needs appropriate #ifdefs. And I have probably made some x86 specific assumptions about paths. But, it should be pretty straightforward.

      Pull requests welcome. See the CONTRIBUTING file for details.

  24. Jeffrey Glenn says:

    Every time I try to save a trace I get “xperf: error: C:\Users\user\Documents\etwtraces\2017-01-28_14-27-18_user.etl: The system cannot find the file specified. (0x2).
    Process exit code was 80070002 (2147942402)”

    • brucedawson says:

      What type of tracing are you using? Circular buffer tracing or tracing to file? If you turn on ‘show commands’ and include the full output then that will be helpful because right now I don’t know what command is failing.

      A setting to “Use other kernel logger” is available if you download and build UIforETW – it hasn’t yet made it into one of the releases. This option might help. But, I don’t know. ETW failures can be quite cryptic.

      • Jeffrey Glenn says:

        Circular – here’s the full output “xperf.exe -flush “NT Kernel Logger” -f “C:\Users\user\AppData\Local\Temp\kernel.etl” -flush UIforETWSession -f “C:\Users\user\AppData\Local\Temp\user.etl”
        The trace you have just captured “C:\Users\user\AppData\Local\Temp\kernel.etl” may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
        The trace you have just captured “C:\Users\user\AppData\Local\Temp\user.etl” may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
        Trace save took 2.2 s
        Merging trace…
        xperf.exe -merge “C:\Users\user\AppData\Local\Temp\kernel.etl” “C:\Users\user\AppData\Local\Temp\user.etl” “C:\Users\user\Documents\etwtraces\2017-01-28_21-26-12_user.etl” -compress
        xperf: error: C:\Users\user\Documents\etwtraces\2017-01-28_21-26-12_user.etl: The system cannot find the file specified. (0x2).
        Process exit code was 80070002 (2147942402)
        Trace merge took 0.2 s
        Finished recording trace.”

        • brucedawson says:

          The crucial information from the verbose output is that the error occurred during the merge process. The “xperf.exe -flush” command saves the contents of the kernel and user-mode buffers to disk, to user.etl and kernel.etl. Then “xperf.exe -merge” merges those two trace files into the final result. It also grabs a bunch of machine information as well – it is a very important step. But, it should also be a very reliable step – it’s just combining to ETW files that were previously saved.

          The error message suggests that the *output* file cannot be found. That’s odd. I don’t know what that means.

          I’d try running procmon while recording the trace in order to see what is happening. Also make sure that the ‘etwtraces’ directory exists.

          You should also upgrade to the latest version of UIforETW. The ‘kernel.etl’ and ‘user.etl’ file names are dangerously generic and can conflict with other ETW tracing loggers. The latest versions (since Nov 20, 2016) use UIForETWuser.etl and UIForETWkernel.etl.

          • Jeffrey Glenn says:

            etwtraces is there, but it is empty. I’m running the latest version as far as I know, listed here: https://github.com/google/UIforETW/releases. I have no experience with procmon but I’ll look into it

            • brucedawson says:

              I’m afraid I can’t tell what is going on. However, I have a few suggestions:

              One idea would be to switch from Circular buffer tracing to Tracing to file. Maybe that will work better.

              Another suggestion would be to run the command invocations that are displayed when “Show commands” is checked in an elevated command prompt. Make sure that UIforETW is not running and that xperf is in the path, and manually run the commands, one at a time. This should make it easier for you to diagnose the problem.

              In particular, you mentioned that this command was failing:

              xperf.exe -merge “C:\Users\user\AppData\Local\Temp\kernel.etl” “C:\Users\user\AppData\Local\Temp\user.etl” “C:\Users\user\Documents\etwtraces\2017-01-28_21-26-12_user.etl” -compress

              So, check if the kernel.etl and user.etl files exist. They are supposed to be created by the previous -flush command, and consumed by the -merge command.

              My best guess would be that anti-virus is messing with the files, but I really don’t know.

          • Jeffrey Glenn says:

            Here’s the output associated with attempting to save trace buffers: http://pastebin.com/raw/hW806PCL

  25. Mike Conley says:

    Does that procmon output help give you any idea of what Jeffrey Glenn is running into, brucedawson?

  26. Taliesin says:

    I’ve watching your Wintellect videos and see you have code for an ETWProvider demo. Where can I download the code? We’re beginners trying to implement ETW for our code and would like to create custom events. Since it’s been a couple of years since you’ve created the videos, is this still the best starting point or do you have updated instructions/demos for creating custom ETW providers? Thanks

  27. Andrew says:

    Looking into an issue that takes variable time (hours!) to reproduce (BSOD crash). I started trace with circular buffer and retrieved it via WinDbg !wmitrace (or mem dump). But when opening the trace, I got the error about time-inversion, unless I do “-tti” on cmdline for WPA. Now the question is when I open the trace with “-tti”, how is the timeline of the trace shown in WPA? If I want to look at the last moment of the trace (before BSOD), will that be the at the end of the trace timeline in WPA, or somewhere in the middle of the timeline? If the later, is there a sure way to identify when is the real last moment?

    • Andrew says:

      Another question is I’m getting “” for all the symbols load, so no useful symbols at all, from ETL retrieved from !wmitrace.logsave. Is there anyway to fix this?

      • brucedawson says:

        When I’ve used -tti I’ve usually got good results, but you are in uncharted territory. A trace that is retrieved from a memory dump sounds like it could be arbitrarily truncated or broken – I don’t have any help to offer, sorry.

  28. shang says:

    how can i monitor a specific processes for a long time?

    • brucedawson says:

      You can’t. ETW always monitors the entire system. You can ratchet down what it records (see https://randomascii.wordpress.com/2016/03/08/power-wastage-on-an-idle-laptop/) to allow for longer recordings, but UIforETW offers no way to focus the recording on just a few processes. ETW’s power is that it monitors the entire system, including the important interactions between processes.

      If you use the ETW APIs then you could filter out events that you aren’t interested in but this would be extremely complicated – ETW’s APIs are famously complicated.

  29. Paul says:

    If I don’t have symbols, I see “Missing ImageId event” or “?”. How can I see the raw address of the functions in the stack trace? Nothing I tried worked. Thanks.

    • brucedawson says:

      Which table are you looking at? For the CPU Usage (Sampled) you can add the address column to see where specific samples hit, but it looks like WPA doesn’t show you addresses for the call stack entries. I think it used to, but no more.

      You could try exporting the data with xperf – see “xperf -help processing” for help on this. It’s a bit cryptic but it should be possible. You could just export all of the data but expect to send some time writing code to parse it.

      Even better would be to write a C# trace parser. I wrote about this last year: https://randomascii.wordpress.com/2020/01/05/bulk-etw-trace-analysis-in-c/
      That should work and will be much more efficient than exporting the results to a .csv file.

      But really you should make sure you have symbols. There’s really no reason to suffer without them. Chrome and Microsoft publish their symbols, as do some other projects. Python makes it difficult, but not impossible. Make sure you archive your own symbols in a symbol server.

  30. Glen Stevens says:

    Hi, I am trying to capture a trace to disk for a long duration workload, approximately 19 minutes long. However, the tool states ‘Tracing to disk ran excessively long, auto-saving and stopping’.

    Is there a way to bypass this notice and allowing for a longer trace collection? Apologies if this is listed somewhere, I could not find it.

    Thanks!

    • brucedawson says:

      Good question. There is no way built-in to the tool to avoid the five-minute limit. You could build UIforETW from source and change (or remove) the limit or else use one of the batch files that come with UIforETW, such as:
      https://github.com/google/UIforETW/blob/main/bin/etwrecord.bat
      Also, UIforETW will show you the commands executed if you check the Show commands button which can help with creating your own batch files.

      • Glen Stevens says:

        Hi, another question. 🙂 I was given a .wprp file to add to WPRUI and collect a trace. This is easy enough with WPRUI but as I mentioned previously I need to collect a trace for a long duration. How can I add the .wprp file to run with UIforETW since this tool will compress the trace?

        • brucedawson says:

          UIforETW doesn’t support .wprp files at all. I’ve seen .wprp files that enable trace compression in WPRUI but I don’t remember where.

          Here’s a trick though. Record the trace with WPRUI. Then copy it to UIforETW’s directory, select it, right-click, select the Trace compression menu, then click ETW compress trace. Boom. Done. If you check Show commands then you can even see the xperf commands I use to compress the trace.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.