ETW traces record a wealth of information about how a Windows system is behaving. When analyzing a new and unknown problem there is no replacement for loading the trace into WPA and following the clues to a solution. The thrill of the hunt and the creative challenge of finding a visualization that will reveal the root cause never gets old (too nerdy? Sorry – I do enjoy this).
But sometimes you want to extract some commonly found piece of information from multiple traces, and doing this manually is tedious and error prone.
Initially the only solution was to run xperf actions (xperf -help processing) and parse the output. This works well enough, but only if there is an action which maps well to the information that you want.
The next option was wpaexporter, which I wrote about here. With wpaexporter you can set up a summary table the way that you want it and export the data to a .csv file. This is infinitely flexible, but setting up the WPA profiles is a bit fussy, setting up .json files (needed if you want to efficiently export multiple profiles) is even worse, and .csv processing is something I’d rather avoid.
Finally, we have something better.
In May of 2019 Microsoft announced Trace Processor Preview 0.1.0 (latest is currently TraceProcessor 0.3.0). This is a NuGet package (Microsoft.Windows.EventTracing.Processing.All) that makes it easy to write robust and efficient ETW trace processing code in C#. I’ve used this for experimental projects and real work and found it to be an easy way to do programmatic analysis of ETW traces.
UIforETW ships with some trace analysis scripts that use the old-school xperf actions so one of my first experiments was to rewrite one of these using the Trace Processor Preview. I chose to rewrite IdentifyChromeProcesses.py in C#.
I had to relearn C# (let me know if I’m doing anything wrong) and learn the trace processing API (quite easy due to Intellisense, despite minimal documentation) but overall the project went smoothly.
The basic pattern for analyzing a trace is that you start by calling TraceProcessor.Create:
using (ITraceProcessor trace = TraceProcessor.Create(traceName))
Then you call trace.UseXXX() once for each type of data you want to use, which returns an IPendingResult<XXX> type. There are dozens of types of data available, all listed on the announcement page and available through Intellisense. Then you call trace.Process() to scan the trace and read in all of the requested data, and then you grab the results from all the IPendingResult objects – like this:
var pendingSchedulingData = trace.UseCpuSchedulingData();
var pendingSamplingData = trace.UseCpuSamplingData();
// Convert from pending to actual data.
var schedulingData = pendingSchedulingData.Result;
var samplingData = pendingSamplingData.Result;
Then you iterate through the data in whatever way you want, something like this:
var execTimes = new Dictionary<IProcess, CPUUsageDetails>();
// Scan through the context-switch data to build up how much time
// was spent by each process.
foreach (var slice in schedulingData.CpuTimeSlices)
// Yep, this happens.
if (slice.Process == null)
execTimes.TryGetValue(slice.Process, out CPUUsageDetails last);
last.ns += slice.Duration.Nanoseconds;
last.contextSwitches += 1;
execTimes[slice.Process] = last;
Many ETW events can contain call stacks. This includes the context switch events in the CPU Usage (Precise) data, the samples in the CPU Usage (Sampled) data, the allocations in heap snapshots, and potentially many more events. These stacks can be crucial for understanding the data, and Trace Processor supports handling these call stacks. Following the pattern of other data types supported by Trace Processor you invoke trace.UseSymbols() to indicate that you want to load the symbols. After calling trace.Process() and retrieving the pending symbols results you then need to load the symbols with LoadSymbolsAsync followed immediately by waiting for the results (so it’s not really async), like this:
var pendingSnapshotData = trace.UseHeapSnapshots();
var pendingSymbols = trace.UseSymbols();
var snapshotData = pendingSnapshotData.Result;
var symbols = pendingSymbols.Result;
After you have done this then you can traverse the stacks of events and call GetAnalyzerString() or similar on the stack or on individual entries. I’ve published sample code for comparing heap snapshot traces. A pair of heap snapshots have also been made available in order to test the code.
When doing the simplest type of report – just listing a Chrome process tree – the Python script was consistently about 20% faster. However for the more complex reports – listing the CPU usage for each process – the C# tool was typically three to eight times faster! The simple reports are very quick to generate either way, so the minor Trace Processor performance regression doesn’t much matter, but the significant Trace Processor speedup on the more expensive reports is quite welcome.
In the CPU-usage case the Python script is forced to process the trace twice – once for command-line information and once for CPU usage data – whereas the C# code only has to process it once. However that isn’t enough to explain all of the speedup. It may be that the time to generate and then parse large quantities of text is the difference, but it ultimately doesn’t really matter.
I tested the new tool on all 600 ETW traces that I have squirreled away to make sure I was handling all of the edge cases. The process-tree results that I get are identical (except in some truly weird traces that somehow ended up with impossible process hierarchies) but in the CPU Usage case there were small discrepancies. After a bit of discussion with the team it was determined that this was due to threads that were running at the very beginning and end of the trace. This difference was probably avoidable but it was so tiny I didn’t bother.
The ultimate benefit to using the trace processing API is that you can explore the data anyway that you want, without having to parse multiple .csv files. You can even process traces with lost events or time inversions (but be wary of weird results) which wpaexporter won’t let you do – see the sample code link below for examples.
Things to watch for
Be aware that the first time you use Trace Processor there will be a message printed out. In my case this message was ending up in my output, which I don’t like, but the Trace Processor team has added a property to TraceProcessorSettings called SuppressFirstTimeSetupMessage that can be used to stop it from printing.
Some of my traces are huge and analyzing these traces – whether with WPA or with the trace processing system – can consume a lot of memory, especially during loading. Making sure you have lots of memory is crucial – my laptop has 32 GB – but you also need to make sure that your C# code is running as a 64-bit process. Unfortunately the default with Visual Studio’s .Net Framework is to run C# code as a 32-bit process, with a maximum address space of 4 GiB. Therefore when you create a new trace processing project you should go to the project properties, Build tab, and uncheck Prefer 32-bit.
The CPU usage summaries of IdentifyChromeProcesses.cs are enough faster that I’m tempted to switch to the C# tool just to get that benefit. Switching to C# would also avoid having a dependency on Python being installed, which is far from guaranteed on Windows. And, the trace processing API feels like a more powerful way to do bulk trace processing.
Now when I want to do some repetitive analysis of ETW traces I reach for the Trace Processor package and I can usually create a custom analysis tool extremely quickly.
Some day I want to add a feature to UIforETW that will run a tool or script on multiple traces, to look for newly discovered patterns in old traces, and the performance and flexibility of the trace processing API make that more tempting.
The best way to learn the Trace Processor API is to look at a few samples for the basic patterns, look at the available types of data, and then start writing code.