ETW Trace Compression (and xperf syntax refresher)

Despite extolling the virtues of wprui for recording ETW traces (here, and here) I’ve actually returned to using xperf.exe in batch files to do most of my trace recording. It gives me more precise control over what is recorded, and where, and with Windows 8+ it has another advantage: trace compression!

As usual the trace compression feature is lightly documented so I’m going to explain it here, and while I’m at it I’ll explain a bit more about recording traces with xperf.

Update: I eventually gave in and wrote a UI for recording ETW traces. You can read more about it at the UIforETW announcement. It includes a check-box for enabling trace compression, which is on by default where supported.

xperf syntax

More than three years ago I blogged about how to record ETW traces using xperf.exe.

Xperf Basics: Recording a Trace

I don’t want to repeat the material about choosing providers, stackwalk, or buffer-size settings, so I will just present a simplified version of the syntax shown there and explain how to improve it. It takes five lines to set kargs and uargs (kernel and user arguments) and then two lines to start and stop tracing:

set kargs=Latency+POWER+DISPATCHER+FILE_IO+FILE_IO_INIT
set kargs=%kargs% -stackwalk PROFILE+CSWITCH+READYTHREAD
set kargs=%kargs% -buffersize 1024 -minbuffers 300
set uargs=Microsoft-Windows-Win32k+Multi-MAIN
set uargs=%uargs%+Multi-FrameRate+Multi-Input+Multi-Worker

xperf -on %kargs% -start gamesession -on %uargs%
@rem Run your scenario here
xperf -stop gamesession -stop -d test.etl

This syntax is confusing because it has two “-on” commands in one line, and two “-stop” commands in another. These separate commands start/stop the kernel providers, and the user providers, and the final stop-step also merges (-d) the kernel and user traces. While the compactness of these one-liners for starting and stopping tracing is compelling, it actually prevents using trace compression. So, let’s start by expanding this into five commands (setting environment variables omitted):

xperf -on %kargs% -f kernel.etl
xperf -start gamesession -on %uargs% -f user.etl
@rem Run your scenario here
xperf -stop gamesession
xperf -stop
xperf -merge kernel.etl user.etl test.etl

As described in Step 8 of my original recording-a-trace article this starts kernel tracing, then starts a new session with the specified user providers. Then it stops the sessions in reverse order, and then merges the user and kernel traces together.

Because the merge step is now explicit instead of being implied by the -d option of the stop command we now have to specify what files to use for our kernel and user session. We specify these temporary files when we start recording, and use those names in the merge step.

We could use more sessions and more files but I see no advantage to going beyond two.

But don’t think of specifying the temporary files as being a burden – think of it as an opportunity. Now we can more explicitly place the files in the best location – on an SSD, in a directory that is excluded from virus scanning. This can be very important for making traces record efficiently.

And now, anticlimactically, we can finally compress the traces. We do this by specifying the -compress option to the -merge command, like this:

xperf –on %kargs% -f kernel.etl
xperf -start gamesession –on %uargs% -f user.etl
@rem Run your scenario here
xperf -stop gamesession
xperf -stop
xperf -merge kernel.etl user.etl test.etl -compress

On Windows 8 and above the compression will reduce the size of trace files significantly – typically by a factor of five to ten.

It is important to understand that the merge step must be done on the machine where the trace is recorded. The merge step (which is needed even if you only have one trace file, either explicitly or through “-d” on the stop command) grabs information about the loaded code (DLLs and EXEs) that is required for symbol loading. The merge step is mandatory.

You can even compress trace files after you have recorded them by going “xperf -merge oldtrace.etl newtrace.etl -compress”. I don’t know how to tell if the input trace has already been compressed other than trying and then seeing whether the result is smaller. Other than minor expansion I haven’t noticed any side-effects from recompressing compressed traces.

On Windows 7 and below the -compress option will have no effect. So, it is safe to always specify it.

In fact, the only time you shouldn’t specify -compress is if you are recording a trace on Windows 8+ and you want to analyze it on Windows 7-, because Windows 7- can’t load compressed traces. And here’s some typical results:

image

Missing wprui features

As long as I’m pointing out the advantages to not using wprui I might as well make a list of features that I think wprui needs in order to tempt me back:

  • Trace compression option
  • Control over where temporary traces and final traces are stored, to allow using fast drives and virus-scan excluded directories
  • Built-in optional ETW input logger for easier trace analysis
  • List of recorded traces for easy launching of the trace viewer on old traces
  • Better documentation of what the different options include (especially since you can’t run “wpr -profiledetails generalprofile” when wprui is running!)
  • Filenames should be chosen so that they sort correctly by date (year-month-day, not month-day-year)
  • The shortcut used to trigger saving a trace should be displayed
  • Keyboard mnemonics (underlined characters) on buttons
  • A way of specifying a maximum total buffer size, since the default of 5% is insane on machines with 64 GB of memory!

My own wprui GUI replacement had most of these features and also supported annotating traces with descriptions of what I discovered about them and had various other features that I found convenient to have.

Potential gotchas

Whether recording traces with xperf or wprui, there is a problem that you may hit that can cause the merge step to take many minutes. That is the subject of my next post.

Advertisements

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8.
This entry was posted in Performance, Programming, xperf and tagged , . Bookmark the permalink.

17 Responses to ETW Trace Compression (and xperf syntax refresher)

  1. Pingback: Profiling the profiler: working around a six minute xperf hang | Random ASCII

  2. Although not quite as effective, NTFS compression is worth mentioning.

    One trace file on hand, is 2.84GB uncompressed, and 868MB “Size on disk” compressed.

    Assuming the fragmentation isn’t an issue, you can get ~3x compression without xperf/application support.

    I recall (but can’t find) NTFS compression being absurdly effective on certain Hashcat restore files, like in the range of 1000x! If I remember correctly, Hashcat was just filling a file with whitespace, and NTFS compression saw this. The “Size on disk” was somewhere in the range of a few kilobytes.

    Ntdebugging has a great article on this: http://blogs.msdn.com/b/ntdebugging/archive/2008/05/20/understanding-ntfs-compression.aspx

    Which is also mirrored in my repository of all things NTFS: https://github.com/ariccio/altWinDirStat/tree/master/filesystem-docs-n-stuff

    • brucedawson says:

      Yep, NTFS compression works pretty well for storing the traces efficiently. It doesn’t help when you want to share a trace — uploading, downloading, copying to a memory stick, etc.

      The ETW trace compression is good enough that it makes me happy that I installed Windows 8 on my work machine. Well, that plus multi-mon remote desktop.

  3. IDontWantAName says:

    does wpr -stop also have a compress option?

  4. Kyle Sabo says:

    In your .wprp file, under the node, add the following lines:

    This may or may not work with the Win8 version of WPR, but it should definitely work with the Win8.1 release.

    • Kyle Sabo says:

      Well, I guess WordPress doesn’t like XML… let’s try an escaped version:

      <TraceMergeProperties>
      <TraceMergeProperty Id="BaseVerboseTraceMergeProperties" Name="BaseTraceMergeProperties" Base="">
      <FileCompression Value="true" />
      </TraceMergeProperty>
      </TraceMergeProperties>

    • Hmmmm…..they just disappeared, instead of (your comment) failing to post with an error?

      I wonder if the server intentionally drops the XML, or tries to parse it and logs a server side error? Smells like a bug in WordPress’s comment-text-validation routines.

  5. Kyle Sabo says:

    Also, for machines with lots of memory, try using the provided “sample” profile for servers. It limits the buffer size for traces to something reasonable. It most likely installed at C:\Program Files (x86)\Windows Kits\8.1\WindowsPerformanceToolkit\SampleGeneralProfileForLargeServers.wprp

    I think the tl;dr is most everything XPerf can do, WPR can do, iff you have a degree in WPR Profile XML. The UI doesn’t expose any of this for reasons unknown to me.

    You can also adjust the directory where temporary trace files are recorded to with
    -recordtempto passed to wpr.exe (again, no luck with WPRUI as far as I know).

    • brucedawson says:

      Thanks Kyle. That’s all very helpful information.

      It is, of course, not as convenient as a checkbox, slider, and file-dialog, but it’s nice to know that these configurations are possible. I can only assume that the UI designers wanted to keep it sleek and simple. Sigh…

      Aside: when I switched to Windows 8.1 and wanted to figure out how to do trace compression I did a few web searches. The first hits were my blog, which was weird because I knew I hadn’t written about it. But the information was there, in a comment from you. So, your comments are provably valuable.

  6. JohnW says:

    Bruce, you allude to ‘your own WPRUI replacement’ in the past tense. That sounded interesting; if you’re comfortable sharing what happened, we’d be interested.

    • brucedawson says:

      My WPRUI replacement is a tool called VTrace that I wrote while at Valve. It was used internally (as a more convenient way of recording ETW traces in the way that we wanted) and we also shared it with customers (as needed), as a Steam application. It included an “Upload Trace to Valve” button which would compress, encrypt, and upload traces for later analysis.

      It was a pretty ugly MFC app, but worked nicely. It just shelled out to xperf.exe in order to control tracing. I do plan to rewrite it someday. If anybody wants to help…

      Here’s an article about it:

      https://support.steampowered.com/kb_article.php?ref=3059-RTXB-2672

  7. Pingback: UIforETW – Windows Performance Made Easier | Random ASCII

  8. Pingback: ETW Central | Random ASCII

  9. “Lightly documented”, yes, and that’s just for the practical use from a high level of Microsoft’s tools. Get down to the programming and “lightly documented” is at best a euphemism. ETW is a wonderful thing to have, but its advanced features are very much Microsoft’s private plaything.

    Of course, many things that are not just lighly documented but undocumented about software are so because they truly do have very limited use to anyone but the software’s manufacturer. Yet from your article, I infer that a) you see merit in someone (you, even) writing your own ETW tools and b) that “xperf -merge” has proved useful for real-world practice. Great, so how might competent, well-informed programmers write the merge feature into their own tools?

    It shouldn’t be hard. After all, the merge feature has been around since Windows XP (but in Microsoft’s tracerpt tool rather than xperf). Would it surprise you to know that building it into your own tool would require that you know not just one but two undocumented things?

    First, you should know that what happens when you merge ETL files is that you run a tracing session whose inputs are events from other tracing sessions. You are just relogging them. I get the term relogging because your tracing session must be started with the EVENT_TRACE_RELOG_MODE. What does Microsoft document of this mode? A cryptic bit about “logs the event without including the EVENT_TRACE_HEADER” and a note that it is “reserved for internal use.”

    But ignore the warning and persevere with trying to use relog mode so that your own tool can do a merge. Then your second problem is how to get the raw data for the input events and trace them into your output session. You’ll retrieve the events through a callback function that presents each event to you as an EVENT_TRACE or EVENT_RECORD. What you need to know is that you can feed the EVENT_TRACE to the TraceEvent function, but you have to set WNODE_FLAG_NO_HEADER so that what gets traced into the output session is not the EVENT_TRACE_HEADER at the start of the EVENT_TRACE but the raw data for the event as described by MofData and MofLength. Good luck with finding anything about WNODE_FLAG_NO_HEADER except its numerical value – not just from Microsoft but from anyone. It’s declared in WMISTR.H – and has been all along – but that seems to be all that anyone knows.

    Still, at least it was defined and public, even if unexplained. Contrast with EVENT_TRACE_COMPRESSED_MODE. This is at the heart of your particular use for the merge, and your subject for this article. You want your ETL files compressed. This has been available to user-mode loggers since Windows 8. But what does Microsoft say about it? Nothing at all, today, at Microsoft’s page on Logging Mode Constants (https://docs.microsoft.com/en-us/windows/desktop/etw/logging-mode-constants). The flag wasn’t even defined, e.g., in EVNTRACE.H, for non-Microsoft programmers to know about until the SDK for the 1607 edition of Windows 10.

    Don’t get me wrong. ETW is wonderful and practical functionality to be thankful for, even just for what’s known at the surface. But we are just getting crumbs that Microsoft throws our way while keeping huge amounts of demonstrably useful functionality as things we get only from Microsoft’s tools and huge amounts of potentially useful functionality very much under wraps.

  10. Perhaps I should add some practical notes – well, at least some thoughts in the general direction of being practical…

    First, “to tell if the input trace has already been compressed” must be possible. I don’t know how you’d do it at such a high-level as xperf but the logger mode, and much else, is recorded in every trace file’s first event. If inspecting in a hex editor, you’d look past the WMI_BUFFER_HEADER and the SYSTEM_TRACE_HEADER to get the TRACE_LOGFILE_HEADER, and the rest is documented. You can lift it out programmatically, i.e., without knowing the file internals, and so any decent tool must have some query option that lifts this out for you.

    Second, I doubt you need let the possibility of analysis on Windows 7 stop you from using compressed mode on Windows 8. Just as you presently relog events from one or more uncompressed ETL files to a new one but with compression, I’d be astonished if it doesn’t work the other way. So if you want to analyse on Windows 7, perhaps feed your compressed ETL files into xperf -merge without the -compress switch.

    Third, I suspect you’ll have played with how good a compression ratio you can get, perhaps in contrast to compressing the file manually after collection. For this, it may help to know a little of how compressed mode works. As you know, a tracing session has fixed-size trace buffers that events accumulate in and then as each trace buffer fills (or ages, or whatever) it gets flushed to the ETL file. Compressed mode inserts an extra step. As successive trace buffer get flushed, they don’t go directly to the file but are instead compressed into a holding buffer. When this buffer accumulates a trace buffer’s worth of compressed data, i.e., compressed from potentially many uncompressed trace buffers, this much compressed data is flushed to the ETL file. What this tells you is that the compression algorithm is never fed more than one trace buffer at a time. If your tracing session is run with the default 4KB buffer size, then the compression algorithm likely will find useful savings from repetitions in each 4KB, but it isn’t really let loose. You can expect a better compression ratio with a bigger size of trace buffer, though that comes with its own trade-offs, which I think you’ve written about before.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.