ETW Central

Posted on September 24, 2015 by brucedawson

Over the last few years I’ve written over forty blog posts that discuss ETW/xperf profiling. I’ve done this because it’s one of the best profilers I’ve ever used, and it’s been woefully undersold and under documented by Microsoft. My goal has been to let people know about this tool, make it easier for developers and users to record ETW traces, and to make it as easy as possible for developers to analyze ETW traces.

Some of those posts have aged poorly, and the rest are hidden amongst the 210+ posts (really? wow) I’ve written. The purpose of this page is to be a central hub that links to the ETW/xperf posts that are still relevant. Also, I’ve updated many of the older posts to reflect changes in the ETW toolset (technically known as the Windows Performance Toolkit). For convenience this page is accessible as https://tinyurl.com/etwcentral.

The most important post describes how to record ETW traces. This is important because ETW traces can be recorded on one machine, and analyzed on another. This means that your customers or relatives can record ETW traces and then you can analyze them. Remote diagnosis of issues is a wonderful superpower. The article describing how to record an xperf/ETW trace can be found here – share it with those who have performance problems:

Xperf Basics: Recording a Trace (the ultimate easy way)

Once you’ve got a trace you need to analyze it (or share it with someone who can). The most comprehensive resource I’ve created for learning how to analyze ETW traces is the series of three videos I created in 2014. More information and links to the videos can be found here:

ETW Training Videos Available Now

For more details, or if you don’t want to watch videos, there are many tutorial blog posts available, listed here in rough order of importance:

UIforETW – Windows Performance Made Easier
WPA–Xperf Trace Analysis Reimagined – basics of WPA trace analysis
Xperf for Excess CPU Consumption: WPA edition – basics of investigating CPU consumption
Xperf Wait Analysis–Finding Idle Time – wait analysis (idle CPU investigations) with UIforETW and WPA 10
New Xperf and new WPA in the new WPT – new features in WPT 10
Graph All the Things (Using WPT 10) – graphing custom ETW data
New Version of Xperf–Upgrade Now – new features in WPT 8.1
ETW Heap Tracing–Every Allocation Recorded – tracing memory allocations
ETW Heap Snapshots – tracing all outstanding allocations (much less data than ETW Heap Tracing)
ETW Flame Graphs Made Easy – using WPA’s flame graphs to visualize CPU usage, and more
Exporting Arbitrary Data from xperf ETL files
CPU Performance Counters on Windows – recording per-process CPU performance counters using ETW
Process Tree from an Xperf Trace – an example of exporting ETL data
ETW Trace Compression (and xperf syntax refresher) – low-level details on how recording traces works, handy if you want to modify UIforETW
Bulk ETW Trace Analysis in C#

Doing precise analysis of an ETW trace requires knowing exactly what the many columns in the tables mean. Some of those table columns are documented in these blog posts, updated in 2016 for the latest version of WPA:

ETW investigation write-ups:

Some of my favorite blog posts are those that tell a tale of noticing some software that I use being slow, recording a trace, and figuring out the problem. In most cases this let me come up with a workaround, and in many cases the (ridiculously!) detailed bug reports or the attention the posts drew led to the problems being fixed.

Many of the articles linked to below have not yet been updated. It’s an ongoing process but I think it’s worth publishing this now without waiting for all of the updates to finish.

I’ve categorized the investigations by what product was investigated. And, as a reminder, with the exception of the fractal software investigation all of these are looking at problems in software that I don’t work on.

Visual Studio and VC++ code-gen:

Windows Performance Toolkit (profiling the profiler!):

Windows:

Delays in login to Windows due to thread rank – what outranks thread priority
O(n^2), again, now in WMI
Making VirtualAlloc Arbitrarily Slower – excessive CPU usage
Hidden Costs of Memory Allocation – excessive CPU usage
Taskbar Latency and Kernel Calls – excessive CPU usage (Windows bug)
- Youtube video of this analysis is available here
O(n^2) in CreateProcess – CFG causing problems again, excessive CPU usage (Windows bug, now fixed)
A Not-Called Function Can Cause a 5X Slowdown – lock contention during process shutdown, caused by loading the wrong DLL (Windows quirk, triggered by llvm tests)
Making Windows Slower Part 2: Process Creation – O(n^2) in CreateProcess due to Application Verifier
24-core CPU and I can’t type an email (part one) – lock contention during process shutdown (CFG, service workers, v8, WMI – it’s got everything, but it’s mostly Windows bugs)
Making Windows Slower Part 1: File Access – slowing down Chrome builds with large notification buffers (Windows bug, now fixed)
24-core CPU and I can’t move my mouse – lock contention during process shutdown (Windows bug, now fixed)
63 Cores Blocked by Seven Instructions – NTFS lock contention bringing a build machine to its knees (Windows bug)
Arranging Invisible Icons in Quadratic Time – this demonstrates how to use UI Delays to find where a hang is happening, CPU Usage (Precise) to prove that it’s a busy hand, and then CPU Usage (Sampled) to see where time is being spent. And then some surprisingly simple work to prove that it’s a quadratic algorithm.
Why Modern Software is Slow–Windows Voice Recorder – this demonstrates how I investigated a twenty-second delay in the Windows 10 Voice Recorder
Slower Memory Zeroing Through Parallelism – poorly implemented parallelism can actually make things slower, while wasting colossal amounts of CPU time. Case in point: the memory-zeroing in Windows uses about 24x as much CPU time as it should (on massively parallel machines), spending all the extra time fighting over locks
No Start Menu for You – my Windows 10 start menu refused to come up and I eventually tracked this down to a crash in RuntimeBroker.exe plus poor internet connectivity plus a design that favored crash reporting over user responsiveness, all visible in an ETW trace
32 MiB Working Sets on a 64 GiB machine – this investigation shows the high cost of PROCESS_MODE_BACKGROUND_BEGIN – it’s undocumented behavior of having a hard-cap on working-set size of 32 MiB can make code run 250x to 800x slower. Ouch!

Windows Live Photo Gallery:

Windows Live Photo Gallery—Poor Performance Peculiarities (xperfview used for analysis, but concepts are still solid)
Fixing another Photo Gallery performance bug

Western Digital driver (initially thought to be Windows):

Miscellaneous:

Other people’s ETW investigation write-ups:

Obsolete posts

Some of the blog posts are now completely obsolete and are listed here only for historical interest:

Xperf Analysis Basics – obsolete or covered elsewhere
Xperf Basics: Recording a Trace, replaced by Xperf Basics: Recording a Trace (the easy way) which was then replaced by Xperf Basics: Recording a Trace (the ultimate easy way) – third time’s the charm
Xperf for Excess CPU Consumption – replaced by Xperf for Excess CPU Consumption: WPA edition
The New Xperf is Here! – replaced by Xperf Basics: Recording a Trace (the ultimate easy way)
The New WPA Xperf Trace Viewer–New Bugs and Old – the reported bugs are generally fixed
Summarizing Xperf CPU Usage with Flame Graphs – WPA 10.0.14393 has built-in flame graphs which are covered here

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in uiforetw, xperf and tagged ETW, performance, summary, UIforETW, xperf. Bookmark the permalink.

41 Responses to ETW Central

Pingback: Summarizing Xperf CPU Usage with Flame Graphs | Random ASCII
Pingback: UIforETW – Windows Performance Made Easier | Random ASCII
Pingback: Xperf Analysis Basics | Random ASCII
Pingback: Exporting Arbitrary Data from xperf ETL files | Random ASCII
Pingback: The New WPA Xperf Trace Viewer–New Bugs and Old | Random ASCII
Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII
Pingback: Process Tree from an Xperf Trace | Random ASCII
Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII
Pingback: Xperf and Visual Studio: The Case of the Breakpoint Hangs | Random ASCII
Pingback: Thread Naming in Windows: Time for Something Better | Random ASCII
Severin Pappadeux says:

February 19, 2016 at 5:33 pm

Did you use logman? Is it better/worse than xperf? https://technet.microsoft.com/en-gb/library/cc753820.aspx?f=255&MSPPError=-2147217396

Reply
- brucedawson says:
  
  February 19, 2016 at 5:52 pm
  
  I don’t think logman is necessarily better or worse than xperf but I don’t think it can do the things that I do with xperf, so it is of little interest to me. Disclaimer: I’m not a logman expert, but I think this is true.
  
  TL;DR – UIforETW/xperf/WPA can work miracles and I explain how to do this in this blog. logman? Dunno.
  
  Reply
  - Severin Pappadeux says:
    
    February 19, 2016 at 8:02 pm
    
    Not that it can do something different, but on my W10 installation it came with the system, C:\Windows\System32\logman.exe. Basically, if you install something on user system, looks like it is much better to run logman as an events watchman for your app
    
    Reply
    - brucedawson says:
      
      February 19, 2016 at 8:57 pm
      
      Unfortunately I don’t *think* that logman has the same capabilities. Pity.
      
      Reply
Marc Chiarini says:

March 23, 2016 at 11:06 am

Hi Bruce, awesome work! Is there a place for RFEs? In particular, it would be great to be able to change the “trace to memory” window as well as specify the length of time before a trace is automatically saved to disk and/or stopped. Thanks!

Reply
- brucedawson says:
  
  March 23, 2016 at 11:13 am
  
  Feel free to create issue suggestions at https://github.com/google/UIforETW/issues
  
  Configuring the memory used and the auto-trace elapsed time has been suggested, but such changes must be made carefully to avoid giving users unnecessary foot-guns. “It just works” must continue to be as true as possible. That is, configuration is reasonable as long as it adds value without undue complexity or user risk.
  
  Reply
  - Marc Chiarini says:
    
    March 23, 2016 at 11:40 am
    
    Thanks. The complexity issue I understand but your point of view on risk seems overly cautious. As someone who has programmed in C/C++ for over 25 years, I have no lower extremities left to shoot! In my experience, a high amount of configurability is suitable for all but the most skittish, particularly when the *defaults* just work. In debugging performance and other issues, exploration is part of learning the ropes and I wouldn’t limit configurability unnecessarily in order to keep safe a small percentage of users that would try to do silly things on production systems. Yes I understand that having more configuration options multiplies the bug surface area and maintenance costs but I think that most users willing to experiment with advanced functionality would also be willing to help debug and improve those features, directly or indirectly. EOP (End of Pitch).
    
    Reply
    - brucedawson says:
      
      March 23, 2016 at 8:31 pm
      
      I don’t disagree. Having it work out-of-the-box is most important. Having the configuration options as uncluttered as possible is also important, since every option that is added makes the remaining options harder to find. But I don’t think we’re really disagreeing in any meaningful way.
      
      > most users willing to experiment with advanced functionality would
      > also be willing to help debug and improve those features
      
      Well, the great thing about UIforETW being open source is that they can just grab a copy, change the constants, and party-on-Garth!
      
      Reply
Tomer says:

September 7, 2017 at 4:39 am

https://github.com/google/UIforETW/releases/download/v1.48/etwpackage1.48.zip
shows a malware in your file.
https://www.virustotal.com/en/file/896554acbe3983cd69e100f1e5867c8cfcc30510567e297c843c96ed6e11d986/analysis/

Reply
- brucedawson says:
  
  September 7, 2017 at 8:42 am
  
  Sigh… I’m sure that file doesn’t contain malware, but it isn’t signed which can make AV software suspicious. I’ll add it to my list of files to sign which should resolve this for the next release.
  
  Reply
  - brucedawson says:
    
    December 20, 2017 at 7:41 pm
    
    Everything should be signed now, which should avoid malware warnings.
    
    Reply
Rick C. Hodgin says:

December 20, 2017 at 1:05 pm

I think another slowdown is on very large source files. If the function(s) being debugged can be migrated to smaller source files, its debugs faster. This was observed on a single source file that was 20K+ lines long.

Reply
Dmitry Akhmedzhanov says:

May 22, 2018 at 11:17 am

Hi Bruce, adore your blog!
Do you know, is it possible to collect PMInterrupt stacks somehow?
I’ve found the event in https://github.com/google/UIforETW/blob/master/UIforETW/StackWalkFlags.txt,
but didn’t manage to collect stacks for it:(

Reply
- brucedawson says:
  
  May 22, 2018 at 2:50 pm
  
  I think that ETW handles PMC counters by recording their values during context switches. If so then recording PMC interrupt stacks is meaningless. It’s an idea that’s only applicable if the interrupts fire after a certain number of events. But, I may be misunderstanding. I’d recommend sharing what you’ve tried so others can learn from what did or did not work.
  
  Reply
Max says:

July 28, 2020 at 2:09 am

Hello,

Thanks for all your training material for WPA/WPR. I’m trying to learn more about Windows and Services in particular. I’m wondering how you would go about using WPA to find out what triggered a particular service to start? I know how to isolate a service into it’s own svchost.exe and I can zoom into where the service starts and I see that services.exe is the process that starts it, but I don’t know how to go about finding out who/what called upon the services.exe to actually start it up.

Thanks again for all your material!

Reply
Murat.a says:

December 1, 2020 at 2:43 pm

Hi Bruce,
Please, Is it suitable to launch multiple apps on Windows then record them with WPR, also after a while, launch other apps and so on (every single time e.g. every 8min), then analyze them with WPA. will I get low-performance issues like low JIT compilation, memory consumption, etc? which apps are suitable for that?. Thank you.

Reply
- brucedawson says:
  
  December 1, 2020 at 8:36 pm
  
  I’m not sure exactly what you are trying to do but I have found WPR/ETW to be suitable for investigating lots of performance issues. I prefer UIforETW for recording the traces, but I think WPR does have a few advantages.
  
  You say “every 8 min” and I would only caution you against recording long traces. ETW traces can be enormous and this then makes them cumbersome. I try to make most of my ETW traces be 20-60 seconds long. There are few issues that become easier to investigate through having a longer time period recorded.
  
  Reply
  - Murat.a says:
    
    December 2, 2020 at 6:28 am
    
    I’m trying to identify Memory leak, looking for an increasing trend over a long period of time, Which means the application committing memory (referred to a private bytes) and releasing only small portions, and over a period of time it continue to accumulate Memory.
    I’m asking to detect this pattern of Memory consumption Practically.
    
    Thanks
    
    Reply
    - brucedawson says:
      
      December 2, 2020 at 9:42 am
      
      Slow memory leaks can be difficult. The default trace settings record memory allocated with VirtualAlloc which can be enough. Go to Settings and enable VirtualAlloc stacks always to get more information (and then view the VirtualAlloc data, with stacks, focusing on those with Type==AIFO)
      
      Or, you can use heap snapshots. I’ve used that to record every outstanding allocation from Chrome’s browser process over a two week period: https://randomascii.wordpress.com/2019/10/27/heap-snapshots-tracing-all-heap-allocations/
      Heap snapshots are great but only work if you are leaking heap memory. vmmap can be used to determine that.
      
      Reply
      - Murat.a says:
        
        December 6, 2020 at 1:12 pm
        
        Is it suitable to do it just in WPR/WPA toolkit? or I have to deal with your mentioned tools in Github? because I don’t have much experience to deal with what you explain in Github, maybe just through the tutorial!
        Also, please, is there a way to detect that Memory leak while launching multiple apps, not specific ones? Thanks!
        
        Reply
        
        brucedawson says:
        
        December 6, 2020 at 4:56 pm
        
        I don’t know what allocation data WPR records. UIforETW records VirtualAlloc data across all processes with optional call stacks which allows for investigating memory leaks in multiple processes. Some of the concepts in the Heap Tracing and Heap Snapshots articles are relevant, in particularly the Type column and the AIFO type. However you will have to use the Virtual Alloc Commit LifeTimes table, found under Graph Explorer-> Memory.
        
        Reply
Hamlet says:

December 5, 2020 at 4:41 am

Hello Bruce,
How can I design and Implement an ETW-based Windows system Performance testing tool and give it a name, that system can implement a process collecting, analyzing, and visualizing various events that occur in the Windows system kernel by means of event tracking, and is used to detect the performance that occurs during system operations, and to make it quickly locate the bottleneck and improve the efficiency of the system operations.? I hope if there is a tutorial for this work. Thanks.

Reply
- brucedawson says:
  
  December 5, 2020 at 10:24 am
  
  Collecting? wpr or UIforETW
  Visualizing? WPA
  Quickly locate the bottleneck and improve the efficiency of system operations? Read my blog posts and other sources, practice, learn, refine, and repeat.
  The information is there. If you are hoping for an automatic system to detect and fix performance bugs then you are out of luck, but manual detection and fixing is possible and I do it all the time.
  
  Reply
  - Hamlet says:
    
    December 5, 2020 at 9:01 pm
    
    Thank you for the reply,
    sorry, I mean to design that ETW diagnostic tool in .NET
    
    Reply
    - brucedawson says:
      
      December 6, 2020 at 11:27 am
      
      For recording and visualizing traces you shouldn’t be worrying about the language or runtime used to create the tool. For analysis it turns out that .NET is the preferred choice. I blogged about how to do trace analysis in C# earlier this year:
      
      Bulk ETW Trace Analysis in C#
      
      However, programmatic trace analysis is not a panacea. It can look for known patterns, and summarize known types of data, but the most interesting performance bugs are novel and require human ingenuity to understand.
      
      I recommend reading my many blog posts on these topics and then asking more specific questions.
      
      Reply
Pingback: Windows 10 20H2 boot trace – dropped events | Jeff Stokes
Peter says:

October 3, 2022 at 7:19 am

hi Bruce

when i set the tracing to file the tool stops tracing after 5 minute. i need the whole story from the start, the circular buffer tracing is not enough. i get the message: “Tracing to disk ran excessively long. Auto-saving and stopping.”

The size of the file is between 300-500MB. How can I configure the max file size, or the time period of the trace?

thanks,
np

Reply
- brucedawson says:
  
  October 4, 2022 at 12:42 pm
  
  A 300-500 MB trace is already quite long and will be unwieldy to analyze. If you want a longer trace you can either use the batch files that ship with UIforETW to control tracing (modify them to suit your needs, etwrecord.bat is probably a good starting point) or you can download the UIforETW source and modify or remove the limit. That said, traces longer than five minutes are often more trouble than they are worth.
  Here’s the relevant bit of code (found by searching for the message in this file):
  https://github.com/google/UIforETW/blob/main/UIforETW/UIforETWDlg.cpp#L1488
  
  Reply
1771911131@qq.com says:

July 24, 2023 at 11:40 pm

hi Bruce
I want to know how match related CSwitch and ReadyThread events?
the ReadyThread events also konwn as DispatcherReadyThreadTraceData,
it has no new thread id，only AwakenedThreadId,
how to get thr NewThredId?
Thank you so much！

Reply
- brucedawson says:
  
  July 25, 2023 at 2:44 pm
  
  Do you mean in WPA, or when writing TraceProcessing code?
  
  I assume you mean when writing TraceProcessing code because in WPA they are associated with each other in fairly clear ways.
  
  For TraceProcessing, I’m not sure. I don’t have any code handy that deals with ReadyThread events. If you post partial code somewhere then somebody might offer some ideas. If you post a comment on the relevant blog post instead of this generic one then you might get a better response:
  
  Bulk ETW Trace Analysis in C#
  
  Reply
  - coding says:
    
    July 26, 2023 at 12:23 am
    
    Thank you very much for your reply and suggestion，I will keep trying your suggestion.
    Now the problem is focused on getting this piece of ETW event, keyWords：0x800: dispatcher: Dispatcher operation. in Microsoft open source code PerfView，This event source is called DispatcherReadyThreadTraceData，the code show as below:
    
    public sealed class DispatcherReadyThreadTraceData : TraceEvent
    {
    public int AwakenedThreadID { get; }
    public int AwakenedProcessID { get; }
    public AdjustReasonEnum AdjustReason { get; }
    public int AdjustIncrement { get; }
    public ReadyThreadFlags Flags { get; }
    public override int ProcessID { get; }
    public override string[] PayloadNames { get; }
    protected internal override Delegate Target { get; set; }
    public event Action Action;
    public override object PayloadValue(int index);
    public override StringBuilder ToXml(StringBuilder sb);
    protected internal override void Dispatch();
    protected internal override void SetState(object newState);
    public enum AdjustReasonEnum
    {
    None = 0,
    Unwait = 1,
    Boost = 2
    }
    [Flags]
    public enum ReadyThreadFlags : byte
    {
    ReadiedFromDPC = 1,
    KernelSwappedOut = 2,
    ProcessSwappedOut = 4
    }
    }
    
    I can’t get the current thread id (New ThreadId), and only can get the AwakenedThreadID, no good solution.
    
    Reply

ETW Central

For more details, or if you don’t want to watch videos, there are many tutorial blog posts available, listed here in rough order of importance:

Doing precise analysis of an ETW trace requires knowing exactly what the many columns in the tables mean. Some of those table columns are documented in these blog posts, updated in 2016 for the latest version of WPA: