Over the last few years I’ve written over forty blog posts that discuss ETW/xperf profiling. I’ve done this because it’s one of the best profilers I’ve ever used, and it’s been woefully undersold and under documented by Microsoft. My goal has been to let people know about this tool, make it easier for developers and users to record ETW traces, and to make it as easy as possible for developers to analyze ETW traces.
Some of those posts have aged poorly, and the rest are hidden amongst the 210+ posts (really? wow) I’ve written. The purpose of this page is to be a central hub that links to the ETW/xperf posts that are still relevant. Also, I’ve updated many of the older posts to reflect changes in the ETW toolset (technically known as the Windows Performance Toolkit). For convenience this page is accessible as https://tinyurl.com/etwcentral.
The most important post describes how to record ETW traces. This is important because ETW traces can be recorded on one machine, and analyzed on another. This means that your customers or relatives can record ETW traces and then you can analyze them. Remote diagnosis of issues is a wonderful superpower. The article describing how to record an xperf/ETW trace can be found here – share it with those who have performance problems:
Once you’ve got a trace you need to analyze it (or share it with someone who can). The most comprehensive resource I’ve created for learning how to analyze ETW traces is the series of three videos I created in 2014. More information and links to the videos can be found here:
For more details, or if you don’t want to watch videos, there are many tutorial blog posts available, listed here in rough order of importance:
- UIforETW – Windows Performance Made Easier
- WPA–Xperf Trace Analysis Reimagined – basics of WPA trace analysis
- Xperf for Excess CPU Consumption: WPA edition – basics of investigating CPU consumption
- Xperf Wait Analysis–Finding Idle Time – wait analysis (idle CPU investigations) with UIforETW and WPA 10
- New Xperf and new WPA in the new WPT – new features in WPT 10
- Graph All the Things (Using WPT 10) – graphing custom ETW data
- New Version of Xperf–Upgrade Now – new features in WPT 8.1
- ETW Heap Tracing–Every Allocation Recorded – tracing memory allocations
- ETW Heap Snapshots – tracing all outstanding allocations (much less data than ETW Heap Tracing)
- ETW Flame Graphs Made Easy – using WPA’s flame graphs to visualize CPU usage, and more
- Exporting Arbitrary Data from xperf ETL files
- CPU Performance Counters on Windows – recording per-process CPU performance counters using ETW
- Process Tree from an Xperf Trace – an example of exporting ETL data
- ETW Trace Compression (and xperf syntax refresher) – low-level details on how recording traces works, handy if you want to modify UIforETW
Doing precise analysis of an ETW trace requires knowing exactly what the many columns in the tables mean. Some of those table columns are documented in these blog posts, updated in 2016 for the latest version of WPA:
- The Lost Xperf Documentation–CPU sampling
- The Lost Xperf Documentation—CPU Scheduling
- The Lost Xperf Documentation—Disk Usage
ETW investigation write-ups:
Some of my favorite blog posts are those that tell a tale of noticing some software that I use being slow, recording a trace, and figuring out the problem. In most cases this let me come up with a workaround, and in many cases the (ridiculously!) detailed bug reports or the attention the posts drew led to the problems being fixed.
Many of the articles linked to below have not yet been updated. It’s an ongoing process but I think it’s worth publishing this now without waiting for all of the updates to finish.
I’ve categorized the investigations by what product was investigated. And, as a reminder, with the exception of the fractal software investigation all of these are looking at problems in software that I don’t work on.
Visual Studio and VC++ code-gen:
- Xperf and Visual Studio: The Case of the Breakpoint Hangs
- Visual C++ Debug Builds–”Fast Checks” Cause 5x Slowdowns
- Visual Studio Single Step Performance Fixes
- Make VC++ Compiles Fast Through Parallel Compilation
- You Got Your Web Browser in my Compiler!
- Self Inflicted Denial of Service in Visual Studio Search
- 50 Bytes of Code That Took 4 GB to Compile
Windows Performance Toolkit (profiling the profiler!):
- Xperf Symbol Loading Pitfalls
- Slow Symbol Loading in Microsoft’s Profiler, Take Two
- Profiling the profiler: working around a six minute xperf hang
Windows:
- Making VirtualAlloc Arbitrarily Slower – excessive CPU usage
- Hidden Costs of Memory Allocation – excessive CPU usage
- Taskbar Latency and Kernel Calls – excessive CPU usage (Windows bug)
- O(n^2) in CreateProcess – CFG causing problems again, excessive CPU usage (Windows bug, now fixed)
- A Not-Called Function Can Cause a 5X Slowdown – lock contention during process shutdown, caused by loading the wrong DLL (Windows quirk, triggered by llvm tests)
- Making Windows Slower Part 2: Process Creation – O(n^2) in CreateProcess due to Application Verifier
- 24-core CPU and I can’t type an email (part one) – lock contention during process shutdown (CFG, service workers, v8, WMI – it’s got everything, but it’s mostly Windows bugs)
- Making Windows Slower Part 1: File Access – slowing down Chrome builds with large notification buffers (Windows bug, now fixed)
- 24-core CPU and I can’t move my mouse – lock contention during process shutdown (Windows bug, now fixed)
- 63 Cores Blocked by Seven Instructions – NTFS lock contention bringing a build machine to its knees (Windows bug)
Windows Live Photo Gallery:
- Windows Live Photo Gallery—Poor Performance Peculiarities (xperfview used for analysis, but concepts are still solid)
- Fixing another Photo Gallery performance bug
Western Digital driver (initially thought to be Windows):
- Windows Slowdown, Investigated and Identified
-
Windows Slowdown, Investigated, Identified, and Now Fixed
Miscellaneous:
- Faster Fractals Through Better Scheduling
- PowerPoint Poor Performance Problem
- Defective Heat Sinks Causing Garbage Gaming
Other people’s ETW investigation write-ups:
Obsolete posts
Some of the blog posts are now completely obsolete and are listed here only for historical interest:
- Xperf Analysis Basics – obsolete or covered elsewhere
- Xperf Basics: Recording a Trace, replaced by Xperf Basics: Recording a Trace (the easy way) which was then replaced by Xperf Basics: Recording a Trace (the ultimate easy way) – third time’s the charm
- Xperf for Excess CPU Consumption – replaced by Xperf for Excess CPU Consumption: WPA edition
- The New Xperf is Here! – replaced by Xperf Basics: Recording a Trace (the ultimate easy way)
- The New WPA Xperf Trace Viewer–New Bugs and Old – the reported bugs are generally fixed
- Summarizing Xperf CPU Usage with Flame Graphs – WPA 10.0.14393 has built-in flame graphs which are covered here
Pingback: Summarizing Xperf CPU Usage with Flame Graphs | Random ASCII
Pingback: UIforETW – Windows Performance Made Easier | Random ASCII
Pingback: Xperf Analysis Basics | Random ASCII
Pingback: Exporting Arbitrary Data from xperf ETL files | Random ASCII
Pingback: The New WPA Xperf Trace Viewer–New Bugs and Old | Random ASCII
Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII
Pingback: Process Tree from an Xperf Trace | Random ASCII
Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII
Pingback: Xperf and Visual Studio: The Case of the Breakpoint Hangs | Random ASCII
Pingback: Visual C++ Debug Builds–”Fast Checks” Cause 5x Slowdowns | Random ASCII
Pingback: Visual Studio Single Step Performance Fixes | Random ASCII
Pingback: Make VC++ Compiles Fast Through Parallel Compilation | Random ASCII
Pingback: You Got Your Web Browser in my Compiler! | Random ASCII
Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII
Pingback: Xperf Symbol Loading Pitfalls | Random ASCII
Pingback: Slow Symbol Loading in Microsoft’s Profiler, Take Two | Random ASCII
Pingback: Windows Slowdown, Investigated and Identified | Random ASCII
Pingback: Windows Slowdown, Investigated, Identified, and Now Fixed | Random ASCII
Pingback: Making VirtualAlloc Arbitrarily Slower | Random ASCII
Pingback: Hidden Costs of Memory Allocation | Random ASCII
Pingback: Fixing another Photo Gallery performance bug | Random ASCII
Pingback: Faster Fractals Through Better Scheduling | Random ASCII
Pingback: PowerPoint Poor Performance Problem | Random ASCII
Pingback: Defective Heat Sinks Causing Garbage Gaming | Random ASCII
Pingback: Thread Naming in Windows: Time for Something Better | Random ASCII
Did you use logman? Is it better/worse than xperf? https://technet.microsoft.com/en-gb/library/cc753820.aspx?f=255&MSPPError=-2147217396
I don’t think logman is necessarily better or worse than xperf but I don’t think it can do the things that I do with xperf, so it is of little interest to me. Disclaimer: I’m not a logman expert, but I think this is true.
TL;DR – UIforETW/xperf/WPA can work miracles and I explain how to do this in this blog. logman? Dunno.
Not that it can do something different, but on my W10 installation it came with the system, C:\Windows\System32\logman.exe. Basically, if you install something on user system, looks like it is much better to run logman as an events watchman for your app
Unfortunately I don’t *think* that logman has the same capabilities. Pity.
Pingback: Power Wastage On An Idle Laptop | Random ASCII
Hi Bruce, awesome work! Is there a place for RFEs? In particular, it would be great to be able to change the “trace to memory” window as well as specify the length of time before a trace is automatically saved to disk and/or stopped. Thanks!
Feel free to create issue suggestions at https://github.com/google/UIforETW/issues
Configuring the memory used and the auto-trace elapsed time has been suggested, but such changes must be made carefully to avoid giving users unnecessary foot-guns. “It just works” must continue to be as true as possible. That is, configuration is reasonable as long as it adds value without undue complexity or user risk.
Thanks. The complexity issue I understand but your point of view on risk seems overly cautious. As someone who has programmed in C/C++ for over 25 years, I have no lower extremities left to shoot! In my experience, a high amount of configurability is suitable for all but the most skittish, particularly when the *defaults* just work. In debugging performance and other issues, exploration is part of learning the ropes and I wouldn’t limit configurability unnecessarily in order to keep safe a small percentage of users that would try to do silly things on production systems. Yes I understand that having more configuration options multiplies the bug surface area and maintenance costs but I think that most users willing to experiment with advanced functionality would also be willing to help debug and improve those features, directly or indirectly. EOP (End of Pitch).
I don’t disagree. Having it work out-of-the-box is most important. Having the configuration options as uncluttered as possible is also important, since every option that is added makes the remaining options harder to find. But I don’t think we’re really disagreeing in any meaningful way.
> most users willing to experiment with advanced functionality would
> also be willing to help debug and improve those features
Well, the great thing about UIforETW being open source is that they can just grab a copy, change the constants, and party-on-Garth!
Pingback: UIforETW is No Longer a CPU Hog | Random ASCII
Pingback: ETW Flame Graphs Made Easy | Random ASCII
Pingback: WPA Symbol Loading is Much Faster, but Broken for Chrome | Random ASCII
Pingback: Thread Naming in Windows: Time for Something Better | Random ASCII
https://github.com/google/UIforETW/releases/download/v1.48/etwpackage1.48.zip
shows a malware in your file.
https://www.virustotal.com/en/file/896554acbe3983cd69e100f1e5867c8cfcc30510567e297c843c96ed6e11d986/analysis/
Sigh… I’m sure that file doesn’t contain malware, but it isn’t signed which can make AV software suspicious. I’ll add it to my list of files to sign which should resolve this for the next release.
Everything should be signed now, which should avoid malware warnings.
I think another slowdown is on very large source files. If the function(s) being debugged can be migrated to smaller source files, its debugs faster. This was observed on a single source file that was 20K+ lines long.
Pingback: [Перевод] Профилирование: оптимизация - Новини дня
Pingback: Xperf Basics: Recording a Trace (the ultimate easy way) | Random ASCII
Pingback: Making Windows Slower Part 1: File Access | Random ASCII
Hi Bruce, adore your blog!
Do you know, is it possible to collect PMInterrupt stacks somehow?
I’ve found the event in https://github.com/google/UIforETW/blob/master/UIforETW/StackWalkFlags.txt,
but didn’t manage to collect stacks for it:(
I think that ETW handles PMC counters by recording their values during context switches. If so then recording PMC interrupt stacks is meaningless. It’s an idea that’s only applicable if the interrupts fire after a certain number of events. But, I may be misunderstanding. I’d recommend sharing what you’ve tried so others can learn from what did or did not work.
Pingback: Study Notes Weekly No.3(Use odbcconf to load dll & Get-Exports & ETW USB Keylogger) | MottoIN
Pingback: C++ Performance – Do the ski gloves fit?
Pingback: Taskbar Latency and Kernel Calls | Random ASCII – tech blog of Bruce Dawson
Pingback: Taskbar Latency and Kernel Calls (via WindowsKernel.com) – Windows Kernel News
Pingback: [Перевод] Почему для открытия меню Windows читает один файл сто тысяч раз? | Терещенко. Просто. Профессионально
Pingback: Почему для открытия меню Windows читает один файл сто тысяч раз? - Дорвеи и Сателлиты
Pingback: 63 Cores Blocked by Seven Instructions | Random ASCII – tech blog of Bruce Dawson
Pingback: [Перевод] 63 ядра заблокированы семью инструкциями – CHEPA website
Pingback: [Перевод] 63 ядра заблокированы семью инструкциями — Malanris.ru
Pingback: [Перевод] 63 ядра заблокированы семью инструкциями | Терещенко. Просто. Профессионально
Pingback: O(n^2), again, now in WMI | Random ASCII – tech blog of Bruce Dawson