ETW Training Videos Available Now

I created a series of training videos that cover Event Tracing for Windows, also known as xperf or the Windows Performance Toolkit. This set of videos, available on WintellectNow, should be enough to teach any experienced programmer how to use this amazing set of tools to investigate tricky performance problems on Microsoft Windows. You can get seven days of access to all of the videos on WintellectNow by using promo code BDAWSON-7 – no credit card required.

Or, better yet, check out Visual Studio Dev Essentials, a free Microsoft program that includes Wintellect. https://www.visualstudio.com/products/visual-studio-dev-essentials-vs and https://twitter.com/BruceDawson0xB/status/668318302309998592

I’m currently watching John Robbins’ excellent WinDBG training video (slightly condensed from Tolstoy’s original version).

My ETW talks are based, to some extent, on articles I have posted to this blog, and those posts are still available for free. But I think that video works well for demonstrating effective use of Windows Performance Analyzer – a quick demonstration is often more effective than a paragraph of explanation, and makes it easier to convey the joy of exploration. Plus, the videos all demonstrate using the latest trace analysis tools and techniques, including what I’ve learned in the last couple of years.

I tried to make use of the flexibility afforded by video editing and multiple takes to get the demos to flow as smoothly as possible. I think the end result should inform (and entertain) without wasting the viewer’s time. And, as always, I learned a few new things about ETW from the process. Take a look at the videos and let me know if you have any feedback. I’m particularly proud of the first five minutes of the second video.

Here are descriptions of the videos, extended from what can be found on the WintellectNow site:

Video 1: Introduction to Profiling with ETW

Event Tracing for Windows (ETW) allows investigation of performance problems on Windows to a greater depth than any other system. ETW can be intimidating to use at first but this talk explains how to get started with recording and analyzing ETW traces. The talk covers essential trace analysis techniques and concepts, with an emphasis on investigation of CPU bound performance problems. After viewing this talk you will be able to confidently use the free Windows Performance Toolkit  to find CPU slowdowns, and you will be prepared to learn additional ETW investigation techniques.

Note that this video was created before UIforETW was created. The steps on how to get the Windows Performance Toolkit and record traces, from 4:09 to 7:10 in the video, can be skipped over, and instead just grab UIforETW and use it instead. If you want a video description of how to install and use UIforETW, take a look at this free video. The description of recording a trace that starts around 7:10 is sufficiently similar to UIforETW to still be applicable, and the rest of the video is as relevant as ever.

This talk is designed to take the viewer from zero to analyzing ETW traces in less than an hour. It covers installing the Windows Performance Toolkit and recording and analyzing a trace. The demo focuses on poor performance in PowerPoint, which requires diving into many graphs and tables, including CPU Usage (Sampled) data. After watching this video you should understand the Zen of WPA tables – how to fearlessly configure them to mine for the information that you need. The talk also explains how to work around this common PowerPoint problem.

Spoiler alert: a cache for decoded and scaled background images might be a good idea.

Video 2: ETW Custom Events and Idle-Thread Analysis

This video explains how to use Event Tracing for Windows (ETW) to easily find why a thread is not running – to find what it is waiting on, and who wakes it up. Additionally this talk explains how to easily use custom ETW events to annotate traces and make them easier to investigate, and how to customize WPRUI to keep trace sizes manageable. The WPA tables and graphs for viewing Generic Events, CPU Usage (Precise), and File I/O are explained and demonstrated.

Note that this video was created before UIforETW was created. File I/O activity, keyboard events, and other custom events are automatically recorded in UIforETW, and UIforETW defaults to using a reasonable amount of memory. Therefore, the section of video from 7:00 to 14:00 can be skipped. And, since UIforETW ships with WPA startup profiles the first use of WPA will be less intimidating than the video shows.

This talk is built around the analysis of a hang in Visual Studio. The additional resources include an ETW key logger which is both useful by itself (now built in to UIforETW) and as an example of how to enrich your ETW traces with custom events.

The main focus of this talk is the subtle but critical skill of wait analysis – finding why a thread is not running. This ability (missing from most profilers) is one of the most important aspects of ETW and is used in many performance investigations, including the third video in this series.

Spoiler alert: treating Perforce paths as UNC paths can lead to UI hangs.

Video 3: ETW Disk I/O and Machine Information

This video, the third in the ETW training series, covers a wide range of topics built around the analysis of a hang in Windows Live Photo Gallery. Starting with the simplest way to locate a hang, moving through some multi-threaded wait analysis to find the badly behaved thread, it then moves to a deep dive on the differences between file I/O and disk I/O. Then, the talk explains how to work around the Windows Live Photo Gallery hang that was being investigated, and finishes with a few tricks on how to find information about the machine the trace was recorded on, and other bonus tips.

Note that this video was created before UIforETW was created. Therefore you can ignore the discussion of adding the Microsoft-Windows-Win32k provider because UIforETW always records data from this provider.

Graphs and tables that are used include Process Lifetimes, Generic Events, UI Delays, CPU Usage (Precise), DPC/ISR, File I/O, Disk Usage, the totally cool Disk Usage Offset graph, System Configuration, Images, and Marks. The usually awesome Window in Focus and CPU Usage (Sampled) graphs are cruelly ignored. The View Editor’s ability to have duplicated columns and save custom views are demonstrated. Some important details of the architecture of Windows (Deferred Procedure Calls and the system cache) are also explained.

This video ties up the loose ends and should make the three-video series a complete explanation of how to do excellent ETW trace analysis.

Spoiler alert: doing thousands of random 4-KB reads from your disk is not efficient.

Finding the videos

In addition to using the links above to find the videos you can go to http://www.wintellectnow.com and then search on the author Bruce Dawson to find them.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in xperf and tagged , , , . Bookmark the permalink.

49 Responses to ETW Training Videos Available Now

  1. Looking forward to watching these. Thanks!

  2. Pingback: Debugging Windows with ETW – Great Video Series! | Philip Buuck

  3. Philip says:

    Awesome videos (just watched all three) and I linked to this post from my blog. I’ve had an issue with Windows minimizing full screen games randomly for awhile – now I know how to figure out what’s going on!

  4. davidbakin says:

    First video was excellent, thanks! The second video is displaying “HTML5: Video File Not Found”. Guess I’ll watch the third out of order.

    Thanks for pointing out the 14-day trial at Wintellect!

    • brucedawson says:

      I just checked and I was able to view all three videos from the links in the post. Try again?

      • davidbakin says:

        Well, I saw the first in IE 11 but the other two gave me that error. So I switched to firefox and saw them all. They’re excellent! I will be using your ETWProvider shortly. And … I’m looking for more from you; thank in advance!

  5. john says:

    Excellent videos Bruce. I have a couple of questions.

    1. Does Windows in focus chart could be used for non-UI applications also? I have a windows service with performance issue and wondering what could be best way of investigating it.
    2. Can same tool be used for .NET applications also? I have watched few videos about PerfView. Do you have any comments about which tool could be better in what scenarios for .net apps.

    • brucedawson says:

      The Window in Focus chart is only applicable for UI applications, because only they have that concept. For services I would recommend instrumenting them using the techniques shown in the ETWProviderDemo project — emit events at key times, whatever that means.

      These techniques can be used for .NET applications, and in fact wprui creates a .NGENPDB directory containing PDBs for every loaded assembly on the system in order to facilitate .NET profiling. I don’t have significant experience with using ETW to profile .NET applications but my understanding is that it should work just fine.

  6. john says:

    Given your thorough understanding on ETW itself, I really hope to see a course from you on .NET application profiling, PerfView etc soon

  7. tony says:

    Chrome won’t open the website because its certificate has been revoked:

    “The certificate that Chrome received during this connection attempt has been revoked.”

  8. 2knowindeed says:

    The last thing I read on ETW was this article: http://mollyrocket.com/casey/stream_0029.html Have you encountered the issues listed in that article or do the tools you use help you sidestep them?

    • brucedawson says:

      I read that article and it is quite good, but not relevant to my usage. I sidestep those problems by using tools like xperf.exe and wprui.exe to record traces, rather than writing code to do so. For my purposes — recording whole-system traces so that I can find performance problems — using the tools to record traces works extremely well.

  9. Porglezomp says:

    On the topic of profiling, I’ve seen a lot of people on StackOverflow advocating doing manual random sampling by pausing the debugger in your code and seeing where you end up. They claim that this is somehow superior to sampling profilers. I don’t understand what they’re saying, since it seems to me that the profiler is doing the exact same thing, just a lot more, but since you know more about profiling, perhaps you have some idea what they’re going on about? Just about any search on StackOverflow about profiling seems to turn up answers by these people.

    • brucedawson says:

      I have used manual random sampling, and I have solved some performance problems that way, so it definitely can work. I supposed its main advantages are that it requires no special training or tools and if you are in the debugger already it is fast and easy. But, a sample every few seconds can’t compete with 1,000+ samples per second. For any non-trivial problem there is just no comparison. I’ll routinely look at traces from customer machines where I have 100,000 samples of data, explorable in many different ways, plus a rich set of other information for the many cases where CPU consumption is not actually the problem.

      So it is legitimate and useful, but not superior to a good sampling profiler except in very narrow cases.

      BTW, run this command to set ETW (globally, across the whole machine) to sample at ~8.19 KHz:

      xperf -setprofint 1221 cached

  10. Thank you for the awesome lectures. I have two quetions for you:

    1. Is it ok to use your ETWProvider project and its outputs in a commercial product?
    2. I’m trying to profile a loading process of a game. But if I run the game with WPR on, I get a zero cpu usage for most early period in the recording. Even if it’s about loading data, I don’t think the CPU usage can be zero in that time period. Why WPT shows no data in that period? What am I doing wrong? I tried several runs, but WPA always showed meaningful data only after the point when the game loading is mostly finished.

    Thanks, again!

    • brucedawson says:

      Yes, the ETWProvider project can be used in a commercial project. You should consider renaming the project and providers and changing GUIDs to avoid possible conflicts with other people who do the same thing. The resource file for the third video adds a BSD style license to make this usage clearly fine.

      It is normal for a trace to only show CPU usage for part of the trace. The circular buffers wrap around, especially during high activity. The ETWProvider data stream goes to a different circular buffer that wraps around quite slowly due to the low data rate, but the sampling/context-switch/disk/file data wraps around quite quickly. During high activity (especially with lots of CPUs) this can happen in as little as ten seconds. If it happens too frequently then you can increase the buffer sizes using the techniques described in the second video (which were used to decrease the buffer sizes).

      I always start analysis by zooming in to the time range that has full data from user-mode and kernel-mode (CPU/disk/file/etc.) providers.

      I hope that helps.

  11. John says:

    Thanks Bruce for these great videos helping me get started with WPA.

    Two questions:

    1. How come I malloc doesn’t show in my WPA “CPU Usage (Sampled)” table stacks even though random pausing of the same execution in Visual Studio shows malloc frequently in the call stack?
    2. Is there a way to sort by aggregate function call counts? For example, if function `foo` is called from 10 different places, it would be hard to identify `foo` as a potential bottleneck since each of the 10 code paths to `foo` will show as 10% or less of the total count.

    • brucedawson says:

      There are a few possible explanations for this discrepancy:

      1) The cost of HeapAlloc, which typically backs malloc, is increased when running under the debugger. So, it is possible that this represents a real variation in cost.
      2) It is possible that the debugger somehow preferentially breaks in to malloc, perhaps something related to locks and idle time or magic fairies.
      3) It is possible that the timing of the malloc calls somehow consistently eludes the ETW timer interrupts. You could try changing the timer interrupt frequency with “xperf -setprofing 9500 cached” to avoid potentially synchronizing the timer interrupts with the scheduler’s interrupts.
      4) Maybe you don’t have enough samples.

      Anyway, I would trust ETW sampling (especially if you have enough samples and have altered the sampling rate) far more than random breaking.

      Regarding the second question, once you find a function you can right-click on it and ask to view all callers of that function. Or, you can hide the stack column and show the modules and functions columns in order to look for functions that are hot (inclusively). It’s not perfect, but that’s what comes to mind.

    • brucedawson says:

      Another option for aggregating function call counts is to take a look at flame graphs. They sometimes make patterns in the data more visible, and the script that I provide can be repurposed for whatever type of custom analysis you want. Here’s the relevant blog post:

      https://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/

  12. Carl Clawson says:

    Amazingly useful tools and great videos about them. Thanks! After watching the first video I solved two thorny slowdown problems in five minutes each. Now on to some memory use issues. If you had a video on that I’d watch it first, but at least I know enough to start trying.

    • brucedawson says:

      I probably should do a video about memory profiling. It is a bit tricky, but I think there are some online resources about how to do it. I do memory tracing by setting this registry key before launching the application to be profiled:

      reg add “HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\AppName.exe” /v TracingFlags /t REG_DWORD /d 1 /f

      but there’s more additional secret sauce than I can summarize here. Good luck.

  13. Mark says:

    Hi Bruce

    Firstly, many thanks for taking the time and effort to produce these videos and your blog. It really is appreciated.

    Secondly, I have come unstuck using WPA and wondered if you could point me in the right direction.

    I have been trying to use WPR and WPA to find the source of a slow logon. I have been watching the TechEd video “2014 Edition How Many Coffees Can You Drink While Your PC Starts” but am having trouble following it as I don’t have the same WPA graphs available to me.

    My standard WPA view of my capture has Generic Events graphs for “Activity by Provider, Task, Opcode” and “VSync-Dwmframe”. The presenter has a lot more, including one called SequentialBootLogonEvents, which he uses to solve the case. There is a profile to download from the presentation, which when added into WPA, provides numerous other graphs in Generic Events. But unfortunately, not the one I am looking for.

    After watching your second video I am confused (Maybe because I am an ITPro who suffers from Visual Studio phobia!) In order to try and get this graph, do I need to be investigating creating custom ETW events in order to capture the information and display it in WPA, or is it a matter of using the View Editor to get the graph I am looking for?

    Thank you.
    Mark

    • brucedawson says:

      I’m not sure why the presenter would have a graph-view that you don’t have. It may be that they created it and saved it under that name. Generally speaking a different graph view doesn’t not expose more information, it just groups it differently, and perhaps adds some filtering. So, you should be able to reproduce the graph view shown in the video, using the View Editor. Good luck!

  14. Pingback: Hidden Costs of Memory Allocation | Random ASCII

  15. Pingback: How to: Beyond Stack Sampling: C++ Profilers | SevenNet

  16. Pingback: ETW Trace Compression (and xperf syntax refresher) | Random ASCII

  17. Pingback: UIforETW – Windows Performance Made Easier | Random ASCII

  18. Pingback: New Xperf and new WPA in the new WPT | Random ASCII

  19. Pingback: Graph All the Things (Using WPT 10) | Random ASCII

  20. Pingback: Xperf Basics: Recording a Trace (the ultimate easy way) | Random ASCII

  21. Pingback: ETW Central | Random ASCII

  22. Pingback: Xperf for Excess CPU Consumption: WPA edition | Random ASCII

  23. Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII

  24. Well, first of all, thank you, Bruce, for this blog, it’s very useful and enlightning.

    I’d like to watch these videos you posted but I think the promo code has expired. Could you, please, reload it? I would be very grateful!

    Thanks again and keep on with the good work.

  25. john says:

    I have really enjoyed learning about ETW. However, I have had a nagging problem. I haven’t been able to get any Network IO providers to give me any data in WPA. It seems so straight forward. I select Networking IO Activity from wprui or I try using the NETWORKTRACE provider from xperf.

    I thought the first video was an excellent and I came away with more useful features I didn’t know about.

    I have been running WPR/WPA from ADK in a VM and I thought maybe it wasn’t recording network traffic for that reason(?) Last night I did a clean install of Windows 7 x64 on a new laptop. However, I was still unable to see any network IO that I was trying to record.

    Any ideas or things for me to consider?

    • brucedawson says:

      It’s been years since I’ve done any network IO profiling so I’m not sure. One common error is not distinguishing between data that is recorded and data that is visible in WPA. Where in WPA are you looking for this data?

      Try recording two traces, one with and one without Networking IO Activity in wprui, Then load both traces into WPA and go to Trace-> System Configuration-> Trace Statistics to see what was actually recorded.

      Note also that UIforETW let’s you specify extra kernel flags, such as NETWORKTRACE, in the Extra kernel flags field in the Settings dialog.

      Good luck!

      • john says:

        Thank you for the quick response. I followed your suggestions and it looks like networking data is recorded and maybe just not available to view in WPA. When I select the Networking IO tab, I see additional providers noticeably Microsoft-Windows-TCPIP and Microsoft-Windows-Networking-Correlation. But there are no new graphs in WPA. Also, I ran xperf with +NETWORKTRACE. In that .etl, I see a TCPIP and a UDPIP provider listed under Trace Statistics.

        Any suggestions or links on how to view this data if it is not available in WPA? It would be nice to have it in a graph that would work with WPA, but I’m really just looking to see if I can do it at this point. Would an older etl reader (maybe xperfview) at least show the data?

        • brucedawson says:

          It’s worth trying xperfview. Failing that you may need to dump the data with a processing action (xperf -help processing) or dump the whole trace as text (xperf -i trace.etl -o output.csv I think?) – sorry, I’ve got nothing else.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s