PortraitThis is the blog of Bruce Dawson. I’m a programmer working at Google (great company!) on Chrome for Windows (awesome browser) and hacking around at home. Much of this blog was written while I was working for Valve and represents a cross-section of the work and research I did during that time. Prior to that I worked for Microsoft where I received excellent training on performance, debugging, security, and reliability. Prior to that… various other companies. This blog tends to include a random assortment of programming tidbits that I find interesting, information about unicycling, rants about Windows Live Photo Gallery, and occasional drink recipes.

The opinions stated here are my own, not necessarily those of my employer.


48 Responses to About

  1. Aaron Roberts says:

    Bruce (I believe that’s your name), I’d wanted to get ahold of you and see if you have any insight into effective software standards. While there are tons of books and articles on things people should do, I haven’t see case studies or post-mortems where a team’s coding standards were examined to determine how useful they were as part of the development process. For example, a formal coding standard of 137 pages, detailing naming conventions, bracketing, etc may be theorectically great, but if developers can’t digest the whole thing, its probably going to end up unused. In contrast, a one page synopsis and 10 page set of examples, may be too thin for teams. I’d love to contact you directly and hear your thoughts.

    • brucedawson says:

      Huh — I thought I’d replied to this, but I guess not.

      It’s good to have some basic standardization for how code should be laid out — variable naming conventions, spacing, parentheses, etc. 137 pages is too much, but 5-10 is well worth it.

      Beyond that I suspect that code reviews are the best way to ensure both quality and consistency.

  2. Malini Kothapalli says:

    Or you could let your development environment help you format your code. If you use an IDE for your day to day coding, you may find that it can do a lot of that stuff for you. In my case, I have setup Eclipse to do most of that stuff for me.

    • Malini Kothapalli says:

      I couldn’t edit my earlier post, so I am replying to my own post. I wanted to make it clear that IDE can not only be used to auto format the code, but it can also help you follow a naming convention for your constants, variables, class names, class files, class header files, etc.

  3. Zeke Odins-Lucas says:

    hey, bruce! nice blog. a coworker pointed it to me, and as I was reading it, I thought, he sounds familiar… – Zeke

  4. Sarkie says:

    Random Question: Why do you have the O2 Area on your blog image ?

    • brucedawson says:

      Random Answer: I lived in London for a year, took the picture on a flight in to London, I liked the picture, and I was able to edit it to the necessary aspect ratio. Also, I’m too artistically lazy to bother reconsidering this choice.

      • Sarkie says:

        Random Reply: I didn’t really notice it till it didn’t load, assumed it was a default picture, so thought I’d ask. Brilliant picture taking into the account you were on a flight.

  5. GTHK says:

    RSS in thunderbird plus this blog results in the same posts reappearing multiple times, I have five copies of everything now. šŸ˜¦

    • brucedawson says:

      Five copies? Wow. I’ve got two copies of my blog in my RSS feed in Outlook, and two copies of a couple of other blogs. I don’t know what causes it. WordPress bug?

  6. John says:

    Hi Bruce, I’ve enjoyed your ETW blog entries and training videos and shared them with my co-workers — thanks!

    Question: do you know of a way to aggregate WPR collected function weights from different stacks, e.g. to identify critical low level functions that are called by different call paths (like malloc and free)? For example KCacheGrind allows you to sort by function weight and call count to easily identify the aggregated weight of low level functions called from different call paths and also shows a nice call graph which can highlight this (e.g. see http://kcachegrind.sourceforge.net/html/Screenshots.html). Is there any way to export the WPR collected data into Excel or some other format that could maybe then be translated so that KCacheGrind or gprof2dot/graphviz could highlight aggregated hot spots. Supposedly this used to be possible (see http://stackoverflow.com/questions/4394606/beyond-stack-sampling-c-profilers/4453999#4453999) but I can’t figure out how to do the necessary CSV export from WPR. Thanks for your time.

    • brucedawson says:

      I covered exporting of CPU sampled data to text format in this blog post:


      It’s definitely tricky, and it was only after I wrote the post that somebody gave me the hints needed to perfect it.

      I usually find that the table view (grouping by stack, or by module, function and address) is sufficient (together with grouping by process, thread ID, or whatever else seems appropriate). The butterfly view (show all stacks leading to a particular function) is also helpful. Therefore I rarely export the data. I find dynamically exploring it in WPA suits most of my needs.

  7. Matthew says:

    Hi Bruce! Thanks for the blog and also for your videos on wintellect. I watched them all and lean’t a great deal about WPR and WPA. I can now use it to track memory leaks hotspots and long waits, as well as measure slow frames.

    From your blog I get the impression you enjoy investigating strange performance problems, and are knowledgeable about general system performance, so you may be able to work this one out.


    Testing if files exist gets slower after the first run of a program, and remains slow until the folder the file is in is renamed. I’ve tried using my ETW skills on this one but am drawing a blank. I suspect its something to do with NTFS however can’t be sure. Enjoy!

    • brucedawson says:

      I do like puzzles. I just posted a comment on the question which I will reproduce here:

      Consider uploading an ETW trace so that people can investigate without having to run the repro code. This also serves as an archive of how it was behaving, and will include many pertinent details such as amount of memory, type of disk, OS version, etc

  8. steve says:

    how do you say the word “ghoti?

  9. Your ETW series is really amazing! Thanks very much, The tool is very methodical and can be used to debug almost anything. I guess it a matter of time to get more familiar with it. I only use GPUView so far but its probably inspired by ETW\Xperf.

  10. Jim P says:

    What are your thoughts on Rust?

  11. Milian Wolff says:

    Hey Bruce,

    I could not find your mail address, so I hope putting this down as a comment here is OK.

    First up, thanks a lot for your blog posts on xperf and WPA – much appreciated. I have some questions on the latter, which you may help me with:

    The tools I’ve used so far, most notably perf and VTune, give you different “visualizations” for call stack data associated to e.g. CPU samples. WPA, as I see it, only offers me the top-down view. Can I somehow view the data in a bottom-up manner? Is there maybe also a caller/callee view, i.e. some way to get a flat list of symbols in a process with the self and inclusive cost?

    Alternatively, is there some trick to handle deep call stacks? In Visual Studio e.g. I can aggregate call stacks if they don’t introduce branches and do not differ in their sample cost from the parent symbol. Right now, I’m always getting mad at WPA for forcing me to click dozens of times to expand a call stack until I find the actual interesting point in my application…

    Also, do you have contacts to people actually working on WPA? I think it would be a good idea for them to add a flame graph visualization as well. It is my current favorite way to visualize the output of perf e.g.

    Then, I wonder whether there are some tricks for application developers. I see the value in analyzing the full picture of the system, as many times it shows the odd interactions between processes that one never would have seen otherwise. That said, sometimes I only want to look at my application and nothing else. Is there an easy way to filter all the visualizations in WPA on a certain executable e.g.? I have found ways to filter individual views, but it’s cumbersome to repeat that step for every view.

    Finally, I wonder about the custom xperf events. Is this the recommended way on Windows to add static trace points, and should frameworks (like Qt e.g.) ship with xperf events? If you have knowledge about Linux or Solaris/Mac systems, do these events compare to Systemtap or DTrace static trace points? Is there maybe some good documentation on custom xperf events that also tells me more about the overhead of these events, and whether they should be shipped in e.g. release builds or only compiled in on demand?

    Thanks a lot again, hope to learn some more tricks from you!

    • brucedawson says:

      I hear that the next version of WPA may include flame graphs. I have not seen them but I am hopeful.

      Yes, it is possible to view caller/callee data on any call stack. Right click on a stack entry and select View Callers or View Callees from the context menu. This is covered in my WPA training videos. You can also change from viewing the call stack to viewing samples by process/module/function – the Randomascii Exclusive (module and function) view preset gives you that. Different views expose different information. You should also get used to fearlessly rearranging (and adding/removing) WPA columns. All questions can be answered by rearranging columns and changing the sort key.

      You also don’t have to click to expand stacks. Just choose the appropriate sort key and keep pressing the right arrow key – much faster.

      Some people like to filter graphs to a particular process. I usually don’t bother. The noise doesn’t really bother me – I just look at the areas of interest.

      I recommend shipping with custom ETW events built in. The ETWProviders*.dll exist for this purpose and the events are very low overhead. A few thousand per second is totally reasonable in release code. Microsoft ships Windows/IE/Edge with *tons* of these events.

      • Milian Wolff says:

        Thanks a lot for your replies!

        I just tested the latest update on W10, and it now sports a basic flame graph view – awesome! Much easier that way.

        Regarding top-down/bottom-up call stacks: Doing it via the context menu means I first have to drill down to select a function, then select to see its callers. What I’m missing is a configuration on the Stack column to set the direction. I.e. right now it’s top-down. I want it to be bottom-up. VTune makes the difference (and value!) of both versions quite apparent. This does not seem to be possible with WPA, or am I simply misusing the context menu?

        Also, is there a way to get file + line numbers for symbols in the stack view?

        Thanks again.

        • brucedawson says:

          I had forgotten that a flame graph view had been added – CPU Usage (Sampled) flame by Process, Stack, or configure as appropriate. Thanks for pointing that out!

          The way that stacks are collapsed in WPA makes reversed stacks impractical (illogical even) without first selecting a point to reverse them from (as with the filtering by callers to a particular function). I’m not sure how VTune does it so I can’t really compare.

          I have asked for file+line numbers (and source server support) but no dice. You’ll notice there are no file or line number columns available, and the .symcache files omit that information. Maybe the Windows 11 version, but I doubt it.

  12. Hi Bruce, wonderful insights all over this blog! I had a quick question I hope you can answer. Is it possible to use WPA to analyze L1/L2/L3 CPU cache statistics? If not, have you needed to do it with some other tool? I have a suspicion that my multi-threaded program might be suffering from false sharing…an increase in threads (and consequently, number of cores used) results in non-linear performance degradation. I’m moderately certain that it’s not caused by locks. Anyway, any insight you could offer would be much appreciated.

    • brucedawson says:

      I keep hearing rumors that CPU performance counters will be made available in ETW, but so far, no dice.

      What I usually do is profile my code on Linux and use perf to monitor CPU performance counters. In most cases the results should be applicable on Windows.

      You should at least be able to use ETW to see if locks (spinning in them or waiting on them) are the problem.

  13. Tomer Ben Arye says:

    Hi Bruce.
    I tried to tweet you with small question.
    We are having the call for DX9 present() delayed randomly ( 1 sec video stutter)
    I used your UIforETW but a software is blocking user keystrokes.
    Even if we wrote a program that sends those keys , the computer blocks it.
    What is our alternative to this key combo?

    ( going to study your second video course – thanks for that! )

    • brucedawson says:

      Because UIforETW is elevated you probably can’t send keystrokes or other messages to it except from other elevated processes. You should probably hack the source to add the functionality you want.

      I implemented the type of code you are talking about – programmatically detecting a slowdown and recording the trace – at a previous job, but I don’t have the code anymore. Let me know if you come up with a reusable solution and maybe it can be rolled into the main release.

  14. Peter N Gregory says:

    Hi Bruce Dawson,

    I love the stuff about unicycles. Tried to get hold of Greg Harper about his small sun & planet geared hub. However, Greg appears to have retired now and they don’t seem to have a forwarding address for him at Washington University.
    Do you know if Greg’s 1:1 or 1.5:1 ratio hubs made it into production, please? I’m emailing from Olde England and we’re not all that clued up about such things yet.
    Thanking you in anticipation of your reply, Peter N Gregory

  15. Peter N Gregory says:

    Thanks for your message, Bruce. In the event, my message reached Greg via his retirement email address at Washington University.

    For a single speed, fixed-wheel version, Greg used gears from QTC (Quality Transmission Components) at

    I will try browsing their catalogue during the week.

    Kind regards, Peter

  16. Andrew says:

    Enjoyed and learned a lot from your posts for ETW. Thanks a lot!
    Have a question, I want to write a program to run xperf to capture certain OS & driver events through a Fast Boot cycle, a.k.a. Fast Startup in WPR/ADK (not full cold boot), including both shutdown and resume phases. However in Fast Boot, all the user processes are terminated so how would I be able to do it? Clearly WPR and ADK have that capability but I can’t directly use them for other reasons (plus I want to know how!). Much appreciated if you can give me any suggestion šŸ™‚

    • brucedawson says:

      Look at bin\etwrecord.bat. You’ll have to split it into two parts, one to run before boot and one to run after. That might work, but probably not.

      Or, you might need to use xbootmgr. Unfortunately I have no recent experience with it, but there are many examples on the web.

      Why can’t you use wpr? It’s not my favorite tool but I do use it sometimes.

  17. Hi Bruce! Are you the same Bruce Dawson of “The Duel” ? šŸ˜€ I started a very small “Dos Memories” blog and of course …. If It’s you, would you please reply a couple question I’d like to post? šŸ˜€ Thanks! šŸ˜€
    Oh yeah! I’ve been shameless! šŸ˜€

  18. ChenA says:

    Hi, Bruce. i have some question to consult you, i failed to start and open a user mode realtime heap trace, EnableTraceEx2 failed when run on win7 64bit, return 1168, success on win10.
    detail code is on https://github.com/chena1982/MemCheck/blob/master/ETWTraceSession.cpp line 112.
    i search on the internet, don’t find any document about this, do you have any suggestion?

  19. Selman Genc says:

    Hi Bruce, I’m working with ETW and I want to display logged events in a different graph than the standard Generic Events graph. I guess I need to create a custom wpaProfile but I haven’t found any documentation about it, about the entries in the xml file. Do you know any documentation or something that can help me with this? Thanks.

    • brucedawson says:

      I keep meaning to blog about this…

      If you use UIforETW and you use the supplied startup profile (go to Settings and click Copy Startup Profile) then you will get multiple custom views for the generic events graph. These use different filters (only showing different providers) and different graphing types. Open up the View Editor and explore, and then use Manager Presets to save your custom settings as a new preset.

      • Selman Genc says:

        Yes, I have read the blog posts, they were really helpful. I have one question, when I log values increasing randomly like 54,128,251, one value per second, and view them in WPA, I set the value column’s aggregation mode to Sum but there is a zigzag in the graph, like this: http://imgur.com/GyXPj3H. Is that normal, if not how can I get rid of it? I think it’s because I’m logging one event per second but the timeline’s sensitivity is in microseconds so there is a gap where no value is logged and that causes the zigzag. But I wanted to make sure šŸ™‚

        • brucedawson says:

          It looks like aliasing between the display resolution and the data resolution. Try zooming in and out. Or, try changing the graph type – the selector is to the right of “Provider, Task, Opcode”

  20. Marek says:

    Hello, i liked your posts about ETW .. Just now i’ve started to have a short (second or two) hiccups of chrome after upgrading to AMD driver ver 17.7.2 and immediately thought about using ETW to look for culprit but i wasn’t successful .. Don’t know how to reach to AMD guys soo… Here is ETW Trace https://mega.nz/#!RtpjSSZT!S4vXTI5b-f7Ss9v8zcOCEO1Ti33IKooElTMyQN-L-30 .. If you’ll have time and interest to look at that i’ll be happy šŸ™‚ for now i’ll just rollback to older drivers. Thanks

    • Marek says:

      Oh and BTW: you can see the hiccup in UI Delays

      • brucedawson says:

        Since it’s a Chrome hang I took a look. The analysis was fairly straightforward although it is not clear *why* closesocket took five seconds to return. I filed crbug.com/749946. Please make additional comments there. In particular:
        1) Wired network?
        2) Did rolling back the drivers help?
        3) Anything else we should know, given that it is a networking hang caused by a function that should always return quickly failing to?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s