About

PortraitThis is the blog of Bruce Dawson. I’m a programmer working at Google (great company!) on Chrome for Windows (awesome browser) and hacking around at home. Much of this blog was written while I was working for Valve and represents a cross-section of the work and research I did during that time. Prior to that I worked for Microsoft where I received excellent training on performance, debugging, security, and reliability. Prior to that… various other companies. This blog tends to include a random assortment of programming tidbits that I find interesting, information about unicycling, rants about Windows Live Photo Gallery, and occasional drink recipes.

The opinions stated here are my own, not necessarily those of my employer.

29 Responses to About

  1. Aaron Roberts says:

    Bruce (I believe that’s your name), I’d wanted to get ahold of you and see if you have any insight into effective software standards. While there are tons of books and articles on things people should do, I haven’t see case studies or post-mortems where a team’s coding standards were examined to determine how useful they were as part of the development process. For example, a formal coding standard of 137 pages, detailing naming conventions, bracketing, etc may be theorectically great, but if developers can’t digest the whole thing, its probably going to end up unused. In contrast, a one page synopsis and 10 page set of examples, may be too thin for teams. I’d love to contact you directly and hear your thoughts.

    • brucedawson says:

      Huh — I thought I’d replied to this, but I guess not.

      It’s good to have some basic standardization for how code should be laid out — variable naming conventions, spacing, parentheses, etc. 137 pages is too much, but 5-10 is well worth it.

      Beyond that I suspect that code reviews are the best way to ensure both quality and consistency.

  2. Malini Kothapalli says:

    Or you could let your development environment help you format your code. If you use an IDE for your day to day coding, you may find that it can do a lot of that stuff for you. In my case, I have setup Eclipse to do most of that stuff for me.

    • Malini Kothapalli says:

      I couldn’t edit my earlier post, so I am replying to my own post. I wanted to make it clear that IDE can not only be used to auto format the code, but it can also help you follow a naming convention for your constants, variables, class names, class files, class header files, etc.

  3. Zeke Odins-Lucas says:

    hey, bruce! nice blog. a coworker pointed it to me, and as I was reading it, I thought, he sounds familiar… – Zeke

  4. Sarkie says:

    Random Question: Why do you have the O2 Area on your blog image ?

    • brucedawson says:

      Random Answer: I lived in London for a year, took the picture on a flight in to London, I liked the picture, and I was able to edit it to the necessary aspect ratio. Also, I’m too artistically lazy to bother reconsidering this choice.

      • Sarkie says:

        Random Reply: I didn’t really notice it till it didn’t load, assumed it was a default picture, so thought I’d ask. Brilliant picture taking into the account you were on a flight.

  5. GTHK says:

    RSS in thunderbird plus this blog results in the same posts reappearing multiple times, I have five copies of everything now.😦

    • brucedawson says:

      Five copies? Wow. I’ve got two copies of my blog in my RSS feed in Outlook, and two copies of a couple of other blogs. I don’t know what causes it. WordPress bug?

  6. John says:

    Hi Bruce, I’ve enjoyed your ETW blog entries and training videos and shared them with my co-workers — thanks!

    Question: do you know of a way to aggregate WPR collected function weights from different stacks, e.g. to identify critical low level functions that are called by different call paths (like malloc and free)? For example KCacheGrind allows you to sort by function weight and call count to easily identify the aggregated weight of low level functions called from different call paths and also shows a nice call graph which can highlight this (e.g. see http://kcachegrind.sourceforge.net/html/Screenshots.html). Is there any way to export the WPR collected data into Excel or some other format that could maybe then be translated so that KCacheGrind or gprof2dot/graphviz could highlight aggregated hot spots. Supposedly this used to be possible (see http://stackoverflow.com/questions/4394606/beyond-stack-sampling-c-profilers/4453999#4453999) but I can’t figure out how to do the necessary CSV export from WPR. Thanks for your time.

    • brucedawson says:

      I covered exporting of CPU sampled data to text format in this blog post:

      https://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/

      It’s definitely tricky, and it was only after I wrote the post that somebody gave me the hints needed to perfect it.

      I usually find that the table view (grouping by stack, or by module, function and address) is sufficient (together with grouping by process, thread ID, or whatever else seems appropriate). The butterfly view (show all stacks leading to a particular function) is also helpful. Therefore I rarely export the data. I find dynamically exploring it in WPA suits most of my needs.

  7. Matthew says:

    Hi Bruce! Thanks for the blog and also for your videos on wintellect. I watched them all and lean’t a great deal about WPR and WPA. I can now use it to track memory leaks hotspots and long waits, as well as measure slow frames.

    From your blog I get the impression you enjoy investigating strange performance problems, and are knowledgeable about general system performance, so you may be able to work this one out.

    http://stackoverflow.com/questions/28579750/files-loading-slower-on-second-run-of-application-with-repro-code

    Testing if files exist gets slower after the first run of a program, and remains slow until the folder the file is in is renamed. I’ve tried using my ETW skills on this one but am drawing a blank. I suspect its something to do with NTFS however can’t be sure. Enjoy!

    • brucedawson says:

      I do like puzzles. I just posted a comment on the question which I will reproduce here:

      Consider uploading an ETW trace so that people can investigate without having to run the repro code. This also serves as an archive of how it was behaving, and will include many pertinent details such as amount of memory, type of disk, OS version, etc

  8. steve says:

    how do you say the word “ghoti?

  9. Your ETW series is really amazing! Thanks very much, The tool is very methodical and can be used to debug almost anything. I guess it a matter of time to get more familiar with it. I only use GPUView so far but its probably inspired by ETW\Xperf.

  10. Jim P says:

    What are your thoughts on Rust?

  11. Milian Wolff says:

    Hey Bruce,

    I could not find your mail address, so I hope putting this down as a comment here is OK.

    First up, thanks a lot for your blog posts on xperf and WPA – much appreciated. I have some questions on the latter, which you may help me with:

    The tools I’ve used so far, most notably perf and VTune, give you different “visualizations” for call stack data associated to e.g. CPU samples. WPA, as I see it, only offers me the top-down view. Can I somehow view the data in a bottom-up manner? Is there maybe also a caller/callee view, i.e. some way to get a flat list of symbols in a process with the self and inclusive cost?

    Alternatively, is there some trick to handle deep call stacks? In Visual Studio e.g. I can aggregate call stacks if they don’t introduce branches and do not differ in their sample cost from the parent symbol. Right now, I’m always getting mad at WPA for forcing me to click dozens of times to expand a call stack until I find the actual interesting point in my application…

    Also, do you have contacts to people actually working on WPA? I think it would be a good idea for them to add a flame graph visualization as well. It is my current favorite way to visualize the output of perf e.g.

    Then, I wonder whether there are some tricks for application developers. I see the value in analyzing the full picture of the system, as many times it shows the odd interactions between processes that one never would have seen otherwise. That said, sometimes I only want to look at my application and nothing else. Is there an easy way to filter all the visualizations in WPA on a certain executable e.g.? I have found ways to filter individual views, but it’s cumbersome to repeat that step for every view.

    Finally, I wonder about the custom xperf events. Is this the recommended way on Windows to add static trace points, and should frameworks (like Qt e.g.) ship with xperf events? If you have knowledge about Linux or Solaris/Mac systems, do these events compare to Systemtap or DTrace static trace points? Is there maybe some good documentation on custom xperf events that also tells me more about the overhead of these events, and whether they should be shipped in e.g. release builds or only compiled in on demand?

    Thanks a lot again, hope to learn some more tricks from you!

    • brucedawson says:

      I hear that the next version of WPA may include flame graphs. I have not seen them but I am hopeful.

      Yes, it is possible to view caller/callee data on any call stack. Right click on a stack entry and select View Callers or View Callees from the context menu. This is covered in my WPA training videos. You can also change from viewing the call stack to viewing samples by process/module/function – the Randomascii Exclusive (module and function) view preset gives you that. Different views expose different information. You should also get used to fearlessly rearranging (and adding/removing) WPA columns. All questions can be answered by rearranging columns and changing the sort key.

      You also don’t have to click to expand stacks. Just choose the appropriate sort key and keep pressing the right arrow key – much faster.

      Some people like to filter graphs to a particular process. I usually don’t bother. The noise doesn’t really bother me – I just look at the areas of interest.

      I recommend shipping with custom ETW events built in. The ETWProviders*.dll exist for this purpose and the events are very low overhead. A few thousand per second is totally reasonable in release code. Microsoft ships Windows/IE/Edge with *tons* of these events.

      • Milian Wolff says:

        Thanks a lot for your replies!

        I just tested the latest update on W10, and it now sports a basic flame graph view – awesome! Much easier that way.

        Regarding top-down/bottom-up call stacks: Doing it via the context menu means I first have to drill down to select a function, then select to see its callers. What I’m missing is a configuration on the Stack column to set the direction. I.e. right now it’s top-down. I want it to be bottom-up. VTune makes the difference (and value!) of both versions quite apparent. This does not seem to be possible with WPA, or am I simply misusing the context menu?

        Also, is there a way to get file + line numbers for symbols in the stack view?

        Thanks again.

        • brucedawson says:

          I had forgotten that a flame graph view had been added – CPU Usage (Sampled) flame by Process, Stack, or configure as appropriate. Thanks for pointing that out!

          The way that stacks are collapsed in WPA makes reversed stacks impractical (illogical even) without first selecting a point to reverse them from (as with the filtering by callers to a particular function). I’m not sure how VTune does it so I can’t really compare.

          I have asked for file+line numbers (and source server support) but no dice. You’ll notice there are no file or line number columns available, and the .symcache files omit that information. Maybe the Windows 11 version, but I doubt it.

  12. Hi Bruce, wonderful insights all over this blog! I had a quick question I hope you can answer. Is it possible to use WPA to analyze L1/L2/L3 CPU cache statistics? If not, have you needed to do it with some other tool? I have a suspicion that my multi-threaded program might be suffering from false sharing…an increase in threads (and consequently, number of cores used) results in non-linear performance degradation. I’m moderately certain that it’s not caused by locks. Anyway, any insight you could offer would be much appreciated.

    • brucedawson says:

      I keep hearing rumors that CPU performance counters will be made available in ETW, but so far, no dice.

      What I usually do is profile my code on Linux and use perf to monitor CPU performance counters. In most cases the results should be applicable on Windows.

      You should at least be able to use ETW to see if locks (spinning in them or waiting on them) are the problem.

  13. Tomer Ben Arye says:

    Hi Bruce.
    I tried to tweet you with small question.
    We are having the call for DX9 present() delayed randomly ( 1 sec video stutter)
    I used your UIforETW but a software is blocking user keystrokes.
    Even if we wrote a program that sends those keys , the computer blocks it.
    What is our alternative to this key combo?

    ( going to study your second video course – thanks for that! )

    • brucedawson says:

      Because UIforETW is elevated you probably can’t send keystrokes or other messages to it except from other elevated processes. You should probably hack the source to add the functionality you want.

      I implemented the type of code you are talking about – programmatically detecting a slowdown and recording the trace – at a previous job, but I don’t have the code anymore. Let me know if you come up with a reusable solution and maybe it can be rolled into the main release.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s