The Lost Xperf Documentation–Disk Usage

As I’ve lamented previously, the documentation for xperf (Windows Performance Toolkit) is a bit light. The names of the columns in the summary tables can be exquisitely subtle, and I have never found any documentation for them. But, I’ve talked to the xperf authors, and I’ve used xperf a lot, and I’ve done some experiments, and here I share some more results, this time for the Disk Usage summary table.

Previous articles in this documentation series include:

It is assumed that you have already read the posts explaining how to install xperf and record traces using some variation of my recommended settings and that you have read how to use summary tables, including grouping and sorting.

You gotta know when to use it

When profiling I rarely start with Disk Usage because it is usually not directly useful. Having your disk 100% busy is not by itself a bad thing. It could mean that your disk is thrashing and causing an application hang, or it could mean that the OS is lazily writing modified pages back to the disk because the disk is otherwise idle.

The usual way that I end up looking at disk I/O is if I notice a hang. Usually it is an idle CPU hang where the critical thread is clearly waiting on a disk I/O operation such as ReadFile or CreateFile. Occasionally it is a busy CPU hang where the CPU is doing a busy wait until some I/O completes.

Getting there

The Disk Usage Summary table is available in xperfview, but the preferred trace viewing UI is now WPA, so I’m going to focus on that. In the Graph Explorer on the left of WPA you can double-click on the Storage section to open up the default disk usage graph. imageYou can then show the associated summary table by clicking on either the first of the five buttons shown here (summary table and graph) or the third button (summary table only). You can adjust the set of displayed columns with the imageView Editor, or by right-clicking on any column to enable/disable columns, and by dragging columns around to reorder them. Rearranging the columns – especially the columns to the left of the orange bar which are used for grouping – is crucial for finding answers to life’s persistent questions.

If you open up the Storage section in the Graph Explorer you will see a Disk Usage section, and if you open that up you will see nine different graphs available. These graphs all offer different views of disk usage, but they all offer the same 19 columns in their associated summary tables. The default order for these columns varies, and sometimes the same column is configured to show up twice (once as a Sum and once as an Average), but the summary tables associated with these Disk Usage graphs are all fundamentally the same. There are five more disk I/O columns that aren’t available in any of these summary tables, available from a super-secret-summary table, but that will have to wait.

Not all file I/O is disk I/O

It is important to understand the difference between disk I/O and file I/O. In xperf’s nomenclature ‘File I/O’ occurs every time somebody calls one of the file I/O functions such as ReadFile, whereas ‘Disk I/O’ is when the disk actually has to do some work. A lot of file I/O doesn’t result in disk I/O because the Windows disk cache does a great job of keeping things in memory. This cache is the main reason why you can never have too much memory – my 32 GB work machine typically has 25 GB of data cached, which keeps it humming along quite nicely.

It’s also important to understand that, on Windows 7 at least, I/O that goes to the optical disk is not recorded. That’s a shame.

As usual the columns can be configured to be a count, average, max, min, sum, etc., depending on what you need.

With all that said, here are the meanings of the columns. I’ve grouped them into categories to make the list easier to navigate:

Columns for who is doing the I/O:

  • Process – the name and process ID of the process doing the I/O. For many writes this will be listed as ‘System’ because the WriteFile call returns immediately and the system process then writes the data in the background. Reads may be attributed to the ‘System’ process if it is doing read-ahead.
  • Process Name – the name of the process, without the process ID. This allows grouping identically named processes together.
  • Thread ID – the ID of the thread doing the I/O.
  • IO Init Stack – if you have asked xperf for call stacks using the DiskReadInit, DiskWriteInit, or DiskFlushInit flags to “-stackwalk” then the call stacks will show up here. This can obviously be handy for finding out exactly where disk I/O is coming from. My default xperf scripts don’t request call stacks for disk I/O so normally this column is just “?!?”. When I need disk I/O call stacks, especially on reads, I can usually find them by doing idle thread analysis.

Columns for what I/O is being done, and where:

  • Path Tree – this gives you a tree view of the paths being accessed. This should be placed to the left of the orange grouping bar and you can then expand it like any normal tree control, getting various statistics summarized by partial paths.
  • Path Name – this displays the full path of the file being accessed. When placed to the left of the orange grouping bar this lets you group by filename in a non-hierarchical way.
  • Path Ext – this contains the file extension of the file being accessed, in case you want to group by file extension. This could be useful if, for instance, you suspect that DLL accesses or .jpg accesses are causing problems.
  • Disk – the disk number, handy for having in the leftmost column for grouping by disk. Since most machines just have a single disk this will usually contain zero.
  • Min Offset – the offset is just a number, in hexadecimal, from zero to the size of the disk, representing the location of the disk access. I think of this as representing the head position (for spinning rust disks) and it is useful for seeing how much seeking is going on. This column shows the minimum offset for all of the operations summarized on the current row.
  • Max Offset – this column shows the maximum offset for all of the operations summarized on the current row. For a single operation this will be Min Offset plus Size minus one. By comparing Min Offset and Max Offset for a series of I/O operations you can get a rough sense of how well localized they are, which is important with spinning disks.
  • IO Type – this column records whether the I/O is a read, write or flush.

Columns for the performance characteristics of the I/O:

  • Init Time (s) – this is when the I/O request was initiated.
  • Complete Time (s) – this is when the I/O request was completed.
  • Disk Service Time (us) – this is the elapsed time to perform the read or write as seen by the disk. The assumption is that a disk can only work on one I/O at a time, or at least a given period of time will only be ‘billed’ to one I/O, so the sum of these can never significantly exceed the length of time being analyzed. The only reason the sum can be slightly longer than the analysis time is because there will often be an I/O that starts before the analyzed region and then completes inside it.
  • IO Time (us) – this is the elapsed time to perform the read or write as seen by the thread initiating the access. It is, roughly speaking, the time that it takes the call to return, and it simply the difference between Init Time (s) and Complete Time (s). IO Time can never be less than Disk Service Time, but because there can be many threads performing disk I/O simultaneously there is no limit to how much greater it can be. For writes – where the I/O function returns before the operation is complete – IO Time is set to Disk Service Time, which means it may overstate the time the thread spends on the disk I/O.
  • Size – the number of bytes read or written by the operations summarized in the row. This is one measure of how much work is being done.
  • Count – the number of operations summarized on the row. In many cases this is a better measure of how much work is being done than Size. Disks are at their most efficient when doing a small number of large reads and writes – anything less than about 1 MB can be significantly inefficient – and a large number of operations may be inefficient.
  • Pri – this is the I/O priority. Most I/O runs at normal priority, but some processes such as SearchProtocolHost.exe run some of their I/O at Very Low priority in order to avoid interfering with foreground tasks. Other priorities are possible. I/O priorities are separate from thread priorities.

Columns for miscellaneous:

  • Source – I have only seen this column be ‘Original’ or ‘Unknown’ and I don’t know what it indicates. Sorry.

Sample Summary Tables

Effective use of summary tables requires rearranging the columns to answer whatever questions you may have. Below I have screenshots of some sample rearrangements. If you have particular favorites then you can add these to WPA for easier reuse.

This summary table groups the number and size of I/Os by disk number and by read/write, to get a high-level view of what is going on.

image

Here we can see a bunch of disk I/O during Windows Live Photo Gallery startup – grouped by process and Path Tree, and sorted by Init Time. This lets us see that WLPG does a lot of small reads to random locations in this one file, which is why it can take a minute or longer to start.

image

Many other useful arrangements are possible – experiment to see what works.

Learning how to fish

The lack of comprehensive official documentation can be frustrating, but in some cases this problem can be easily worked around. When writing this post I found a trace with lots of disk I/O and just played around. By dragging a particular column to the very left I automatically got a list of all the values it has in my trace, which is usually enough to let me write about it. I encourage you to fearlessly rearrange columns, both to investigate performance problems, and to determine the meanings of unknown columns.

Sometimes there is an apparent mathematical relationship between columns, such as the one between Init Time, Complete Time, and IO Time. When you suspect a relationship and want to test your hypothesis it can be very useful to a few hundred rows of data, then right-click and use Copy Selection. You can then paste it into Excel and do a few quick calculations.

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Documentation, Performance, xperf and tagged , . Bookmark the permalink.

5 Responses to The Lost Xperf Documentation–Disk Usage

  1. Pingback: The Lost Xperf Documentation–CPU sampling | Random ASCII

  2. Pingback: The Lost Xperf Documentation–CPU Scheduling | Random ASCII

  3. Pingback: New Version of Xperf–Upgrade Now | Random ASCII

  4. Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII

  5. Pingback: How to analyse SQL Server performance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s