The Lost Xperf Documentation–CPU Usage (Precise)

Posted on May 11, 2012 by brucedawson

As I’ve mentioned previously, the documentation for xperf (Windows Performance Toolkit, also known as ETW) is pretty weak. In this post I’m going to attempt to explain the meaning of the extremely subtle and non-obvious columns in the CPU Usage (Precise) Tables, which display every context switch recorded in the trace. Understanding these columns is important when analyzing idle time, which is one of xperf’s most valuable features.

This post was updated September 2015 to include information about UIforETW, WPA, and new columns.

The CPU Usage (Precise) table is one of the worst offenders for having confusingly named columns. Quick – what’s the difference between Switch-In Time and Last Switch-Out Time? And is New Prev Out Pri a real column name, or is it a made-up compound word? And is ProcessOutOfMemory really a column in the context switch table? Stick around and find out. WPA 10 adds some spaces and hyphens to the names, and tooltips, but documentation is still needed.

Here are the other articles in this series:

This post assumes that you are recording traces with UIforETW and that the WPA (Windows Performance Analyzer) startup profiles which ship with UIforETW have been installed. You can do this from the Settings dialog by clicking on Copy startup profiles. This post also assumes that you are using WPA 10 to view the traces – this is the default when you double-click on a trace in UIforETW‘s Traces list.

CPU Usage (Precise)

This table is labeled CPU Usage (Precise) to distinguish it from the CPU Usage (Sampled) data. The sampled data comes from interrupting the CPU frequently – sampling – and seeing what it is doing. The precise data that is discussed here comes from instrumenting the context switch code in the kernel. Every time a thread starts or stops running the precise time of this activity is recorded, with optional call stacks. This means that the CPU Usage (Precise) data can tell exactly how much CPU time each thread consumes, and what the call stack looked like when a thread lost/gained the CPU. However the CPU Usage (Precise) data can’t tell you what a thread is doing between context switches – that is the job of the CPU Usage (Sampled) data. Understanding this difference is crucial.

UIforETW’s default startup profile shows the CPU Usage (Precise) data in graph form, because this is the most accurate graph for showing how much CPU time is used by each process. If you want to see details of the context switches you can click on the Display graph and table button and the table will appear below the graph.

Default columns for CPU Usage (Precise)

The available View Presets for the CPU Usage (Precise) data are good for getting different graphs (Timeline by Process, Thread and Timeline by CPU can be interesting)…

…but I don’t find any of the built-in presets particularly useful for a detailed investigation of why my program is idle. So, the default View Preset with the UIforETW startup profile is Randomascii Wait Analysis and it is optimized for figuring out when, where, and why your threads go idle. The columns used in this preset are:

New Process
New Thread Id
New Thread Stack
Readying Process
Readying Thread Id
Ready Thread Stack
Orange bar (columns to the left are used for grouping)
Count
Sum: Time Since Last (μs)
Max: Time Since Last (μs)
Count: Waits
Switch-In Time (s)

With these columns available you can easily drill down to where your threads go idle, see how often they went idle and for how long, see what call stack they went idle on, see what thread from what process woke them (and that call stack), and more. It’s pretty amazing.

CPU scheduling shows switches

All of the columns in the CPU Scheduling summary table show data that is related to a particular context switch (or multiple context switches grouped together), and the grouping (by the columns to the left of the orange bar) affects how many context switches are displayed on each row. With the recommended default view there will initially be one row for each process, and as you drill in more rows will be revealed. The grouping by stack is particularly important since it lets you see the big picture more easily.

Any threads or processes that did not have context switches during the visible time period will not show up.

Before diving into the columns (there are over 60 of them!) some concepts and prefixes need to be covered.

Context switch concepts

The CPU Scheduling summary tables are summarizing context switches. They record the act of one logical or physical CPU changing from executing one thread to executing another thread. In order to understand context switches it is important to understand the different states in which a thread can be – if you don’t understand Windows thread scheduling and its nomenclature then you can’t fully understand the CPU scheduling summary table. A Windows thread can be in several states:

Waiting – not ready to run because the thread is waiting on something through a call to WaitForMultipleObjects, EnterCriticalSection, ReadFile, etc.
Ready – the thread is ready to run but has not yet been scheduled to run. Threads move from waiting to ready when their wait is satisfied or times out. If a thread is in the Ready state for any non-trivial amount of time then it indicates some level of CPU starvation
Running – the thread is running on one of the CPUs, executing instructions and getting work done. Threads move from ready to running when a CPU is available. They can move back to ready (if they lose the CPU) or back to waiting (if they wait on something).
There are other intermediate states used by the scheduler to improve scalability such as DeferredReady and Standby, and states such as Initialized and Terminated for new and dead threads, but those shouldn’t normally be relevant for performance purposes.

There are up to three processes involved in a context switch:

New Process/Thread: this is the process/thread that we are switching to – the one that is starting execution at this point. This may be the idle thread – when a CPU is idle it is considered to be ‘executing’ the idle thread. The New Thread is moving from Ready to Running.
Old Process/Thread: this is the process/thread that we are switching away from – the one that has lost (or relinquished) the CPU. This may be the idle thread. The Old Thread is moving from Running to Waiting or to Ready.
Readying Process/Thread: this is the process/thread that ‘readied’ the new process – that satisfied whatever it was waiting on. This could be a thread that released a critical section, or set an event, or something else. This readying could represent keyboard input, completion of file I/O, receipt of a network packet, or many other possibilities. Sometimes there is no readying thread – this happens when a thread wasn’t waiting for anything except more CPU time. Sometimes the readying thread is a DPC, which means it comes from a random thread that was hijacked to run the DPC code – watch for it. The Readying Thread’s state doesn’t change during this context switch, and at the time of the context switch it can be in any one of the thread states.

In short, a context switch, as recorded by ETW, is the act of moving the New Thread from Ready to Running, and moving the Old Thread from running to some other state, on a particular CPU.

Some documents suggest that Windows always ensures that the highest priority threads are running, but this is a half truth at best, and usually irrelevant. This priority guarantee is a half truth because current versions of Windows actually only guarantee that a particular CPU is running the highest priority thread in its per-CPU queue. It is quite possible for a high-priority thread to stay in the Ready state for a few quanta while a low-priority thread continues to run on another CPU. In addition, the priority guarantee is usually not relevant because threads are constantly having their priorities adjusted, so you can’t usefully know what priority a thread will have at the time (although you can see it in the ETW trace).

Compound words

Many of the column names are compound words and if you understand the meaning of the constituent words then you can more easily remember the meaning of the columns. Memorizing these words is far more valuable than memorizing individual column names. Some of the most important words used are:

New – this refers to the thread/process that is being switched to – the thread that will be running after the context switch
Old – this refers to the thread/process that is being switched away from – the thread that was running before the context switch
Ready/Readying – this refers to the thread/process, if any, that readied the new thread
New Prev – this appears to refer to the previous context switch involving the new thread – the time that it stopped running, which may have been arbitrarily many context switches earlier
In – this apparently refers to a thread’s state at the moment when it starts running, when it switched “in” to the Running state – its state at the beginning of a period of running
Out – this apparently refers to a thread’s state when it stops running, when it is switched “out” of the Running state
Max – when rows are collapsed together a Max column is the maximum of some value across all of the collapsed rows, which is useful when searching for a single context switch with a very high value
Sum – when rows are collapsed together a Sum column is the sum of some value across all of the collapsed rows, which is useful when finding what collection of context switches are responsible for a high total. There are many columns with no Sum or Max prefix which behave as Sum columns, such as Count, and the time duration columns
Time has two different meanings in this summary table. Time (s) refers to a point in time in the trace (with units of seconds). Time (us) refers to a duration – a length of time (with units of microseconds). Thus, the units of time are crucial clues to the meaning of a column.

That is how we can end up with column names such as New Prev Out Pri which is the Priority of the New thread at the end (Out) of the Previous time that it ran. Phew.

Documentation of important columns

New/Readying/Old Process Name – this is the name of the process (i.e.; devenv.exe) without the process ID. This column is useful when you want to group all processes from one executable together.
New/Readying/Old Process – this is the name of the process plus its process ID. This column is useful when you want each process to be grouped separately
New/Readying/Old ThreadId – the ID of the thread, or –1 if there is no thread
New Thread Stack – the stack of the new thread, which is both where it resumes execution after the context switch, and where it was when its execution was suspended on an earlier context switch. This represents where the thread was waiting
Ready Thread in DPC – when the value in this column is True it means that the thread was not readied by a process. It was readied by a deferred procedure call (doing work on behalf of an interrupt) that has hijacked a process temporarily. When this happens it means that the Readying Process and Readying Thread Id are meaningless – they are just random victims. The Ready Thread Stack is also mostly meaningless but it will usually have ntoskrnl.exe!KiProcessExpiredTimerList on it because Ready Thread in DPC almost means that a thread was readied due to a timer expiring, such as Sleep(1) or a timeout being reached on WaitForSingleObject.
Ready Thread Stack – the stack of the thread, if any, which readied (made eligible to run) the new thread
Old Thread Stack – there is no column that records the call stack of the old thread (the thread being switched out). The New Thread Stack for the next context switch involving the Old Thread contains that data
Ready Time (s) – this is the time at which a thread was readied – the time at which it became eligible to run
Count – this is the number of context switches summarized by the current row. When the first column is NewProcess and all of the data is collapsed this represents the number of context switches per process over the selected time. As you drill down into the data you will eventually get down to individual context switches and the count will be one.
Sum/Max:Time Since Last (μs) – this is the time since the thread last ran – i.e.; the time the thread was in the Waiting and Ready states
Sum/Max:Waits (μs) – this is the length of time that the thread was in the Waiting state before it was moved to Ready
Sum/Max:Ready (μs) – this is the length of time that the thread was in the Ready state before it was moved to Running. This is often just a few dozen microseconds but can be arbitrarily long on a CPU starved system. The sum of Sum:Waits and Sum:Ready should be Sum:Time Since Last, and a little Excel spot checking shows that it is
Count: Waits – this counts how many of the context switches have a non-zero Wait time – i.e.; how many of the context switches were after the new thread waited. If a thread is preempted then it can go from the Running state to the Ready state and back to the Running state with no waiting. If you drill down to an individual context switch then Count:Waits count will be zero or one. If it is zero then Sum/Max:Waits will also be zero, and ReadyingProcess, ReadyingThreadId, and ReadyThreadStack will all be empty
New/Old InSwitchTime (μs) – these represent how long the new and old threads will/did run – these are the elapsed time between the current context switch and the next/prev context switches – see the Learning through exploring section below for more details – these columns no longer exist in WPA 10. CPU Usage (ms) is the closest substitute for NewInSwitchTime (μs).
Switch-In Time (s) – this is the time of the context switch.
Annotation – this, column, which is available on most or all WPA tables, can be filled in during trace analysis. See below for details.

My editor tells me that I need more diagrams, so here is a lovely Visio diagram showing some of the times and durations in a context switch – pairs of parentheses indicate time durations (us), and arrows point to particular times (s). The data is all for the Context Switch pointed to by the text with the purple background:

For those of you who prefer math with your diagrams:

Switch-In Time (s) + CPU Usage (ms) * 1e-3 = Next Switch-Out Time (s)
Ready Time (s) + Ready (μs) * 1e-6 = Switch-In Time (s)
Waiting (μs) + Ready (μs) = Time Since Last (μs)

Annotation

The annotation column can be useful during analysis as a way to add an arbitrary tag to a set of data. When the Annotation column is enabled it should typically be one of the left-most columns. Then, when a set of data is understood the trace analyst should select that data, right-click and select Annotate Selection…, and enter an appropriate description. Then the trace analysis can continue on the data that is under the <Not Annotated> node in the Annotation column, undistracted by the data that is already understood. In the screen show below the Annotation column was used to tag context switches as “Self-readied” or “Readied by other process” in order to better visualize flame graph data:

Unimportant columns

Some of these might be more important if Microsoft explained them enough to be useful, but most of them are really only of interest to kernel developers (note that some have been renamed slightly with spaces added):

Last Switch-Out Time (s) – this is the time when the old thread last stopped running, which could be arbitrarily many context switches ago. However this value appears to be buggy in regards to the idle thread where it is always the same as SwitchInTime (s)
Next Switch-Out Time (s) – this is the time of the next context switch. Or, as the name wants you to think about it, this is when the new thread will be switched out.
New Prev Out Pri – I believe that this is the priority of the new thread at the end of the last time that it ran
New In Pri – this is the priority of the new thread when it moves into the Running state because of this context switch. This is often elevated by one or two levels because the Windows scheduler boosts the priority of threads that have just been readied
New Out Pri – I believe that this is what the priority of the new thread will be when it ends its current running state – at the next context switch. This often decays back to the same level of New Prev Out Pri
Old Out Pri – I believe this is the priority of the old thread at the time that it stopped running. Note that the idle thread runs at priority zero
Cpu – this is the CPU on which the context switch is happening. A thread can move across CPUs (although Windows tries to minimize that) but for any given context switch, a particular CPU goes from running one thread to running another thread
New/Readying/Old ThreadStartModule – the module where the thread originally started
New/Readying/Old ThreadStartFunction – the function where the thread originally started
Ideal Cpu – every thread has an ideal CPU that Windows tries to schedule it on. This field presumably represents this data for the New Thread. Grouping by Cpu and then Ideal Cpu shows that some CPUs have far more context switches than others, and the majority of context switches have the new thread running on its ideal CPU. But I can’t find any time where this matters
New/Old Qnt – these are always -1, even on Windows 10. This appears to be an old and unfixed bug.
New Prev Wait Mode, New/Old Wait Mode – the available wait modes seem to be Unknown, NonSwap, and Swapable. None of these seem relevant to non-kernel developers
New Pri Decr – this number varies from –112 to +112, with zero and two being the most common values. But I don’t know what it means
New Prev State, New/Old State – these refer to the Waiting, Ready, Running, etc. states that a thread can be in, but I can’t make any sense of the data
New Prev Wait Reason, New/Old Wait Reason – these presumably refer to why the thread was waiting or otherwise not running, but the number of reasons and the diversity of call stacks makes understanding them seem hopeless
Prev CState – Cstates represent how active your CPU is, with zero being fully active and larger numbers representing various degrees of reduced power. The largest number I’ve seen on a trace is ‘2’
AdjustIncrement – unknown
AdjustReason – unknown
DirectSwitchAttempt – unknown
KernelStackNotResident – unknown
NewThreadPrevRemainingQuantum – unknown
OldThreadRank – unknown
OldThreadRemainingQuantum – presumably this records how much of the old (switched out) threads’ quantum was remaining
ProcessOutOfMemory – unknown, and doesn’t this seem out-of-place in context-switch data?

Learning through exploring

I have no secret source of knowledge for what most of these columns are – I didn’t call up someone at Microsoft and ask, and my web searches for documentation were generally fruitless. Instead, I figured out the meanings of the columns by rearranging the summary table to figure out patterns. For instance, in order to figure out OldInSwitchTime (μs) (now deleted) and NewInSwitchTime (μs) (now deleted, use CPU Usage (ms) instead) I rearranged the columns in the order:

Cpu – the CPU the context switch occurs on
Orange bar – columns to the left are used for grouping
New Process – the process we are switching to
NewThreadId – the thread we are switching to
SwitchInTime (s) – the time of the context switch
OldInSwitchTime (μs)
NewInSwitchTime (μs)

Once I did this and sorted by Switch In Time (s) the patterns became obvious:

InSwitchTime (μs) is the length of time that a thread is running, from the context switch where it starts running to the context switch where another thread starts running. This can be verified by taking SwitchInTime (s) on the selected row, adding NewInSwitchTime (μs) (remembering to divide by 1,000,000 to convert to seconds), and noting that the result is the SwitchInTime (s) for the next row – the next context switch. By selecting a few thousand rows, copying to a spreadsheet and doing some Excel-fu you can quickly verify this.

The OldInSwitchTime (μs) is how long the old thread ran, and the NewInSwitchTime (μs) is how long the new thread will run. The NewInSwitchTime (μs) is not known at the time that the context switch happens, but xperf helpfully calculates it at analysis time. You can see that the OldInSwitchTime (μs) for a given row (when sorting by SwitchInTime (s) and when there is one context switch per row) is just the NewInSwitchTime (μs) from the previous row (the previous context switch).

Aside: I don’t know what Outlook is doing in the screenshot above but it appears to be suffering from Attention Deficit Hyperactivity Disorder. In total it flips back and forth between it and the idle thread seven times (not all shown) and it never runs for more than 0.02 ms. That can’t be good.

Grouping variations

It can be interesting to see all of the context switches on a single CPU – or depressing if you think about the lost efficiency that comes from thousands of context switches per second per CPU. To do this just arrange the columns like this:

Cpu
Switch-In Time (s)
Orange bar (columns to the left are used for grouping)
New Process
New Thread Id
Sum: Time Since Last (μs)
Sum: Ready (μs)
Sum: Waits (μs)
Whatever else you want summarized…

Timer expiration

When tracing through chains of readying threads (thread A waiting on thread B which was waiting on thread C…) the identify of the readying thread is crucial, but sometimes it is irrelevant. If a thread was not actually waiting (if it was in the Standby state due to CPU starvation) then there will be no readying thread. If it was waiting on a timer (calling Sleep or a call to WaitForMultipleObjects that timed out) then the readying thread appears to be whatever thread happened to be running on that CPU – watch for KiProcessExpiredTimerList in the ReadyThreadStack for random threads. That’s because DPCs effectively ‘hijack’ other threads and run in their context, and WPA doesn’t expose that at a high level.

Go forth and rearrange summary tables

If you understand how summary tables work and you understand the data that they are presenting you with then there are many rich explorations of CPU Scheduling that you can do. If you figure out a cool technique, let me know. If you find any errors, or new or better column explanations, please share.

P.S.

I didn’t figure out the significance of Time (s) versus Time (μs) (point in time versus duration) until I was researching for this post. In fact, I didn’t figure it out until about half an hour before publishing this post. Knowing that distinction is going to make this summary table much easier to use, and justifies the time I spent writing this post.

P.P.S.

Microsoft’s documentation has improved a lot. See this page for a different perspective.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in Documentation, Performance, Programming, xperf and tagged CPU scheduling, documentation, summary tables, xperf. Bookmark the permalink.

32 Responses to The Lost Xperf Documentation–CPU Usage (Precise)

Pingback: The Lost Xperf Documentation–CPU sampling | Random ASCII
Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII
Pingback: WPA–Xperf Trace Analysis Reimagined | Random ASCII
Neeraj Singh (windows kernel dev) says:

July 7, 2012 at 12:27 am

Great post on my most oft-used summary table. Two minor obscurities:
1) IdealCpu factors into where you’ll see cases of high priority threads ready while lower pri guys are running on alternate cpus… the scheduler chooses the ideal cpu if there is no better idle cpu to run the thread and then uses priority to determine whether to preempt or queue. The ideal cpu column actually refers to the ‘old’ ideal cpu, which even trips up scheduler experts from time to time.

2) previous c-state refers to the last sleep state of the cpu when the oldthread was the idle thread. It has no useful meaning in other cases. It may be instructive to look at if you have patterns where you distribute lots of tiny pieces of work to threads and join on the results. It’s possible to get in a regime where more time is spent waking up from deep c-states than doing work. This would show up in 100us – 800us ready times where the target core was actually idle.

Reply
- brucedawson says:
  
  July 7, 2012 at 7:50 am
  
  Thanks Neeraj. I’ve recently been noticing a lot of traces where threads spend a lot of time in the Ready state even though there are CPUs available. The causes that I am aware of are:
  
  1) Scheduler doesn’t want to move the thread to another CPU
  2) Waking up CPU from deep c-states (thanks, I didn’t know about that one)
  3) Thread is blocked by an interrupt or DPC
  4) Scheduler is unwilling to unlock more logical CPUs (typically second hyperthreads but could be another module)
  
  Am I missing anything? I’m not sure that accounts for all that I am seeing, or else perhaps I just need to look more.
  
  I could never get c-state to display what looked like meaningful data, but I’ll look more closely.
  
  BTW, you should ask the xperf guys to document this. It’s tragic that it’s so hidden.
  
  Reply
  - vinc says:
    
    November 24, 2021 at 9:35 am
    
    Hey Bruce,
    When I sort my traces by Ready(us) Max, I get threads spending ridiculous ammounts of times in the ready states, I’m talking 90 seconds and more.(There is even a thread with priority 31 that averages a ready time of half a second)
    
    Do you have any idea what is happening there?
    Looking at your list above I’d assume:
    1) Even if scheduler doesn’t move the thread to another cpu, the cpu it is already on will be idle at some point during the 90 seconds window
    2) Waking from c-states can’t produce this big of a delay (also I have c-states fully disabled as well)
    3) Intereupt/DPC can’t block for this long of a time
    4) Same as with 1)
    
    Since this article is a few years old now, maybe you figured out other mechanisms that could explain what I’m seeing.
    
    Reply
    - brucedawson says:
      
      November 25, 2021 at 10:35 am
      
      That’s very odd. I just looked at a randomly selected trace and the highest value I saw in Ready (µs) Max was 11,439.6, which is 11.4 milliseconds. That was in MiZeroLargePageThread which is a low priority thread and the system was temporarily overloaded (100% CPU usage) so that delay makes sense.
      
      You don’t say how busy your system is. High CPU load is the main thing that will lead to long times spent in the Ready state. High priority threads should be immune to this, but note that Rank can occasionally be a factor, since it is essentially the high-order bits of thread priority. This can only cause your problem if your system is under heavy load, but maybe it is? See this article for details.
      
      What Outranks Thread Priority?
      
      If that doesn’t explain it then I don’t know what does.
      
      Reply
      - vinc says:
        
        November 26, 2021 at 6:22 am
        
        Here’s a screenshot
        
        First line you can see that idle is running 97% of the time. On the bottom is the System thread RTKVHD64.sys+0x3629c something to do with audio driver, it has 31 prio, and rank 0. You can also see a lot of other threads with Ready Max of 10 seconds and more.
        
        My system is miraculously nevertheless running very well, I’m gaming without any lags. I did a trace while ingame and the game main thread had average time spent in ready state of 77 us.
        
        So I guess there is no easy way to track down why the threads don’t get run? At least it’s good to know that I’m not tripping or missing something very obvious.
        
        Thanks for your help, have a nice weekend!
        
        Reply
        
        brucedawson says:
        
        November 26, 2021 at 9:49 am
        
        Maybe those threads have been suspended somehow? I don’t know – it seems very weird.
        It’s also possible that ETW is somehow recording the data incorrectly. I usually don’t suspect that but I don’t know what else to guess.
        
        Reply
        
        vinc says:
        
        November 27, 2021 at 9:21 am
        
        Someone suggested a scheduler bug as well.
        Thank you in any case, appreciate your help!
Martin Weißhaupt says:

October 22, 2012 at 1:37 am

Thank you for your posts about xperf, they have been a great help for me.

There is one thing I didn’t get to work yet. Do you know how to get the name of the thread to be shown in xperfview or wpa? Or is this not possible?

Reply
- brucedawson says:
  
  October 22, 2012 at 8:07 am
  
  It is not possible to get xperfview or WPA to show the names of threads. Pity. Setting a thread name (see http://msdn.microsoft.com/en-us/library/xcb2z8hs(v=vs.110).aspx) is supported only by debuggers, and perhaps a couple of other tools, probably because it is a tacked on hack rather than a first-class feature.
  
  Reply
  - Martin Weißhaupt says:
    
    October 23, 2012 at 1:51 am
    
    Ok, thank you for your answer.
    
    Then I’ll have to print the thread names to a file. This was plan B 😀
    
    Reply
    - brucedawson says:
      
      October 23, 2012 at 8:20 am
      
      If the thread names are that important then why not emit custom events with the names. Each thread could emit its own name, or the main thread could emit events with name/threadID pairs.
      
      The trick is to make sure that this is recorded, so either your program needs to know when it is being recorded, or else it needs to emit this information every ten seconds or so.
      
      Emitting custom events isn’t as convenient as seeing thread names in the UI, but at least it guarantees that the information is consolidated in one place. See https://randomascii.wordpress.com/2011/08/18/xperf-basics-recording-a-trace/ for an example custom provider.
      
      Reply
- brucedawson says:
  
  November 4, 2015 at 2:02 pm
  
  The problem of profilers not being able to display thread names was discussed in a blog post I wrote in October 2015:
  
  Thread Naming in Windows: Time for Something Better
  
  Reply
  - brucedawson says:
    
    November 25, 2021 at 10:36 am
    
    Note that WPA now _can_ display thread names. This is just as wonderful as I had hoped it would be.
    
    Reply
Pingback: The Lost Xperf Documentation–Disk Usage | Random ASCII
Bobi says:

January 17, 2013 at 10:34 pm

Hello,

It still not clear to me what the “Count” represent.
o.k. so it saves the number of context switch,is it the number of context switch of the new process?
and if it is, in what duration?
or maybe the context switch of other processes while the “new process” is running.
It is not very clear what this field represent.

Reply
- brucedawson says:
  
  January 18, 2013 at 1:14 am
  
  Count is the number of context switches over the time being analyzed (in WPA this is the currently visible time range) for that row. If you drill down all the way then Count will always go to 1, because at that point each row represents a single context switch.
  
  If your leftmost column is Process and you haven’t drilled in at all then Count is how many context switches for each process over the time being analyzed. If you have drilled down a bit then the meaning will change, depending on what rows you have to the left of the orange bar and how far you have drilled in.
  
  Note that in xperfview/WPA a context switch is counted for the New Process/Thread, not the one that is being switched out. I hope that makes sense. If not, let me know.
  
  Reply
  - Bobi says:
    
    January 18, 2013 at 9:57 am
    
    Hi Brucedownson,
    
    Thanks for the quick answer,
    O.k. so from what I understand now, the count field is the sum of all the context switch of new process in the duration time specified in the NewInSwitchTime, is that correct?
    lets say the NewThreadStack is a nieghbor to Count field, and I drill into the call stack (NewThreadStack field).
    from looking at the call stack, what the Count field represent?
    Is it the number of context switch while this function/thread was starting to execute until it finish its duty(what I am actually saying is that while this function/thread was executing it was interrupted by #sum of context switches)?
    
    Thanks,
    Bobi
    
    Reply
    - brucedawson says:
      
      January 18, 2013 at 12:18 pm
      
      Are you using xperfview or wpa to view the traces?
      
      In xperfview you select a time range and then view a summary table. Within that summary table the count columns tells you how many context switches there are per-row, and your grouping tells you what that row actually contains. NewInSwitchTime is not relevant.
      
      In WPA you are always viewing a summary table and the relevant time range is the entire visible time range — zooming in and out affects what is being displayed, and hence the count field.
      
      If you are looking at a particular thread and drilling down in the stack then at a particular point in the call stack the count field tells you how many times the OS switched to that thread on that sub-callstack.
      
      Ultimately the count field tells you how many times a thread or set of threads were given a chance to run. Your grouping affects what thread or threads are being summarized, and what call stacks are being summarized, and you are in charge of what time period is covered (see above).
      
      Play around with it. Zoom in to a small time range in WPA and experiment. Try some experiments. That’s how I figured out most of the columns. You’ll figure it out.
      
      Reply
Pingback: Sleep Variation Investigated | Random ASCII
Finn Wong says:

May 18, 2013 at 8:51 pm

Hi Bruce,
I begin to use WPA to analyze power issue of applications recently and your great posts help me a lot. Thanks you!
Here is a good document from Microsoft about WPA:
http://msdn.microsoft.com/en-us/library/windows/desktop/jj679884.aspx
I think it may help in identifying some of the unknowns you described.

BR,
Finn

Reply
Pingback: Defective Heat Sinks Causing Garbage Gaming | Random ASCII
Pingback: New Version of Xperf–Upgrade Now | Random ASCII
Pingback: Exporting Arbitrary Data from xperf ETL files | Random ASCII
Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII
Pingback: You Got Your Web Browser in my Compiler! | Random ASCII
Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII
Pingback: UIforETW – Windows Performance Made Easier | Random ASCII
Pingback: ETW Central | Random ASCII
Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII