Power Wastage On An Idle Laptop

Posted on March 8, 2016 by brucedawson

Ever since I upgraded to Windows 10 it’s felt like my battery life is worse. My suspicion was various scanning tasks that were springing to life more frequently, but it was just a hunch.

So, I did what I do – I profiled. I recorded long-running ETW traces to see how much CPU time was going to the processes that I have chosen to run, and how much was going to processes that Microsoft has chosen to run.

I should have done this years ago…

TL;DR – I found a few applications that were consuming the vast majority of the CPU time. These applications were useless to me so I disabled them. Sqlservr, Plex, and iTunes – you are dead to me now. I also disabled some lesser programs, and others are on my watch list.

I didn’t find much that was Windows 10 specific, so maybe my battery is just getting old and tired, but the fixes I’ve made were still worthwhile. My battery life should be better now.

See the end of this post for how to do a simple version of this analysis without needing to learn ETW tracing.

First steps

I knew I wanted an ETW trace that would contain all of the context switches, interrupts (ISRs) and deferred procedure calls (DPCs) over a period of about an hour. This would tell me exactly what processes and devices were consuming CPU time, and when.

I also knew that I needed to not enable the sampling profiler, because it would waste power and make the traces too big. And, I knew that I needed to not record call stacks, for the same reasons. This means that the resultant traces would be useless for figuring out what these processes were doing, but such are the tradeoffs we must make when recording all CPU usage for an hour at a time, instead of the usual length of about a minute.

I hacked up etwrecord.bat to create etw_cpuusage_longterm.bat, and let it run for an hour while my laptop was on battery power. This is approximately the command line I used:

xperf -start %logger% -on proc_thread+loader+dpc+interrupt+cswitch …

The result is this CPU usage graph covering a one hour time period. The y-axis is CPU usage by process as a percent of total system CPU power and the x-axis is time in seconds:

The “500 m” on the vertical axis indicates 0.5% of CPU power, and the “1” indicates 1%. Because this is a four-core/eight-thread laptop these markings indicate 4% and 8% of a single core. The visible time covers exactly an hour, from 300 to 3900 seconds in the trace.

Sqlservr!!!

The biggest consumer of CPU time is pretty obvious in the graph – those repeating reddish spikes had better be something really important. But they’re not. Those represent sqlservr (MSSQL10.SQLEXPRESS to be precise). This process is installed by default with Visual Studio (VS 2010 I think) and although I’ve seen it on traces before I’ve just ignored it. And I’ve never used it – I’m not hosting any databases. And yet, over the one hour period being analyzed:

sqlservr woke up every minute, consumed 1.5 s of CPU time (spread out over a few seconds), and then went back to sleep
sqlservr did 13,157,222 context switches
sqlservr consumed 91.116 s of CPU time
sqlservr did all of this while my laptop was on battery power

That’s pretty impressive. And, to be perfectly clear, this is a program that I’m not using. There are no databases loaded.

Once I knew that sqlservr was waking up once a minute it was easy to get a more detailed trace to see what it was doing – I just used UIforETW to record a normal trace for a bit more than a minute. Sqlservr had better be doing something really important to justify burning all of that CPU time. Yeah. About that… Here’s what the CPU sampling data says:

So sqlservr, which is using a paltry 15 MB of data because it is hosting zero databases is waking up once a minute to notify memory consumers that it’s “Eight Bells and All Is Well”. Why does it take so much CPU time when there is nothing to notify memory consumers about? Aarrrggghh!

I’m kind of pissed about this, both because of the colossal waste, and because I’ve done nothing about this for years. I feel like some Microsoft SQL developers were chortling to themselves and saying “I wonder how long Dawson will take to realize how much we’re hurting his battery life.”

How much electricity does sqlservr waste? The same trace that showed how it was spending its time also contained CPU power usage data (one of the bonus bits of data that UIforETW records, as long as Intel Power Gadget is installed). A bit of Excel math shows that the extra energy consumed by the CPU when sqlservr is notifying memory consumers is 5.6 mWh, which is enough to run the CPU when my PC is idle for at least five seconds. The graph below shows energy usage on the top and CPU usage on the bottom – note the two energy usage spikes corresponding to the two CPU usage blips that correspond to sqlservr springing to life.

This doesn’t mean that disabling sqlservr will give 8% better battery life, because other components also consume energy, but it will help.

If you’re actually using sqlservr then you should leave it alone, but if you’re not using it then disable it. Run services.msc, find SQL Server (SQLEXPRESS), stop the service, and change the Startup Type from Automatic to Manual. Or uninstall it if you feel particularly aggressive. Your battery will thank you.

And Microsoft, can you please fix this?

Grrrr.

The lock screen

With sqlservr understood I wanted to remove it from the data so that the next busiest process can more easily be seen. This can be done by recording a new trace after disabling sqlservr, or it can be done by just hiding the sqlservr data in WPA. I took the latter route so that I could do most of my investigations from the same trace. You can disable any graphed item just by clicking its associated color (circled in the diagram above) to clear it. Right-clicking gives more options, including the option to change the color. With sqlservr hidden the next interesting target becomes obvious – the horizontal blue line at around the 200 m (0.2% of total CPU power) mark:

In fact, if you look really closely you can see that there are two lines that almost perfectly overlap.

Okay, this one was weird. And this seems to happen only on my laptop, because I’m lucky that way. When my screen turns off and locks because of inactivity then LockApp and LogonUI both start running xaml animations at 30 fps – I verified this with a normal UIforETW trace. Yes, these two programs start running animations as soon as the screen is off and the animations are guaranteed to be invisible. They each use 0.2% of CPU time, for a total of 0.4%, or about 3.2% of a core. But, at least this doesn’t happen when I’m using my laptop – hurray for small blessings.

I can’t repro this on any other Windows 10 machines so I assume this is something to do with upgrading from a five-year-old install of Windows 7. If you upgraded to Windows 10 then maybe see if this happens to you as well.

Plex

With LockApp and LogonUI graphs both disabled the next problem can be seen – the green horizontal line pointed to by the arrow. That represents the PlexDlnaServer process:

I installed Plex a few months ago to see if it would be a good solution for viewing photos on my 4K TV. It downsamples the pictures (to 720p I think) so the answer was: no. But, I left it installed and running because how bad could that be.

The answer is “not as bad as sqlservr, but still annoying”. Plex actually left three processes running and they all made it into the top fifteen by CPU usage. That’s not something to be proud of. When sorted by CPU Usage the three Plex* processes came in 5th, 11th, and 15th.

The three Plex* processes used a combined total of 49,551 ms per hour which means 13.7 ms of CPU time per second, which is way too much for an application that should just be waiting for work. Pro-tip: don’t poll.

PlexDlnaServer also has the distinction of having the second most context switches, which can harm power usage by never letting the CPU go to sleep.

Other programs

While the graph view is great for seeing execution patterns, the table view can be better for seeing exactly how much CPU time each process consumed, and how many times they were context switched in. Here’s the data sorted by CPU Usage:

In the interests of science and transparency I’ve shared the raw data for the trace which I’ve been analyzing – sorted by CPU usage and by context switches in a google sheet and with more rows available, and the trace is available as a release .zip file on github.

Outlook (the green line just below PlexDlnaServer) had 297,110 context switches and used 31.7 seconds of CPU time. I’d like to see that lower, and I should probably close Outlook when I really need to maximize battery life. At least Outlook is a program that I am running intentionally, but I do wish it wouldn’t spin in pointless animation loops when it’s not even active.

MsMpEng (anti-virus software from Microsoft) consumed 19.6 s of CPU time. I’m not sure why it has such trouble staying asleep even when the rest of the system is idle.

The system process consumes 15.2 s of time. Looking back at the graph I can see that it wakes up once every three minutes to do some bookkeeping, but I’m not yet sure what to do about that. Ditto with svchost.

I had five web pages open in Chrome, running who knows what Javascript, so 14.5 s of CPU time doesn’t seem terrible, but part of my day job is to make that number even smaller.

LMS is Intel Management and Security Application Local Management Service and it should slow down.

NisSrv is Microsoft Network Realtime Inspection Service. I don’t know if its CPU usage is justified.

Here’s the same data sorted by context switch count:

Context switches can cause significant power waste, but it depends on the pattern. If explorer’s 99,774 context switches are evenly distributed then that means that the CPU is being woken up every 36 ms and that is really bad. If they are clumped together then the power implications are much less severe.

csrss may be related to the LockApp and LogonUI animations – I don’t know.

iPodService, AppleMobileDeviceService, and iTunesHelper are doing a combined average of 25.5 context switches per second. I’ve never owned an iPod and I rarely run iTunes so this is completely excessive. I used services.msc and autoruns to stop these – it took a few tries to make that work.

SynTPEnh is the driver for my touchpad I believe – it’s not clear why it wakes up so frequently when my touchpad is untouched.

EDICT (Microsoft Encarta Dictionary Tools) wakes up every 400 ms, and hangs around after you close it (so it can display its notification area icon). Unchecking “Always show icon in taskbar” makes it go away when closed, thus avoiding 9,079 context switches per hour.

Dropbox wasn’t particularly high (4,986 context switches), but does it really need to wake up more than once a second when nothing is happening and use that much CPU time? Ditto for PhotoshopElementsFileAgent (3,580 context switches) – they should learn a few things from FlashPlayerUpdateService (92 context switches).

Once you start using autoruns you can easily go crazy on turning off software, but I decided not to do this. My focus is on improving battery life so I only need to disable programs that are waking up frequently.

Rules of the game for writing long-running software

If a software developer needs to run a program in the background, that’s fine. But there are rules. Because you could be running for hours at a time on battery power. The rules are:

Don’t poll. Polling wastes CPU time and stops the CPU from dropping deep into power-saving states. WaitForMultipleObjects is your friend.
Seriously, don’t poll. I know you think that your program is a special snowflake but polling is just wasteful.
If you really can’t figure out how to avoid polling then be smart about it. Waking up every second may seem very conservative, but if every long running program does it then your CPU may get woken up dozens of times per second. Consider using exponential back off, and when you do wake up and find that there is nothing to do then go back to sleep promptly.
Using SetCoalescableTimer can help Windows coalesce your polling with others to reduce the power cost, or better yet, SetThreadpoolTimer with a large ‘window’ argument.
Avoid doing long-running animations, especially when your program is inactive or invisible. Incessant animation is just another form of polling, except that it is visible, so it makes it easier for users to realize that you’re wasting power.

What I do have are a very particular set of skills, skills I have acquired over a very long career, and if you violate these rules then I will look for your process, I will find it, and I will kill it. If you respect your customers’ CPU time then they are more likely to leave your software installed.

Web browsers

I work on Chrome so I was pleased to see that it looked pretty good. I left the five tabs I was browsing open and the net result was very low CPU usage. Chrome lowers the timer frequency on background tabs, and throttles flash. However the results are highly dependent on the page. A JavaScript or animation heavy page could easily consume huge amounts of CPU time and it is difficult for a browser to control that, so be aware of what pages you keep open if you want to run on battery for long periods of time.

Interrupts

I carefully recorded the time used by ISRs and DPCs, but it didn’t look very interesting so I didn’t spend a lot of time on it. ndis.sys and ntoskrnl.exe had by the far the most context switches and the highest CPU usage, but I don’t know what to do about that.

Alternate methods

The great thing is that this sort of research is pretty easy. Just download the latest UIforETW release. You need to run UIforETW once in order to install the Windows Performance Toolkit, but then close UIforETW, and maybe reboot to finish the setup. Get your computer so that it is running the programs that you care about, and nothing else. Then, from an administrator command prompt run UIforETW\bin\etw_cpuusage_longterm.bat. This starts up low-overhead tracing. Hit enter when you’re done. The UIforETW\bin\CPUUsageByProcess.wpaProfile profile is particularly good for summarizing this type of data.

I use ETW tracing because I’m familiar with it, because I could configure it to have very low impact, and I could do detailed post-mortem analysis of the results. But, there are other options. One great option is Process Explorer. Leave this running with the CPU, CPU Time, CSwitch Delta, and Context Switches columns visible, and watch for bad behavior.

CPU (or Cycles Delta) shows how much time each process is currently using. The Cycles Delta column should be more accurate for processes that are using very little CPU time
CPU Time (or Cycles) shows how much time each process has used – this can catch processes like sqlservr that are usually idle, but accumulate a lot of wasted CPU time in the long run
Also look at CSwitch Delta and Context Switches to find programs that are currently or historically doing a lot of context switches.

When looking at procexp.exe you should see that Interrupts, System Idle Process, and procexp itself should be at the top for CPU and CSwitch Delta, and everything else should (on an idle machine), be pretty low. Investigate the anomalies, and turn them off if you don’t want them. Take control of the CPU usage on your computer, and save energy.

Discussions on reddit, hacker news, and twitter.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in Investigative Reporting, Performance, Programming, uiforetw, xperf and tagged procexp, UIforETW. Bookmark the permalink.

37 Responses to Power Wastage On An Idle Laptop

Adrian says:

March 8, 2016 at 8:16 am

So how much better is you battery life?

Reply
- brucedawson says:
  
  March 8, 2016 at 8:26 am
  
  Good question, but I don’t know. The change is likely to be small and the noise (from occasional indexing, update scans, anti-virus scans, etc.) is large so I’m not sure I could measure it without extreme effort. Send me some matching laptops to test on and I’ll happily do science!
  
  I may try another one-hour idle scan with all of the changes to compare before/after context switches and CPU usage.
  
  Reply
billco says:

March 8, 2016 at 8:18 am

Solid investigative chops, as always! I really like your tip about exponential back-off, I’d never thought of that.

Reply
Michael Marcin says:

March 8, 2016 at 9:11 am

I think there’s a minor mistake
“..Excel math shows that the extra energy consumed by the CPU when *Excel* is notifying memory consumers..”

Reply
- brucedawson says:
  
  March 8, 2016 at 9:16 am
  
  D’oh! Fixed.
  
  Reply
Neeraj Singh says:

March 8, 2016 at 10:16 am

Hi Bruce, thanks for the great article!

On your SetCoalescableTimer tip, I’d instead recommend SetThreadpoolTimer with a large ‘window’ argument. As of win 10, it has a better implementation of coalescing than set coalescable timer.
-Neeraj, former Windows kernel dev.

Reply
Amadeus Wieczorek (@HiAmadeus) says:

March 8, 2016 at 1:06 pm

Good article! I’m happy to have learned about new ways to measure performance.

I collected a trace from my computer idling and noticed that for periods when Task Manager reports ~2% CPU usage, Windows Performance Analyzer shows ~15% CPU Usage. Does Task Manager skip some of the data? Does WPA ignore throttling information and reports 15% usage of a CPU slowed down to 1GHz?

Reply
- brucedawson says:
  
  March 8, 2016 at 1:53 pm
  
  Windows Task Manager has multiple problems with its reporting of CPU usage.
  
  1) It has insufficient precision. On a four-core/eight-thread machine a 100% busy thread corresponds to just 12.5% CPU time, which task manager shows as 12. Lesser amounts may easily truncate to zero.
  2) Task Manager, IIRC, just looks to see who is running when the timer interrupt fires. That is, it is *sampling* CPU usage and can entirely miss or overcount CPU usage. I believe its “Cycles” column uses context switch data and is more accurate.
  
  WPA does ignore throttling information. That is, it doesn’t report how many clock cycles a thread is running, it reports how long (wall clock time) a thread is running. All CPU utilization data is done that way. To do it any other way would lead to bizarre and ill-defined results I believe.
  
  Reply
J says:

March 8, 2016 at 1:11 pm

For a quicker / less involed investigation you can also use the “powercfg -energy” command. Spits out a little HTML report and might even have info about drivers/hardware that’s playing up.

Reply
Jon says:

March 8, 2016 at 9:55 pm

Does your “don’t poll” advice apply equally to server applications, where battery life is not an issue?
A couple examples of polling: Process explorer and perfmon.

Reply
- brucedawson says:
  
  March 8, 2016 at 9:58 pm
  
  There are some exceptions – the ETW sampling profiler is polling. But I think in general the advice applies to servers as well – power consumption is a huge issue in data centers.
  
  If you are polling it should be because there is a high likelihood that you will immediately find work to do, and polling is more efficient (throughput and power) than waiting. Don’t do it because waiting for work is hard.
  
  Some server expert could probably give a specific example where polling is needed, but I can’t think of any.
  
  Reply
  - Oleksandr Nikitin says:
    
    March 8, 2016 at 11:43 pm
    
    It’s pretty common to poll in low-latency scenarios (e.g. HFT). Typically you can’t even afford to acquire a lock because that may mean you’ll need to wait for your thread to be rescheduled again, so you spin-wait for another core burning electricity in the process.
    
    Reply
    - brucedawson says:
      
      March 9, 2016 at 9:01 am
      
      In that case you are trading CPU power for latency – wasting some CPU power to reduce latency slightly. This is an *extremely* delicate tradeoff that only makes sense when your latency needs are sub-millisecond and really only makes sense on a dedicated machine.
      
      So true, but not applicable to consumer software.
      
      Well, mostly not. Very brief spinning in job-queue systems can make sense, but I doubt it should ever exceed a fraction of a millisecond. I’ve seen job queues that mess this up and busy-wait for over a second. Uggh.
      
      Reply
      - rpavlik says:
        
        March 9, 2016 at 11:35 am
        
        So in my particular case, we do actually have to do this, as far as I can tell – virtual reality (specifically HMD-based VR) software, where any little bit of latency perceived gets you plenty of (valid and invalid) criticism – the main loop of our software ends up getting some 10ms of “mystery” latency with any type of yield, so while we try to be polite when nobody is connected, if there’s an app running and using our services, we effectively “burn a core” to get rid of the 10ms penalty, since that’s very perceptible.
        
        Not saying that anything is incorrect in your article, just pointing out one unusual corner case, and yet, how it can still incorporate your advice (not burning a core when nobody is listening and needing low-latency services).
        
        Reply
        
        brucedawson says:
        
        March 9, 2016 at 12:57 pm
        
        VR is a special case – a rare instance where a ms difference can be perceived.
        
        That said: if you are losing 10 ms then that sounds like you are leaving the timer frequency at its default setting. While I have also railed against unnecessarily raising the global Windows timer frequency (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) it is better to raise the frequency than to busy wait. Doing so may give you the 1 ms precision you need without spinning.
        
        So, yeah, for VR you almost need to violate one or more of my recommendations 🙂 – but with the timer frequency recommendation in particular I always said that it should be used as-needed, not never. It just bugs me when long-running idle programs raise the timer frequency for no good reason.
        
        Reply
      - Jon says:
        
        March 9, 2016 at 4:15 pm
        
        How about timeouts? For example, disconnect clients if they don’t respond within 10 seconds.
        Can’t use WaitForSingleObject/Sleep as that’d block a thread per client. And timer objects are limited resources.
        Could use polling for that?
        
        Reply
        
        brucedawson says:
        
        March 9, 2016 at 7:10 pm
        
        > Can’t use WaitForSingleObject/Sleep as that’d block a thread per client.
        
        That’s what WaitForMultipleObjects is for – awesome function.
        
        For client disconnect timeouts and what-not you probably need to pass a timeout value to WaitForMultipleObjects. Passing in a timeout to WaitForMultipleObjects is arguably a form of polling, but at least in this hypothetical case you are using a timeout because you are servicing clients! All of the badly behaved software in my example was serving zero clients – that’s what makes the inefficiency so annoying.
        
        Reply
      - Jon says:
        
        March 9, 2016 at 8:40 pm
        
        Do you mean WaitForMultipleObjects using Waitable Timer Objects? (in which case you don’t need a timeout as well). That still blocks 1 thread, but I guess that’s better than polling with 0 clients.
        
        Reply
Perepechko Grigory says:

March 8, 2016 at 10:39 pm

TaskManager shows CPU time. Just add it via Select Columns.
My biggest enemy is utorrent. It spawns process called utorrentie.exe and mines bitcoins in it.

Reply
rpavlik says:

March 9, 2016 at 6:03 am

Have you noticed any issues with Visual Studio since disabling SQL server? I was under the impression that newer versions used it for intellisense. (And, as a non database dev, I’m not totally clear on the different editions: sounded like they had a “sqlite-imitator” – in process, etc – version, but then they go and install this thing called express on my machine anyway.)

Reply
- brucedawson says:
  
  March 9, 2016 at 8:59 am
  
  I have not seen any issues. That concern is why I ignored sqlservr for years, but in hindsight that was foolish. VS (and Windows Live Photo Gallery) run an in-process database server so they are unaffected by whether sqlservr is running.
  
  So yeah, unless you are doing database development you can kill it with fire.
  
  Reply
  - rpavlik says:
    
    March 9, 2016 at 11:28 am
    
    Ah, so they do actually have it in-process for those. Good to know! Thanks!
    
    Reply
IL says:

March 15, 2016 at 1:41 am

bass.dll sets timeBeginPeriod(1) on init. It better set it when it actually needed and set it back just right after. Hopefully that could be changed.

Reply
- brucedawson says:
  
  March 15, 2016 at 1:52 am
  
  Do you mean base.dll in Chrome? If so then it does set it as needed. I have six Chrome tabs open right now and my timer interval is 15.626 ms. There may be some situations where the timer frequency gets permanently raised but it does not happen always. Reproes are always welcome.
  
  Reply
  - IL says:
    
    March 15, 2016 at 3:52 am
    
    No, not base.dll which appears to be missing on my computer.
    This one – BASS.DLL http://www.un4seen.com/ – it’s very popular audio library.
    
    Reply
    - brucedawson says:
      
      March 15, 2016 at 9:05 am
      
      Ah – got it. Yeah, you should definitely complain to them about that. Audio shouldn’t need a 1 ms timer. Audio programs just need to use large enough buffers so that they can avoid skipping without waking up 1,000 times per second.
      
      Reply
      - IL says:
        
        March 15, 2016 at 9:18 am
        
        Unfortunately software using that library do not set timer resolution back after BASS Init. BASS.DLL itself sets it indefinetely on Init if it is not compiled with undocumented (sigh!)BASS_CONFIG_NOTIMERES config option. Rising timer resolution is used to keep something in sync as close as possible. Unfortunately, this is all Greek to me.
        http://www.un4seen.com/forum/?topic=17107.0
        
        Reply
JohnB says:

March 17, 2016 at 7:39 pm

Nice post 🙂 Cracked this out to optimize performance on a tablet/netbook I just got (Acer Aspire Switch 10) – very good excuse for getting familiar with xperf! (and I usually spend a lot of time optimizing with Autoruns/Process-Explorer too)

Interestingly, I found that this tablets minimum CPU usage, is limited by its internal SD memory storage, due to running a special WIMBoot install of Windows 8.1 that massively reduces the installed OS size, using a compressed partition (which serves a dual use as a recovery partition), but which constantly is decompressing needed OS files.

This nicely shows up in the resultant xperf data, on interrupts to sdbus.sys, correlated with high System process CPU 🙂 (which, when examining threads in Process Explorer, reliably shows up high CPU calls to a ‘DecompressBuffer’ function)

Pretty nice at nailing that performance bottleneck. Am a game dev myself, working on some performance critical network code at the moment, so think this tool may come in pretty handy!

Reply
- brucedawson says:
  
  March 17, 2016 at 8:04 pm
  
  Interesting… compressed OS files would make paging them in more expensive, but… shouldn’t those files all be cached? Maybe the real problem is that you don’t have enough RAM.
  
  More RAM does draw a bit more power, so, tradeoffs, but it seems worth trying.
  
  Then again those CPU spikes are suspiciously regular – I wonder what is triggering them?
  
  Reply
iAPX says:

March 1, 2018 at 3:02 pm

lmfao!
“What I do have are a very particular set of skills, skills I have acquired over a very long career, and if you violate these rules then I will look for your process, I will find it, and I will kill it.”

Thank you sir!

Reply
Ona says:

October 27, 2022 at 10:56 am

Hi Bruce, great writeup! I stumbled across this post while looking for more closer-to-the-metal ways to determine what’s causing high DPC latency on my Win 10 system. Any tips on how you’d go about determining high DPC latency culprits using a similar approach doing a trace or any other techniques you think might work? Thanks!

Reply
- brucedawson says:
  
  October 27, 2022 at 11:51 am
  
  I’ve been lucky enough to not have to investigate high DPC times. I know that it is possible for ISRs or DPCs to run too long, which can cause lots of problems, although I’m not sure if that is what you refer to by DPC “latency”.
  The ETW traces that I use (those recorded by UIforETW, possibly others) include information on DPCs so you should be able to open up the appropriate graph (Computation-> DPC/ISR) and find out information about the longest-running DPCs. I think that if they run long enough they will also be caught by the sampling profiler, which should give you stacks.
  You then need to either get rid of the offending device, upgrade the driver, or report the issue to the vendor.
  Reporting issues to the vendor is an uncertain path, but when I found a driver that was doing an ephemeral 4 GiB physical memory allocation I got lucky and they saw my blog post and fixed the issue:
  
  Windows Slowdown, Investigated, Identified, and Now Fixed
  
  Reply
  - Ona says:
    
    October 27, 2022 at 2:03 pm
    
    Thanks for responding! A “high DPC latency” problem for me is when LatencyMon (https://www.resplendence.com/latencymon) tells me I have a problem, haha (as in this screenshot https://i.imgur.com/qFrG3M5.png). I think it does this by measuring kernel timer latencies and DPC and ISR times. It does isolate the offenders which often are graphics, USB, and network drivers, but the solution seems to rarely ever involve them directly other than “try a different one.” There are a lot of suggestions on how to fix these issues, and they all seem like guesswork and trial-and-error. I was wondering if these ETW traces could be a more systematic approach to finding causes.
    
    Reply
    - brucedawson says:
      
      October 30, 2022 at 8:25 am
      
      I’m not familiar with that tool, but it sounds like it is measuring how long DPCs and ISRs take. ETW traces may give you additional information about where time is spent, but the advice LatencyMon gives is sound. You generally can’t “fix” DPCs and ISRs other than by updating or removing drivers. Talk to your vendors.
      
      Reply
      - Ona says:
        
        October 31, 2022 at 9:35 pm
        
        Hey Bruce, I took a look again this evening, and 9 out of my top 10 drivers with the worst latencies are all from Microsoft, nVidia being the sole exception. My top two are Wdf01000.sys and dxgkrnl.sys. Aren’t those essentially Windows and DirectX kernels? Win networking drivers and then the NT kernel are also up there. So the issue with talking to vendors in this instance is like asking Microsoft to fix Windows. It’s more straightforward to find/fix the problem when it’s not a Microsoft product, and when the problem is the OS itself? It seems like in this scenario, the mitigation routine is users trying combinations of obscure registry tweaks, Windows settings, and bios settings until satisfactory performance is measured, after ruling out drivers, attached devices, and 3rd party software conflicts. Pretty tedious and frustrating. How would one go about mitigating this? Maybe automated testing? I suppose the issue here is really more about identifying settings that work towards less system latency. Thank you.
        
        Reply
        
        brucedawson says:
        
        October 31, 2022 at 10:28 pm
        
        That does sound frustrating. I don’t know what would be causing high latency in Microsoft drivers. Maybe it’s something about your hardware, or maybe your/LatencyMon’s expectations are too high. I don’t know. Sorry. Good luck.
        
        Reply
        
        Ona says:
        
        November 1, 2022 at 8:45 am
        
        Thanks for your thoughts. Great blog!