Power Wastage On An Idle Laptop

Ever since I upgraded to Windows 10 it’s felt like my battery life is worse. My suspicion was various scanning tasks that were springing to life more frequently, but it was just a hunch.

So, I did what I do – I profiled. I recorded long-running ETW traces to see how much CPU time was going to the processes that I have chosen to run, and how much was going to processes that Microsoft has chosen to run.

I should have done this years ago…

TL;DR – I found a few applications that were consuming the vast majority of the CPU time. These applications were useless to me so I disabled them. Sqlservr, Plex, and iTunes – you are dead to me now. I also disabled some lesser programs, and others are on my watch list.

I didn’t find much that was Windows 10 specific, so maybe my battery is just getting old and tired, but the fixes I’ve made were still worthwhile. My battery life should be better now.

See the end of this post for how to do a simple version of this analysis without needing to learn ETW tracing.

First steps

I knew I wanted an ETW trace that would contain all of the context switches, interrupts (ISRs) and deferred procedure calls (DPCs) over a period of about an hour. This would tell me exactly what processes and devices were consuming CPU time, and when.

I also knew that I needed to not enable the sampling profiler, because it would waste power and make the traces too big. And, I knew that I needed to not record call stacks, for the same reasons. This means that the resultant traces would be useless for figuring out what these processes were doing, but such are the tradeoffs we must make when recording all CPU usage for an hour at a time, instead of the usual length of about a minute.

I hacked up etwrecord.bat to create etw_cpuusage_longterm.bat, and let it run for an hour while my laptop was on battery power. This is approximately the command line I used:

xperf -start %logger% -on proc_thread+loader+dpc+interrupt+cswitch …

The result is this CPU usage graph covering a one hour time period. The y-axis is CPU usage by process as a percent of total system CPU power and the x-axis is time in seconds:

Source: 2016-01-23_17-00-41_per_process_cpu_usage.etl

The “500 m” on the vertical axis indicates 0.5% of CPU power, and the “1” indicates 1%. Because this is a four-core/eight-thread laptop these markings indicate 4% and 8% of a single core. The visible time covers exactly an hour, from 300 to 3900 seconds in the trace.

Sqlservr!!!

The biggest consumer of CPU time is pretty obvious in the graph – those repeating reddish spikes had better be something really important. But they’re not. Those represent sqlservr (MSSQL10.SQLEXPRESS to be precise). This process is installed by default with Visual Studio (VS 2010 I think) and although I’ve seen it on traces before I’ve just ignored it. And I’ve never used it – I’m not hosting any databases. And yet, over the one hour period being analyzed:

  • sqlservr woke up every minute, consumed 1.5 s of CPU time (spread out over a few seconds), and then went back to sleep
  • sqlservr did 13,157,222 context switches
  • sqlservr consumed 91.116 s of CPU time
  • sqlservr did all of this while my laptop was on battery power

That’s pretty impressive. And, to be perfectly clear, this is a program that I’m not using. There are no databases loaded.

Once I knew that sqlservr was waking up once a minute it was easy to get a more detailed trace to see what it was doing – I just used UIforETW to record a normal trace for a bit more than a minute. Sqlservr had better be doing something really important to justify burning all of that CPU time. Yeah. About that… Here’s what the CPU sampling data says:

Source: 2016-01-26_22-17-07 sqlserver sampling.etl

So sqlservr, which is using a paltry 15 MB of data because it is hosting zero databases is waking up once a minute to notify memory consumers that it’s “Eight Bells and All Is Well”. Why does it take so much CPU time when there is nothing to notify memory consumers about? Aarrrggghh!

I’m kind of pissed about this, both because of the colossal waste, and because I’ve done nothing about this for years. I feel like some Microsoft SQL developers were chortling to themselves and saying “I wonder how long Dawson will take to realize how much we’re hurting his battery life.”

How much electricity does sqlservr waste? The same trace that showed how it was spending its time also contained CPU power usage data (one of the bonus bits of data that UIforETW records, as long as Intel Power Gadget is installed). A bit of Excel math shows that the extra energy consumed by the CPU when sqlservr is notifying memory consumers is 5.6 mWh, which is enough to run the CPU when my PC is idle for at least five seconds. The graph below shows energy usage on the top and CPU usage on the bottom – note the two energy usage spikes corresponding to the two CPU usage blips that correspond to sqlservr springing to life.

image

This doesn’t mean that disabling sqlservr will give 8% better battery life, because other components also consume energy, but it will help.

If you’re actually using sqlservr then you should leave it alone, but if you’re not using it then disable it. Run services.msc, find SQL Server (SQLEXPRESS), stop the service, and change the Startup Type from Automatic to Manual. Or uninstall it if you feel particularly aggressive. Your battery will thank you.

And Microsoft, can you please fix this?

Grrrr.

The lock screen

imageWith sqlservr understood I wanted to remove it from the data so that the next busiest process can more easily be seen. This can be done by recording a new trace after disabling sqlservr, or it can be done by just hiding the sqlservr data in WPA. I took the latter route so that I could do most of my investigations from the same trace. You can disable any graphed item just by clicking its associated color (circled in the diagram above) to clear it. Right-clicking gives more options, including the option to change the color. With sqlservr hidden the next interesting target becomes obvious – the horizontal blue line at around the 200 m (0.2% of total CPU power) mark:

Source: 2016-01-23_17-00-41_per_process_cpu_usage.etl

In fact, if you look really closely you can see that there are two lines that almost perfectly overlap.

Okay, this one was weird. And this seems to happen only on my laptop, because I’m lucky that way. When my screen turns off and locks because of inactivity then LockApp and LogonUI both start running xaml animations at 30 fps – I verified this with a normal UIforETW trace. Yes, these two programs start running animations as soon as the screen is off and the animations are guaranteed to be invisible. They each use 0.2% of CPU time, for a total of 0.4%, or about 3.2% of a core. But, at least this doesn’t happen when I’m using my laptop – hurray for small blessings.

I can’t repro this on any other Windows 10 machines so I assume this is something to do with upgrading from a five-year-old install of Windows 7. If you upgraded to Windows 10 then maybe see if this happens to you as well.

Plex

With LockApp and LogonUI graphs both disabled the next problem can be seen – the green horizontal line pointed to by the arrow. That represents the PlexDlnaServer process:

Source: 2016-01-23_17-00-41_per_process_cpu_usage.etl

I installed Plex a few months ago to see if it would be a good solution for viewing photos on my 4K TV. It downsamples the pictures (to 720p I think) so the answer was: no. But, I left it installed and running because how bad could that be.

The answer is “not as bad as sqlservr, but still annoying”. Plex actually left three processes running and they all made it into the top fifteen by CPU usage. That’s not something to be proud of. When sorted by CPU Usage the three Plex* processes came in 5th, 11th, and 15th.

The three Plex* processes used a combined total of 49,551 ms per hour which means 13.7 ms of CPU time per second, which is way too much for an application that should just be waiting for work. Pro-tip: don’t poll.

PlexDlnaServer also has the distinction of having the second most context switches, which can harm power usage by never letting the CPU go to sleep.

Other programs

While the graph view is great for seeing execution patterns, the table view can be better for seeing exactly how much CPU time each process consumed, and how many times they were context switched in. Here’s the data sorted by CPU Usage:

Source: 2016-01-23_17-00-41_per_process_cpu_usage.etl

In the interests of science and transparency I’ve shared the raw data for the trace which I’ve been analyzing – sorted by CPU usage and by context switches in a google sheet and with more rows available, and the trace is available as a release .zip file on github.

Outlook (the green line just below PlexDlnaServer) had 297,110 context switches and used 31.7 seconds of CPU time. I’d like to see that lower, and I should probably close Outlook when I really need to maximize battery life. At least Outlook is a program that I am running intentionally, but I do wish it wouldn’t spin in pointless animation loops when it’s not even active.

MsMpEng (anti-virus software from Microsoft) consumed 19.6 s of CPU time. I’m not sure why it has such trouble staying asleep even when the rest of the system is idle.

The system process consumes 15.2 s of time. Looking back at the graph I can see that it wakes up once every three minutes to do some bookkeeping, but I’m not yet sure what to do about that. Ditto with svchost.

I had five web pages open in Chrome, running who knows what Javascript, so 14.5 s of CPU time doesn’t seem terrible, but part of my day job is to make that number even smaller.

LMS is Intel Management and Security Application Local Management Service and it should slow down.

NisSrv is Microsoft Network Realtime Inspection Service. I don’t know if its CPU usage is justified.

Here’s the same data sorted by context switch count:

Source: 2016-01-23_17-00-41_per_process_cpu_usage.etl

Context switches can cause significant power waste, but it depends on the pattern. If explorer’s 99,774 context switches are evenly distributed then that means that the CPU is being woken up every 36 ms and that is really bad. If they are clumped together then the power implications are much less severe.

csrss may be related to the LockApp and LogonUI animations – I don’t know.

iPodService, AppleMobileDeviceService, and iTunesHelper are doing a combined average of 25.5 context switches per second. I’ve never owned an iPod and I rarely run iTunes so this is completely excessive. I used services.msc and autoruns to stop these – it took a few tries to make that work.

SynTPEnh is the driver for my touchpad I believe – it’s not clear why it wakes up so frequently when my touchpad is untouched.

EDICT (Microsoft Encarta Dictionary Tools) wakes up every 400 ms, and hangs around after you close it (so it can display its notification area  icon). Unchecking “Always show icon in taskbar” makes it go away when closed, thus avoiding 9,079 context switches per hour.

Dropbox wasn’t particularly high (4,986 context switches), but does it really need to wake up more than once a second when nothing is happening and use that much CPU time? Ditto for PhotoshopElementsFileAgent (3,580 context switches) – they should learn a few things from FlashPlayerUpdateService (92 context switches).

Once you start using autoruns you can easily go crazy on turning off software, but I decided not to do this. My focus is on improving battery life so I only need to disable programs that are waking up frequently.

Rules of the game for writing long-running software

If a software developer needs to run a program in the background, that’s fine. But there are rules. Because you could be running for hours at a time on battery power. The rules are:

  1. Don’t poll. Polling wastes CPU time and stops the CPU from dropping deep into power-saving states. WaitForMultipleObjects is your friend.
  2. Seriously, don’t poll. I know you think that your program is a special snowflake but polling is just wasteful.
  3. If you really can’t figure out how to avoid polling then be smart about it. Waking up every second may seem very conservative, but if every long running program does it then your CPU may get woken up dozens of times per second. Consider using exponential back off, and when you do wake up and find that there is nothing to do then go back to sleep promptly.
  4. Using SetCoalescableTimer can help Windows coalesce your polling with others to reduce the power cost, or better yet, SetThreadpoolTimer with a large ‘window’ argument.
  5. Avoid doing long-running animations, especially when your program is inactive or invisible. Incessant animation is just another form of polling, except that it is visible, so it makes it easier for users to realize that you’re wasting power.

What I do have are a very particular set of skills, skills I have acquired over a very long career, and if you violate these rules then I will look for your process, I will find it, and I will kill it. If you respect your customers’ CPU time then they are more likely to leave your software installed.

Web browsers

I work on Chrome so I was pleased to see that it looked pretty good. I left the five tabs I was browsing open and the net result was very low CPU usage. Chrome lowers the timer frequency on background tabs, and throttles flash. However the results are highly dependent on the page. A JavaScript or animation heavy page could easily consume huge amounts of CPU time and it is difficult for a browser to control that, so be aware of what pages you keep open if you want to run on battery for long periods of time.

Interrupts

I carefully recorded the time used by ISRs and DPCs, but it didn’t look very interesting so I didn’t spend a lot of time on it. ndis.sys and ntoskrnl.exe had by the far the most context switches and the highest CPU usage, but I don’t know what to do about that.

Alternate methods

The great thing is that this sort of research is pretty easy. Just download the latest UIforETW release. You need to run UIforETW once in order to install the Windows Performance Toolkit, but then close UIforETW, and maybe reboot to finish the setup. Get your computer so that it is running the programs that you care about, and nothing else. Then, from an administrator command prompt run UIforETW\bin\etw_cpuusage_longterm.bat. This starts up low-overhead tracing. Hit enter when you’re done. The UIforETW\bin\CPUUsageByProcess.wpaProfile profile is particularly good for summarizing this type of data.

I use ETW tracing because I’m familiar with it, because I could configure it to have very low impact, and I could do detailed post-mortem analysis of the results. But, there are other options. One great option is Process Explorer. Leave this running with the CPU, CPU Time, CSwitch Delta, and Context Switches columns visible, and watch for bad behavior.

image

  • CPU (or Cycles Delta) shows how much time each process is currently using. The Cycles Delta column should be more accurate for processes that are using very little CPU time
  • CPU Time (or Cycles) shows how much time each process has used – this can catch processes like sqlservr that are usually idle, but accumulate a lot of wasted CPU time in the long run
  • Also look at CSwitch Delta and Context Switches to find programs that are currently or historically doing a lot of context switches.

When looking at procexp.exe you should see that Interrupts, System Idle Process, and procexp itself should be at the top for CPU and CSwitch Delta, and everything else should (on an idle machine), be pretty low. Investigate the anomalies, and turn them off if you don’t want them. Take control of the CPU usage on your computer, and save energy.

Discussions on reddit, hacker news, and twitter.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Investigative Reporting, Performance, Programming, uiforetw, xperf and tagged , . Bookmark the permalink.

32 Responses to Power Wastage On An Idle Laptop

  1. Pingback: 2 – Power Wastage on an Idle Laptop

  2. Adrian says:

    So how much better is you battery life?

    • brucedawson says:

      Good question, but I don’t know. The change is likely to be small and the noise (from occasional indexing, update scans, anti-virus scans, etc.) is large so I’m not sure I could measure it without extreme effort. Send me some matching laptops to test on and I’ll happily do science!

      I may try another one-hour idle scan with all of the changes to compare before/after context switches and CPU usage.

  3. billco says:

    Solid investigative chops, as always! I really like your tip about exponential back-off, I’d never thought of that.

  4. I think there’s a minor mistake
    “..Excel math shows that the extra energy consumed by the CPU when *Excel* is notifying memory consumers..”

  5. Neeraj Singh says:

    Hi Bruce, thanks for the great article!

    On your SetCoalescableTimer tip, I’d instead recommend SetThreadpoolTimer with a large ‘window’ argument. As of win 10, it has a better implementation of coalescing than set coalescable timer.
    -Neeraj, former Windows kernel dev.

  6. Good article! I’m happy to have learned about new ways to measure performance.

    I collected a trace from my computer idling and noticed that for periods when Task Manager reports ~2% CPU usage, Windows Performance Analyzer shows ~15% CPU Usage. Does Task Manager skip some of the data? Does WPA ignore throttling information and reports 15% usage of a CPU slowed down to 1GHz?

    • brucedawson says:

      Windows Task Manager has multiple problems with its reporting of CPU usage.

      1) It has insufficient precision. On a four-core/eight-thread machine a 100% busy thread corresponds to just 12.5% CPU time, which task manager shows as 12. Lesser amounts may easily truncate to zero.
      2) Task Manager, IIRC, just looks to see who is running when the timer interrupt fires. That is, it is *sampling* CPU usage and can entirely miss or overcount CPU usage. I believe its “Cycles” column uses context switch data and is more accurate.

      WPA does ignore throttling information. That is, it doesn’t report how many clock cycles a thread is running, it reports how long (wall clock time) a thread is running. All CPU utilization data is done that way. To do it any other way would lead to bizarre and ill-defined results I believe.

  7. J says:

    For a quicker / less involed investigation you can also use the “powercfg -energy” command. Spits out a little HTML report and might even have info about drivers/hardware that’s playing up.

  8. Jon says:

    Does your “don’t poll” advice apply equally to server applications, where battery life is not an issue?
    A couple examples of polling: Process explorer and perfmon.

    • brucedawson says:

      There are some exceptions – the ETW sampling profiler is polling. But I think in general the advice applies to servers as well – power consumption is a huge issue in data centers.

      If you are polling it should be because there is a high likelihood that you will immediately find work to do, and polling is more efficient (throughput and power) than waiting. Don’t do it because waiting for work is hard.

      Some server expert could probably give a specific example where polling is needed, but I can’t think of any.

      • It’s pretty common to poll in low-latency scenarios (e.g. HFT). Typically you can’t even afford to acquire a lock because that may mean you’ll need to wait for your thread to be rescheduled again, so you spin-wait for another core burning electricity in the process.

        • brucedawson says:

          In that case you are trading CPU power for latency – wasting some CPU power to reduce latency slightly. This is an *extremely* delicate tradeoff that only makes sense when your latency needs are sub-millisecond and really only makes sense on a dedicated machine.

          So true, but not applicable to consumer software.

          Well, mostly not. Very brief spinning in job-queue systems can make sense, but I doubt it should ever exceed a fraction of a millisecond. I’ve seen job queues that mess this up and busy-wait for over a second. Uggh.

          • rpavlik says:

            So in my particular case, we do actually have to do this, as far as I can tell – virtual reality (specifically HMD-based VR) software, where any little bit of latency perceived gets you plenty of (valid and invalid) criticism – the main loop of our software ends up getting some 10ms of “mystery” latency with any type of yield, so while we try to be polite when nobody is connected, if there’s an app running and using our services, we effectively “burn a core” to get rid of the 10ms penalty, since that’s very perceptible.

            Not saying that anything is incorrect in your article, just pointing out one unusual corner case, and yet, how it can still incorporate your advice (not burning a core when nobody is listening and needing low-latency services).

          • brucedawson says:

            VR is a special case – a rare instance where a ms difference can be perceived.

            That said: if you are losing 10 ms then that sounds like you are leaving the timer frequency at its default setting. While I have also railed against unnecessarily raising the global Windows timer frequency (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) it is better to raise the frequency than to busy wait. Doing so may give you the 1 ms precision you need without spinning.

            So, yeah, for VR you almost need to violate one or more of my recommendations🙂 – but with the timer frequency recommendation in particular I always said that it should be used as-needed, not never. It just bugs me when long-running idle programs raise the timer frequency for no good reason.

          • Jon says:

            How about timeouts? For example, disconnect clients if they don’t respond within 10 seconds.
            Can’t use WaitForSingleObject/Sleep as that’d block a thread per client. And timer objects are limited resources.
            Could use polling for that?

          • brucedawson says:

            > Can’t use WaitForSingleObject/Sleep as that’d block a thread per client.

            That’s what WaitForMultipleObjects is for – awesome function.

            For client disconnect timeouts and what-not you probably need to pass a timeout value to WaitForMultipleObjects. Passing in a timeout to WaitForMultipleObjects is arguably a form of polling, but at least in this hypothetical case you are using a timeout because you are servicing clients! All of the badly behaved software in my example was serving zero clients – that’s what makes the inefficiency so annoying.

          • Jon says:

            Do you mean WaitForMultipleObjects using Waitable Timer Objects? (in which case you don’t need a timeout as well). That still blocks 1 thread, but I guess that’s better than polling with 0 clients.

  9. TaskManager shows CPU time. Just add it via Select Columns.
    My biggest enemy is utorrent. It spawns process called utorrentie.exe and mines bitcoins in it.

  10. rpavlik says:

    Have you noticed any issues with Visual Studio since disabling SQL server? I was under the impression that newer versions used it for intellisense. (And, as a non database dev, I’m not totally clear on the different editions: sounded like they had a “sqlite-imitator” – in process, etc – version, but then they go and install this thing called express on my machine anyway.)

    • brucedawson says:

      I have not seen any issues. That concern is why I ignored sqlservr for years, but in hindsight that was foolish. VS (and Windows Live Photo Gallery) run an in-process database server so they are unaffected by whether sqlservr is running.

      So yeah, unless you are doing database development you can kill it with fire.

  11. Pingback: Les liens de la semaine – Édition #175 | French Coding

  12. IL says:

    bass.dll sets timeBeginPeriod(1) on init. It better set it when it actually needed and set it back just right after. Hopefully that could be changed.

    • brucedawson says:

      Do you mean base.dll in Chrome? If so then it does set it as needed. I have six Chrome tabs open right now and my timer interval is 15.626 ms. There may be some situations where the timer frequency gets permanently raised but it does not happen always. Reproes are always welcome.

      • IL says:

        No, not base.dll which appears to be missing on my computer.
        This one – BASS.DLL http://www.un4seen.com/ – it’s very popular audio library.

        • brucedawson says:

          Ah – got it. Yeah, you should definitely complain to them about that. Audio shouldn’t need a 1 ms timer. Audio programs just need to use large enough buffers so that they can avoid skipping without waking up 1,000 times per second.

          • IL says:

            Unfortunately software using that library do not set timer resolution back after BASS Init. BASS.DLL itself sets it indefinetely on Init if it is not compiled with undocumented (sigh!)BASS_CONFIG_NOTIMERES config option. Rising timer resolution is used to keep something in sync as close as possible. Unfortunately, this is all Greek to me.
            http://www.un4seen.com/forum/?topic=17107.0

  13. JohnB says:

    Nice post🙂 Cracked this out to optimize performance on a tablet/netbook I just got (Acer Aspire Switch 10) – very good excuse for getting familiar with xperf! (and I usually spend a lot of time optimizing with Autoruns/Process-Explorer too)

    Interestingly, I found that this tablets minimum CPU usage, is limited by its internal SD memory storage, due to running a special WIMBoot install of Windows 8.1 that massively reduces the installed OS size, using a compressed partition (which serves a dual use as a recovery partition), but which constantly is decompressing needed OS files.

    This nicely shows up in the resultant xperf data, on interrupts to sdbus.sys, correlated with high System process CPU🙂 (which, when examining threads in Process Explorer, reliably shows up high CPU calls to a ‘DecompressBuffer’ function)

    Pretty nice at nailing that performance bottleneck. Am a game dev myself, working on some performance critical network code at the moment, so think this tool may come in pretty handy!

    • brucedawson says:

      Interesting… compressed OS files would make paging them in more expensive, but… shouldn’t those files all be cached? Maybe the real problem is that you don’t have enough RAM.

      More RAM does draw a bit more power, so, tradeoffs, but it seems worth trying.

      Then again those CPU spikes are suspiciously regular – I wonder what is triggering them?

  14. Pingback: UIforETW is No Longer a CPU Hog | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s