Ever since I upgraded to Windows 10 it’s felt like my battery life is worse. My suspicion was various scanning tasks that were springing to life more frequently, but it was just a hunch.
So, I did what I do – I profiled. I recorded long-running ETW traces to see how much CPU time was going to the processes that I have chosen to run, and how much was going to processes that Microsoft has chosen to run.
I should have done this years ago…
TL;DR – I found a few applications that were consuming the vast majority of the CPU time. These applications were useless to me so I disabled them. Sqlservr, Plex, and iTunes – you are dead to me now. I also disabled some lesser programs, and others are on my watch list.
I didn’t find much that was Windows 10 specific, so maybe my battery is just getting old and tired, but the fixes I’ve made were still worthwhile. My battery life should be better now.
See the end of this post for how to do a simple version of this analysis without needing to learn ETW tracing.
I knew I wanted an ETW trace that would contain all of the context switches, interrupts (ISRs) and deferred procedure calls (DPCs) over a period of about an hour. This would tell me exactly what processes and devices were consuming CPU time, and when.
I also knew that I needed to not enable the sampling profiler, because it would waste power and make the traces too big. And, I knew that I needed to not record call stacks, for the same reasons. This means that the resultant traces would be useless for figuring out what these processes were doing, but such are the tradeoffs we must make when recording all CPU usage for an hour at a time, instead of the usual length of about a minute.
xperf -start %logger% -on proc_thread+loader+dpc+interrupt+cswitch …
The result is this CPU usage graph covering a one hour time period. The y-axis is CPU usage by process as a percent of total system CPU power and the x-axis is time in seconds:
The “500 m” on the vertical axis indicates 0.5% of CPU power, and the “1” indicates 1%. Because this is a four-core/eight-thread laptop these markings indicate 4% and 8% of a single core. The visible time covers exactly an hour, from 300 to 3900 seconds in the trace.
The biggest consumer of CPU time is pretty obvious in the graph – those repeating reddish spikes had better be something really important. But they’re not. Those represent sqlservr (MSSQL10.SQLEXPRESS to be precise). This process is installed by default with Visual Studio (VS 2010 I think) and although I’ve seen it on traces before I’ve just ignored it. And I’ve never used it – I’m not hosting any databases. And yet, over the one hour period being analyzed:
- sqlservr woke up every minute, consumed 1.5 s of CPU time (spread out over a few seconds), and then went back to sleep
- sqlservr did 13,157,222 context switches
- sqlservr consumed 91.116 s of CPU time
- sqlservr did all of this while my laptop was on battery power
That’s pretty impressive. And, to be perfectly clear, this is a program that I’m not using. There are no databases loaded.
Once I knew that sqlservr was waking up once a minute it was easy to get a more detailed trace to see what it was doing – I just used UIforETW to record a normal trace for a bit more than a minute. Sqlservr had better be doing something really important to justify burning all of that CPU time. Yeah. About that… Here’s what the CPU sampling data says:
So sqlservr, which is using a paltry 15 MB of data because it is hosting zero databases is waking up once a minute to notify memory consumers that it’s “Eight Bells and All Is Well”. Why does it take so much CPU time when there is nothing to notify memory consumers about? Aarrrggghh!
I’m kind of pissed about this, both because of the colossal waste, and because I’ve done nothing about this for years. I feel like some Microsoft SQL developers were chortling to themselves and saying “I wonder how long Dawson will take to realize how much we’re hurting his battery life.”
How much electricity does sqlservr waste? The same trace that showed how it was spending its time also contained CPU power usage data (one of the bonus bits of data that UIforETW records, as long as Intel Power Gadget is installed). A bit of Excel math shows that the extra energy consumed by the CPU when sqlservr is notifying memory consumers is 5.6 mWh, which is enough to run the CPU when my PC is idle for at least five seconds. The graph below shows energy usage on the top and CPU usage on the bottom – note the two energy usage spikes corresponding to the two CPU usage blips that correspond to sqlservr springing to life.
This doesn’t mean that disabling sqlservr will give 8% better battery life, because other components also consume energy, but it will help.
If you’re actually using sqlservr then you should leave it alone, but if you’re not using it then disable it. Run services.msc, find SQL Server (SQLEXPRESS), stop the service, and change the Startup Type from Automatic to Manual. Or uninstall it if you feel particularly aggressive. Your battery will thank you.
And Microsoft, can you please fix this?
The lock screen
With sqlservr understood I wanted to remove it from the data so that the next busiest process can more easily be seen. This can be done by recording a new trace after disabling sqlservr, or it can be done by just hiding the sqlservr data in WPA. I took the latter route so that I could do most of my investigations from the same trace. You can disable any graphed item just by clicking its associated color (circled in the diagram above) to clear it. Right-clicking gives more options, including the option to change the color. With sqlservr hidden the next interesting target becomes obvious – the horizontal blue line at around the 200 m (0.2% of total CPU power) mark:
In fact, if you look really closely you can see that there are two lines that almost perfectly overlap.
Okay, this one was weird. And this seems to happen only on my laptop, because I’m lucky that way. When my screen turns off and locks because of inactivity then LockApp and LogonUI both start running xaml animations at 30 fps – I verified this with a normal UIforETW trace. Yes, these two programs start running animations as soon as the screen is off and the animations are guaranteed to be invisible. They each use 0.2% of CPU time, for a total of 0.4%, or about 3.2% of a core. But, at least this doesn’t happen when I’m using my laptop – hurray for small blessings.
I can’t repro this on any other Windows 10 machines so I assume this is something to do with upgrading from a five-year-old install of Windows 7. If you upgraded to Windows 10 then maybe see if this happens to you as well.
With LockApp and LogonUI graphs both disabled the next problem can be seen – the green horizontal line pointed to by the arrow. That represents the PlexDlnaServer process:
I installed Plex a few months ago to see if it would be a good solution for viewing photos on my 4K TV. It downsamples the pictures (to 720p I think) so the answer was: no. But, I left it installed and running because how bad could that be.
The answer is “not as bad as sqlservr, but still annoying”. Plex actually left three processes running and they all made it into the top fifteen by CPU usage. That’s not something to be proud of. When sorted by CPU Usage the three Plex* processes came in 5th, 11th, and 15th.
The three Plex* processes used a combined total of 49,551 ms per hour which means 13.7 ms of CPU time per second, which is way too much for an application that should just be waiting for work. Pro-tip: don’t poll.
PlexDlnaServer also has the distinction of having the second most context switches, which can harm power usage by never letting the CPU go to sleep.
While the graph view is great for seeing execution patterns, the table view can be better for seeing exactly how much CPU time each process consumed, and how many times they were context switched in. Here’s the data sorted by CPU Usage:
In the interests of science and transparency I’ve shared the raw data for the trace which I’ve been analyzing – sorted by CPU usage and by context switches in a google sheet and with more rows available, and the trace is available as a release .zip file on github.
Outlook (the green line just below PlexDlnaServer) had 297,110 context switches and used 31.7 seconds of CPU time. I’d like to see that lower, and I should probably close Outlook when I really need to maximize battery life. At least Outlook is a program that I am running intentionally, but I do wish it wouldn’t spin in pointless animation loops when it’s not even active.
MsMpEng (anti-virus software from Microsoft) consumed 19.6 s of CPU time. I’m not sure why it has such trouble staying asleep even when the rest of the system is idle.
The system process consumes 15.2 s of time. Looking back at the graph I can see that it wakes up once every three minutes to do some bookkeeping, but I’m not yet sure what to do about that. Ditto with svchost.
LMS is Intel Management and Security Application Local Management Service and it should slow down.
NisSrv is Microsoft Network Realtime Inspection Service. I don’t know if its CPU usage is justified.
Here’s the same data sorted by context switch count:
Context switches can cause significant power waste, but it depends on the pattern. If explorer’s 99,774 context switches are evenly distributed then that means that the CPU is being woken up every 36 ms and that is really bad. If they are clumped together then the power implications are much less severe.
csrss may be related to the LockApp and LogonUI animations – I don’t know.
iPodService, AppleMobileDeviceService, and iTunesHelper are doing a combined average of 25.5 context switches per second. I’ve never owned an iPod and I rarely run iTunes so this is completely excessive. I used services.msc and autoruns to stop these – it took a few tries to make that work.
SynTPEnh is the driver for my touchpad I believe – it’s not clear why it wakes up so frequently when my touchpad is untouched.
EDICT (Microsoft Encarta Dictionary Tools) wakes up every 400 ms, and hangs around after you close it (so it can display its notification area icon). Unchecking “Always show icon in taskbar” makes it go away when closed, thus avoiding 9,079 context switches per hour.
Dropbox wasn’t particularly high (4,986 context switches), but does it really need to wake up more than once a second when nothing is happening and use that much CPU time? Ditto for PhotoshopElementsFileAgent (3,580 context switches) – they should learn a few things from FlashPlayerUpdateService (92 context switches).
Once you start using autoruns you can easily go crazy on turning off software, but I decided not to do this. My focus is on improving battery life so I only need to disable programs that are waking up frequently.
Rules of the game for writing long-running software
If a software developer needs to run a program in the background, that’s fine. But there are rules. Because you could be running for hours at a time on battery power. The rules are:
- Don’t poll. Polling wastes CPU time and stops the CPU from dropping deep into power-saving states. WaitForMultipleObjects is your friend.
- Seriously, don’t poll. I know you think that your program is a special snowflake but polling is just wasteful.
- If you really can’t figure out how to avoid polling then be smart about it. Waking up every second may seem very conservative, but if every long running program does it then your CPU may get woken up dozens of times per second. Consider using exponential back off, and when you do wake up and find that there is nothing to do then go back to sleep promptly.
- Using SetCoalescableTimer can help Windows coalesce your polling with others to reduce the power cost, or better yet, SetThreadpoolTimer with a large ‘window’ argument.
- Avoid doing long-running animations, especially when your program is inactive or invisible. Incessant animation is just another form of polling, except that it is visible, so it makes it easier for users to realize that you’re wasting power.
What I do have are a very particular set of skills, skills I have acquired over a very long career, and if you violate these rules then I will look for your process, I will find it, and I will kill it. If you respect your customers’ CPU time then they are more likely to leave your software installed.
I carefully recorded the time used by ISRs and DPCs, but it didn’t look very interesting so I didn’t spend a lot of time on it. ndis.sys and ntoskrnl.exe had by the far the most context switches and the highest CPU usage, but I don’t know what to do about that.
The great thing is that this sort of research is pretty easy. Just download the latest UIforETW release. You need to run UIforETW once in order to install the Windows Performance Toolkit, but then close UIforETW, and maybe reboot to finish the setup. Get your computer so that it is running the programs that you care about, and nothing else. Then, from an administrator command prompt run UIforETW\bin\etw_cpuusage_longterm.bat. This starts up low-overhead tracing. Hit enter when you’re done. The UIforETW\bin\CPUUsageByProcess.wpaProfile profile is particularly good for summarizing this type of data.
I use ETW tracing because I’m familiar with it, because I could configure it to have very low impact, and I could do detailed post-mortem analysis of the results. But, there are other options. One great option is Process Explorer. Leave this running with the CPU, CPU Time, CSwitch Delta, and Context Switches columns visible, and watch for bad behavior.
- CPU (or Cycles Delta) shows how much time each process is currently using. The Cycles Delta column should be more accurate for processes that are using very little CPU time
- CPU Time (or Cycles) shows how much time each process has used – this can catch processes like sqlservr that are usually idle, but accumulate a lot of wasted CPU time in the long run
- Also look at CSwitch Delta and Context Switches to find programs that are currently or historically doing a lot of context switches.
When looking at procexp.exe you should see that Interrupts, System Idle Process, and procexp itself should be at the top for CPU and CSwitch Delta, and everything else should (on an idle machine), be pretty low. Investigate the anomalies, and turn them off if you don’t want them. Take control of the CPU usage on your computer, and save energy.