Hey Synaptics, Can You Please Stop Polling!

Posted on September 5, 2017 by brucedawson

TL;DR – be wary of buying a Lenovo laptop or any other laptop that uses a Synaptics touch pad until Synaptics ships a fixed driver. Their driver has a memory leak and they have a battery-life bug that causes Windows to repeat the same system-scan once a second. So far Synaptics has failed to respond so the wait for a fix must be assumed to be infinite.

Update: October 15, 2017: an updated driver for my laptop was released September 11, 2017, six days after I published this post. However I have yet to see the driver on Windows Update, Lenovo’s System Update, or in any communications from Lenovo or Synaptics. So, I don’t know if or how ‘normal’ customers are supposed to get this fix. I only found out through the kindness of a stranger.

Now for more details to back up my claims…

My new laptop has great battery life and I want to keep it that way so if I notice something consuming CPU time for no good reason I investigate. If it’s a web page with battery-wasting ads I close it, and if it’s a process that keeps doing background work I’ll kill it.

So, when I noticed that Task Manager said that WmiPrvSE.exe was consuming ~0.7% of my CPU time (~5.6% of a core) continuously I investigated.

I shut down all visible processes and a few invisible ones but the CPU usage continued, so I grabbed an ETW trace. It showed WmiPrvSE.exe was doing about 200 ms of work, then sleeping for one second, and then doing the work again. This means that WmiPrvSE.exe was actually using more CPU time than Task Manager had indicated – a full 16% of a core (200 ms every 1200 ms). The image below shows five seconds of activity on my system which is blissfully idle except for WmiPrvSE.exe:

The next thing I did was look at the CPU Usage (Sampled) data to see where the time was going. A flame graph gave me the best visualization. I know practically nothing about WMI but it seems to me that if Process::LoadCheapPropertiesNT is consuming most of your CPU time then maybe ‘cheap’ is the wrong adjective to use. And if SetDateTime is a major cost then it might be that you are repeatedly loading libraries and strings and, I dunno, maybe this code should be ten times less costly:

Poking around at the trace a bit more I also noticed that tzrez.dll is opened 676 times whenever WmiPrvSE.exe springs to life and tzrez.dll.mui is opened 338 times, which would explain the LoadLibrary and LoadString costs which make SetDateTime so expensive. It looks like there are some easy optimization opportunities here for Microsoft. But, why is this code springing to life every 1.2 seconds?

As somebody with no WMI expertise I started sharing information on twitter and it worked brilliantly. Koby Kahane gave me several crucial hints including his interpretation of the flame graph above:

This is just the WMI service enumerating Win32_Process instances. You need to find the client process doing the queries to find the culprit.

Who is doing this query?

Normally I use context switch data (CPU Usage (Precise)) to to track interactions between processes, as one process readies the next, but this time that just told me that WmiPrvSE.exe was being readied (woken up) by svchost.exe (3228) running Windows Management Instrumentation. That merely let’s me restate the question as “who is asking the Windows Management Instrumentation svchost to request this query?” The next suggestion from twitter was this:

I suggest recording Microsoft-Windows-WMI-Activity, it should have events for the client issuing the queries.

Unfortunately that didn’t help. The type of query that had been set up was a fire-and-forget query where the WMI service automatically repeats it indefinitely. This seems like a dangerous option to give developers, especially since there are better alternatives:

A polling query is particularly tragic in the case of Win32_Process, since Win32_ProcessStartTrace does the same thing without polling.

Then I got two crucial hints – one is that there was probably yet another proxy process (unsecapp.exe) and the other was that results were probably only being reported when new processes were created:

But I think unsecapp.exe would be readying the client, if you rapidly create processes to make the notification query have results.

That explained why I could see no signs of results being sent to the client process, because I was recording my traces on an idle machine with no processes being created.

So, I recorded another trace, this time creating new processes during the trace. The details were messy but they showed that:

svchost.exe (3228), deep inside CCoreQueue::Execute, sends a command to WmiPrvSE.exe which (presumably) triggers its work
When WmiPrvSE.exe is finished its work it lets svchost.exe (3228) know
svchost.exe (3228), deep inside CQueueingEventSink::DeliverAll, then wakes up unsecapp.exe
unsecapp.exe readies four different processes: itself, svchost.exe (3228), System, and SynTPEnh.exe. One of this must be the process that created the polling WMI query, and the only one it can be is SynTPEnh.exe, signed by Synaptics Incorporated.

I killed the SynTPEnh.exe process but it came back to life like a digital zombie. So I killed its friends (SynTPEnhService.exe and SynTPHelper.exe) and then killed it and this time it stayed dead, and WmiPrvSE.exe finally stopped its work!

SynTPEnh.exe is the Synaptics TouchPad 64-bit Enhancements, and this is not the first time I’d noticed it misbehaving. Just one month earlier I had complained that it was using 662 MiB of private working set, and when investigating its new bad behavior it was consuming 942 MiB:

The search for an easier way

The whole investigation process for this CPU consumption seemed unreasonably messy. It bothers me that a process can create a polling query like this without having to leave a clearer calling card. It was pointed out that you can open up Event Viewer, go to Applications and Services Logs, Microsoft, Windows, WMI-Activity, Operational, then look for likely queries around the time that you rebooted. Here’s what I found:

$Namespace = ROOT\CIMV2; NotificationQuery = SELECT * FROM __InstanceCreationEvent WITHIN 1 WHERE TargetInstance ISA 'Win32_Process'; UserName = LAPTOP-EAODEGEE\Bruced; ClientProcessID = 2764, ClientMachine = LAPTOP-EAODEGEE; PossibleCause = Temporary$

The ClientProcessID identifies which process made the query. I’m not sure I could have found the right query if I didn’t already know what process ID I was looking for. If this is the easy way of tracking down who made these battery-depleting queries then maybe it needs to be made easier.

How long, has this been going on

I’d had my laptop for almost three months when I noticed this so naturally I wondered why I hadn’t noticed it before. I’m enough of a pack rat that I had a trace from when I first got the laptop (from investigating resume-from-standby hangs caused by a buggy Intel driver, sigh…) and I could see that this bad behavior was not showing up in that trace. I then looked at the version number of SynTPEnh.exe in that trace (ETW traces contain a ton of useful information) and it was 19.3.4.75. The badly behaved version was 19.3.4.83, so clearly a driver upgrade had happened and that caused this problem. I then found the driver install entry that showed exactly when the bad driver arrived – apparently it took me more than six weeks to notice:

What about that memory leak?

When I’d first noticed that SynTPEnh.exe was leaking memory I couldn’t reproduce it, but this investigation gave me a new idea: maybe the new-process notifications were related to the memory leak. I happened to have a process creation stress test lying around from investigating a Windows 10 performance problem so I ran it. To nobody’s surprise running it caused the SynTPEnh.exe private working set to increase by about 20 MiB – about 20 KiB per process (technically ~20.5 KiB per process because base-2 is so awesome).

Since the memory leak is proportional to how many processes are created I can safely assume that most users will not see as much memory leaked as I have. You really have to build Chrome or run ProcessCreateTests.exe if you want to leak a huge amount of memory. On the other hand, most consumers don’t have 32 GiB laptops and can’t afford to leak large amounts of memory. Here are some more leak statistics:

Restarting Chrome (26 tabs opened): ~1 MiB leaked
Building Chrome ‘base’ target (a tiny portion of Chrome): ~12 MiB leaked
Running ProcessCreateTests.exe: ~20 MiB leaked

In other words, the memory leak was exacerbated by being a Chrome developer and by investigating the Windows 10 performance problem. It’s safe to say that I was hit worse than anyone. But it still needs fixing.

Next steps

I used Synaptics’ “Contact Us” form to report these bugs and I tweeted information to them but I have heard no response.

I also contacted Lenovo since they are the maker of my laptop. They responded but I’m not sure how seriously they are taking the issue and I don’t feel like experimenting with driver versions that “might” fix the issue.

Microsoft suggested that they could use shimming to fix the issue. And they encouraged me to file a feedback report, which I did. I’m not thrilled with shimming as a solution, but unresponsive developers is the perfect use-case for it.

I’m not the first person to notice this. The bug was reported to Lenovo on May 30th and May 31st. Various suggestions were made but the issue can’t be considered fixed until an automatic update goes out to all customers.

For my own sanity I used the Roll Back Driver functionality to go back to a better version:

So, my laptop battery life is now back to normal, until the next update comes along. But a few million unwary customers are still wasting power and memory until this issue is fixed.

Apparently Lenovo has fixed this for at least some Yoga machines.

One reader suggested that uninstalling the Synaptics driver and using the Microsoft Precision Touch Pad drivers might work better.

An alternative WMI investigative technique was suggested here.

Hacker news discussion is here.

Reddit discussion is here.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

This entry was posted in Investigative Reporting, Uncategorized. Bookmark the permalink.

30 Responses to Hey Synaptics, Can You Please Stop Polling!

mohammed imran says:

September 5, 2017 at 9:46 am

Are all users using the synaptic pad with this driver affected?

Reply
- brucedawson says:
  
  September 5, 2017 at 12:23 pm
  
  I would assume that all users with that Synaptic driver version are affected, but I don’t claim to have an accounting of what machines have that driver version. Lenovo shipped fixed drivers for their Yoga machines a few weeks ago, for instance:
  https://pcsupport.lenovo.com/us/en/products/LAPTOPS-AND-NETBOOKS/THINKPAD-X-SERIES-LAPTOPS/THINKPAD-X1-YOGA-TYPE-20JD-20JE-20JF-20JG/20JF/downloads/DS120311
  I don’t know why the P51 hasn’t received updated drivers yet.
  
  Reply
ericlaw says:

September 5, 2017 at 10:28 am

Curious: Does ‘powercfg /energy’ give any useful output here, or is the wasted CPU hidden from its analysis?

Reply
- brucedawson says:
  
  September 5, 2017 at 8:20 pm
  
  I think powercfg /energy just shows that WmiPrvSE.exe is using too much CPU time without indicating why. Pity.
  
  Reply
Mohammed Imran says:

September 5, 2017 at 10:45 am

19.2.17.55/ 2/7/2017/Synaptics

Reply
Severin Pappadeux says:

September 5, 2017 at 10:47 am

Is it still behaving that way if mouse is plugged in and touchpad option is set to disable when external pointing device is used?

Reply
Koby Kahane says:

September 5, 2017 at 11:21 am

I think you mean tzres.dll when you say trez.dll.

What would you suggest, event wise, be added to make tracking down the initiators of WMI notification queries easier? I guess adding the PID of the notification query initiator to the event for the “polling query” done every interval would go a long way.

Reply
- brucedawson says:
  
  September 5, 2017 at 12:22 pm
  
  D’oh – fixed. Writing is hard.
  
  I’d like to see a WMI event issued every time that the query happens. That would make finding the culprit in an ETW trace or in event viewer much simpler. Only emitting these events when the query is initiated seems perverse, at least in this context.
  
  Reply
Zine Eddine Bendjaballah says:

September 5, 2017 at 12:21 pm

Very good analysis and use of Windows Performance Toolkit to troubleshoot performance issues.

Reply
Rollie says:

September 5, 2017 at 12:27 pm

I know what that comma means. https://www.youtube.com/watch?v=uFBHNV935Zw&t=00m20s

Reply
- Terence says:
  
  September 5, 2017 at 12:50 pm
  
  I got it instantly as well.
  
  Reply
- brucedawson says:
  
  September 5, 2017 at 1:10 pm
  
  I can never tell whether my many irrelevant allusions are noticed so that made me smile
  
  Reply
poiru says:

September 5, 2017 at 2:21 pm

Have you tried searching e.g. “synaptics engineer touchpad” on LinkedIn? This is what I usually end up doing when I’m unable to reach the relevant people through normal support channels. I messaged someone (Perry) on your behalf, I’ll let you know if they respond!

Reply
- poiru says:
  
  September 5, 2017 at 5:47 pm
  
  Reply from Perry: “I got a report back from the driver team that the issue was fixed a while ago in version 19.3.4.99 of the driver. I’m not sure if it is currently available on Windows Update or not, but users should be able to get an updated driver by contacting Lenovo.”
  
  Reply
  - brucedawson says:
    
    September 5, 2017 at 7:11 pm
    
    Thanks for the update. But…
    
    The issue isn’t resolved until the fixed driver is on Windows Update for everyone. Contacting Lenovo/Dell/whomever is a non-starter because the vast majority of users have no way to realize that they have a problem that requires contacting Lenovo.
    
    Reply
Carlos says:

September 5, 2017 at 2:29 pm

Thanks for this blog post- I knew I wasn’t going crazy when my laptop suddenly started having awful battery life. If I had a dollar for every time I ran powercfg /batteryreport to try and figure out what the heck was going on, I’d be rich.

Reply
Zac Spitzer (@zackster) says:

September 5, 2017 at 5:12 pm

Very interesting, I had a look and found that Dell’s quickset is hammering my laptop with WMI queries too

Reply
- brucedawson says:
  
  September 5, 2017 at 7:13 pm
  
  What model laptop do you have? Dell wants to know:
  
  Hi Bruce, thank you reaching out to us. Please let us know the model of the system you are having issue with . ^KR
  
  — Dell Cares PRO (@DellCaresPRO) September 6, 2017
  
  Reply
  - brucedawson says:
    
    September 5, 2017 at 9:15 pm
    
    Dell also says: “Hi Bruce, Can you please ask the user to reach out to us and DM the service tag of the system. ^IR”
    
    So, yeah, can you do that? That will help Dell help you, and perhaps help them help lots of other people.
    
    Reply
Haofei Wu says:

September 5, 2017 at 9:45 pm

Thank you for the information you posted! I have forwarded this information to our BIOS team to check.

Reply
- Haofei Wu says:
  
  September 5, 2017 at 9:50 pm
  
  Sorry, it should be touchpad team, not bios team….sorry to make a mistake in writing.
  
  Reply
Thomas says:

September 6, 2017 at 2:56 am

nice writeup. but this has been fixed in august: https://download.lenovo.com/pccbbs/mobiles/n1cgr14w.txt (https://download.lenovo.com/pccbbs/mobiles/n1cgr14w.exe)

Reply
- brucedawson says:
  
  September 6, 2017 at 8:55 am
  
  It’s still not fixed for my Lenovo P51, but the fixes for other systems are an encouraging sign (as long as they go out through an auto-update system). It is also frustrating that Lenovo couldn’t tell me the state of this bug when I asked, and Synaptics never even responded.
  
  Reply
Mohammed Imran says:

September 6, 2017 at 10:31 am

Lenovo brand and its products should be avoided at all cost.

Reply
jdrch says:

September 7, 2017 at 6:14 am

The badly behaved version was 19.3.4.83, so clearly a driver upgrade had happened and that caused this problem.

This is exactly what I was afraid of when Microsoft announced their plans to have Windows 10 update drivers automatically. Microsoft Update has a long history of delivering (obviously) bad drivers; I once had Windows 8 try to update my Bluetooth driver with one that didn’t even have “Atheros” spelled correctly.

I also had a similar problem earlier this year with Sony Vaio laptop running Windows 10 Insider Slow Ring. It would run out of memory when left idle. The problem turned out to be a bad Realtek driver: https://jdrch.wordpress.com/2016/12/28/how-to-fix-the-your-computer-is-low-on-memory-windows-10-error-message/

Reply
Ian Yates (@IanYates82) says:

September 7, 2017 at 2:05 pm

My HP Envy 17 laptop (3 years old now…) has a Synaptics trackpad and the same rubbish 64-bit extension thing. I too found over the years that occasionally it’d go bad for a while and then not. I just got used to killing it in Task Manager, and if it was particularly bad, finding the scheduled task that started it on logon and disabling that for a while.
Rubbish 😦

My new Thinkpad has a Synaptics trackpad but it’s a precision one so it doesn’t seem to need the same “helpers”.

From what I could tell, the “helper” thing was enabling two finger scroll, pinch-zoom, etc. That seems like something pretty essential to the driver. I can only imagine it wanted to know about all processes so it could hook sending them windows messages, etc. Surely there’s a more efficient way though 😦

Reply
Rizzi Frank says:

December 11, 2017 at 7:45 am

Is this in any way related to the keylogger? http://www.bbc.com/news/technology-42309371

Reply
- brucedawson says:
  
  December 11, 2017 at 10:28 pm
  
  No
  
  Reply
Greg says:

May 24, 2018 at 2:27 am

Bah. Lenovo have no idea about mouse or keyboard drivers. My Yoga, according to them, needs no f-keys (their function keys are repeatedly enabled, even if you select f-keys), no escape key at all (no way to have f-keys and escape key at the same time, without a scancode hack), more bloatware than a Samsung phone, and more annoying popups from their monitoring systray aops than a pub’s poker machine giving away free spins…

Reply
akraus1 says:

November 9, 2019 at 12:44 pm

Hi Bruce, I had different issues with WMI. Now there is a faster way to diagnose WMI Queries and WMI polling Queries. I will set up a new blog post but you can take already a look here: https://github.com/Alois-xx/WMIWatcher

Reply