While creating slides for a recent conference I noticed that PowerPoint was having trouble keeping up with my typing. Since the topic of my talk was ETW the obvious thing to do was to profile PowerPoint to figure out what was going on
The short answer is that PowerPoint decompresses the background image frequently – almost every time it repaints the screen. If the background image is big enough – and the slide template for this conference had a 2666×1500 background image – then you get repeated mini hangs.
Investigating Interesting Issues
My initial investigation involved recording an ETW trace while typing quickly into a PowerPoint slide. In order to better understand what was happening I typed several words as fast as I could, then paused and typed a single letter, then paused again and typed a single letter. This let me see different patterns of CPU usage, and my hope was the the CPU spikes would make a pattern that would help me orient myself when exploring the trace.
I recorded a trace and looked at the CPU sampling data for some of the spikes. Despite a lack of symbols for PowerPoint itself the results were pretty clear. The call stack meandered through MSO.DLL, then ppcore.dll, then OART.DLL and then GFX.DLL, all with no symbols. Eventually it got down to GdiPlus.dll, for which symbols are available, and we find that more than 99% of the samples are in there, inside of GdipDrawImageRectRect and its descendants. More than 66% of the time is spent inside of GpWicDecoder::Decode and its descendants. And now we see the first problem. PowerPoint is using GdiPlus in such a way that the background image is decompressed every time it is drawn! The rest of the time is mostly spent in pOutputSpanStretch<1>::StretchScanline, which is resizing the image each time it is drawn.
Here’s what the sampling profiler summary table looks like:
The count column is how many samples were found along the call stack (whose root is off the top of the screen), and the Weight column is an estimate of how many milliseconds were spent along each call stack. The sampling frequency defaults to 1 KHz, and the imperfect matchup between Count and Weight is because the actual interval varies slightly.
Without access to the PowerPoint source code I can only speculate about why so much time is spent on image decode and scaling, but I suspect that this is a result of the developers not understanding the peculiar semantics of GdiPlus. With most image APIs when you load an image you are given a handle to a decoded bitmap. With GdiPlus you are given a handle which is basically a promissory note. The bits aren’t actually decoded until you draw the image. Deferring work can be quite efficient, but if a caching mechanism isn’t implemented then GdiPlus ends up doing the work again and again and again and again.
Doing the caching intelligently is important. If PowerPoint cached the image at its full resolution then, in these pathological cases, they would be using a lot of memory and would still have the cost of scaling the image. Probably they should be caching the image at the current size needed by the editing window, which has the advantage of being a consistently reasonable size, and having best-case performance. Doing this means that you only have to decode and scale the image when the editing window is resized, and the cost of rendering the background would be negligible.
While poking around at the trace I noticed a few other peculiarities. When I typed a single isolated keystroke then the screen would be redrawn instantly – there was clearly not a one second delay. This was true regardless of how much word-wrapping and text-shifting happened. And yet, if I typed a lot of characters I could clearly see the mini hangs happening repeatedly. A closer look at the CPU usage graph showed this behavior. Here is a zoomed in view of me typing a single character:
There is an initial burst of activity, typically 30-60 ms long, then about 100 ms of inactivity, and then the huge spikes that cause the hangs. It seemed like PowerPoint was updating the screen, then waiting a while, and then redrawing the screen in a more expensive way.
In order to investigate this behavior I hacked up a program that would insert an ETW event whenever I typed ‘a’. I modified MultiProvider to use RegisterHotKey to globally trap typing of ‘a’. Whenever ‘a’ was typed the code would insert an ETW event, and then inject a fake keypress (I chose ‘v’) into the input stream.
This concept is now productized in UIforETW, an open-source tool for recording ETW traces that, among other things, has an input logger (defaulting to anonymized) to help with input-related trace analysis.
With that done the patterns became clearer. Each green diamond represents me typing ‘a’, and PowerPoint’s CPU usage is shown above. You can easily imagine that the keystrokes made between 2.6 and 3.4 seconds would incur a delay before PowerPoint could respond to them. The value of having the green ‘input’ diamonds on the same timeline as the CPU Usage is quite obvious in this graph:
It turns out that there is a way to quantify the length of these hangs. The Microsoft-Windows-Win32k provider (requires Windows 7+) can put this information into your xperf traces, and this is one reason I always have it enabled when I record traces. If I bring up a Generic Events summary table for the region above and then rearrange the columns just so then I get this data:
This shows that PowerPoint was hung (was not processing input while there was input available) for 936 ms and then for 982 ms, ending at 3.658 seconds and 5.278 seconds. No hang was reported between 5.4 and 6.6 seconds because there was no input then, and similarly for 7.6 to 8.8 seconds.
With the Windows 10 version of WPA the various hang-detection events are neatly graphed – see the Wait Analysis post for details.
It looks like PowerPoint immediately reformats and repaints the screen when you type a character, but then schedules some sort of additional work for 100 ms later. Without access to source or symbols I can’t even speculate about why PowerPoint decodes the image when it has clearly already been able to draw it. If I hear back from the Office team I’ll be sure to update this post.
If you want to see this problem for yourself then you can download a dramatic recreation of the offending PowerPoint presentation. Slide two has a huge background image. The download also includes four different versions of the background, in varying resolutions, for easy experimentation.
You can record a trace by following the steps and using UIforETW, or you can just download “slow ppt trace.zip” or “slow ppt trace new.zip” (has input events) from here (both traces have a 2011 prefix).
A background that is 2666×1500 is excessive. Slide template creators should stick to more modest resolutions, especially when the projector resolution is 1280×720 (77% fewer pixels) in order to improve performance and avoid ridiculous bloat. Using dramatically lower resolutions for the background image is an effective work around while waiting for Microsoft to fix this bug.
PowerPoint should fix this performance bug. From talking to other presenters it is clear that poor slide editing performance was a problem for many people. In addition to the mini hangs, the repeated decoding also significantly harms battery life while editing slides. It is clearly impractical to expect that all templates will be well designed to work around this performance bug.
The irony in all of this is that the conference I was preparing for was a Microsoft conference. Which means that they supplied the template. Which worked badly with their presentation software. But it all worked out well, since recreating the discovery of this problem made for a great xperf demo during my talk.