PowerPoint Poor Performance Problem

While creating slides for a recent conference I noticed that PowerPoint was having trouble keeping up with my typing. Since the topic of my talk was ETW the obvious thing to do was to profile PowerPoint to figure out what was going on

The short answer is that PowerPoint decompresses the background image frequently – almost every time it repaints the screen. If the background image is big enough – and the slide template for this conference had a 2666×1500 background image – then you get repeated mini hangs.


Investigating Interesting Issues

My initial investigation involved recording an ETW trace while typing quickly into a PowerPoint slide. In order to better understand what was happening I typed several words as fast as I could, then paused and typed a single letter, then paused again and typed a single letter. This let me see different patterns of CPU usage, and my hope was the the CPU spikes would make a pattern that would help me orient myself when exploring the trace.

I recorded a trace and looked at the CPU sampling data for some of the spikes. Despite a lack of symbols for PowerPoint itself the results were pretty clear. The call stack meandered through MSO.DLL, then ppcore.dll, then OART.DLL and then GFX.DLL, all with no symbols. Eventually it got down to GdiPlus.dll, for which symbols are available, and we find that more than 99% of the samples are in there, inside of GdipDrawImageRectRect and its descendants. More than 66% of the time is spent inside of GpWicDecoder::Decode and its descendants. And now we see the first problem. PowerPoint is using GdiPlus in such a way that the background image is decompressed every time it is drawn! The rest of the time is mostly spent in pOutputSpanStretch<1>::StretchScanline, which is resizing the image each time it is drawn.

Here’s what the sampling profiler summary table looks like:

image

The count column is how many samples were found along the call stack (whose root is off the top of the screen), and the Weight column is an estimate of how many milliseconds were spent along each call stack. The sampling frequency defaults to 1 KHz, and the imperfect matchup between Count and Weight is because the actual interval varies slightly.

Without access to the PowerPoint source code I can only speculate about why so much time is spent on image decode and scaling, but I suspect that this is a result of the developers not understanding the peculiar semantics of GdiPlus. With most image APIs when you load an image you are given a handle to a decoded bitmap. With GdiPlus you are given a handle which is basically a promissory note. The bits aren’t actually decoded until you draw the image. Deferring work can be quite efficient, but if a caching mechanism isn’t implemented then GdiPlus ends up doing the work again and again and again and again.

Doing the caching intelligently is important. If PowerPoint cached the image at its full resolution then, in these pathological cases, they would be using a lot of memory and would still have the cost of scaling the image. Probably they should be caching the image at the current size needed by the editing window, which has the advantage of being a consistently reasonable size, and having best-case performance. Doing this means that you only have to decode and scale the image when the editing window is resized, and the cost of rendering the background would be negligible.

More Mysteries

While poking around at the trace I noticed a few other peculiarities. When I typed a single isolated keystroke then the screen would be redrawn instantly – there was clearly not a one second delay. This was true regardless of how much word-wrapping and text-shifting happened. And yet, if I typed a lot of characters I could clearly see the mini hangs happening repeatedly. A closer look at the CPU usage graph showed this behavior. Here is a zoomed in view of me typing a single character:

image

There is an initial burst of activity, typically 30-60 ms long, then about 100 ms of inactivity, and then the huge spikes that cause the hangs. It seemed like PowerPoint was updating the screen, then waiting a while, and then redrawing the screen in a more expensive way.

In order to investigate this behavior I hacked up a program that would insert an ETW event whenever I typed ‘a’. I modified MultiProvider to use RegisterHotKey to globally trap typing of ‘a’. Whenever ‘a’ was typed the code would insert an ETW event, and then inject a fake keypress (I chose ‘v’) into the input stream.

This concept is now productized in UIforETW, an open-source tool for recording ETW traces that, among other things, has an input logger (defaulting to anonymized) to help with input-related trace analysis.

With that done the patterns became clearer. Each green diamond represents me typing ‘a’, and PowerPoint’s CPU usage is shown above. You can easily imagine that the keystrokes made between 2.6 and 3.4 seconds would incur a delay before PowerPoint could respond to them. The value of having the green ‘input’ diamonds on the same timeline as the CPU Usage is quite obvious in this graph:

image

It turns out that there is a way to quantify the length of these hangs. The Microsoft-Windows-Win32k provider (requires Windows 7+) can put this information into your xperf traces, and this is one reason I always have it enabled when I record traces. If I bring up a Generic Events summary table for the region above and then rearrange the columns just so then I get this data:

image

This shows that PowerPoint was hung (was not processing input while there was input available) for 936 ms and then for 982 ms, ending at 3.658 seconds and 5.278 seconds. No hang was reported between 5.4 and 6.6 seconds because there was no input then, and similarly for 7.6 to 8.8 seconds.

With the Windows 10 version of WPA the various hang-detection events are neatly graphed – see the Wait Analysis post for details.

It looks like PowerPoint immediately reformats and repaints the screen when you type a character, but then schedules some sort of additional work for 100 ms later. Without access to source or symbols I can’t even speculate about why PowerPoint decodes the image when it has clearly already been able to draw it. If I hear back from the Office team I’ll be sure to update this post.

Downloading Documents

If you want to see this problem for yourself then you can download a dramatic recreation of the offending PowerPoint presentation. Slide two has a huge background image. The download also includes four different versions of the background, in varying resolutions, for easy experimentation.

You can record a trace by following the steps and using UIforETW, or you can just download “slow ppt trace.zip” or “slow ppt trace new.zip” (has input events) from here (both traces have a 2011 prefix).

Concluding Comments

A background that is 2666×1500 is excessive. Slide template creators should stick to more modest resolutions, especially when the projector resolution is 1280×720 (77% fewer pixels) in order to improve performance and avoid ridiculous bloat. Using dramatically lower resolutions for the background image is an effective work around while waiting for Microsoft to fix this bug.

PowerPoint should fix this performance bug. From talking to other presenters it is clear that poor slide editing performance was a problem for many people. In addition to the mini hangs, the repeated decoding also significantly harms battery life while editing slides. It is clearly impractical to expect that all templates will be well designed to work around this performance bug.

The irony in all of this is that the conference I was preparing for was a Microsoft conference. Which means that they supplied the template. Which worked badly with their presentation software. But it all worked out well, since recreating the discovery of this problem made for a great xperf demo during my talk.

For more information on performance analysis look at Xperf for Excess CPU Consumption and Xperf Wait Analysis – Finding Idle Time.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Investigative Reporting, Performance, Programming, xperf and tagged . Bookmark the permalink.

17 Responses to PowerPoint Poor Performance Problem

  1. Pingback: Xperf for Excess CPU Consumption | Random ASCII

  2. Pedro Da Ros says:

    Fantastic work. Microsoft should pay you for this🙂 .

    And yes, PowerPoint 2010 performance while typing text is horrible (thats why I arrived here, was looking for a solution). And I’m using a template that comes with PowerPoint… oh well.

    Hope it gets fixed soon!

  3. brucedawson says:

    Well, you know what to do. Record an xperf trace, see if you can identify the problem, and see if you can work around.

    Or just check the background image size and see if it is excessive.

    Good luck.

  4. Tom Parke says:

    Fantastic post. Thank you. I’ve a new high spec laptop, a fairly mundane typing speed, and no massive background image just a company logo at the bottom of the slide yet still see this problem.
    Your post has enabled me to find work around which seems to work: create the text using a blank template then copy into a slidedeck with the desired template one the bulk of the typing is done. My respect for MS has sunk to new lows.

    • brucedawson says:

      The company logo probably is part of a massive background image. It is quite common to have a background image which is mostly blank or at least very mundane, with just some minor details at the bottom. Right-click on a poorly performing slide, select “Save Background” from the context menu, and save the image somewhere. Then get the properties of the saved file in Explorer and see what size it is. My guess is that it is huge. If you replace the background things will probably get better.

      If you can complain to the creators of the template, and to Microsoft, then maybe eventually this problem will be corrected.

  5. Sriram Srinivasan says:

    Hey Bruce,
    Really nice post! Thanks for the information. Any chance you can get a post on “where is my code spending all its idle time waiting?” analysis?

    • brucedawson says:

      I’ve been meaning to get around to that. I’ve got a Visual Studio hang that I diagnosed a few months ago that I was going to use as a case study. I’ll try to get it up within the next few weeks.

  6. Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII

  7. Pingback: The Lost Xperf Documentation–CPU sampling | Random ASCII

  8. Steve Ferro says:

    This is one of the best diagnostic analyses I’ve seen of a software problem. It is so annoying reading through posts on forums where the “expert” asks if your anti-virus software is up to date, and have you deleted your temporary internet files? Thanks for an intelligent, analytical, and accurate post!

    • brucedawson says:

      Thanks — I’m glad you liked it.

      My new mantra whenever somebody reports a performance problem is “send me the trace”. With an xperf trace I can usually (80% of the time?) identify the problem, even when it is in some other company’s code. Try to figure out the problem without a trace is just guessing.

  9. Pingback: Windows Slowdown, Investigated and Identified | Random ASCII

  10. Pingback: Xperf for Excess CPU Consumption: WPA edition | Random ASCII

  11. Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII

  12. Pingback: ETW Central | Random ASCII

  13. Pingback: Xperf Wait Analysis–Finding Idle Time | Random ASCII

  14. Pingback: ETW Flame Graphs Made Easy | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s