Fixing another Photo Gallery performance bug

I really like Windows Live Photo Gallery (WLPG) and I depend on its photo-tagging (5-stars, people, and descriptive tags) to organize my photos. I love the fact that I can effortlessly find all 5-star photos of my daughters unicycling together with just a few clicks.

image

However, WLPG has a frustrating number of performance problems. It likes to read and write to the disk, and it often does it very poorly. One problem WLPG has is that it uses a SQL Server Compact database to store a copy of all of the metadata, and it accesses this database inefficiently. Some queries require doing thousands of reads from the database file (not enough indexes perhaps?) and these random 4-KB reads work poorly, especially on a slow laptop drive. When you first launch WLPG it is not unusual for it to take a minute or so to become responsive, and after that it may still hang when you start browsing. Tracing with xperf makes the cause of the hangs quite obvious, as this disk I/O summary table shows:

image

During this particular process startup WLPG spend 28.8 s reading from pictures.pd6, 6.7 s reading from FaceExemplars.ed1, and .5 s reading from FaceThumbs.fd1. And yet, it only read about 23 MB from these three files, which should not take 36 s. The reason is hinted at by the expanded details for FaceThumbs.fd1 which show lots of 4 KB reads. In fact, all of the reads for all three files are 4 KB reads, and they are not sequential. That’s 5,013 separate reads to just pictures.pd6! The poor disk head is bouncing all over the disk and is spending very little time reading data. If the files were read sequentially then it would just take a few seconds. Because they are read randomly it takes an excruciatingly long time, and there will be more delays later on because some of the data has still not been read!

The disk I/O graph shows only I/O that went to the disk – it filters out I/O that as handled by the system cache. The file I/O graph shows every read/write from every process, and it’s interesting to analyze the reads from Pictures.pd6. I copied the file offset of each file read, in time-order, to Excel. Then I calculated the seek distance (just subtract the previous offset). I filtered out the approximately 5,000 redundant reads (two in a row from the same offset) and found 30,044 separate 4,096 byte reads, with an average seek distance between them of 17 MB. In other words, the average seek distance was more than a quarter the length of the file.

image

This is particularly frustrating because the database file is not that large. I’ve got 47,000 photos and over 37,000 people tags but my WLPG database is just 70 MB. Even my lousy laptop drive can read that entire file in just a couple of seconds. All you have to do is ask it to read the entire file sequentially, instead of messing around reading 4 KB here and 4 KB there.

I finally got around to fixing this problem. I wrote a short Python program that reads in the “Pictures.pd6” files, and the three other WLPG data files for good measure. If I run this before launching WLPG, or anytime WLPG is being slow, then it will load these files into the Windows file cache, and suddenly WLPG will stop hanging.

Here’s the code:

“””This pre-caches valuable files so that WLPG will run better.”””

import os
appdata = os.environ["localappdata"]

import glob
files = glob.glob(os.path.join(appdata, r”Microsoft\Windows Live Photo Gallery\*”))

print “Found %d files” % len(files)

for file in files:
    try:
        data = open(file, “rb”).read()
        print “Read %s, %d bytes” % (file, len(data))
    except:
        # Ignore errors from, for instance, directories
        print “Ignoring %s” % file

If you prefer, you can use a two-line version that just reads the crucial file:

import os
open(os.path.join(os.environ["localappdata"], r”Microsoft\Windows Live Photo Gallery\Pictures.pd6″), “rb”).read()

Ideally this script should be run every minute or so while WLPG is running in order to keep the database file in the cache. The cost of this is very low, especially if the file is actually still in the cache.

This doesn’t fix all of the WLPG performance problems. WLPG still does tens of MB of I/O when you zoom in on a photo. WLPG still takes too long to update its database, and busy waits. WLPG doesn’t make use of my eight CPU cores to decode a few photos ahead, so it’s common to have to wait for it when stepping through photos. But, this does neatly solve one annoying problem.

All of the WLPG performance problems could easily be fixed. The database files could be precached, the excessive disk I/O when zooming on images could be avoided, and multi-core image decoding could easily let WLPG stay ahead of the user. None of these would be difficult. All of these solutions consume additional memory, but memory is cheap these days and I would happily spend 500 MB in order to make WLPG ten times more responsive.

My daughter refuses to use WLPG because it is too slow. I don’t blame her. But I wish she could use it.

Once again xperf has shown its value in allowing the cause of a performance problem to be diagnosed, and a workaround to be created.

This article was updated August 24 and August 28 with more xperf details.

About these ads

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Performance, Programming, WLPG, xperf and tagged . Bookmark the permalink.

12 Responses to Fixing another Photo Gallery performance bug

  1. Z.T. says:

    If it was open source, you could have just fixed it. If it was free software, you could have redistributed the fixed version.

  2. christophe says:

    Wouldn’t using a ramdisk work ?

    A tool like http://www.ltr-data.se/opencode.html/#ImDisk ?

    • brucedawson says:

      I successfully used a RAM disk to work around the problems with WLPG writing temporary files to the disk for some operations. However this would be a poor solution for the SQL database, which needs to persist.

      The RAM disk solution for the temporary files is far from ideal, but it did completely solve the problem I targeted it with. I just had to point %tmp% and %temp% at the RAM disk. I might write a launcher for WLPG that precaches the database and then remaps those environment variables so that only WLPG gets the custom location.

      • christophe says:

        Or write a launcher that copies the sql database to the ramdisk and persists it back when the program is stopped (or as a pseudo cron job). The kind of way portable apps work.
        This is quite ugly, but it might be worth it if you like using the program.

        • brucedawson says:

          Copying the SQL database to the RAM disk makes me nervous, but it could work. The launcher would have to wait until WLPG terminated to copy it back, because WLPG maintains a lock on the file. You’d also have to figure out some way to tell WLPG to use the version on the RAM disk.

          The WLPG team is aware of these problems so I’m hopeful that they will fix them.

  3. zeuxcg says:

    Yup, this problem is quite common. I had to solve the same problem in the same way for analyzing PDB files – otherwise cold reads take minutes instead of seconds for 100-200 Mb files.

  4. Pingback: Windows Slowdown, Investigated and Identified | Random ASCII

  5. Pingback: The Lost Xperf Documentation–Disk Usage | Random ASCII

  6. Pingback: Basics of Digital Photo Organization | Random ASCII

  7. Pingback: Bugs I Got Other Companies to Fix in 2013 | Random ASCII

  8. Pingback: Self Inflicted Denial of Service in Visual Studio Search | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s