Sandybridge and Scaling (of Performance)

One of the first tests I like to run with a new computer is to see how quickly it can run Fractal eXtreme. Fractal eXtreme is heavily optimized and multithreaded so it can make use of all of the CPU power available, and when doing deep zooms (possibly with hundreds or thousands of digits of precision) it needs all that computing power.

My old laptop was a Core 2 Duo running at 2.40 GHz – a T7700 for those who understand Intel’s model numbers. This CPU has two cores on one piece of silicon.

My new laptop is a Core i7-2720QM CPU at 2.20 GHz. That’s a slightly lower clock rate, but there are several reasons that this machine isn’t actually slower:

  • It’s a new microarchitecture (code name Sandybridge) and a new microarchitecture generally squeezes out a bit more performance per cycle
  • It has a Turbo Boost feature that lets the CPU overclock itself, to about 3.0 GHz
  • It has four cores on one piece of silicon, and each core can run two independent threads of execution (hyperthreading as Intel likes to call it) for a total of eight hardware threads

So, what’s the result? To cut to the chase, the new laptop can do deep-zoom fractal calculations about three times faster than the old one. If I force Fractal eXtreme to use just two threads (to make it a fair fight, if that has any meaning) then the new laptop is about 50% faster than the old laptop. Using four threads gives me a 63% speedup over two threads, and using eight threads (using hyperthreading) gives a further 23% speedup.

Sandybridge: 1 Thread 2 Threads 4 Threads 8 Threads
Calc time: 43.9 s 22.6 s 13.8 s 11.3 s

I only measured on the Core 2 Duo with two threads (its maximum), and its time on the test image (which uses 320 bits of precision) was 33.1 s.

Conclusions:

  • Scaling from one thread to two threads works almost perfectly – performance roughly doubles.
  • Scaling from two threads to four threads doesn’t work as well. We ‘should’ see a ~95% speedup, but we actually see a 63% speedup. Profiling shows no unexpected synchronization overhead so the most likely cause is that Intel’s Turbo Boost is less aggressive when all four cores are in use. That makes sense. Turbo Boost works by having one core ‘borrow’ some of the power/heat budget of another core, and that doesn’t work when they are all maxed out.
  • Sandybridge hyperthreading works. Even though Fractal eXtreme is micro-optimized to get maximum performance out of a core, hyperthreading still exposes 23% more performance. It doesn’t double performance, but I never thought that it would.
  • As expected, the 64-bit version of Fractal eXtreme continues to do deep-zoom calculations about four times faster than the 32-bit version. Go 64-bit!

Overall I’m pleased with my new laptop. A three times speedup over 3.5 years is what makes the computer industry an exciting place to work.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Performance, Programming. Bookmark the permalink.

One Response to Sandybridge and Scaling (of Performance)

  1. Pingback: Then and Now–Performance Improvements | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s