Sandybridge and Scaling (of Performance)

One of the first tests I like to run with a new computer is to see how quickly it can run Fractal eXtreme. Fractal eXtreme is heavily optimized and multithreaded so it can make use of all of the CPU power available, and when doing deep zooms (possibly with hundreds or thousands of digits of precision) it needs all that computing power.

My old laptop was a Core 2 Duo running at 2.40 GHz – a T7700 for those who understand Intel’s model numbers. This CPU has two cores on one piece of silicon.

My new laptop is a Core i7-2720QM CPU at 2.20 GHz. That’s a slightly lower clock rate, but there are several reasons that this machine isn’t actually slower:

  • It’s a new microarchitecture (code name Sandybridge) and a new microarchitecture generally squeezes out a bit more performance per cycle
  • It has a Turbo Boost feature that lets the CPU overclock itself, to about 3.0 GHz
  • It has four cores on one piece of silicon, and each core can run two independent threads of execution (hyperthreading as Intel likes to call it) for a total of eight hardware threads

So, what’s the result? To cut to the chase, the new laptop can do deep-zoom fractal calculations about three times faster than the old one. If I force Fractal eXtreme to use just two threads (to make it a fair fight, if that has any meaning) then the new laptop is about 50% faster than the old laptop. Using four threads gives me a 63% speedup over two threads, and using eight threads (using hyperthreading) gives a further 23% speedup.

Sandybridge: 1 Thread 2 Threads 4 Threads 8 Threads
Calc time: 43.9 s 22.6 s 13.8 s 11.3 s

I only measured on the Core 2 Duo with two threads (its maximum), and its time on the test image (which uses 320 bits of precision) was 33.1 s.


  • Scaling from one thread to two threads works almost perfectly – performance roughly doubles.
  • Scaling from two threads to four threads doesn’t work as well. We ‘should’ see a ~95% speedup, but we actually see a 63% speedup. Profiling shows no unexpected synchronization overhead so the most likely cause is that Intel’s Turbo Boost is less aggressive when all four cores are in use. That makes sense. Turbo Boost works by having one core ‘borrow’ some of the power/heat budget of another core, and that doesn’t work when they are all maxed out.
  • Sandybridge hyperthreading works. Even though Fractal eXtreme is micro-optimized to get maximum performance out of a core, hyperthreading still exposes 23% more performance. It doesn’t double performance, but I never thought that it would.
  • As expected, the 64-bit version of Fractal eXtreme continues to do deep-zoom calculations about four times faster than the 32-bit version. Go 64-bit!

Overall I’m pleased with my new laptop. A three times speedup over 3.5 years is what makes the computer industry an exciting place to work.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more:
This entry was posted in Performance, Programming. Bookmark the permalink.

1 Response to Sandybridge and Scaling (of Performance)

  1. Pingback: Then and Now–Performance Improvements | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.