Faster Fractals Through Algebra

I’ve been working on Fractal eXtreme on-and-off for years, and an important (and fun) part of working on it is optimizing the calculation of the Mandelbrot set, especially the high-precision math routines that allow deep-zooming.

After working on this over the course of a decade I assumed that all the easiest optimizations were done, but recently I found a couple of trivial optimizations that I had missed. Here’s one of them.

Mandelbrot Set Math Primer

The Mandelbrot set is calculated by setting the complex number ‘c’ to the location of the point you want to calculate and then executing the following loop:

z = 0;
while (abs(z) <= 2.0)
    z = z * z +c;

If the loop never exits then the point is in the Mandelbrot set. Simple enough. Practical considerations necessitate adding a limit to how many times the loop runs before giving up the search, but I’m going to ignore that for the sake of elegance and simplicity.

The Unzoomed Mandelbrot set:

image

Since z and c are complex numbers we have to store each one as two components and expand out operations like multiplication to multiple operations on those components. I’m not going to review complex math, but I will point out that abs(complex) is equal to the square root of (real * real + imaginary * imaginary) and if you square both sides you can avoid the square root. With that in mind a more detailed version would be:

z.r = 0;
z.i = 0;
while (z.r * z.r + z.i * z.i < 4.0)
{
    temp = z.r * z.r – z.i * z.i + c.r;
    z.i = z.r * z.i * 2 + c.i;
    z.r = temp;
}

That’s correct but it’s not very efficient. The code does six multiplications and at high-precisions those are very expensive. It squares z.r and z.i twice each, and it multiplies by two. All of those can be avoided:

z.r = 0;
z.i = 0;
zrsqr = z.r * z.r;
zisqr = z.i * z.i;
while (zrsqr + zisqr < 4.0)
{
    z.i = z.r * z.i;
    z.i += z.i; // Multiply by two
    z.i += c.i;
    z.r = zrsqr – zisqr + c.r;
    zrsqr = square(z.r);
    zisqr = square(z.i);
}

Now the code is doing three multiplies per iteration, which is the minimum. So it’s perfect, and all we have to worry about is making sure that all of our basic operations – especially squaring and multiplication – are as fast as possible.

When using floating-point math the speed of addition, subtraction, and multiplication are generally identical. However floating-point math has only about 52 bits of precision which is woefully inadequate for serious fractal exploration. All of my fractal programs have featured high-precision math routines to allow virtually unlimited zooming. Fractal eXtreme supports up to 7,680 bits of precision. While this isn’t truly unlimited it is close enough because fractal calculation time generally increases as the cube of the zoom level, and even on the fastest PCs available it exceeds most people’s patience between 500 and 2,000 bits of precision.

The speed of squaring and multiplication is critical because they are O(n^2) operations and everything else is O(n). As ‘n’ (the number of 32-bit or 64-bit words) increases, the time to do the multiplications becomes the only thing that matters. Okay, at high precisions multiplication doesn’t have to be O(n^2), but the alternatives are a lot of work for uncertain gain, so I’m ignoring them.

While multiplication and squaring are both O(n^2) it turns out that squaring can be done roughly twice as fast as multiplication. Half of the partial results when squaring are needed twice, but only need to be calculated once.

All the information necessary to find the algebraic optimization is now available.

The observation (which came from somebody I know only through e-mail) was that the Mandelbrot calculations can be done using three squaring operations rather than two squares and a multiply.

The one multiply is used to calculate zr * zi. If we instead calculate (zr + zi)^2 then we get zr^2 + 2*zr*zi + zi^2. Since we’ve already calculated zr^2 and zi^2 we can just subtract them off and, voila, multiplication through squaring:

z.r = 0;
z.i = 0;
zrsqr = z.r * z.r;
zisqr = z.i * z.i;
while (zrsqr + zisqr < 4.0)
{
    z.i = square(z.r + z.i) – zrsqr – zisqr;
    z.i += c.i;
    z.r = zrsqr – zisqr + c.r;
    zrsqr = square(z.r);
    zisqr = square(z.i);
}

At low precisions this is inefficient because we are doing an extra add and an extra subtract. However as ‘n’ grows larger this change should give us a 33% speedup. At Fractal eXtreme’s peak precision of 7,680 bits I’ve measured a speedup of about 25.6% so the theoretical predictions appear to be reasonably accurate.

Charts are fun – let’s paste one in (higher is slower):

image

Fitting the performance of the 32-bit version on the chart makes it hard to see the difference between the two 64-bit versions, so let’s rescale:

image

This post was supposed to show the benefits of optimization-through-algebra, but the charts really make the case for the importance of 64-bit software when doing high-precision math. At the highest precisions the fastest 64-bit version is running 7.75x faster than the 32-bit version. Yikes.

The Mandelbrot set magnified 6.3 x 10408 times, calculated with 1472 bit precision

image

A side benefit of this optimization is that the multiplication routine isn’t needed at all, which is a nice code-size saving. The Mandelbrot DLL drops from 2,572,288 bytes to 1,814,528 bytes.

Summary

In practice this optimization isn’t really a lot of use. In the 64-bit version of Fractal eXtreme (which all smart consumers choose because it typically runs at least 4x faster when doing deep zooms) this optimization doesn’t become worthwhile until 512 bits of precision and it doesn’t give even a 10% speedup until 1,024 bits of precision. It’s cool, but virtually unnoticeable for even the hard-core deep-zoom fractal explorer. But, given the minimal effort require to implement the optimization it has an excellent cost/benefit ratio, and an excellent cost/geeky-fun ratio.

Aside: Fractal eXtreme uses Lattice Multiplication to avoid ever having to store intermediate results to memory. I independently invented it years ago and only just found out what it is called.

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Fractals, Math, Programming. Bookmark the permalink.

5 Responses to Faster Fractals Through Algebra

  1. Pingback: Faster Fractals Through Better Scheduling | Random ASCII

  2. Pingback: Then and Now–Performance Improvements | Random ASCII

  3. Z.T. says:

    By saying your multiplication routine takes up 740K of code, I guess it is completely unrolled and maybe that code for each size of argument is separate. So you avoid some loop overhead by paying in instruction cache misses. Have you benchmarked this on recent hardware to make sure this is worthwhile?

    I am reminded of performance optimizations in x264 which shrunk code size of functions to make sure the next function was still in cache to avoid the latency of loading code every iteration.

  4. brucedawson says:

    The 740K saving is for every precision from 128-bit to 7680-bit. At 64-bit precision that is 119 different routines, so the actual code size (1.8 MB now for all of the code) only averages out to about 15 KB per routine, so in general they fit in the i-cache. Even when they don’t the execution patterns are so straightforward (straight through, fully unrolled, virtually no branches) that I suspect the prefetcher can keep up.

    i-cache optimizations definitely can be important, but I think they are at their most important when you are branching around a lot. Branching makes prefetching harder, and it means that only part of some i-cache lines will be used. Measurements show that i-cache misses aren’t a huge cost for the Fractal eXtreme calculation code, and they might be zero cost, but I’m not sure.

  5. Pingback: Fractal eXtreme, now cheaper | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s