Users sometimes assume that 64-bit computers (typically running x64 CPUs) will naturally run faster than 32-bit computers (typically running x86 CPUs). They often guess that they will run twice as fast. After all, 64 is twice as big as 32. But in reality there is usually little difference. Sometimes x64 processes will run a bit faster than x86 processes, due to having twice as many registers (sixteen instead of eight), but more often x64 processes will run slightly slower, due to having larger instructions and larger data structures (because of larger pointers) that lead to increased cache pressure.
But in some cases x64 processes can run dramatically faster than x86 processes. If you need access to more than 4 GB of RAM then x64 processes are the way to go, and if you need to do high-precision math – math to hundreds of digits of accuracy – then x64 processes can deliver a four times performance increase. I was discussing this with one of my Fractal eXtreme customers when he recommended that I post some of the discussions. This is part one of a multi-part series on the optimizations that make multi-precision math in Fractal eXtreme (and in some cryptography code I’m sure) as fast as possible.
The first part is already written, as part of the Fractal eXtreme documentation. It explains why 64-bit high-precision math is four times faster when coded for a 64-bit processor and you can read it here.
Parts two and three, when I get to them, will cover the advantages of diagonal math (officially known as Lattice Multiplication) over rectangular math, and stretching the limits of loop unrolling.