Floating-Point Determinism

Is IEEE floating-point math deterministic? Will you always get the same results from the same inputs? The answer is an unequivocal “yes”. Unfortunately the answer is also an unequivocal “no”. I’m afraid you will need to clarify your question.

My hobby: injecting code into other processes and changing the floating-point rounding mode on some threads

The answer to this ambiguous question is of some interest to game developers. If we can guarantee determinism then we can create replays and multi-player network protocols that are extremely efficient. And indeed many games have done this. So what does it take to make your floating-point math deterministic within one build, across multiple builds, and across multiple platforms?

Before proceeding I need to clarify that floating-point determinism is not about getting the ‘right’ answer, or even the best answer. Floating-point determinism is about getting the same answer on some range of machines and builds, so that every player agrees on the answer. If you want to get the best answer then you need to choose stable algorithms, but that will not guarantee perfect determinism. If you want to learn how to get the best answer you’ll need to look elsewhere, perhaps including the rest of my floating-point series.

Determinism versus free will: cage match

The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think. In particular, the addendum to David Goldberg’s important paper points out that “the IEEE standard does not guarantee that the same program will deliver identical results on all conforming systems.” And, the C/C++ standards don’t actually mandate IEEE floating-point math.

On the other hand, the new IEEE 754-2008 standard does say “Together with language controls it should be possible to write programs that produce identical results on all conforming systems”, so maybe there is hope. They even devote all of chapter 11 to the topic, even if it’s just a one and a half page chapter. They don’t promise it will be easy, they warn that the language-controls are not yet defined, but at least it is potentially possible.

What is guaranteed

Some of the things that are guaranteed are the results of addition, subtraction, multiplication, division, and square root. The results of these operations are guaranteed to be the exact result correctly rounded (more on that later) so if you supply the same input value(s), with the same global settings, and the same destination precision you are guaranteed the same result.

Therefore, if you are careful and if your environment supports your care, it is possible to compose a program out of these guaranteed operations, and floating-point math is deterministic.

What is not guaranteed

Unfortunately there are a significant number of things that are not guaranteed. Many of these things can be controlled so they should not be problems, but others can be tricky or impossible. Which is why the answer is both “yes” and “no”.

Floating-point settings (runtime)

There are a number of settings that control how floating-point math will be done. The IEEE standard mandates several different rounding modes and these are usually expressed as per-thread settings. If you’re not writing an interval arithmetic library then you’ll probably keep the rounding mode set to round-to-nearest-even. But if you – or some rogue code in your process – changes the rounding mode then all of your results will be subtly wrong. It rarely happens but because the rounding mode is typically a per-thread setting it can be amusing to change it and see what breaks. Troll your coworkers! Amuse your friends!

If you are using the x87 floating-point unit (and on 32-bit x86 code you can’t completely avoid it because the calling conventions specify that an x87 register is used to return floating-point results) then if somebody changes the per-thread precision settings your results may be rounded to a different precision than expected.

Exception settings can also be altered, but the mostly likely different result that these changes will cause is a program crash, which at least has the advantage of being easier to debug.

Denormals/subnormals are part of the IEEE standard and if a system doesn’t support them then it is not IEEE compliant. But… denormals sometimes slow down calculations. So some processors have options to disable denormals, and this is a setting that many game developers like to enable. When this setting is enabled tiny numbers are flushed to zero. Oops. Yet again your results may vary depending on a per-thread mode setting.

(see That’s Not Normal–the Performance of Odd Floats for more details on denormals)

Rounding, precision, exceptions, and denormal support – that’s a lot of flags. If you do need to change any of these flags then be sure to restore them promptly. Luckily these settings should rarely be altered so just asserting once per frame that they are as expected (on all threads!) should be enough. There have been sordid situations where floating-point settings can be altered based on what printer you have installed and whether you have used it. If you find somebody who is altering your floating-point settings then it is very important to expose and shame them.

You can use _controlfp on VC++ to query and change the floating-point settings for both the x87 and SSE floating-point units. For gcc/clang look in fenv.h.

Composing larger expressions

Once we have controlled the floating-point settings then our next step is to take the primitive operations (+, –, *, /, sqrt) with their guaranteed results and use them to compose more complicated expressions. Let’s start small and see what we can do with the float variables a, b, and c. How about this:

a + b + c

The most well known problem with the line of code above is order of operations. A compiler could add a and b first, or b and c first (or a and c first, if it was in a particularly cruel mood). The IEEE standard leaves the order of evaluation up to the language, and most languages give compilers some latitude. If b and c happen to be in registers from a previous calculation and if you are compiling with /fp:fast or some equivalent setting then it is quite likely that the compiler will optimize the code by adding b and c first, and this will often give different results compared to adding a and b first. Adding parentheses may help. Or it may not. You need to read your compiler documentation or do some experiments to find out. I know that I have managed to fix several floating-point precision problems with VC++ by forcing a different order of evaluation using parentheses. Your mileage may vary.

Let’s assume that parentheses help, so now we have this:

(a + b) + c

Let’s assume that all of our compilers are now adding a and b and then adding c, and addition has a result guaranteed by the IEEE standard, so do we have a deterministic result now across all compilers and machines?


The result of a + b is stored in a temporary destination of unspecified precision. Neither the C++ or IEEE standards mandate what precision intermediate calculations are done to and this intermediate precision will affect your results. The temporary result could equally easily be stored in a float or a double and there are significant advantages to both options. It is an area where reasonable people can disagree. When using Visual C++ the intermediate precision depends on your compiler version, 32-bit versus 64-bit, /fp compile settings, /arch compile settings, and x87 precision settings. I discussed this issue in excessive detail in Intermediate Floating-Point Precision, or you can just look at the associated chart. Note that for gcc you can improve consistency with -ffloat-store and -fexcess-precision. For C99 look at FLT_EVAL_METHOD. As always, the x87 FPU makes this trickier by mixing run-time and compile-time controls.

The simplest example of the variability caused by different intermediate precisions comes from this expression:

printf(“%1.16e”, 0.1f * 0.1f);

Assuming that the calculation is done at run time the result can vary, depending on whether it is done at float or double precision – and both options are entirely IEEE compliant. A double will always be passed to printf, but the conversion to double can happen before or after the multiplication. In some configurations VC++ will insert extra SSE instructions in order to do the multiplication at double precision.

The term destination in the IEEE standard explicitly gives compilers some flexibility in this area which is why all of the VC++ variants are conformant to the standard. The IEEE 2008 standard encourages compilers to offer programmers control over this with a preferredWidth attribute, but I am not aware of any C++ compilers that support this attribute. Adding an explicit cast to float may help – again, it depends on your compiler and your compilation settings.

Intermediate precision is a particularly thorny problem with the x87 FPU because it has a per-thread precision setting, as opposed to the per-instruction precision setting of every other FPU in common use. To further complicate things, if you set the x87 FPU to round to float or double then you get rounding that is almost like float/double, but not quite. This means that the only way to get predictable rounding on x87 is to store to memory, which costs performance and may lead to double-rounding errors. The net result is that it may be impossible or impractical to get the x87 FPU to give identical results to other floating-point units.

If you store to a variable with a declared format then well-behaved compilers should (so sayeth IEEE-754-2008) round to that precision, so for increased portability you may have to forego some unnamed temporaries.


A variant of the intermediate precision problem shows up because of fmadd instructions. These instructions do a multiply followed by an add, and the full precision of the multiply is retained. Thus, these effectively have infinite intermediate precision. This can greatly increase accuracy, but this is another way of saying that it gives different results than machines that don’t have an fmadd instruction. And, in some cases the presence of fmadd can lead to worse results. Imagine this calculation:

result = a * b + c * d

If a is equal to c and b is equal to –d then the result should (mathematically) be equal to zero. And, on a machine without fmadd you typically will get a result of zero. However on a machine with fmadd you usually won’t. On a machine that uses fmadd the generated code will look something like this:

compilerTemp = c * d
result = fmadd(a, b, compilerTemp)

The multiplication of c and d will be rounded but the multiplication of a and b will not be, so the result will usually not be zero.

Trivia: the result of the fmadd calculation above should be an exact representation of the rounding error in c times d. That’s kind of cool.

While the implementation of fmadd is now part of the IEEE standard there are many machines that lack this instruction, and an accurate and efficient emulation of it on those machines may be impossible. Therefore if you need determinism across architectures you will have to avoid fmadd. On gcc this is controlled with -ffp-contract.

Square-root estimate

Graphics developers love instructions like reciprocal square root estimate. However the results of these instructions are not defined by the IEEE standard. If you only ever use these results to drive graphics then you are probably fine, but if you ever let these results propagate into other calculations then all bets are off. This was discovered by Allan Murphy at Microsoft who used this knowledge to create the world’s most perverse CPUID function:

// CPUID is for wimps:
__m128 input = { -997.0f };
input = _mm_rcp_ps(input);
int platform = (input.m128_u32[0] >> 8) & 0xf;
switch (platform)
case 0x0: printf(“Intel.\n”); break;
case 0x7: printf(“AMD Bulldozer.\n”); break;
case 0x8: printf(“AMD K8, Bobcat, Jaguar.\n”); break;
default: printf(“Dunno\n”); break;

The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They are supposed to provide estimates with bounded errors and we should not be surprised that different manufacturers’ implementations (some mixture of tables and interpolation) give different results in the low bits.

This issue doesn’t just affect games, it is also a concern for live migration of virtual machines.


The precise results of functions like sin, cos, tan, etc. are not defined by the IEEE standard. That’s because the only reasonable result to standardize would be the exact result correctly rounded, and calculating that is still an area of active research due to the Table Maker’s Dilemma. I believe that it is now practical to get correctly rounded results for most of these functions at float precision, but double is trickier. Part of the work in solving this for float was to do an exhaustive search for hard cases (easy for floats because there are only four billion of them) – those where it takes over a hundred bits of precision before you find out which way to round. In practice I believe that these instructions give identical results between current AMD and Intel processors, but on PowerPC where they are calculated in software they are highly unlikely to be identical. You can always write your own routines, but then you have to make sure that they are consistent, as well as accurate.

Update: according to this article the results of these instructions changed when the Pentium came out, and between the AMD-K6 and subsequent AMD processors. The AMD changes were to maintain compatibility with Intel’s imperfections.

Update 2: the fsin instruction is quite inaccurate around pi and multiples of pi. Because of this many C runtimes implement sin() without using fsin, and give much more accurate (but very different) results. g++ will sometimes calculates sin() at compile-time which it does extremely accurately. In 32-bit Ubuntu 12.04 with glibc 2.15 the run-time sin() would use fsin making for significant differences depending on whether sin() was calculated at run-time or compile-time. On Ubuntu 12.04 this code does not print 1.0:

const double pi_d = 3.14159265358979323846;
const int zero = argc / 99;
printf(“%f\n”, sin(pi_d + zero) / sin(pi_d));


Per-processor code

Some libraries very helpfully supply different code for CPUs with different capabilities. These libraries test for the presence of features like SSE and then use those features if available. This can be great for performance but it adds a new option for getting different results on different CPUs. Watch for these techniques and either test to make sure they give identical results, or avoid them like the plague.


Conversion between bases – such as printf(“%1.8e”); – is not guaranteed to be identical across all implementations. Doing perfect conversions efficiently was an unsolved problem when the original IEEE standard came out and while it has since been solved this doesn’t mean that everybody does correctly rounded printing. For a comparison between gcc and Visual C++ see Float Precision Revisited: Nine Digit Float Portability. VC++ Dev 14 improves the situation considerably (see Formatting and Parsing Correctness).

While conversion to text is not guaranteed to be correctly rounded, the values are guaranteed to round-trip as long as you print them with enough digits of precision, and this is true even between gcc and Visual C++, except where there are implementation bugs. Rick Regan at Exploring Binary has looked at this issue in great depth and has reported on double values that don’t round-trip when read with iostreams (scanf is fine, and so is the conversion to text), and troublesome values that have caused both Java and PHP to hang when converting from text to double. Great stuff.

The Universal C RunTime on Windows recently (2020) fixed some tie-break rounding bugs and in doing so they introduced some new tie-break rounding bugs, so we may never hit perfection.

So don’t use iostreams, Java, or PHP?

Uninitialized data

It seems odd to list uninitialized data as a cause of floating-point indeterminism because there is usually nothing floating-point specific about this. But sometimes there is. Imagine a function like this:

void NumWrapper::Set( T newVal )
if ( m_val != newVal )
m_val = newVal;

If m_val is not initialized then the first call to Set may or may not call Notify, depending on what value m_val started with. However if T is equal to float, and if you are compiling with /fp:fast, and if m_val happens to be a NaN (always possible with uninitialized data) then the comparison may say that m_val and newVal are equal, and newVal will never get set, and Notify will never be called. Yes, a NaN is supposed to compare not-equal to everything, but the always-popular /fp:fast takes away this guarantee.

I’ve run into this bug twice in the last two years. It’s nasty. Maybe don’t use /fp:fast? But definitely avoid uninitialized data – that’s undefined behavior.

Another way that float code could be affected by uninitialized data when integer code is not is a calculation like this:

result = a + b – b;

On every machine I’ve ever used this will set result to a regardless of the value of b, for integer calculations. For floating-point calculations there are many values of b that would cause you to end up with a different result, typically infinity or NaN. I’ve never hit this bug, but it is certainly possible. Again, undefined behavior for int or float, but actual failures are more likely with floats.

Compiler differences

There are many compiler flags and differences that might affect intermediate precision or order of operations. Some of the things that could affect results include:

  • Debug versus release versus levels of optimization
  • x86 versus x64 versus PowerPC
  • SSE versus SSE2 versus x87
  • gcc versus Visual C++ versus clang
  • /fp:fast versus /fp:precise
  • -ffp-contract, -ffloat-store, and -fexcess-precision
  • Compile-time versus run-time calculations (such as sin())

With jitted languages such as C# the results may also vary depending on whether you launch your program under a debugger, and your code could potentially be optimized while it’s running so that the results subtly change.

Other sources of non-determinism

Floating-point math is a possible source of non-determinism, but it is certainly not the only one. If your simulation frames have variable length, if your random number generators don’t replay correctly, if you have uninitialized variables, if you use undefined C++ behavior, if you allow timing variations or thread scheduling differences to affect results, then you may find that determinism fails for you, and it’s not always the fault of floating-point math. For more details on these issues see the resources section at the end.


A lot of things can cause floating-point indeterminism. How difficult determinism is will depend on whether you need the exact same behavior as you rebuild and maintain your code, and when your code runs on completely different platforms. The stronger your needs, the more difficult and costly will be the engineering effort.

Some people assume that if you use stable algorithms then determinism doesn’t matter. They are wrong. If your network protocol or save game format stores only user inputs then you must have absolute determinism. An error of one ULP (Unit in the Last Place) won’t always matter, but sometimes it will make the difference between a character surviving and dying, and things will diverge from there. You can’t solve this problem just by using epsilons in your comparisons.

If you are running the same binary on the same processor then the only determinism issues you should have to worry about are:

  • Altered FPU settings for precision (x87 only), rounding, or denormal control
  • Uninitialized variables (especially with /fp:fast)
  • Non floating-point specific sources of indeterminism

The details of how your compiler converts your source code to machine code are irrelevant in this case because every player is running the same machine code on the same processor. This is the easiest type of floating-point determinism and, absent other problems with determinism, it just works.

That’s easy enough, but unless you are running on a console you probably have to deal with some CPU variation. If you are running the same binary but on multiple processor types – either different manufacturers or different generations – then you also have to worry about:

  • Different execution paths due to CPU feature detection
  • Different results from sin, cos, estimate instructions, etc.

That’s still not too bad. You have to accept some limitations, and avoid some useful features, but the core of your arithmetic can be written without thinking about this too much. Again, the secret is that every user is executing exactly the same instructions, and you have restricted yourself to instructions with defined behavior, so it just works™.

If you are running a different binary then things start getting sticky. How sticky they get depends on how big a range of compilers, compiler settings, and CPUs you want to support. Do you want debug and release builds to behave identically? PowerPC and x64? x87 and SSE (god help you)? Gold master and patched versions? Maintaining determinism as you change the source code can be particularly tricky, and increasing discipline will be required. Some of the additional things that you may need to worry about include:

  • Compiler rules for generating floating-point code
  • Intermediate precision rules
  • Compiler optimization settings and rules
  • Compiler floating-point settings
  • Differences in all of the above across compilers
  • Different floating-point architectures (x87 versus SSE versus VMX and NEON)
  • Different floating-point instructions such as fmadd
  • Different float-to-decimal routines (printf) that may lead to different printed values
  • Buggy decimal-to-float routines (iostreams) that may lead to incorrect values

If you can control these factors – some are easy to control and some may involve a lot of work – then floating-point math can be deterministic, and indeed many games have been shipped based on this. Then you just need to make sure that everything else is deterministic and you are well on your way to an extremely efficient replay and networking mechanism.

It turns out that there are some parts of your floating-point code that can be non-deterministic and can make use of, for instance, square-root estimate instructions. If you have code that is just driving the GPU, and if the GPU results never affect gameplay, then variations in this code will not lead to divergence.

When debugging the problems associated with a game engine that must be deterministic, remember that despite all of its mysteries there is some logic to floating-point, and in many cases a loss of determinism is actually caused by something completely different. If your code diverges when running the same binary on identical processors then, unless you’ve got a suspicious printer driver, you might want to look for bugs elsewhere in your code instead of always blaming IEEE-754.



Explain clearly why printf(“%1.16ef”, 0.1f * 0.1f); can legally print different values, and how that behavior applies to 0.1f * 0.3f * 0.7f.

For extra credit, explain why a float precision version of fmadd cannot easily be implemented on processors that lack it by using double math (or alternately, prove the opposite by implementing fmadd).

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Floating Point, Visual Studio and tagged . Bookmark the permalink.

65 Responses to Floating-Point Determinism

  1. K Gadd says:

    I once spent a day or two helping another engineer investigate a weird problem they were running into with an application we had written in PHP. For some reason, occasionally test cases would fail, but only after a certain other set of test cases had run. The failure was very strange – the PHP unserialize() function was returning corrupted results in some cases. He couldn’t find any reason for it to break. For example, unserialize() on this input would normally return 4 but was returning 12. (wtf?)

    The fact that a previous test case was causing it to break made me suspicious. Eventually, we dug around and figured out that the previous test case was loading a native module. You may be able to see where this is going. Further investigation finally got us into a state where we had a broken worker thread (where the test case always failed), and we could compare it with a good thread (where the test case passed). Guess what was different?

    The floating point rounding mode. Oh god. The floating point rounding mode affected our ability to unpack integers from strings! 😦

    • brucedawson says:

      Uggh. Yeah. Horrible. I guess the unserialize must have been converting to double and then integer, which is a whole other set of problems.

      The native module that changed the rounding mode should be burned with fire.

  2. Daniel Camus says:

    Hello Bruce, very informative! As far as you know, how is the behavior on ARM architectures? The vendor base it’s way too fragmented like the old days, Qualcomm, Nvidia, Texas Instruments, etc. In the company I work for, we face many many nasty bugs (on mobile games), and as you very well said, the flag /fp:fast it’s almost a game requirement. Not to mention the buggy video drivers and most of the shader bugs are related to precision.

    • brucedawson says:

      /fp:fast seems to be less necessary with recent versions of VC++ (2012 and beyond), so try skipping it there.

      I would expect ARM to be similar to SSE from a hardware point of view, but I don’t know how the compilers handle it. It should be easier than trying to get consistency from x87.

  3. ohmantics says:

    And how can I forget debugging a loss-of-sync issue caused by installing the Motorola libmotovec “drop-in replacement math library” on a PowerPC Mac. It caused differences in the static initialization of an 8-bit sincos table our game used.

  4. Smok says:

    I agree – disagree:) That’s what I love in science – there’s not easy Answers:D

  5. saijanai says:

    Squeak Smaltalk sidesteps this issue by using a software-based floating point package guaranteed to yield consistent answers, not just from one calculation to the next, but across platforms as well.
    This allows Croquet/Cobalt to distribute 3D and physics data in the form of Smalltalk messages that evoke calculations locally, rather than as calculations performed on a central server so that 3D game clients can say in-synch indefinitely while performing all relevant calculations locally.

    • brucedawson says:

      What is the performance impact of this? The real trick is to get deterministic, stable math, that is fast enough to be useful.

      • saijanai says:

        It’s pretty slow compared to a direct CPU call, obviously. OTOH, it makes distributed virtual worlds trivially simple, at least on a small scale. The original strategy for Croquet distributed processing was called “Teatime” and was implemented by David P Reed as a first direct test of his scaleable internet work. Currently, he’s trying to recreate it in JavaScript in the webbrowser.

  6. minusinf says:

    I heard that a company* found a cpu bug within the integer processing unit of a server processor when the processor was running hot. So there’s your intra-cpu issue 🙂

    I think the internal CPU instruction reordering can also have an impact: Especially if the CPU is running a high number of threads and the results depend on the higher intermediate accuracy of the FPU. Say collision detection and everything has been stored as floats.

    Maybe you need to abduct an Intel engineer for an extended interview 😉

    • brucedawson says:

      Out-of-order execution will never change the results, unless the CPU is horribly buggy. Multiple threads won’t change the results unless you have race conditions, in which case it’s not a floating-point problem, it’s a multi-threading problem.

      I hope to never have to deal with a processor that misbehaves when hot. Although, I do remember years ago spending a long time tracking down a mysterious crash that turned out to be a bad memory chip.

  7. Kibeom Kim says:

    Have you looked at http://nicolas.brodu.net/en/programmation/streflop/ ? It’s pretty comprehensive. I haven’t benchmarked it, but seeing how it is implemented, I think the performance will be still practical.

  8. Joshua Grass says:

    So does that mean you guys are using a deterministic sim approach for DOTA 2? I thought you were using a client server model? I ask because I wrote the deterministic simulation engine for Guardians of Middle Earth and I was always curious which approach the other MOBAs used.

    • brucedawson says:

      I’m just fascinated by these issues, and I’ve had to fix a fair number of floating-point bugs over the years, but I’ve actually never worked on an engine that relied on floating-point determinism, I don’t think.

      • Joshua Grass says:

        We certainly relied on it for Guardians, but luckily we were a fixed platform so we never ran into situations where the floating point math was an issue. Years ago I wrote another deterministic sim for the Mac back when there was 68k and PPC chips. Man, that was a learning experience! Luckily I was in my early twenties so staying up for a week and converting my floating point math to integer math was something I was willing to do…Thanks for the article, I find this stuff fascinating as well.

  9. mortoray says:

    Frighteningly enlightening. Given all these variables and conditions does it even make sense to write something that requires floating point determinism? It seems saner to have the engine assume each client has a different result — though I admit I’m at a loss as to how it would handle that.

    • brucedawson says:

      If you restrict yourself to a single build giving the same results on multiple machines then the challenges are manageable and have indeed been handle. Supporting multiple builds, possibly from different compilers, is indeed terrifying and may be practically impossible, especially if you need to support x87 and SSE, or x87 and PowerPC.

  10. Pascal Cuoq says:

    Hello, Bruce.

    > For extra credit, explain why a float precision version of fmadd cannot easily be implemented on processors that lack it by using double math (or alternately, prove the opposite by implementing fmadd).

    Here is an easy implementation of fmadd for round-to-nearest, a≥0, b≥0, c≥0, all finite. I am no good at generating difficult inputs but I think the idea should be sound even if one or two small bugs remain:


    • brucedawson says:

      I can’t tell if that is correct or not.

      Storing the result of the multiply in a double is the right idea (as far as I can tell) but since you don’t identify the reason why the obvious implementation (do the add and then store the result) is wrong I really can’t tell whether you have corrected for it or not.

      • Pascal Cuoq says:

        The double-precision multiplication of a and b is a good start because it is exact. If we were to stop now and round to float, we would get the correctly rounded single-precision result of a * b. There would be no double rounding because there was no rounding the first time.

        > you don’t identify the reason why the obvious implementation (do the add and then store the result) is wrong

        This double-precision addition would not be exact. The result would be rounded to double and then rounded to float. IEEE 754’s formatOf would be a great help here, but it is not available in hardware that I know of.

        Instead, the rest of the code follows the same initial idea of using exact computations possibly followed by an ultimate approximate one.

        After s1 and r1 have been computed, the mathematical equality a*b+c = s1+r1 holds between the value of these variables (and r1 is small compared to s1).

        A good candidate for the result of the function is f1 = (float)s1. The correct answer is either f1 or the float on the other side of s1 from f1. This float is computed as f2.

        The last addition in t – 0.5 * ulp + r1 is not necessarily exact, but it is precise around zero, so that the conditions r0 have the same truth values as if r had been computed with infinite precision.

        • Pascal Cuoq says:

          And there probably are bits that are sub-optimal in this function. In particular if (p1 < p2) is perhaps unnecessary, but I don't understand the nuances here very well yet, so I wrote the safe version.

        • brucedawson says:

          In short, the issue is the double rounding at the end. The double precision multiply is exact, the double precision add is inexact (but more accurate than a float fmadd would have been) but when you store the result to a float you may hit double rounding, leading to a slightly different result from a real float fmadd.

          I have no idea whether your solution is correct, but it does seem clever and plausible. I think you’d need to run it for a few hours on random inputs on a number with an fmadd instruction and compare. I’d recommend an AVX machine for that but that comes with its own problems: https://randomascii.wordpress.com/2013/03/11/should-this-windows-7-bug-be-fixed/

  11. Maxime Coste says:

    Hey, very interesting article. I learned most of these things the hard way, making our RTS game handle cross-platform determinism (32/64 bits, MSVC/Windows, Clang/MacOS, GCC/Linux).

    One tip I wanted to add is the ‘%a’ printf format, which is not std C++98 (C99 actually, so C++11 as well) but is supported on all these platforms, and which prints a floating point number in exact representation.

    This saved me many times when tracing execution on different platforms, as %f does not always round the same on different platforms (giving false positives, or false negatives).

    By the way, thanks for the Intermediate floating-point precision article, it was very useful.

  12. Ahaa, its pleasant discussion regarding this post at this place at this web site,
    I have read all that, so now me also commenting
    at this place.

  13. Yossarian King says:

    You mention C# once, but just in passing. I don’t suppose you have additional info on floating point determinism in the .NET environment? The C# compiler does not provide the command line options or #pragmas that are available in C++ — should I just assume results are non-deterministic and go roll myself a nice integer-based fixed-point library?

    • brucedawson says:

      I’m afraid I don’t know what the expectations are for C#. One would hope that .Net would have a well defined floating-point model, but a few quick web searches found nothing useful. I would expect that the results in 64-bit processes will be sane and sensible. In 32-bit processes I wouldn’t be so sure, given that platform’s unfortunate x87 FPU legacy. But, I’m just guessing. [edited to fix 64-bit/32-bit reference]

      • Yossarian King says:

        Thanks for the quick response. I think you mean “In *32-bit* processes I wouldn’t be so sure …”? Unfortunately “just guessing” seems to be about as confident as anyone is willing to get with floating point determinism (all your good info above notwithstanding), so I just guess I’ll carry on with fixed point. 😉

        BTW, the .NET folks are in the process of incorporating SIMD, at which point I’m sure yet more bets come off the table …

      • Yossarian King says:

        (Thanks for the edit. Seems I can’t edit my comment to remove now-spurious correction.)

        Another issue with C# is JIT compilation means that even if you distribute the same build / patch of the executable to all users they aren’t actually running the same machine code. (NGEN doesn’t help, it’s an install-time thing. The new .NET Native initiative *might* help, but it’s still in preview mode, so I’m not sure.)

    • brucedawson says:

      I got curious and tracked down the relevant specification. You can find a link to the C# language specification here:


      In section 4.16 it explains the evaluation rules, which basically say that intermediate precision might be higher, might not:

      * Then, if either of the operands is of type double, the other operand is converted to double, the operation is performed using at least double range and precision, and the type of the result is double (or bool for the relational operators).
      * Otherwise, the operation is performed using at least float range and precision, and the type of the result is float (or bool for the relational operators).

      • Yossarian King says:

        Ah, good find. The specification also states “Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type [snip]. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.”

        In other words, /phony French accent/ “Floating point determinism? Ha! I float in your general direction, you and all your … !”

  14. Name Required says:

    I am a bit behind on all this cr.p, but didn’t you forget about “hidden bit” and “context switches” (that on some platforms cause FPU to unload it’s state into memory and subsequently lose hidden state, e.g. hidden bit)?

  15. Name Required says:

    “hidden bit” is described in “What Every Computer Scientist Should Know About Floating-Point Arithmetic”.
    Another piece of “hidden state” is extra bits x87 could be using (long double registers).
    Anyways, I might be talking complete garbage, but I think in old days (Win95? old hardware?) context switch (when scheduler decides to stop your thread and run another one on given CPU) resulted in FPU state dumped into memory (losing hidden state). Naturally, since it was happening at random times, it had random effect on certain calcs.
    I think there were problems with MMX state under similar conditions, too…
    Apologies, if all this never happened.

    • Name Required says:

      Hmm… hidden bit is not what I thought it is. Scratch that, pls. Somehow I mixed “hidden bit” with “guard digit” :-\

      • brucedawson says:

        As you realize, the hidden bit and guard bit are not part of the FPU state. They are used during some calculations but are never visible.

        One problem with the (excellent!) and classic Goldberg paper that you cite is that it is a product of its time which is prior to the widespread adoption of IEEE floating-point math. Thus it spends time on implementation details of floating-point math that users have no need of knowing. This occasionally leads to confusion.

  16. Paul Crawford says:

    I think using an uninitialised variable, of any type, is basically a bug (or VERY bad programming if it was intentional)! Over the years I have seen all sorts of strange behaviour that can be resistant to debugging that turned out to be uninitialised values, so my advice is:
    (1) Keep you functions to manageable size, that way it is easier to spot mistakes.
    (2) Turn on all compiler warnings, and actually listen to them!
    (3) Use other static analysis tools (or a 2nd compiler) as well to get another opinion on what might go wrong.
    (4) If in doubt, initialise to zero (or something predictable) as at least any bugs due to unintended use are then reproducible!

    • brucedawson says:

      One of my favorite C++ features is the humble but crucial ability to declare variables anywhere. This then makes it easy to avoid declaring variables until you are ready to initialize them. In most cases this means that a variable’s lifetime needn’t begin until it is initialized, meaning that there the window of opportunity where the variable exists but is not initialized does not exist.

      And, in those rare cases where you can’t initialize your variables with useful data when they are first defined, initialize them with zero, as you suggest.

      Uninitialized member variables are trickier, but even those get easier with C++ 11.

      • Paul Crawford says:

        You can also restrict C variables’ scope by declaring them inside {} regions, but you don’t get the nicety of using for(int ii=0;… of C++ without superfluous brackets. However, one thing that often is missed by compiler warnings, and often by static code tools as well, is duplicate names of differing scope created this way. For example where you have a local variable of the same name as a global or similar, consider:
        int foo(void) {
        int ii, y=0;
        for(ii=0; ii<10; ii++) {
        int y=5*ii;
        return y;
        Here 'y' is declared twice, and so the function returns 0 and not 45 (5*9) as one might expect from the for() loop. It might be legal in the language, but it is a potential bug source!

        • brucedawson says:

          Yeah, adding extra {} is a nice option to have but I wouldn’t want to depend on it just so I can declare variables later.

          Variable shadowing is a real problem. gcc/clang have warnings to detect it but these warnings are usually off. VC++ has warnings to detect it but these are only available with the (slow) /analyze option enabled. I asked Microsoft to fix that for Dev 14 and it sounds like they will:

          The problem, however, is that most code has huge amounts of variable shadowing and fixing it is tedious and dangerous, so most developers can’t enable the warnings, so more variable shadowing accumulates. In my tests I’ve found that ~98% of variable shadowing is harmless, but that other 2% can be nasty.

  17. Allan Murphy says:

    I’d like to supply a correction to the perverse SSE RCPPS example above – the shift right should be 8, not 16. Apologies to anyone using it for business critical software 🙂
    The 3rd line should be:

    int platform = (input.m128_u32[0] >> 8) & 0xf;

  18. Allan Murphy says:

    The general usability and utility of that perverse code is so great that I find it hard to imagine there is a code base in the world that does not contain it now.

  19. brucedawson says:

    This post explains why Company of Heroes 2 has per-platform multiplayer – getting identical floating-point results across the different compilers used on different platforms was too great a challenge:


    • Thanks for the overall discussion on floating point. It’s fascinating. I am a long-time game developer, pondering a fully deterministic code design from the ground up for replay, networking purposes, and simplifying the designer’s life.

      However, it is often expedient to leverage, for example, a package like Bullet Physics. Assuming I’m maintaining and building the source myself, is this even a realistic hope, in your experience?

      On the continuum between setting appropriate compile/link flags coupled with preprocessor defines on the low end and rewriting significant portions of the package on the high end, I’m guessing I would quickly trend toward the high end.

      I would greatly appreciate your perspective.

      • brucedawson says:

        1) Is the physics being used for animations, or for calculations which effect gameplay? If the physics doesn’t affect the course of gameplay then it doesn’t affect determinism.
        2) Is the code all going to run on one CPU architecture (x64 SSE/SSE2 for example) or do you need to support ARM or x86, If you need to ship multiple binaries then it’s far more challenging.
        3) Do you need compatibility between different versions of your game? This implies recompiling which can change behavior.

        These are some of the things to consider. And, test early/test often.

        • Yeah. It’s pretty much ‘worst’ case — i.e. 1) the physics affects gameplay, 2) ideally it’s architecture agnostic, and 3) hopefully floating-point determinism wouldn’t be affected by recompiling.

          Because initially I can get by with simple rigid body physics, I’m leaning toward implementing the physics myself and at the very least insuring determinism across similar CPU architectures.

          Thanks for your response.

  20. brucedawson says:

    This article on thewinnower.com discusses why the *ability* to have bitwise reproducibility is important in science. The ability to reproduce experiments is a crucial part of science and seemingly trivial differences can complicate or prevent this.


  21. Ian Ameline says:

    Great article — learned some new things from it — and I thought I was pretty knowledgeable on the subject :-). Once found a bug where threading produced different results — completely ruled out numerical instability and reduced it to a test case where we just added two identical numbers, but got different results on different threads! I reminded my colleague that every FP operation has a hidden, implied input — the FP control word. In this case, Intel’s TBB library was setting the control word differently on worker threads vs the main thread.

    I’ve also seen a compiler inline a FP heavy function and at the different use sites, the order of ops was different. So a unit test would produce one answer, and real use in real code would produce another.

    These days, implementing a cloud based JITTed simulation framework, we have to group jobs by machine type to keep things sane. (And have a job return a HW finger-print so that subsequent runs can request the same HW runtime setup – including node count and CPU count & type per node — same job inputs + same HW fingerprint = at least a fighting chance of the same outputs.)

  22. Hi-Angel says:

    That’s funny. You didn’t mention though: how do I change per thread floating point round mode? I looked http://man7.org/linux/man-pages/man7/signal.7.html for keywords “round” and “float” but don’t see anything relevant.

  23. George Geczy says:

    An interesting item that caused me to end up here in 2021… my game engine had resolved the floating point determinism to perfection – in x86. We switched to x64, and suddenly AMD/Intel were out of sync. After trying many of the usual things (compiler flags, precision, etc) the end fix was to remove setting _controlfp(_RC_CHOP, _MCW_RC ) – we did this to help fix issues in x86, but now apparently this *causes* issues in x64. So the secret in x64 is to set /fp:strict and don’t touch anything else (leave rounding, precision and sse at defaults) – most docs say that precision and sse settings are not even used in x64 although the docs are not consistent on this point.

    • brucedawson says:

      I find it surprising that _RC_CHOP (round towards zero) fixed determinism issues on x86. Curious. I’m also surprised that it caused issues on x64.

      Your math will be slightly more predictable with the default of round-to-nearest-even, so that’s good.

      The precision settings are not used for x64 because SSE/AVX don’t have that concept. I’m not sure what you mean by SSE settings – x64 code defaults to SSE.

  24. frankson says:

    Hi Bruce,
    thanks for this really detailed article. We develop scientific software for different platforms, and we are looking for stable and same results on all platforms for years! For us, the most important thing is not 1-million-digits precision, but inter-platform reproducability.

    I found mpfr, with boost.multiprecision as a very handy interface – but did not test this so far. The specification seems to match exactly that need, because all algorithms are implemented the same way on all platforms, if I get it right.

    Of course, you will get this for the price of lesser performance, if not using the hardware arithmetic. Did you consider such libraries like mpfr for your needs?


    • brucedawson says:

      I have primarily worked on games and other software where we need maximum performance as well as consistent results so I have not explored mpfr but it is an interesting idea.

  25. Pingback: Random Number Generator Recommendations for Applications | Tech Programing

  26. Pingback: Building for Windows without Running Windows – Quentin Santos

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.