Floating-Point Determinism

Is IEEE floating-point math deterministic? Will you always get the same results from the same inputs? The answer is an unequivocal “yes”. Unfortunately the answer is also an unequivocal “no”. I’m afraid you will need to clarify your question.

My hobby: injecting code into other processes and changing the floating-point rounding mode on some threads

The answer to this ambiguous question is of some interest to game developers. If we can guarantee determinism then we can create replays and multi-player network protocols that are extremely efficient. And indeed many games have done this. So what does it take to make your floating-point math deterministic within one build, across multiple builds, and across multiple platforms?

Before proceeding I need to clarify that floating-point determinism is not about getting the ‘right’ answer, or even the best answer. Floating-point determinism is about getting the same answer on some range of machines and builds, so that every player agrees on the answer. If you want to get the best answer then you need to choose stable algorithms, but that will not guarantee perfect determinism. If you want to learn how to get the best answer you’ll need to look elsewhere, perhaps including the rest of my floating-point series.

Determinism versus free will: cage match

The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think. In particular, the addendum to David Goldberg’s important paper points out that “the IEEE standard does not guarantee that the same program will deliver identical results on all conforming systems.” And, the C/C++ standards don’t actually mandate IEEE floating-point math.

On the other hand, the new IEEE 754-2008 standard does say “Together with language controls it should be possible to write programs that produce identical results on all conforming systems”, so maybe there is hope. They even devote all of chapter 11 to the topic, even if it’s just a one and a half page chapter. They don’t promise it will be easy, they warn that the language-controls are not yet defined, but at least it is potentially possible.

What is guaranteed

Some of the things that are guaranteed are the results of addition, subtraction, multiplication, division, and square root. The results of these operations are guaranteed to be the exact result correctly rounded (more on that later) so if you supply the same input value(s), with the same global settings, and the same destination precision you are guaranteed the same result.

Therefore, if you are careful and if your environment supports your care, it is possible to compose a program out of these guaranteed operations, and floating-point math is deterministic.

What is not guaranteed

Unfortunately there are a significant number of things that are not guaranteed. Many of these things can be controlled so they should not be problems, but others can be tricky or impossible. Which is why the answer is both “yes” and “no”.

Floating-point settings (runtime)

There are a number of settings that control how floating-point math will be done. The IEEE standard mandates several different rounding modes and these are usually expressed as per-thread settings. If you’re not writing an interval arithmetic library then you’ll probably keep the rounding mode set to round-to-nearest-even. But if you – or some rogue code in your process – changes the rounding mode then all of your results will be subtly wrong. It rarely happens but because the rounding mode is typically a per-thread setting it can be amusing to change it and see what breaks. Troll your coworkers! Amuse your friends!

If you are using the x87 floating-point unit (and on 32-bit x86 code you can’t completely avoid it because the calling conventions specify that an x87 register is used to return floating-point results) then if somebody changes the per-thread precision settings your results may be rounded to a different precision than expected.

Exception settings can also be altered, but the mostly likely different result that these changes will cause is a program crash, which at least has the advantage of being easier to debug.

Denormals/subnormals are part of the IEEE standard and if a system doesn’t support them then it is not IEEE compliant. But… denormals sometimes slow down calculations. So some processors have options to disable denormals, and this is a setting that many game developers like to enable. When this setting is enabled tiny numbers are flushed to zero. Oops. Yet again your results may vary depending on a per-thread mode setting.

(see That’s Not Normal–the Performance of Odd Floats for more details on denormals)

Rounding, precision, exceptions, and denormal support – that’s a lot of flags. If you do need to change any of these flags then be sure to restore them promptly. Luckily these settings should rarely be altered so just asserting once per frame that they are as expected (on all threads!) should be enough. There have been sordid situations where floating-point settings can be altered based on what printer you have installed and whether you have used it. If you find somebody who is altering your floating-point settings then it is very important to expose and shame them.

You can use _controlfp on VC++ to query and change the floating-point settings for both the x87 and SSE floating-point units. For gcc/clang look in fenv.h.

Composing larger expressions

Once we have controlled the floating-point settings then our next step is to take the primitive operations (+, –, *, /, sqrt) with their guaranteed results and use them to compose more complicated expressions. Let’s start small and see what we can do with the float variables a, b, and c. How about this:

a + b + c

The most well know problem with the line of code above is order of operations. A compiler could add a and b first, or b and c first (or a and c first, if it was in a particularly cruel mood). The IEEE standard leaves the order of evaluation up to the language, and most languages give compilers some latitude. If b and c happen to be in registers from a previous calculation and if you are compiling with /fp:fast or some equivalent setting then it is quite likely that the compiler will optimize the code by adding b and c first, and this will often give different results compared to adding a and b first. Adding parentheses may help. Or it may not. You need to read your compiler documentation or do some experiments to find out. I know that I have managed to fix several floating-point precision problems with VC++ by forcing a different order of evaluation using parentheses. Your mileage may vary.

Let’s assume that parentheses help, so now we have this:

(a + b) + c

Let’s assume that all of our compilers are now adding a and b and then adding c, and addition has a result guaranteed by the IEEE standard, so do we have a deterministic result now across all compilers and machines?

No.

The result of a + b is stored in a temporary destination of unspecified precision. Neither the C++ or IEEE standards mandate what precision intermediate calculations are done to and this intermediate precision will affect your results. The temporary result could equally easily be stored in a float or a double and there are significant advantages to both options. It is an area where reasonable people can disagree. When using Visual C++ the intermediate precision depends on your compiler version, 32-bit versus 64-bit, /fp compile settings, /arch compile settings, and x87 precision settings. I discussed this issue in excessive detail in Intermediate Floating-Point Precision, or you can just look at the associated chart. Note that for gcc you can improve consistency with -ffloat-store and -fexcess-precision. For C99 look at FLT_EVAL_METHOD. As always, the x87 FPU makes this trickier by mixing run-time and compile-time controls.

The simplest example of the variability caused by different intermediate precisions comes from this expression:

printf(“%1.16e”, 0.1f * 0.1f);

Assuming that the calculation is done at run time the result can vary, depending on whether it is done at float or double precision – and both options are entirely IEEE compliant. A double will always be passed to printf, but the conversion to double can happen before or after the multiplication. In some configurations VC++ will insert extra SSE instructions in order to do the multiplication at double precision.

The term destination in the IEEE standard explicitly gives compilers some flexibility in this area which is why all of the VC++ variants are conformant to the standard. The IEEE 2008 standard encourages compilers to offer programmers control over this with a preferredWidth attribute, but I am not aware of any C++ compilers that support this attribute. Adding an explicit cast to float may help – again, it depends on your compiler and your compilation settings.

Intermediate precision is a particularly thorny problem with the x87 FPU because it has a per-thread precision setting, as opposed to the per-instruction precision setting of every other FPU in common use. To further complicate things, if you set the x87 FPU to round to float or double then you get rounding that is almost like float/double, but not quite. This means that the only way to get predictable rounding on x87 is to store to memory, which costs performance and may lead to double-rounding errors. The net result is that it may be impossible or impractical to get the x87 FPU to give identical results to other floating-point units.

If you store to a variable with a declared format then well-behaved compilers should (so sayeth IEEE-754-2008) round to that precision, so for increased portability you may have to forego some unnamed temporaries.

fmadd

A variant of the intermediate precision problem shows up because of fmadd instructions. These instructions do a multiply followed by an add, and the full precision of the multiply is retained. Thus, these effectively have infinite intermediate precision. This can greatly increase accuracy, but this is another way of saying that it gives different results than machines that don’t have an fmadd instruction. And, in some cases the presence of fmadd can lead to worse results. Imagine this calculation:

result = a * b + c * d

If a is equal to c and b is equal to -d then the result should (mathematically) be equal to zero. And, on a machine without fmadd you typically will get a result of zero. However on a machine with fmadd you usually won’t. On a machine that uses fmadd the generated code will look something like this:

compilerTemp = c * d
result = fmadd(a, b, compilerTemp)

The multiplication of c and d will be rounded but the multiplication of a and b will not be, so the result will usually not be zero.

Trivia: the result of the fmadd calculation above should be an exact representation of the rounding error in c times d. That’s kind of cool.

While the implementation of fmadd is now part of the IEEE standard there are many machines that lack this instruction, and an accurate and efficient emulation of it on those machines may be impossible. Therefore if you need determinism across architectures you will have to avoid fmadd. On gcc this is controlled with -ffp-contract.

Square-root estimate

Graphics developers love instructions like reciprocal square root estimate. However the results of these instructions are not defined by the IEEE standard. If you only ever use these results to drive graphics then you are probably fine, but if you ever let these results propagate into other calculations then all bets are off. This was discovered by Allan Murphy at Microsoft who used this knowledge to create the world’s most perverse CPUID function:

// CPUID is for wimps:
__m128 input = { -997.0f };
input = _mm_rcp_ps(input);
int platform = (input.m128_u32[0] >> 16) & 0xf;
switch (platform)
{
   case 0x0: printf(“Intel.\n”); break;
   case 0x7: printf(“AMD Bulldozer.\n”); break;
   case 0x8: printf(“AMD K8, Bobcat, Jaguar.\n”); break;
   default: printf(“Dunno\n”); break;
}

The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the name suggests, not expected to give a fully accurate result. They are supposed to provide estimates with bounded errors and we should not be surprised that different manufacturers’ implementations (some mixture of tables and interpolation) give different results in the low bits.

This issue doesn’t just affect games, it is also a concern for live migration of virtual machines.

Transcendentals

The precise results of functions like sin, cos, tan, etc. are not defined by the IEEE standard. That’s because the only reasonable result to standardize would be the exact result correctly rounded, and calculating that is still an area of active research due to the Table Maker’s Dilemma. I believe that it is now practical to get correctly rounded results for most of these functions at float precision, but double is trickier. Part of the work in solving this for float was to do an exhaustive search for hard cases (easy for floats because there are only four billion of them) – those where it takes over a hundred bits of precision before you find out which way to round. In practice I believe that these instructions give identical results between current AMD and Intel processors, but on PowerPC where they are calculated in software they are highly unlikely to be identical. You can always write your own routines, but then you have to make sure that they are consistent, as well as accurate.

Update: according to this article the results of these instructions changed when the Pentium came out, and between the AMD-K6 and subsequent AMD processors. The AMD changes were to maintain compatibility with Intel’s imperfections.

Update 2: the fsin instruction is quite inaccurate around pi and multiples of pi. Because of this many C runtimes implement sin() without using fsin, and give much more accurate (but very different) results. g++ will sometimes calculates sin() at compile-time which it does extremely accurately. In 32-bit Ubuntu 12.04 with glibc 2.15 the run-time sin() would use fsin making for significant differences depending on whether sin() was calculated at run-time or compile-time. On Ubuntu 12.04 this code does not print 1.0:

const double pi_d = 3.14159265358979323846;
const int zero = argc / 99;
printf(“%f\n”, sin(pi_d + zero) / sin(pi_d));

0.999967

Per-processor code

Some libraries very helpfully supply different code for CPUs with different capabilities. These libraries test for the presence of features like SSE and then use those features if available. This can be great for performance but it adds a new option for getting different results on different CPUs. Watch for these techniques and either test to make sure they give identical results, or avoid them like the plague.

Conversions

Conversion between bases – such as printf(“%1.8e”); – is not guaranteed to be identical across all implementations. Doing perfect conversions efficiently was an unsolved problem when the original IEEE standard came out and while it has since been solved this doesn’t mean that everybody does correctly rounded printing. For a comparison between gcc and Visual C++ see Float Precision Revisited: Nine Digit Float Portability. VC++ Dev 14 improves the situation considerably (see Formatting and Parsing Correctness).

While conversion to text is not guaranteed to be correctly rounded, the values are guaranteed to round-trip as long as you print them with enough digits of precision, and this is true even between gcc and Visual C++, except where there are implementation bugs. Rick Regan at Exploring Binary has looked at this issue in great depth and has reported on double values that don’t round-trip when read with iostreams (scanf is fine, and so is the conversion to text), and troublesome values that have caused both Java and PHP to hang when converting from text to double. Great stuff.

So don’t use iostreams, Java, or PHP?

Uninitialized data

It seems odd to list uninitialized data as a cause of floating-point indeterminism because there is usually nothing floating-point specific about this. But sometimes there is. Imagine a function like this:

void NumWrapper::Set( T newVal )
{
    if ( m_val != newVal )
    {
        m_val = newVal;
        Notify();
    }
}

If m_val is not initialized then the first call to Set may or may not call Notify, depending on what value m_val started with. However if T is equal to float, and if you are compiling with /fp:fast, and if m_val happens to be a NaN (always possible with uninitialized data) then the comparison may say that m_val and newVal are equal, and newVal will never get set, and Notify will never be called. Yes, a NaN is supposed to compare not-equal to everything, but the always-popular /fp:fast takes away this guarantee.

I’ve run into this bug twice in the last two years. It’s nasty. Maybe don’t use /fp:fast?

Another way that float code could be affected by uninitialized data when integer code is not is a calculation like this:

result = a + b – b;

On every machine I’ve ever used this will set result to a regardless of the value of b, for integer calculations. For floating-point calculations there are many values of b that would cause you to end up with a different result, typically infinity or NaN. I’ve never hit this bug, but it is certainly possible.

Compiler differences

There are many compiler flags and differences that might affect intermediate precision or order of operations. Some of the things that could affect results include:

  • Debug versus release versus levels of optimization
  • x86 versus x64 versus PowerPC
  • SSE versus SSE2 versus x87
  • gcc versus Visual C++ versus clang
  • /fp:fast versus /fp:precise
  • -ffp-contract, -ffloat-store, and -fexcess-precision
  • FLT_EVAL_METHOD
  • Compile-time versus run-time calculations (such as sin())

With jitted languages such as C# the results may also vary depending on whether you launch your program under a debugger, and your code could potentially be optimized while it’s running so that the results subtly change.

Other sources of non-determinism

Floating-point math is a possible source of non-determinism, but it is certainly not the only one. If your simulation frames have variable length, if your random number generators don’t replay correctly, if you have uninitialized variables, if you use undefined C++ behavior, if you allow timing variations or thread scheduling differences to affect results, then you may find that determinism fails for you, and it’s not always the fault of floating-point math. For more details on these issues see the resources section at the end.

Summary

A lot of things can cause floating-point indeterminism. How difficult determinism is will depend on whether you need the exact same behavior as you rebuild and maintain your code, and when your code runs on completely different platforms. The stronger your needs, the more difficult and costly will be the engineering effort.

Some people assume that if you use stable algorithms then determinism doesn’t matter. They are wrong. If your network protocol or save game format stores only user inputs then you must have absolute determinism. An error of one ULP (Unit in the Last Place) won’t always matter, but sometimes it will make the difference between a character surviving and dying, and things will diverge from there. You can’t solve this problem just by using epsilons in your comparisons.

If you are running the same binary on the same processor then the only determinism issues you should have to worry about are:

  • Altered FPU settings for precision (x87 only), rounding, or denormal control
  • Uninitialized variables (especially with /fp:fast)
  • Non floating-point specific sources of indeterminism

The details of how your compiler converts your source code to machine code are irrelevant in this case because every player is running the same machine code on the same processor. This is the easiest type of floating-point determinism and, absent other problems with determinism, it just works.

That’s easy enough, but unless you are running on a console you probably have to deal with some CPU variation. If you are running the same binary but on multiple processor types – either different manufacturers or different generations – then you also have to worry about:

  • Different execution paths due to CPU feature detection
  • Different results from sin, cos, estimate instructions, etc.

That’s still not too bad. You have to accept some limitations, and avoid some useful features, but the core of your arithmetic can be written without thinking about this too much. Again, the secret is that every user is executing exactly the same instructions, and you have restricted yourself to instructions with defined behavior, so it just works™.

If you are running a different binary then things start getting sticky. How sticky they get depends on how big a range of compilers, compiler settings, and CPUs you want to support. Do you want debug and release builds to behave identically? PowerPC and x64? x87 and SSE (god help you)? Gold master and patched versions? Maintaining determinism as you change the source code can be particularly tricky, and increasing discipline will be required. Some of the additional things that you may need to worry about include:

  • Compiler rules for generating floating-point code
  • Intermediate precision rules
  • Compiler optimization settings and rules
  • Compiler floating-point settings
  • Differences in all of the above across compilers
  • Different floating-point architectures (x87 versus SSE versus VMX and NEON)
  • Different floating-point instructions such as fmadd
  • Different float-to-decimal routines (printf) that may lead to different printed values
  • Buggy decimal-to-float routines (iostreams) that may lead to incorrect values

If you can control these factors – some are easy to control and some may involve a lot of work – then floating-point math can be deterministic, and indeed many games have been shipped based on this. Then you just need to make sure that everything else is deterministic and you are well on your way to an extremely efficient replay and networking mechanism.

It turns out that there are some parts of your floating-point code that can be non-deterministic and can make use of, for instance, square-root estimate instructions. If you have code that is just driving the GPU, and if the GPU results never affect gameplay, then variations in this code will not lead to divergence.

When debugging the problems associated with a game engine that must be deterministic, remember that despite all of its mysteries there is some logic to floating-point, and in many cases a loss of determinism is actually caused by something completely different. If your code diverges when running the same binary on identical processors then, unless you’ve got a suspicious printer driver, you might want to look for bugs elsewhere in your code instead of always blaming IEEE-754.

Resources

Homework

Explain clearly why printf(“%1.16ef”, 0.1f * 0.1f); can legally print different values, and how that behavior applies to 0.1f * 0.3f * 0.7f.

For extra credit, explain why a float precision version of fmadd cannot easily be implemented on processors that lack it by using double math (or alternately, prove the opposite by implementing fmadd).

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Floating Point, Visual Studio and tagged . Bookmark the permalink.

47 Responses to Floating-Point Determinism

  1. K Gadd says:

    I once spent a day or two helping another engineer investigate a weird problem they were running into with an application we had written in PHP. For some reason, occasionally test cases would fail, but only after a certain other set of test cases had run. The failure was very strange – the PHP unserialize() function was returning corrupted results in some cases. He couldn’t find any reason for it to break. For example, unserialize() on this input would normally return 4 but was returning 12. (wtf?)

    The fact that a previous test case was causing it to break made me suspicious. Eventually, we dug around and figured out that the previous test case was loading a native module. You may be able to see where this is going. Further investigation finally got us into a state where we had a broken worker thread (where the test case always failed), and we could compare it with a good thread (where the test case passed). Guess what was different?

    The floating point rounding mode. Oh god. The floating point rounding mode affected our ability to unpack integers from strings! :(

    • brucedawson says:

      Uggh. Yeah. Horrible. I guess the unserialize must have been converting to double and then integer, which is a whole other set of problems.

      The native module that changed the rounding mode should be burned with fire.

  2. Daniel Camus says:

    Hello Bruce, very informative! As far as you know, how is the behavior on ARM architectures? The vendor base it’s way too fragmented like the old days, Qualcomm, Nvidia, Texas Instruments, etc. In the company I work for, we face many many nasty bugs (on mobile games), and as you very well said, the flag /fp:fast it’s almost a game requirement. Not to mention the buggy video drivers and most of the shader bugs are related to precision.

    • brucedawson says:

      /fp:fast seems to be less necessary with recent versions of VC++ (2012 and beyond), so try skipping it there.

      I would expect ARM to be similar to SSE from a hardware point of view, but I don’t know how the compilers handle it. It should be easier than trying to get consistency from x87.

  3. ohmantics says:

    And how can I forget debugging a loss-of-sync issue caused by installing the Motorola libmotovec “drop-in replacement math library” on a PowerPC Mac. It caused differences in the static initialization of an 8-bit sincos table our game used.

  4. Smok says:

    I agree – disagree:) That’s what I love in science – there’s not easy Answers:D

  5. saijanai says:

    Squeak Smaltalk sidesteps this issue by using a software-based floating point package guaranteed to yield consistent answers, not just from one calculation to the next, but across platforms as well.
    This allows Croquet/Cobalt to distribute 3D and physics data in the form of Smalltalk messages that evoke calculations locally, rather than as calculations performed on a central server so that 3D game clients can say in-synch indefinitely while performing all relevant calculations locally.

    • brucedawson says:

      What is the performance impact of this? The real trick is to get deterministic, stable math, that is fast enough to be useful.

      • saijanai says:

        It’s pretty slow compared to a direct CPU call, obviously. OTOH, it makes distributed virtual worlds trivially simple, at least on a small scale. The original strategy for Croquet distributed processing was called “Teatime” and was implemented by David P Reed as a first direct test of his scaleable internet work. Currently, he’s trying to recreate it in JavaScript in the webbrowser.

  6. minusinf says:

    I heard that a company* found a cpu bug within the integer processing unit of a server processor when the processor was running hot. So there’s your intra-cpu issue :)

    I think the internal CPU instruction reordering can also have an impact: Especially if the CPU is running a high number of threads and the results depend on the higher intermediate accuracy of the FPU. Say collision detection and everything has been stored as floats.

    Maybe you need to abduct an Intel engineer for an extended interview ;)

    • brucedawson says:

      Out-of-order execution will never change the results, unless the CPU is horribly buggy. Multiple threads won’t change the results unless you have race conditions, in which case it’s not a floating-point problem, it’s a multi-threading problem.

      I hope to never have to deal with a processor that misbehaves when hot. Although, I do remember years ago spending a long time tracking down a mysterious crash that turned out to be a bad memory chip.

  7. Kibeom Kim says:

    Have you looked at http://nicolas.brodu.net/en/programmation/streflop/ ? It’s pretty comprehensive. I haven’t benchmarked it, but seeing how it is implemented, I think the performance will be still practical.

  8. Pingback: Epicene Cyborg

  9. Joshua Grass says:

    So does that mean you guys are using a deterministic sim approach for DOTA 2? I thought you were using a client server model? I ask because I wrote the deterministic simulation engine for Guardians of Middle Earth and I was always curious which approach the other MOBAs used.

    • brucedawson says:

      I’m just fascinated by these issues, and I’ve had to fix a fair number of floating-point bugs over the years, but I’ve actually never worked on an engine that relied on floating-point determinism, I don’t think.

      • Joshua Grass says:

        We certainly relied on it for Guardians, but luckily we were a fixed platform so we never ran into situations where the floating point math was an issue. Years ago I wrote another deterministic sim for the Mac back when there was 68k and PPC chips. Man, that was a learning experience! Luckily I was in my early twenties so staying up for a week and converting my floating point math to integer math was something I was willing to do…Thanks for the article, I find this stuff fascinating as well.

  10. mortoray says:

    Frighteningly enlightening. Given all these variables and conditions does it even make sense to write something that requires floating point determinism? It seems saner to have the engine assume each client has a different result — though I admit I’m at a loss as to how it would handle that.

    • brucedawson says:

      If you restrict yourself to a single build giving the same results on multiple machines then the challenges are manageable and have indeed been handle. Supporting multiple builds, possibly from different compilers, is indeed terrifying and may be practically impossible, especially if you need to support x87 and SSE, or x87 and PowerPC.

  11. Pascal Cuoq says:

    Hello, Bruce.

    > For extra credit, explain why a float precision version of fmadd cannot easily be implemented on processors that lack it by using double math (or alternately, prove the opposite by implementing fmadd).

    Here is an easy implementation of fmadd for round-to-nearest, a≥0, b≥0, c≥0, all finite. I am no good at generating difficult inputs but I think the idea should be sound even if one or two small bugs remain:

    http://ideone.com/kx7MXE

    • brucedawson says:

      I can’t tell if that is correct or not.

      Storing the result of the multiply in a double is the right idea (as far as I can tell) but since you don’t identify the reason why the obvious implementation (do the add and then store the result) is wrong I really can’t tell whether you have corrected for it or not.

      • Pascal Cuoq says:

        The double-precision multiplication of a and b is a good start because it is exact. If we were to stop now and round to float, we would get the correctly rounded single-precision result of a * b. There would be no double rounding because there was no rounding the first time.

        > you don’t identify the reason why the obvious implementation (do the add and then store the result) is wrong

        This double-precision addition would not be exact. The result would be rounded to double and then rounded to float. IEEE 754’s formatOf would be a great help here, but it is not available in hardware that I know of.

        Instead, the rest of the code follows the same initial idea of using exact computations possibly followed by an ultimate approximate one.

        After s1 and r1 have been computed, the mathematical equality a*b+c = s1+r1 holds between the value of these variables (and r1 is small compared to s1).

        A good candidate for the result of the function is f1 = (float)s1. The correct answer is either f1 or the float on the other side of s1 from f1. This float is computed as f2.

        The last addition in t – 0.5 * ulp + r1 is not necessarily exact, but it is precise around zero, so that the conditions r0 have the same truth values as if r had been computed with infinite precision.

        • Pascal Cuoq says:

          And there probably are bits that are sub-optimal in this function. In particular if (p1 < p2) is perhaps unnecessary, but I don't understand the nuances here very well yet, so I wrote the safe version.

        • brucedawson says:

          In short, the issue is the double rounding at the end. The double precision multiply is exact, the double precision add is inexact (but more accurate than a float fmadd would have been) but when you store the result to a float you may hit double rounding, leading to a slightly different result from a real float fmadd.

          I have no idea whether your solution is correct, but it does seem clever and plausible. I think you’d need to run it for a few hours on random inputs on a number with an fmadd instruction and compare. I’d recommend an AVX machine for that but that comes with its own problems: http://randomascii.wordpress.com/2013/03/11/should-this-windows-7-bug-be-fixed/

  12. Pingback: BigDecimal over double | Tech Geek

  13. Louanne says:

    hello!,I like your writing so much! proportion we
    communicate extra about your post on AOL? I require an expert in this house to resolve my problem.
    May be that’s you! Having a look ahead to look you.

  14. Maxime Coste says:

    Hey, very interesting article. I learned most of these things the hard way, making our RTS game handle cross-platform determinism (32/64 bits, MSVC/Windows, Clang/MacOS, GCC/Linux).

    One tip I wanted to add is the ‘%a’ printf format, which is not std C++98 (C99 actually, so C++11 as well) but is supported on all these platforms, and which prints a floating point number in exact representation.

    This saved me many times when tracing execution on different platforms, as %f does not always round the same on different platforms (giving false positives, or false negatives).

    By the way, thanks for the Intermediate floating-point precision article, it was very useful.

  15. Pingback: Floating-point equality: It’s worse than you think | Possibly Wrong

  16. Ahaa, its pleasant discussion regarding this post at this place at this web site,
    I have read all that, so now me also commenting
    at this place.

  17. Yossarian King says:

    You mention C# once, but just in passing. I don’t suppose you have additional info on floating point determinism in the .NET environment? The C# compiler does not provide the command line options or #pragmas that are available in C++ — should I just assume results are non-deterministic and go roll myself a nice integer-based fixed-point library?

    • brucedawson says:

      I’m afraid I don’t know what the expectations are for C#. One would hope that .Net would have a well defined floating-point model, but a few quick web searches found nothing useful. I would expect that the results in 64-bit processes will be sane and sensible. In 32-bit processes I wouldn’t be so sure, given that platform’s unfortunate x87 FPU legacy. But, I’m just guessing. [edited to fix 64-bit/32-bit reference]

      • Yossarian King says:

        Thanks for the quick response. I think you mean “In *32-bit* processes I wouldn’t be so sure …”? Unfortunately “just guessing” seems to be about as confident as anyone is willing to get with floating point determinism (all your good info above notwithstanding), so I just guess I’ll carry on with fixed point. ;-)

        BTW, the .NET folks are in the process of incorporating SIMD, at which point I’m sure yet more bets come off the table …

      • Yossarian King says:

        (Thanks for the edit. Seems I can’t edit my comment to remove now-spurious correction.)

        Another issue with C# is JIT compilation means that even if you distribute the same build / patch of the executable to all users they aren’t actually running the same machine code. (NGEN doesn’t help, it’s an install-time thing. The new .NET Native initiative *might* help, but it’s still in preview mode, so I’m not sure.)

    • brucedawson says:

      I got curious and tracked down the relevant specification. You can find a link to the C# language specification here:

      http://msdn.microsoft.com/en-us/library/ms228593.aspx

      In section 4.16 it explains the evaluation rules, which basically say that intermediate precision might be higher, might not:


      * Then, if either of the operands is of type double, the other operand is converted to double, the operation is performed using at least double range and precision, and the type of the result is double (or bool for the relational operators).
      * Otherwise, the operation is performed using at least float range and precision, and the type of the result is float (or bool for the relational operators).

      • Yossarian King says:

        Ah, good find. The specification also states “Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type [snip]. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.”

        In other words, /phony French accent/ “Floating point determinism? Ha! I float in your general direction, you and all your … !”

  18. Name Required says:

    I am a bit behind on all this cr.p, but didn’t you forget about “hidden bit” and “context switches” (that on some platforms cause FPU to unload it’s state into memory and subsequently lose hidden state, e.g. hidden bit)?

  19. Name Required says:

    “hidden bit” is described in “What Every Computer Scientist Should Know About Floating-Point Arithmetic”.
    Another piece of “hidden state” is extra bits x87 could be using (long double registers).
    Anyways, I might be talking complete garbage, but I think in old days (Win95? old hardware?) context switch (when scheduler decides to stop your thread and run another one on given CPU) resulted in FPU state dumped into memory (losing hidden state). Naturally, since it was happening at random times, it had random effect on certain calcs.
    I think there were problems with MMX state under similar conditions, too…
    Apologies, if all this never happened.

    • Name Required says:

      Hmm… hidden bit is not what I thought it is. Scratch that, pls. Somehow I mixed “hidden bit” with “guard digit” :-\

      • brucedawson says:

        As you realize, the hidden bit and guard bit are not part of the FPU state. They are used during some calculations but are never visible.

        One problem with the (excellent!) and classic Goldberg paper that you cite is that it is a product of its time which is prior to the widespread adoption of IEEE floating-point math. Thus it spends time on implementation details of floating-point math that users have no need of knowing. This occasionally leads to confusion.

  20. Paul Crawford says:

    I think using an uninitialised variable, of any type, is basically a bug (or VERY bad programming if it was intentional)! Over the years I have seen all sorts of strange behaviour that can be resistant to debugging that turned out to be uninitialised values, so my advice is:
    (1) Keep you functions to manageable size, that way it is easier to spot mistakes.
    (2) Turn on all compiler warnings, and actually listen to them!
    (3) Use other static analysis tools (or a 2nd compiler) as well to get another opinion on what might go wrong.
    (4) If in doubt, initialise to zero (or something predictable) as at least any bugs due to unintended use are then reproducible!

    • brucedawson says:

      One of my favorite C++ features is the humble but crucial ability to declare variables anywhere. This then makes it easy to avoid declaring variables until you are ready to initialize them. In most cases this means that a variable’s lifetime needn’t begin until it is initialized, meaning that there the window of opportunity where the variable exists but is not initialized does not exist.

      And, in those rare cases where you can’t initialize your variables with useful data when they are first defined, initialize them with zero, as you suggest.

      Uninitialized member variables are trickier, but even those get easier with C++ 11.

      • Paul Crawford says:

        You can also restrict C variables’ scope by declaring them inside {} regions, but you don’t get the nicety of using for(int ii=0;… of C++ without superfluous brackets. However, one thing that often is missed by compiler warnings, and often by static code tools as well, is duplicate names of differing scope created this way. For example where you have a local variable of the same name as a global or similar, consider:
        int foo(void) {
        int ii, y=0;
        for(ii=0; ii<10; ii++) {
        int y=5*ii;
        }
        return y;
        }
        Here 'y' is declared twice, and so the function returns 0 and not 45 (5*9) as one might expect from the for() loop. It might be legal in the language, but it is a potential bug source!

        • brucedawson says:

          Yeah, adding extra {} is a nice option to have but I wouldn’t want to depend on it just so I can declare variables later.

          Variable shadowing is a real problem. gcc/clang have warnings to detect it but these warnings are usually off. VC++ has warnings to detect it but these are only available with the (slow) /analyze option enabled. I asked Microsoft to fix that for Dev 14 and it sounds like they will:

          http://randomascii.wordpress.com/2013/09/09/vote-for-the-vc-improvements-that-matter/

          The problem, however, is that most code has huge amounts of variable shadowing and fixing it is tedious and dangerous, so most developers can’t enable the warnings, so more variable shadowing accumulates. In my tests I’ve found that ~98% of variable shadowing is harmless, but that other 2% can be nasty.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s