A few months ago I saw a blog post touting fancy new SSE3 functions for implementing vector *floor*, *ceil*, and *round* functions. There was the inevitable proud proclaiming of impressive performance and correctness. However the *ceil* function gave the wrong answer for many numbers it was supposed to handle, including odd-ball numbers like ‘one’.

The *floor* and *round* functions were similarly flawed. The reddit discussion of these problems then discussed two other sets of vector math functions. Both of them were similarly buggy.

Fixed versions of some of these functions were produced, and they are greatly improved, but some of them still have bugs.

Floating-point math is hard, but testing these functions is trivial, and fast. Just do it.

The functions *ceil,* *floor*, and* round* are particularly easy to test because there are presumed-good CRT functions that you can check them against. And, you can test every float bit-pattern (all four billion!) in about ninety seconds. It’s actually very easy. Just iterate through all four-billion (technically 2^32) bit patterns, call your test function, call your reference function, and make sure the results match. Properly comparing NaN and zero results takes a bit of care but it’s still not too bad.

Aside: floating-point math has a reputation for producing results that are unpredictably wrong. This reputation is then used to justify sloppiness, which then justifies the reputation. In fact IEEE floating-point math is designed to, whenever practical, give the best possible answer (correctly rounded), and functions that extend floating-point math should follow this pattern, and only deviate from it when it is clear that correctness is too expensive.

Later on I’ll show the implementation for my *ExhaustiveTest* function but for now here is the function declaration:

typedef float(*Transform)(float); // Pass in a range of float representations to compare against. // start and stop are inclusive. Pass in 0, 0xFFFFFFFF to scan all // floats. The floats are iterated through by incrementing // their integer representation. void ExhaustiveTest(uint32_t start, uint32_t stop, Transform TestFunc, Transform RefFunc, const char* desc)

Typical test code that uses ExhaustiveTest is shown below. In this case I am testing the original SSE 2 *_mm_ceil_ps2* function that started the discussion, with a wrapper to translate between float and __m128. The function didn’t claim to handle floats outside of the range of 32-bit integers so I restricted the test range to just those numbers:

float old_mm_ceil_ps2(float f) { __m128 input = { f, 0, 0, 0 }; __m128 result = old_mm_ceil_ps2(input); return result.m128_f32[0]; } int main() { // This is the biggest number that can be represented in // both float and int32_t. It’s 2^31-128. Float_t maxfloatasint(2147483520.0f); const uint32_t signBit = 0×80000000; ExhaustiveTest(0, (uint32_t)maxfloatasint.i, old_mm_ceil_ps2, ceil, "old _mm_ceil_ps2"); ExhaustiveTest(signBit, signBit | maxfloatasint.i, old_mm_ceil_ps2, ceil, "old _mm_ceil_ps2"); }

Note that this code uses the Float_t type to get the integer representation of a particular float. I described Float_t years ago in Tricks With the Floating-Point Format .

## How did the original functions do?

*_mm_ceil_ps2* claimed to handle all numbers in the range of 32-bit integers, which is already ignoring about 38% of floating-point numbers. Even in that limited range it had 872,415,233 errors – that’s a 33% failure rate over the 2,650,800,128 floats it tried to handle. *_mm_ceil_ps2* got the wrong answer for all numbers between 0.0 and *FLT_EPSILON* * 0.25, all odd numbers below 8,388,608, and a few other numbers. A fixed version was quickly produced after the errors were pointed out.

Another set of vector math functions that was discussed was DirectXMath. The 3.03 version of DirectXMath’s *XMVectorCeiling* claimed to handle all floats. However it failed on lots of tiny numbers, and on most odd numbers. In total there were 880,803,839 errors out of the 4,294,967,296 numbers (all floats) that it tried to handle. The one redeeming point for *XMVectorCeiling* is that these bugs have been known and fixed for a while, but you need the latest Windows SDK (comes with VS 2013) in order to get the fixed 3.06 version. And even the 3.06 version doesn’t entirely fix *XMVectorRound*.

The LiraNuna / glsl-sse2 family of functions were the final set of math functions that were mentioned. The LiraNuna *ceil* function claimed to handle all floats but it gave the wrong answer on 864,026,625 numbers. That’s better than the others, but not by much.

I didn’t exhaustively test the *floor* and *round* functions because it would complicate this article and wouldn’t add significant value. Suffice it to say that they have similar errors.

## Sources of error

Several of the *ceil* functions were implemented by adding 0.5 to the input value and rounding to nearest. This does not work. This technique fails in several ways:

- Round to nearest even is the default IEEE rounding mode. This means that 5.5 rounds to 6, and 6.5 also rounds to 6. That’s why many of the
*ceil*functions fail on odd integers. This technique also fails on the largest float smaller than 1.0 because this plus 0.5 gives 1.5 which rounds to 2.0. - For very small numbers (less than about
*FLT_EPSILON** 0.25) adding 0.5 gives 0.5 exactly, and this then rounds to zero. Since about 40% of the positive floating-point numbers are smaller than FLT_EPSILON*0.25 this results in a lot of errors – over 850 million of them!

The 3.03 version of DirectXMath’s *XMVectorCeiling* used a variant of this technique. Instead of adding 0.5 they added *g_XMOneHalfMinusEpsilon*. Perversely enough the value of this constant doesn’t match its name – it’s actually one half minus 0.75 times *FLT_EPSILON*. Curious. Using this constant avoids errors on 1.0f but it still fails on small numbers and on odd numbers greater than one.

## NaN handling

The fixed version of *_mm_ceil_ps2* comes with a handy template function that can be used to extend it to support the full range of floats. Unfortunately, due to an implementation error, it fails to handle NaNs. This means that if you call *_mm_safeInt_ps<new_mm_ceil_ps2>()* with a NaN then you get a normal number back. Whenever possible NaNs should be ‘sticky’ in order to aid in tracking down the errors that produce them.

The problem is that the wrapper function uses *cmpgt* to create a mask that it can use to retain the value of large floats – this mask is all ones for large floats. However since all comparisons with NaNs are false this mask is zero for NaNs, so a garbage value is returned for them. If the comparison is switched to *cmple* and the two mask operations (*and* and *andnot*) are switched then NaN handling is obtained for free. Sometimes correctness doesn’t cost anything. Here’s a fixed version:

template< __m128 (FuncT)(const __m128&) > inline __m128 _mm_fixed_safeInt_ps(const __m128& a){ __m128 v8388608 = *(__m128*)&_mm_set1_epi32(0x4b000000); __m128 aAbs = _mm_and_ps(a, *(__m128*)&_mm_set1_epi32(0x7fffffff)); // In order to handle NaNs correctly we need to use le instead of gt. // Using le ensures that the bitmask is clear for large numbers *and* // NaNs, whereas gt ensures that the bitmask is set for large numbers // but not for NaNs. __m128 aMask = _mm_cmple_ps(aAbs, v8388608); // select a if greater then 8388608.0f, otherwise select the result of // FuncT. Note that 'and' and 'andnot' were reversed because the // meaning of the bitmask has been reversed. __m128 r = _mm_xor_ps(_mm_andnot_ps(aMask, a), _mm_and_ps(aMask, FuncT(a))); return r; }

With this fix and the latest version of *_mm_ceil_ps2* it becomes possible to handle all 4 billion floats correctly.

## Conventional wisdom Nazis

Conventional wisdom says that you should never compare two floats for equality – you should always use an epsilon. Conventional wisdom is wrong.

I’ve written in great detail about how to compare floating-point values using an epsilon, but there are times when it is just not appropriate. Sometimes there really is an answer that is correct, and in those cases anything less than perfection is just sloppy.

So yes, I’m proudly comparing floats to see if they are equal.

## How did the fixed versions do?

After the flaws in these functions were pointed out fixed versions of *_mm_ceil_ps2* and its sister functions were quickly produced and these new versions work better.

I didn’t test every function, but here are the results from the final versions of functions that I did test:

- XMVectorCeiling 3.06: zero failures
- XMVectorFloor 3.06: zero failures
- XMVectorRound 3.06: 33,554,432 errors on incorrectly handled boundary conditions
- _mm_ceil_ps2 with _mm_safeInt_ps: 16777214 failures on NaNs
- _mm_ceil_ps2 with _mm_fixed_safeInt_ps: zero failures
- LiraNuna ceil: this function was not updated so it still has

864,026,625 failures.

Exhaustive testing works brilliantly for functions that take a single float as input. I used this to great effect when rewriting all of the CRT math functions for a game console some years ago. On the other hand, if you have a function that takes multiple floats or a double as input then the search space is too big. In that case a mixture of test cases for suspected problem areas and random testing should work. A trillion tests can complete in a reasonable amount of time, and it should catch most problems.

## Test code

Here’s a simple function that can be used to test a function across all floats. The sample code linked below contains a more robust version that tracks how many errors are found.

// Pass in a uint32_t range of float representations to test. // start and stop are inclusive. Pass in 0, 0xFFFFFFFF to scan all // floats. The floats are iterated through by incrementing // their integer representation. void ExhaustiveTest(uint32_t start, uint32_t stop, Transform TestFunc, Transform RefFunc, const char* desc) { printf("Testing %s from %u to %u (inclusive).\n", desc, start, stop); // Use long long to let us loop over all positive integers. long long i = start; while (i <= stop) { Float_t input; input.i = (int32_t)i; Float_t testValue = TestFunc(input.f); Float_t refValue = RefFunc(input.f); // If the results don’t match then report an error. if (testValue.f != refValue.f && // If both results are NaNs then we treat that as a match. (testValue.f == testValue.f || refValue.f == refValue.f)) { printf("Input %.9g, expected %.9g, got %1.9g \n", input.f, refValue.f, testValue.f); } ++i; } }

## Subtle errors

My test code misses one subtle difference – it fails to detect one type of error. Did you spot it?

The correct result for ceil(-0.5f) is -0.0f. The sign bit should be preserved. The vector math functions all fail to do this. In most cases this doesn’t matter, at least for game math, but I think it is at least important to acknowledge this (minor) imperfection. If the compare function was put into ‘fussy’ mode (just compare the representation of the floats instead of the floats) then each of the *ceil* functions would have an additional billion or so failures, from all of the floats between -0.0 and -1.0.

## References

The original post that announced *_mm_ceil_ps2* can be found here – with corrected code:

http://dss.stephanierct.com/DevBlog/?p=8

This post discusses the bugs in the 3.03 version of DirectXMath and how to get fixed versions:

http://blogs.msdn.com/b/chuckw/archive/2013/03/06/known-issues-directxmath-3-03.aspx

This post links to the LiraNuna glsl-sse2 math library:

https://github.com/LiraNuna/glsl-sse2

The original reddit discussion of these functions can be found here:

http://w3.reddit.com/r/programming/comments/1p2yys/sse3_optimized_vector_floor_ceil_round_and_mod/

Sample code for VC++ 2013 to run these tests. Just uncomment the test that you want to run from the body of *main*.

ftp://ftp.cygnus-software.com/pub/Test4Billion.zip

The reddit discussion of this post can be found here.

The hacker news discussion of this post can be found here.

I’ve written before about running tests on all of the floats. The last time I was exhaustively testing round-tripping of printed floats, which took long enough that I showed how to easily parallelize it and then I verified that they round-tripped between VC++ and gcc. This time the tests ran so quickly that it wasn’t even worth spinning up extra threads.

Great post! I’d like to see more exhaustive testing of ALL functions with a fairly limited input space

might I suggest using a monospaced sans font inside a pre tag instead of an italic serif… much more readable. You can easily create and embed code examples using https://gist.github.com/

My preferred toolset for blog writing handles this badly, but I agree it’s important. I went in and fixed up the HTML to display the code sanely. Thanks for the critique.

Sadly I lost most of my intrest in glsl-sse2, thankfully, it’s open source so people can fork it and contribute.

It’s not the best code in the world, but it’s mine and I’m proud of it.

That’s too bad. But, I’m glad you open sourced it so that others can use it and learn from it.

Ah, the tried and true exhaustive search. Had lunch with Doug Hoffman just this weekend, who was discussing a similar situation he found himself in many years ago. He’s written about it at http://www.testingeducation.org/BBST/foundations/Hoffman_Exhaust_Options.pdf

Funny how the times may change, but the same problems keep getting solved over and over.

Excellent!

I was at first surprised they didn’t try writing the sqrt() results to a disk and then comparing them against known-good results from another computer, but I guess this was before multi-GB disks were available. When I did an exhaustive comparison of printf/scanf on gcc and VC++ (https://randomascii.wordpress.com/2013/02/07/float-precision-revisited-nine-digit-float-portability/) I used a shared folder and simple compression to let me store all of the results and transfer them between Windows and Linux without wasting too much of my 750 GB drive.

The way he talked about about the problem, you sometimes have to get creative with your oracle (the component of a highly automated test that is used to verify the correctness of system under test) when solving problems of this mechanical scale.

He had a known good implementation of sqrt() (the article glosses over the details of this part of the problem solving), and simply had the program provide the same input to both implementations. You only have to log an error if the two implementations disagree. Repeat x4billion on a 4096 core “super”computer and you’ve got yourself a “high volume functional equivalence test”.

Finding an oracle is most of the hard thought when solving problems like that. I’ve seen groups compare a system to itself on a different platform (testing a port?), a previous version of itself (a literal regression test!), a competitor (want to test to see if the math lib in OpenOffice Calc is as good as the latest Excel? Wire up the same RNG to both pieces of software and compare results), and even a toy/model implementation (As part of an undergrad research project, I wanted to test how well Windows Media Player handles large playlists. My oracle was a playlist manager I wrote in Ruby in a weekend. Write a script to send the same noise to the interface parts that should work the same- add track, move track, delete track in my case- then run diff on the output m3u files. If the diff output is more than just the generator ID lines, you’ve got a long-sequence bug in either your playlist writer implementation or one of those actions when executed under stress. WMP had just such a problem).

While we’re throwing around links of interest, I’ll point you towards some other resources I’m familiar with on this sphere of problems:

Part of my undergraduate research has been on High Volume Test Automation (we call it HiVAT). As a culmination of that research, I was invited to attend the 12th Workshop On Teaching Software Testing. You can see all the public artifacts of that workshop at http://wtst.org/?page_id=12. There’s an even mix of teaching resources (including what I brought to the table) and research or just distilled rumination on the topic of HiVAT from a practitioner’s point of view.

There is also a lovely database of articles and dissertations on http://kaner.com/?p=163 that I spent most of my freshman year working on, all of which are related to software testing in some way, and many of which are HiVAT-related.

If I think of any others, I’ll drop them here after class.

Glad to see the “TEST ALL THE FLOATS” is actually a good idea. We do this for a bunch of our own math functions in our SIMD related project Now to see how good/bad we fair on this very specific functions.

Thanks for this insightful post

I’m amazed that people don’t try this exhaustive test strategy more often. There are many cases where one can test all possible values (not just floating point). I use an exhaustive test for a date conversion system (used in financial calculator). I use 32 bit values to store the Julian day number,. I go through every value converting day number to day/month/year and back again. I also check that each date succeeded the previous one correctly and that each day of the week is correct. I also check that all last working day, beginning of month and end of month calculations are correct for every possible day number. The whole test suite runs in about 2 minutes (includes some perpetual holiday tests) and is run as part of a general financial calculation library regression test.

I’m not a programmer any more, but 25 years ago I was helping port a large program, written in FORTRAN. Our first step was to upgrade to the latest compiler, but there were differences in the trigonometric functions (SIN, COS, TAN) at very small angles. Sounds trivial but in our program it made a difference, like when running a simulation for 5 (simulated) minutes at time steps of 0.002 seconds. Lots of fun to track down. Later we moved it from a Univac with 36-bit words to something else, and had to make sure we were getting similar results.

Them curly quotes.

Aw dang. Okay, fixed.

Maybe a newb comment, but where would you get a RefFunc from? Your “errors” would only be as good as your RefFunc . With every failed test the question is “Is this a problem in the code (SUT) or a problem in the test? “

In this case I had existing reference functions that were presumed to be correct. If the reference functions had errors then (the hope is) they would follow different patterns so that discrepancies would still be found and investigated. It is surprisingly common to have an existing reference function, either on another machine, or with a slower implementation. If the tests find bugs in both functions — better yet.

If not then see the link in Casey Doran’s comment — write one, using a different algorithm. It’s *so* worth it.

When I was rewriting the CRT math functions for a game console I found that the existing ones had several bugs. That complicated the testing but didn’t stop it (and I chose correctness over compatibility).

Pingback: How to verify that a 64-bit FPU works?CopyQuery CopyQuery | Question & Answer Tool for your Technical Queries,CopyQuery, ejjuit, query, copyquery, copyquery.com, android doubt, ios question, sql query, sqlite query, nodejsquery, dns query, update quer

Reblogged this on The Scoop.

Pingback: Michael Tsai - Blog - There’s Only Four Billion Floats—So Test Them All!

There

areonly four billion. Four billion is plural.Damn! You’re right. And the error is embedded in the URL so I really can’t change it. My grade six English teacher must be so ashamed of me right now.

unit test next post😉

You can correct the test to catch the -0.5 case by comparing the integer values:

if (testValue.i != refValue.i &&

Then it will also catch the case where floor(-0.0f) returns 0.0f.

Also, I corrected the functions in glsl-sse2, maybe this weekend I will do some more througout range testing, at least for the 1-param methods.

That change (comparing the integer representations) is exactly what the full version of my test does. It’s in the sample code (linked at the end).

That’s awesome that you’re fixing up glsl-sse2 — very cool.

Pingback: Январская лента: лучшее за месяц

I guess four billion is not a small amount to test Thanks Javin

It’s plenty small enough to make it practical, which is what really matters.

Pingback: Floating-Point Determinism | Random ASCII

Fix for XMVectorRound is on my blog at and will be in a future SDK update.

Pingback: Known Issues: DirectXMath 3.06 - Games for Windows and the DirectX SDK - Site Home - MSDN Blogs

Pingback: Comparing Floating Point Numbers, 2012 Edition | Random ASCII

Pingback: Tests vs. Types - IT大道-IT大道

Pingback: floats, bits, and constant expressions – hbr

Pingback: Floats, Bits, And Constant Expressions | 神刀安全网

I’ve used a different, hopefully more portable method to iterate over all floats, using only floats. See my first comment in http://math.stackexchange.com/questions/907327/accurate-floating-point-linear-interpolation/

I’m confused. That question seems to be talking about linear interpolating between two floats. Iterating over *all* the floats is not linear interpolation – not in the values of the floats anyway. It’s more like exponential iteration.

If you think you’ve got floating-point only code that iterates over all of the floats then it’s easy enough to test, using the techniques in this post. It sounds like a tricky problem, and for a test harness I’d rather have the certainty of just incrementing the underlying representation – it’s trivially obvious that that works to generate all floats.

Sorry for the confusion. The method is used in this program: http://www.formauri.es/personal/pgimeno/pastes/monotonictest.c and referenced (together with an explanation of how it’s used) in the first of my comments to the OP’s question. I didn’t mean the question itself, but the comments section.

The method basically consists of incrementing the float in 1 ulp intervals, starting with the lowest subnormal, until the incremented value equals the previous value, then it duplicates the ulp and goes on. In this instance it only iterates over all floats from 0 to 1, which is half the floats.

Okay, that link makes more sense. It would be more robust if the “inc *= 2.0;” was in a loop so that you could start anywhere – and a comment to explain the magic 1e-45 would kinda help

It’s good to see that you can increment through all the floats without using the integer representation, but I prefer my method. It feels like it uses less magic.

I understand, it’s just that I cringe when I rely on things that the C standard doesn’t promise (like existence of uint32_t, or that the endianess of a float matches that of an integer, even if both are safe assumptions in most of today’s computers). Relying solely on the IEEE single-precision semantics seems a little bit more robust to me for that reason.

I agree that turning that method into a function of the kind you present here would be useful. The focus of the program was pretty narrow and I didn’t need anything more elaborate at the time.