Please Calculate This Circle’s Circumference

“Please write a C++ function that takes a circle’s diameter as a float and returns the circumference as a float.”

It sounds like the sort of question you might get in the first week of a C++ programming class. And yet. This question is filled with subtlety if you dig into it. Let’s try some solutions.

Updated June 27 to add a link to a sample program, and instructions on how to bloat the executable that it creates by 20 KiB. Updated July 2 to add ‘Take seven’.

Student: How about this?

#include <math.h>
float CalcCircumference1(float d)
{
    return d * M_PI;
}

Teacher: This code may compile. But it may not. For M_PI is not part of the C or C++ standards. If you compile with VC++ 2005 then this will work, but with later versions you need to #define _USE_MATH_DEFINES prior to including math.h in order to request these non-standard constants. And, having done that you will have written code which may well not compile on other compilers.

Take two

Student: Thank you for your wisdom teacher. I have removed the dependency on the non-standard M_PI constant. How about this?

float CalcCircumference2(float d)
{
    return d * 3.14159265358979323846;
}

Teacher: That’s better. This code will compile and will work as intended. But it is inefficient. You are multiplying a float by a double-precision constant. The compiler will have to convert the float input to a double, and then convert the result from a double to a float. If you compile for SSE2 then this adds two instructions to the dependency chain and may triple the latency! In many contexts this extra cost will not matter, but in an inner loop it can be quite significant.

If you compile for x87 then the conversion to double is free but the conversion to float is expensive – so expensive that some optimizers may omit the conversion to double which can lead to surprising results, such as CalcCircumference(r) == CalcCircumference(r) returning false!

Take three

Student: Thank you for wisdom teacher. I don’t know what SSE2 or x87 are but I see the elegance and poetry of keeping my types consistent. I will use a float constant. How about this, says the student?

float CalcCircumference3(float d)
{
    return d * 3.14159265358979323846f;
}

Teacher: Ah, well done. That ‘f’ at the end of the constant makes all the difference. If you were to look at the generated machine code you would see that it is leaner and cleaner. However I have stylistic objections. Does it not seem sloppy to have this cryptic constant embedded in the function? Even though ‘pi’ is unlikely to change it would be cleaner and less error prone to give it a name and put it in a header file.

Take four

Student: Thank you teacher. This wisdom is much easier to understand. I will put the line of code below in a shared header file, and use it in my function. How about this, says the student?

const float pi = 3.14159265358979323846f;

Teacher: Well done. By using the ‘const’ keyword you have both indicated that the variable cannot and should not be modified, and you have allowed it to be placed in a header file. But, I’m afraid we must now delve into some subtleties of the C++ scope rules.

By marking ‘pi’ as const you are also implicitly marking it as static. This is fine for integral types, but for non-integral types (float, double, array, class, struct) there may be storage allocated for this variable, potentially in every translation unit that includes your header file. In come cases you may end up with dozens or even hundreds of copies of the float, thus bloating your executable.

Take five

Student: Are you frickin’ kidding me? Now what?

Teacher: Yeah, it’s a bit of a mess. You could tag your variable with __declspec(selectany) or __attribute__(weak) in order to tell VC++ and gcc respectively that it is okay to just retain one of the many copies of the constant. But since we are in the idealistic world of academia right now I’m going to insist that you stick to standard C++ constructs.

Take six

Student: You mean like this? Using the C++11 constexpr?

constexpr float pi = 3.14159265358979323846f;

Teacher: Yes. Your code is now perfect. Of course it won’t compile with VS 2013 because that compiler doesn’t support constexpr. But you can always use the Visual C++ Compiler Nov 2013 CTP toolset, or wait for Dev 14. Or use recent versions of gcc or clang.

Student: Can I use a #define?

Teacher: No!

Student: Screw this. I’m quitting school to become a barista.

Take seven (newly added July 2, 2014)

Student: Wait, I just remembered. This is easy. I just have to do this:

mymath.h:
extern const float pi;

mymath.cpp:
extern const float pi = 3.14159265358979323846f;

Teacher: Indeed that is the correct solution in many cases. But what if you are building a DLL, and mymath.h is included by functions outside of that DLL? Now you have to deal with the complexity and cost of exporting and importing this symbol.

Ultimately the  problem is the confusion caused by the rules being totally different for integral types. It is appropriate and recommended to put this in a C++ header file:

const int pi_i = 3;

It’s not a very accurate version of pi, but the point is that integral constants in header files don’t allocate storage, whereas non-integral constants do. This distinction is poorly understood, and occasionally important.

I learned the implied ‘static’ in ‘const’ a few years ago when I was asked to investigate why one of our key DLLs had suddenly gotten 2 MB larger. It turns out there was a const array in a header file and we had thirty copies of it in the DLL. So sometimes it does matter.

And yes, I still think that using a #define is a terrible solution. It may be the least-worst solution, but that makes me unhappy. I once dealt with compile errors caused by a #define of ‘pi’ and they did not make me happy. Namespace pollution is the main reason why #define should be avoided as much as possible.

Conclusion

I’m not sure what the lesson is here. The problems of putting const float (or const double, or const structures or arrays) in header files is not well understood. Most large programs have duplicate static const variables because of this, and sometimes they are of non-trivial size. I think that constexpr solves this but I haven’t used it enough to be certain.

I have seen programs waste hundreds of KB because of a const array defined in a header file. I have also seen a program that ended up with 50 copies of a class object (plus 50 constructors and destructors) because it was defined as const in a header file. Something to be aware of.

You can see that this happens with gcc by downloading a test program here. Build it with make and then run “objdump -d constfloat | grep flds” to see the four loads from four adjacent data segment addresses. FWIW. If you want to waste more space then add this to header.h:

const float sinTable[1024] = { 0.0, 0.1, };

With gcc this will waste 4 KiB per translation unit (source file) for a total of 20 KiB of bloat in the final executable, even though the table is never referenced.

As usual, floating-point math is full of complexities, although in this case I think the C++ language’s slow evolution is more to blame.

Some more reading:

http://stackoverflow.com/questions/3709207/c-semantics-of-static-const-vs-const

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Floating Point, Programming and tagged , , , . Bookmark the permalink.

46 Responses to Please Calculate This Circle’s Circumference

  1. alandovos says:

    Am I the only one that was reminded of https://www.youtube.com/watch?v=r_pqnsKWlpc?

  2. If you declare your variable as extern const, and add the definition (and value) in a cpp file, I guess you avoid the redundant storage allocation ? I’m not sure if this is right.
    It also reduces compilation time in case you have to modify its value.

    • brucedawson says:

      Yep, that works. Maybe I should have listed that solution, but honestly I don’t like it. Or, perhaps more accurately, I don’t like that it might be the best solution.

      It has two problems. One is that it is inconsistent with the best-practices for a const int, where putting them in the header file is recommended. The second is it creates a need for an appropriate .cpp file to put it in. Not necessarily a big deal, but additional complexity when you just want a float constant.

      I guess ultimately it is the inconsistency with the advice for ‘const int’ that makes it complicated.

      • Godji says:

        This is the solution we tend to use ourselves, but your exchange maked me realize that what I am missing is the other side of the issue: besides the first part of your post about the implicit double/flaot conversions, when it comes to declaring your constant the difference between int and float types eludes me.

  3. bittermanandy says:

    They keep improving C++, but it’s ultimately still as much of a mess as it ever was.

  4. peter says:

    What’s wrong with a #define?

    • brucedawson says:

      The biggest problem with a define is namespace pollution. I worked on a project that had a #define pi once and it would occasionally cause horrible build breaks — a function parameter named pi conflicted with it, for instance.

      All uppercase and verbose names are at less risk, but #defines are ugly, ugly, ugly.

  5. If you care about executable size, you must also take into account code size. If you use an inline float literal, what will that actually compile to? For me, on x86_64, compiling with clang++ (-O3), it sticks the constant into the .rodata section and refers to it (using RIP-relative addressing) from the mulss instruction (the function itself of course is only two instructions: mulss, ret).

    In my simple test, clang did not coallesce identical constants within the translation unit. In fact, it did the opposite: the const float twopi value at the top of the translation unit was pulled into the functions that used it, so it behaved exactly the same as the inline float literal version (ie, each function got its own copy of the constant).

    Using @CarniBlood’s suggestion of declaring an extern const float (I didn’t bother providing a definition since I was only compiling, not linking) was better: each instance of the function using that extern float produced a mulss that referred to the same data.

    Anyway, the point is: Yes, there may be storage allocated for static const float values in translation units, but that is *also* true (in practice) for inline float literals. Those four bytes have to be put *somewhere*, whatever you do. If space is a concern, your best bet, as usual, is to examine your actual compiler output (and consider using extern).

    • jpabartholomew says:

      Tried with g++ -O3: GCC was smart enough to coallesce the identical constants (both the named constant and inline float literal) into a single value in the .rodata section.

      • brucedawson says:

        Are you accusing GCC of being nonconformant? I believe that the compiler is *required* to keep the constants separate.

        Anyway, I did test with gcc. I had four copies of CalcCirc in four different source files, all including the same header file that defined pi_f. I then ran “objdump -d constfloat | grep flds” and observed that the four copies loaded the const float from four different data segment addresses. The four copies of the const float were grouped together, far from the machine code. I’m not sure why the float constants were spaced eight bytes apart — that’s odd.

        • jpabartholomew says:

          This may have been ambiguity on my part: I mean it coalesced (bit-identical) float constants within a single translation unit. I hadn’t tested multiple TUs.

          Where does the standard say it must keep constants separate?

          • brucedawson says:

            I am moderately certain that the standard says that the constants must be kept separate, but I was unable to find any language in the standard that makes that clear. There are many interacting concepts and I gave up. So, I might be wrong, but if VC++ and gcc both create duplicates and require special options to override this then the standard probably requires them to be separate.

      • Jason Schulz says:

        Without the optimization options, and just ‘-flto’ g++ also rolls the constant into .rodata. I tried clang++ as well, and it even rolled pre-computed values into the .rodata section (whole program). I don’t have a copy of VS, so I’m not sure whether it would do the same, but if I had to hazard a guess, I’d guess it would.

        • brucedawson says:

          I added a link to a test project for gcc. Give it a try?

          • Jason Schulz says:

            Odd, I used circumference and area, but the same diameter :).

            With the compilers on my Linux box (GCC 4.8.1, Clang 3.4.1), g++ rolls the pi_f constant into the .rodata section, and does an instruction relative movss…

            objdump -d

            0000000000400663 :
            400663: 55 push %rbp
            400664: 48 89 e5 mov %rsp,%rbp
            400667: f3 0f 11 45 fc movss %xmm0,-0x4(%rbp)
            40066c: f3 0f 10 4d fc movss -0x4(%rbp),%xmm1
            400671: f3 0f 10 05 53 01 00 movss 0x153(%rip),%xmm0 # 4007cc
            400678: 00
            400679: f3 0f 59 c1 mulss %xmm1,%xmm0
            40067d: f3 0f 11 45 f8 movss %xmm0,-0x8(%rbp)
            400682: 8b 45 f8 mov -0x8(%rbp),%eax
            400685: 89 45 f8 mov %eax,-0x8(%rbp)
            400688: f3 0f 10 45 f8 movss -0x8(%rbp),%xmm0
            40068d: 5d pop %rbp
            40068e: c3 retq

            clang++ stores a pre-computed value in the .rodata section and does an instruction relative movsd before the call to printf…

            objdump -d

            00000000004005b0 :
            4005b0: 55 push %rbp
            4005b1: 48 89 e5 mov %rsp,%rbp
            4005b4: f2 0f 10 05 dc 00 00 movsd 0xdc(%rip),%xmm0 # 400698
            4005bb: 00
            4005bc: bf a0 06 40 00 mov $0x4006a0,%edi
            4005c1: b0 01 mov $0x1,%al
            4005c3: e8 e8 fe ff ff callq 4004b0
            4005c8: bf a0 06 40 00 mov $0x4006a0,%edi
            4005cd: b0 01 mov $0x1,%al
            4005cf: f2 0f 10 05 c1 00 00 movsd 0xc1(%rip),%xmm0 # 400698
            4005d6: 00
            4005d7: e8 d4 fe ff ff callq 4004b0

            So, at least at first glance it looks like LTO might be an option. I’m not sure how it scales relative to the size of the codebase though.

            (sorry for the copious amounts of text)

        • Jason Schulz says:

          Just to expand a bit, neither compiler will coalesce the constants wihtout explicitly specifying LTO. If no optimization level is specified at link time, GCC will take the higher TU optimization level, and generally stricter code generation option, but it doesn’t enable any additional optimization options with ‘-flto’ (https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Optimize-Options.html#Optimize-Options -flto). So, it looks like coalescing is part of its intermediate merge phase.

          Clang in particular also won’t pre-compute the value without LTO being specified. So it looks ‘-flto’ may imply an additional optimization option, or Clang does pre-computing as part of its intermediate merge phase. I haven’t tested it yet, but It might be interesting to see if it will also store pre-computed library function calls.

      • jpabartholomew says:

        Ok, more information. I tried Bruce’s provided test project and fiddled with different optimisation options. Rather than try to describe it I’ll just point to the documentation: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

        Specifically, look up -fmerge-constants (which is enabled automatically with -O1 and higher), and -fmerge-all-constants (which is not enabled automatically and is documented as being non-conforming).

        • jpabartholomew says:

          Of course, this relies on the linker to merge identical values from different object files. The ELF object format includes a section flag (‘M’) to indicate that the section is mergeable ( https://sourceware.org/binutils/docs/as/Section.html ). Clang also puts global static float constants into a mergeable rodata section when optimisations are turned on, so it produces the same final output, although (unlike gcc), clang doesn’t appear to merge constants itself within the TU, it just leaves multiple copies for the linker to clean up, which is why I thought it wasn’t coalescing them.

      • jpabartholomew says:

        And even more:

        If your code takes the address of the constant, then the compiler is no longer allowed to merge it with others, so you’ll get a separate copy.

        Really all that’s happening (I think) is that when you use the value, gcc will inline it and (because it still has to put the number somewhere) put one copy in a mergeable rodata section; the linker will eliminate duplicate values across translation units. If you *only* use the value and never use the address of the value, then the original constant is dropped entirely, leaving you with just the one copy in the executable. The original constant itself is not mergeable.

        • brucedawson says:

          Just for a test I added const float sinTable[1024] = { 0.0, 0.1, }; to header.h in my sample code. This declares a static const array of size 4 KiB. It is never used but it still gets put in every translation unit and bloats the executable by 20 KiB. Fascinating.

          • jpabartholomew says:

            For gcc you can pass -fno-keep-static-consts to get rid of it, which I guess makes sense if not all of the files that include the header actually use the table. I don’t know if that’s turned on by any of the normal optimisation levels, or how it relates to what I found (or thought I found) earlier w.r.t gcc dropping unused static const float vars. You’ll still end up with one copy of the table for every file that references the thing though. I didn’t find any way to convince gcc to merge multiple copies of the table, even with -flto and -fmerge-all-constants.

            Though for big tables like that perhaps making them extern and isolating the data in a single translation unit is less onerous.

          • brucedawson says:

            I would absolutely agree that putting the table in a .cpp file is the correct thing to do. And, that is the correct thing to do with any C or C++ compiler.

    • brucedawson says:

      Excellent analysis about executable size. Yes, having the constant adjacent to the code may end up being more efficient overall.

      The times that I’ve been really bitten by the duplicated instances is with other types — structures and arrays that were defined in header files. Then the duplicates become expensive. In one case I found a class object, with constructor and destructor, defined as const in a header file. I found it because I noticed 50 copies of the constructor, one for each copy of the object! That was a waste.

  6. Hoon says:

    Bad teacher. Teach early optimization first. :)

    Anyway, really do compilers double to float conversion at run-time? I thought most compilers would take care them implicitly.

    • brucedawson says:

      If the programmer requests double precision math (with later rounding to float) then the compiler has to do that — otherwise the compiler has dangerously changed the semantics of the program. That is, if you use a double-precision constant then you are requesting double-precision math and the compiler must honor that.

  7. Kat Marsen says:

    It’s a mess. In C++, this will do the trick, at the expense of “looking” expensive (or more to the point, not looking like a constant):

    inline float FloatPi() { return 3.14159265358979323846f; }

    I have the same problems with enumerations… that they’re not any particular type, and are frequently signed, means mixing an enum with a size_t will garner all sorts of compiler warnings. You can cast the warning away, if you don’t mind typing…

    enum { HeaderSize = 20, };
    if (m.size() < static_cast(HeaderSize)) …

    But const size_t HeaderSize = 20; would be so much better.

    • brucedawson says:

      I dislike the inline function because then debug builds are likely to be slower. I’ve actually done some work to make our debug builds as fast as possible while still being non-optimized for easy debugging. Having games playable in debug builds is wonderful.

  8. Ahmed Fasih says:

    Discussion with a colleague identifies another problem with Take One: VS will complain about loss of precision when downcasting M_PI (a double) to float.

  9. FergoTheGreat says:

    What about the C++14 way?

    template
    constexpr T M_PI = T(3.1415926535897932);

    constexpr float CalcCircumference3(float d)
    {
    return d * M_PI;
    }

  10. With a compiler with no constpexpr one can define large (for suitable value of “large”) constants in header files either like Meyers’ singletons in inline functions, or via the templated constant trick. Still for π I would use an #ifdef and then M_PI. It’s a shame that apparently nobody on the committee is interested in standardizing existing practice.

  11. Why not just use “static const float pi” as your take six?

    • brucedawson says:

      The ‘static’ is redundant. At global scope ‘const’ already implies static, so adding static doesn’t change anything. It still leaves the problem of getting multiple copies of the float. This isn’t so terrible for a float, but for larger data types it can lead to a lot of wasted space in the executable.

  12. bilbothegravatar says:

    You know, that’s interesting (and annoying) and all, but …

    “Most large programs have duplicate static const variables because of this, and sometimes they are of non-trivial size. …” / “… I have seen programs waste hundreds of KB because of a const array defined in a header file. …”

    I’d claim that to waste any non-trivial amout of space, your program must already be of very non-trivial size. So with the program itself already being “large”, the wasted size should then be relatively small again. (OK, maybe there are some large array edge cases, and the thing with the 50 objects is certainly worth knowing.)

    If constexpr isn’t the solution, then what is? Shouldn’t the Linker-Optimizer already be good enough to remove all that fluff without any additional declspec magic?

    As always, more questions’n answers.

    • brucedawson says:

      The case where I saw lots of waste (a couple of MB if I recall correctly) was because somebody put a const array definition in a header file, and many copies of it got linked in. This could happen on a project of any size, and it was a noticeable percentage of the DLL size.

      The solution in general is to put an extern declaration in the header file and put the definition in a .cpp file, just like with normal variables.

      I should have listed that as one of the solutions, but then I wouldn’t have had as much fun with the wrapup. Tradeoffs.

  13. I was wondering if nesting the constant in a namespace would yield better results, as the constant would not be at “global scope” strictly speaking. Unsurprisingly, it doesn’t.

  14. Wouldn’t guarding a collection of constants in a header prevent the space being reallocated for each variable?

    e.g.
    #ifndef MASSIVELY_LONG_AND_UNIQUE_DEFINE_NAME
    #define MASSIVELY_LONG_AND_UNIQUE_DEFINE_NAME
    //…
    const float pi = 3.14159265358979323846f;
    const int meaningOfLife = 42;
    //…
    #endif

    • brucedawson says:

      The whole purpose of a header file is to be included, and parsed, from multiple translation units. Each translation unit that sees “const float pi = …” is likely to allocate a separate copy of storage for it. That is the problem. If you have “const float sinTable[1024] = …” then it is a much bigger problem.

      So, include guards don’t change things at all.

      No storage is allocated for the “const int meaningOfLife = …”, ’cause that’s what the standard says.

  15. Jeremy Laumon says:

    I just discovered a related sad story. In our engine we have a few other global constants in addition to PI, with this kind of declaration:
    const float MTH_PIBY2 = MTH_PI / 2.f;
    const float MTH_DEGTORAD = MTH_PI / 180.f;
    When compiled with no optimization (with VC++ 2012), each of these declarations actually generates a dynamic initialization function. And these dynamic initializers apparently get called in every .cpp where the header is included, even if the constant is never used.
    A simple breakpoint with a trace shows that every one of these initializers were called about 300 times at launch.

    For PI related constants, it’s not a big problem, those were silly constants anyway. But we also have many gameplay constants in one of our game where this kind of dependency particularly useful. And if we move those values to a cpp, they will be separated from the int constants and it could also potentially break some constant propagation in release builds, which is not great.

    Let’s hope constexpr becomes widely supported soon!

    • brucedawson says:

      The fun thing is that you are probably getting many of those redundant dynamic initializations in your release builds as well. It’s not the end of the world, but it is ugly if nothing else. I agree that it is disappointing to have to separate the float constants from the integer constants.

  16. Ahmed Saleh says:

    Well, the if high level languages compilers are producing problems, we could use the FPU of the processors and assembly and just write the function at the lowest level.

    • Ahmed Saleh says:

      Something like that would work,
      float f_pi = 3.14159265358979323846f;
      float f_circum;
      float f_radius;
      _asm {
      FLD f_pi
      FMUL f_radius
      FSTP f_circum
      // the stack
      } // end asm

  17. Ahmed Saleh says:

    All the cases that you have mentioned would really make big problems on Embedded Systems :/, especially low end microcontroller…

  18. Anonymous says:

    Go back to binary. Create a sequence of shifts and adds that are equivalent to xPi. Encode this as a binary string, and write a wee feisty engine loop to process it.

  19. ayidi says:

    PhysX header files have this problem (Take One). It can be a pain to compile the library.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s