Please Calculate This Circle’s Circumference

“Please write a C++ function that takes a circle’s diameter as a float and returns the circumference as a float.”

It sounds like the sort of question you might get in the first week of a C++ programming class. And yet. This question is filled with subtlety if you dig into it. Let’s try some solutions.

Updated June 27 to add a link to a sample program, and instructions on how to bloat the executable that it creates by 20 KiB. Updated July 2 to add ‘Take seven’. Updated Jan 5, 2015 to add VC++ section at the end.

Student: How about this?

#include <math.h>
float CalcCircumference1(float d)
{
    return d * M_PI;
}

Teacher: This code may compile. But it may not. For M_PI is not part of the C or C++ standards. If you compile with VC++ 2005 then this will work, but with later versions you need to #define _USE_MATH_DEFINES prior to including math.h in order to request these non-standard constants. And, having done that you will have written code which may well not compile on other compilers.

Take two

Student: Thank you for your wisdom teacher. I have removed the dependency on the non-standard M_PI constant. How about this?

float CalcCircumference2(float d)
{
    return d * 3.14159265358979323846;
}

Teacher: That’s better. This code will compile and will work as intended. But it is inefficient. You are multiplying a float by a double-precision constant. The compiler will have to convert the float input to a double, and then convert the result from a double to a float. If you compile for SSE2 then this adds two instructions to the dependency chain and may triple the latency! In many contexts this extra cost will not matter, but in an inner loop it can be quite significant.

If you compile for x87 then the conversion to double is free but the conversion to float is expensive – so expensive that some optimizers may omit the conversion to double which can lead to surprising results, such as CalcCircumference(r) == CalcCircumference(r) returning false!

Take three

Student: Thank you for wisdom teacher. I don’t know what SSE2 or x87 are but I see the elegance and poetry of keeping my types consistent. I will use a float constant. How about this, says the student?

float CalcCircumference3(float d)
{
    return d * 3.14159265358979323846f;
}

Teacher: Ah, well done. That ‘f’ at the end of the constant makes all the difference. If you were to look at the generated machine code you would see that it is leaner and cleaner. However I have stylistic objections. Does it not seem sloppy to have this cryptic constant embedded in the function? Even though ‘pi’ is unlikely to change it would be cleaner and less error prone to give it a name and put it in a header file.

Take four

Student: Thank you teacher. This wisdom is much easier to understand. I will put the line of code below in a shared header file, and use it in my function. How about this, says the student?

const float pi = 3.14159265358979323846f;

Teacher: Well done. By using the ‘const’ keyword you have both indicated that the variable cannot and should not be modified, and you have allowed it to be placed in a header file. But, I’m afraid we must now delve into some subtleties of the C++ scope rules.

By marking ‘pi’ as const you are also implicitly marking it as static. This is fine for integral types, but for non-integral types (float, double, array, class, struct) there may be storage allocated for this variable, potentially in every translation unit that includes your header file. In come cases you may end up with dozens or even hundreds of copies of the float, thus bloating your executable.

Take five

Student: Are you frickin’ kidding me? Now what?

Teacher: Yeah, it’s a bit of a mess. You could tag your variable with __declspec(selectany) or __attribute__(weak) in order to tell VC++ and gcc respectively that it is okay to just retain one of the many copies of the constant. But since we are in the idealistic world of academia right now I’m going to insist that you stick to standard C++ constructs.

Take six

Student: You mean like this? Using the C++11 constexpr?

constexpr float pi = 3.14159265358979323846f;

Teacher: Yes. Your code is now perfect. Of course it won’t compile with VS 2013 because that compiler doesn’t support constexpr. But you can always use the Visual C++ Compiler Nov 2013 CTP toolset, or wait for Dev 14. Or use recent versions of gcc or clang.

Student: Can I use a #define?

Teacher: No!

Student: Screw this. I’m quitting school to become a barista.

Take seven (newly added July 2, 2014)

Student: Wait, I just remembered. This is easy. I just have to do this:

mymath.h:
extern const float pi;

mymath.cpp:
extern const float pi = 3.14159265358979323846f;

Teacher: Indeed that is the correct solution in many cases. But what if you are building a DLL, and mymath.h is included by functions outside of that DLL? Now you have to deal with the complexity and cost of exporting and importing this symbol.

Ultimately the  problem is the confusion caused by the rules being totally different for integral types. It is appropriate and recommended to put this in a C++ header file:

const int pi_i = 3;

It’s not a very accurate version of pi, but the point is that integral constants in header files don’t allocate storage, whereas non-integral constants do. This distinction is poorly understood, and occasionally important.

I learned the implied ‘static’ in ‘const’ a few years ago when I was asked to investigate why one of our key DLLs had suddenly gotten 2 MB larger. It turns out there was a const array in a header file and we had thirty copies of it in the DLL. So sometimes it does matter.

And yes, I still think that using a #define is a terrible solution. It may be the least-worst solution, but that makes me unhappy. I once dealt with compile errors caused by a #define of ‘pi’ and they did not make me happy. Namespace pollution is the main reason why #define should be avoided as much as possible.

Conclusion

I’m not sure what the lesson is here. The problems of putting const float (or const double, or const structures or arrays) in header files are not well understood. Most large programs have duplicate static const variables because of this, and sometimes they are of non-trivial size. I think that constexpr solves this but I haven’t used it enough to be certain.

I have seen programs waste hundreds of KB because of a const array defined in a header file. I have also seen a program that ended up with 50 copies of a class object (plus 50 constructors and destructors) because it was defined as const in a header file. Something to be aware of.

You can see that this happens with gcc by downloading a test program here. Build it with make and then run “objdump -d constfloat | grep flds” to see the four loads from four adjacent data segment addresses. FWIW. If you want to waste more space then add this to header.h:

const float sinTable[1024] = { 0.0, 0.1, };

With gcc this will waste 4 KiB per translation unit (source file) for a total of 20 KiB of bloat in the final executable, even though the table is never referenced.

As usual, floating-point math is full of complexities, although in this case I think the C++ language’s slow evolution is more to blame.

Some more reading:

http://stackoverflow.com/questions/3709207/c-semantics-of-static-const-vs-const

VC++ – avoiding duplicates, and unavoidable duplicates

The /Gw compiler option that was introduced in VC++ 2013 Update 2 puts each global variable in its own ‘COMDAT’, which lets the linker discard duplicates. In some cases this avoids the costs of having const/static globals declared in header files. This switch saved about 600 KB in Chrome – see this change for details. Some of the savings was (surprise!) from removing thousands of copies of twoPiDouble and piDouble (and twoPiFloat and piFloat).

However the VC++ 2013 STL has several static or const inline objects that /Gw is unable to discard. These are all one-byte objects but in Chrome they add up to over 45 KB of waste! I filed a bug for this behavior and have been told that the issue is fixed in VC++ 2015.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Floating Point, Programming and tagged , , , . Bookmark the permalink.

75 Responses to Please Calculate This Circle’s Circumference

  1. alandovos says:

    Am I the only one that was reminded of https://www.youtube.com/watch?v=r_pqnsKWlpc?

  2. If you declare your variable as extern const, and add the definition (and value) in a cpp file, I guess you avoid the redundant storage allocation ? I’m not sure if this is right.
    It also reduces compilation time in case you have to modify its value.

    • brucedawson says:

      Yep, that works. Maybe I should have listed that solution, but honestly I don’t like it. Or, perhaps more accurately, I don’t like that it might be the best solution.

      It has two problems. One is that it is inconsistent with the best-practices for a const int, where putting them in the header file is recommended. The second is it creates a need for an appropriate .cpp file to put it in. Not necessarily a big deal, but additional complexity when you just want a float constant.

      I guess ultimately it is the inconsistency with the advice for ‘const int’ that makes it complicated.

      • Godji says:

        This is the solution we tend to use ourselves, but your exchange maked me realize that what I am missing is the other side of the issue: besides the first part of your post about the implicit double/flaot conversions, when it comes to declaring your constant the difference between int and float types eludes me.

  3. bittermanandy says:

    They keep improving C++, but it’s ultimately still as much of a mess as it ever was.

  4. peter says:

    What’s wrong with a #define?

    • brucedawson says:

      The biggest problem with a define is namespace pollution. I worked on a project that had a #define pi once and it would occasionally cause horrible build breaks — a function parameter named pi conflicted with it, for instance.

      All uppercase and verbose names are at less risk, but #defines are ugly, ugly, ugly.

      • ghastly310 says:

        “a function parameter named pi conflicted with it, for instance.”

        This is one of the reasons why some people have adopted certain naming conventions. For example, I write my constants in all-caps.

  5. If you care about executable size, you must also take into account code size. If you use an inline float literal, what will that actually compile to? For me, on x86_64, compiling with clang++ (-O3), it sticks the constant into the .rodata section and refers to it (using RIP-relative addressing) from the mulss instruction (the function itself of course is only two instructions: mulss, ret).

    In my simple test, clang did not coallesce identical constants within the translation unit. In fact, it did the opposite: the const float twopi value at the top of the translation unit was pulled into the functions that used it, so it behaved exactly the same as the inline float literal version (ie, each function got its own copy of the constant).

    Using @CarniBlood’s suggestion of declaring an extern const float (I didn’t bother providing a definition since I was only compiling, not linking) was better: each instance of the function using that extern float produced a mulss that referred to the same data.

    Anyway, the point is: Yes, there may be storage allocated for static const float values in translation units, but that is *also* true (in practice) for inline float literals. Those four bytes have to be put *somewhere*, whatever you do. If space is a concern, your best bet, as usual, is to examine your actual compiler output (and consider using extern).

    • jpabartholomew says:

      Tried with g++ -O3: GCC was smart enough to coallesce the identical constants (both the named constant and inline float literal) into a single value in the .rodata section.

      • brucedawson says:

        Are you accusing GCC of being nonconformant? I believe that the compiler is *required* to keep the constants separate.

        Anyway, I did test with gcc. I had four copies of CalcCirc in four different source files, all including the same header file that defined pi_f. I then ran “objdump -d constfloat | grep flds” and observed that the four copies loaded the const float from four different data segment addresses. The four copies of the const float were grouped together, far from the machine code. I’m not sure why the float constants were spaced eight bytes apart — that’s odd.

        • jpabartholomew says:

          This may have been ambiguity on my part: I mean it coalesced (bit-identical) float constants within a single translation unit. I hadn’t tested multiple TUs.

          Where does the standard say it must keep constants separate?

          • brucedawson says:

            I am moderately certain that the standard says that the constants must be kept separate, but I was unable to find any language in the standard that makes that clear. There are many interacting concepts and I gave up. So, I might be wrong, but if VC++ and gcc both create duplicates and require special options to override this then the standard probably requires them to be separate.

      • Jason Schulz says:

        Without the optimization options, and just ‘-flto’ g++ also rolls the constant into .rodata. I tried clang++ as well, and it even rolled pre-computed values into the .rodata section (whole program). I don’t have a copy of VS, so I’m not sure whether it would do the same, but if I had to hazard a guess, I’d guess it would.

        • brucedawson says:

          I added a link to a test project for gcc. Give it a try?

          • Jason Schulz says:

            Odd, I used circumference and area, but the same diameter :).

            With the compilers on my Linux box (GCC 4.8.1, Clang 3.4.1), g++ rolls the pi_f constant into the .rodata section, and does an instruction relative movss…

            objdump -d

            0000000000400663 :
            400663: 55 push %rbp
            400664: 48 89 e5 mov %rsp,%rbp
            400667: f3 0f 11 45 fc movss %xmm0,-0x4(%rbp)
            40066c: f3 0f 10 4d fc movss -0x4(%rbp),%xmm1
            400671: f3 0f 10 05 53 01 00 movss 0x153(%rip),%xmm0 # 4007cc
            400678: 00
            400679: f3 0f 59 c1 mulss %xmm1,%xmm0
            40067d: f3 0f 11 45 f8 movss %xmm0,-0x8(%rbp)
            400682: 8b 45 f8 mov -0x8(%rbp),%eax
            400685: 89 45 f8 mov %eax,-0x8(%rbp)
            400688: f3 0f 10 45 f8 movss -0x8(%rbp),%xmm0
            40068d: 5d pop %rbp
            40068e: c3 retq

            clang++ stores a pre-computed value in the .rodata section and does an instruction relative movsd before the call to printf…

            objdump -d

            00000000004005b0 :
            4005b0: 55 push %rbp
            4005b1: 48 89 e5 mov %rsp,%rbp
            4005b4: f2 0f 10 05 dc 00 00 movsd 0xdc(%rip),%xmm0 # 400698
            4005bb: 00
            4005bc: bf a0 06 40 00 mov $0x4006a0,%edi
            4005c1: b0 01 mov $0x1,%al
            4005c3: e8 e8 fe ff ff callq 4004b0
            4005c8: bf a0 06 40 00 mov $0x4006a0,%edi
            4005cd: b0 01 mov $0x1,%al
            4005cf: f2 0f 10 05 c1 00 00 movsd 0xc1(%rip),%xmm0 # 400698
            4005d6: 00
            4005d7: e8 d4 fe ff ff callq 4004b0

            So, at least at first glance it looks like LTO might be an option. I’m not sure how it scales relative to the size of the codebase though.

            (sorry for the copious amounts of text)

        • Jason Schulz says:

          Just to expand a bit, neither compiler will coalesce the constants wihtout explicitly specifying LTO. If no optimization level is specified at link time, GCC will take the higher TU optimization level, and generally stricter code generation option, but it doesn’t enable any additional optimization options with ‘-flto’ (https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Optimize-Options.html#Optimize-Options -flto). So, it looks like coalescing is part of its intermediate merge phase.

          Clang in particular also won’t pre-compute the value without LTO being specified. So it looks ‘-flto’ may imply an additional optimization option, or Clang does pre-computing as part of its intermediate merge phase. I haven’t tested it yet, but It might be interesting to see if it will also store pre-computed library function calls.

      • jpabartholomew says:

        Ok, more information. I tried Bruce’s provided test project and fiddled with different optimisation options. Rather than try to describe it I’ll just point to the documentation: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

        Specifically, look up -fmerge-constants (which is enabled automatically with -O1 and higher), and -fmerge-all-constants (which is not enabled automatically and is documented as being non-conforming).

        • jpabartholomew says:

          Of course, this relies on the linker to merge identical values from different object files. The ELF object format includes a section flag (‘M’) to indicate that the section is mergeable ( https://sourceware.org/binutils/docs/as/Section.html ). Clang also puts global static float constants into a mergeable rodata section when optimisations are turned on, so it produces the same final output, although (unlike gcc), clang doesn’t appear to merge constants itself within the TU, it just leaves multiple copies for the linker to clean up, which is why I thought it wasn’t coalescing them.

      • jpabartholomew says:

        And even more:

        If your code takes the address of the constant, then the compiler is no longer allowed to merge it with others, so you’ll get a separate copy.

        Really all that’s happening (I think) is that when you use the value, gcc will inline it and (because it still has to put the number somewhere) put one copy in a mergeable rodata section; the linker will eliminate duplicate values across translation units. If you *only* use the value and never use the address of the value, then the original constant is dropped entirely, leaving you with just the one copy in the executable. The original constant itself is not mergeable.

        • brucedawson says:

          Just for a test I added const float sinTable[1024] = { 0.0, 0.1, }; to header.h in my sample code. This declares a static const array of size 4 KiB. It is never used but it still gets put in every translation unit and bloats the executable by 20 KiB. Fascinating.

          • jpabartholomew says:

            For gcc you can pass -fno-keep-static-consts to get rid of it, which I guess makes sense if not all of the files that include the header actually use the table. I don’t know if that’s turned on by any of the normal optimisation levels, or how it relates to what I found (or thought I found) earlier w.r.t gcc dropping unused static const float vars. You’ll still end up with one copy of the table for every file that references the thing though. I didn’t find any way to convince gcc to merge multiple copies of the table, even with -flto and -fmerge-all-constants.

            Though for big tables like that perhaps making them extern and isolating the data in a single translation unit is less onerous.

            • brucedawson says:

              I would absolutely agree that putting the table in a .cpp file is the correct thing to do. And, that is the correct thing to do with any C or C++ compiler.

    • brucedawson says:

      Excellent analysis about executable size. Yes, having the constant adjacent to the code may end up being more efficient overall.

      The times that I’ve been really bitten by the duplicated instances is with other types — structures and arrays that were defined in header files. Then the duplicates become expensive. In one case I found a class object, with constructor and destructor, defined as const in a header file. I found it because I noticed 50 copies of the constructor, one for each copy of the object! That was a waste.

  6. Hoon says:

    Bad teacher. Teach early optimization first. 🙂

    Anyway, really do compilers double to float conversion at run-time? I thought most compilers would take care them implicitly.

    • brucedawson says:

      If the programmer requests double precision math (with later rounding to float) then the compiler has to do that — otherwise the compiler has dangerously changed the semantics of the program. That is, if you use a double-precision constant then you are requesting double-precision math and the compiler must honor that.

  7. Kat Marsen says:

    It’s a mess. In C++, this will do the trick, at the expense of “looking” expensive (or more to the point, not looking like a constant):

    inline float FloatPi() { return 3.14159265358979323846f; }

    I have the same problems with enumerations… that they’re not any particular type, and are frequently signed, means mixing an enum with a size_t will garner all sorts of compiler warnings. You can cast the warning away, if you don’t mind typing…

    enum { HeaderSize = 20, };
    if (m.size() < static_cast(HeaderSize)) …

    But const size_t HeaderSize = 20; would be so much better.

    • brucedawson says:

      I dislike the inline function because then debug builds are likely to be slower. I’ve actually done some work to make our debug builds as fast as possible while still being non-optimized for easy debugging. Having games playable in debug builds is wonderful.

  8. Ahmed Fasih says:

    Discussion with a colleague identifies another problem with Take One: VS will complain about loss of precision when downcasting M_PI (a double) to float.

  9. FergoTheGreat says:

    What about the C++14 way?

    template
    constexpr T M_PI = T(3.1415926535897932);

    constexpr float CalcCircumference3(float d)
    {
    return d * M_PI;
    }

  10. With a compiler with no constpexpr one can define large (for suitable value of “large”) constants in header files either like Meyers’ singletons in inline functions, or via the templated constant trick. Still for π I would use an #ifdef and then M_PI. It’s a shame that apparently nobody on the committee is interested in standardizing existing practice.

  11. Why not just use “static const float pi” as your take six?

    • brucedawson says:

      The ‘static’ is redundant. At global scope ‘const’ already implies static, so adding static doesn’t change anything. It still leaves the problem of getting multiple copies of the float. This isn’t so terrible for a float, but for larger data types it can lead to a lot of wasted space in the executable.

  12. bilbothegravatar says:

    You know, that’s interesting (and annoying) and all, but …

    “Most large programs have duplicate static const variables because of this, and sometimes they are of non-trivial size. …” / “… I have seen programs waste hundreds of KB because of a const array defined in a header file. …”

    I’d claim that to waste any non-trivial amout of space, your program must already be of very non-trivial size. So with the program itself already being “large”, the wasted size should then be relatively small again. (OK, maybe there are some large array edge cases, and the thing with the 50 objects is certainly worth knowing.)

    If constexpr isn’t the solution, then what is? Shouldn’t the Linker-Optimizer already be good enough to remove all that fluff without any additional declspec magic?

    As always, more questions’n answers.

    • brucedawson says:

      The case where I saw lots of waste (a couple of MB if I recall correctly) was because somebody put a const array definition in a header file, and many copies of it got linked in. This could happen on a project of any size, and it was a noticeable percentage of the DLL size.

      The solution in general is to put an extern declaration in the header file and put the definition in a .cpp file, just like with normal variables.

      I should have listed that as one of the solutions, but then I wouldn’t have had as much fun with the wrapup. Tradeoffs.

    • brucedawson says:

      I found these in MathExtras.h in webkit. In chrome_child.dll.pdb in Chrome these waste 41,148 bytes of data segment space. That amount of space is not going to change the world, and it is (as you observe) a small fraction of the total size. But still. Worth fixing I think:

      const double piDouble = M_PI;
      const float piFloat = static_cast(M_PI);

      const double twoPiDouble = piDouble * 2.0;
      const float twoPiFloat = piFloat * 2.0f;

  13. I was wondering if nesting the constant in a namespace would yield better results, as the constant would not be at “global scope” strictly speaking. Unsurprisingly, it doesn’t.

    • Michael Grier says:

      Putting them in an anonymous namespace would have the benefit of ensuring that if someone were so evil as to use the address-of operator (&) on the constant, at least the non-mergable section is limited to that one TU and also since the name is constrained to the TU, it’s possible that the compiler could prove that the address-of was not converted to non-const and so it could be mergable again.

  14. Wouldn’t guarding a collection of constants in a header prevent the space being reallocated for each variable?

    e.g.
    #ifndef MASSIVELY_LONG_AND_UNIQUE_DEFINE_NAME
    #define MASSIVELY_LONG_AND_UNIQUE_DEFINE_NAME
    //…
    const float pi = 3.14159265358979323846f;
    const int meaningOfLife = 42;
    //…
    #endif

    • brucedawson says:

      The whole purpose of a header file is to be included, and parsed, from multiple translation units. Each translation unit that sees “const float pi = …” is likely to allocate a separate copy of storage for it. That is the problem. If you have “const float sinTable[1024] = …” then it is a much bigger problem.

      So, include guards don’t change things at all.

      No storage is allocated for the “const int meaningOfLife = …”, ’cause that’s what the standard says.

  15. Jeremy Laumon says:

    I just discovered a related sad story. In our engine we have a few other global constants in addition to PI, with this kind of declaration:
    const float MTH_PIBY2 = MTH_PI / 2.f;
    const float MTH_DEGTORAD = MTH_PI / 180.f;
    When compiled with no optimization (with VC++ 2012), each of these declarations actually generates a dynamic initialization function. And these dynamic initializers apparently get called in every .cpp where the header is included, even if the constant is never used.
    A simple breakpoint with a trace shows that every one of these initializers were called about 300 times at launch.

    For PI related constants, it’s not a big problem, those were silly constants anyway. But we also have many gameplay constants in one of our game where this kind of dependency particularly useful. And if we move those values to a cpp, they will be separated from the int constants and it could also potentially break some constant propagation in release builds, which is not great.

    Let’s hope constexpr becomes widely supported soon!

    • brucedawson says:

      The fun thing is that you are probably getting many of those redundant dynamic initializations in your release builds as well. It’s not the end of the world, but it is ugly if nothing else. I agree that it is disappointing to have to separate the float constants from the integer constants.

  16. Ahmed Saleh says:

    Well, the if high level languages compilers are producing problems, we could use the FPU of the processors and assembly and just write the function at the lowest level.

    • Ahmed Saleh says:

      Something like that would work,
      float f_pi = 3.14159265358979323846f;
      float f_circum;
      float f_radius;
      _asm {
      FLD f_pi
      FMUL f_radius
      FSTP f_circum
      // the stack
      } // end asm

  17. Ahmed Saleh says:

    All the cases that you have mentioned would really make big problems on Embedded Systems :/, especially low end microcontroller…

    • … Where you’re not supposed to use anything but fixed precision maths.

      At least that’s how I do it. Of course that requires a lot more thought about the range of possible input values and worrying about execution order to prevent overflow, than using float/double would.

  18. Anonymous says:

    Go back to binary. Create a sequence of shifts and adds that are equivalent to xPi. Encode this as a binary string, and write a wee feisty engine loop to process it.

  19. ayidi says:

    PhysX header files have this problem (Take One). It can be a pain to compile the library.

  20. I have an application with a const double PI in a header (I actually choose the same amount of significant digits even though 17 would be sufficient).

    I’m compiling with -std=c++11 and I cannot reproduce the issue there. The constant is used in two .cpp files, the header is included virtually everywhere.

    I moved the const into the .cpp files where it’s used and removed it from the header and compiled it using:
    – clang++ -O2 -std=c++11
    – clang++ -O1 -std=c++11
    – clang++ -O0 -std=c++11
    – clang++ -Os -std=c++11
    – g++49 -O2 -std=c++11

    In all cases identical binaries were produced (according to sha256). I’m developing on FreeBSD, so there’s no chance I’m going to check this with VC++.

    I haven’t read anything like that in Stroustrup’s C++11 FAQ, so I don’t know whether it’s compiler specific behaviour or was addressed in the C++11 standard.

    I also changed the type to float (I have a typedef scalar that I use for everything dealing with vertices and transformation, so I can switch between float and double on anything but the public interfaces). With floats I get the same behaviour, i.e. putting the const float into the header produces exactly the same code.

    • brucedawson says:

      When you say “I cannot reproduce the issue there” what do you mean? Are all references to the PI constant coming from the same address? How have you verified that? Without more details I can’t tell if you can’t detect the problem or if it is not there.

      Some linkers may optimize away the duplicate references (although doing so may be noncomformant), but the compiler will necessarily produce them.

      • When I check an .o file using the Header, but not the PI constant, it has exactly the same checksum, no matter whether I have PI in the header or not.

        • brucedawson says:

          Putting PI in the header does not necessarily add cost. Here’s the test: have const PI in a header file. Reference PI from two source files — they will necessarily have references to the constant. Link the source files together (take care that the referencing functions are not discarded). See if the two PI references are to the same address (efficient, but possibly not conformant) or to different addresses (inefficient).

          The same issue happens on a larger scale if you have a const array, struct, or class in a header file — wastage from the duplication when multiple translation units reference the const object.

          • The PI constant is inlined like an integer, not referenced.

            I haven’t tried that with a struct or array. I don’t think structs or arrays would be inlined because they may be larger than a pointer.

            • brucedawson says:

              The x86/x64 instruction sets do not allow a float or double constant to be inlined like an integer. You’re going to have to be more precise because I can’t tell what you’re talking about. Useful information would include the relevant source code from your two source files that reference the PI constant, and the machine code for those functions from the final executable. Then we can see the references to PI and whether they are the same or different. If there are no references to PI then you are testing something different.

      • I cannot find any reference to the PI symbol using objdump or readelf.

        I can find lots of inline occurences of 18 2d 44 54 fb 21 09 40 (little endian) in the .o file that uses the constant. None in the one that doesn’t.

  21. Name Required says:

    my_header.h:
    inline double pi_d() { return 3.1415….; }
    inline float pi_f() { return 3.1415….f; }

    • brucedawson says:

      That does work. I’m dissatisfied by the overhead this causes on non-optimized builds — if they grow too slow then people will not be able to use them — but otherwise it’s fine.

  22. I’m surprised the discussion is still ongoing since I posted here how to do implement this constant with no overhead, at the end of June.

    Maybe I should have posted code, but this is well known.

    In order to create an external linkage constant that is known to the compiler and OK with the One Definition Rule, place this in your header:

    template< class Dummy >
    struct Math_
    {
        static float const pi_f;
    };
    
    template< class Dummy >
    float const Math_<Dummy>::pi_f = 3.14;
    
    using Math = Math_<void>;    // Or for C++03, `typedef Math_<void> Math`
    

    Now you can write Math::pi_f.

    Disclaimer: code not touched by compiler’s hands.

    • brucedawson says:

      Student: wait really? I have to use eight lines of template magic to efficiently define a float constant? My barista comment stands. If this technique is well known then that suggests that the problem is well known which suggests (to me) that a better solution would sure be nice, like having ‘const float pi_f = …;’ being efficient.

      • Maybe we/you could write up a proposal for inline data. It could be defined by simple transformation to the template magic shown. Three problems: (1) every proposal needs someone to champion it in the committee, (2) every proposal needs an existing implementation, and preferably usage experience, and while implementing such a feature is probably trivial with dang compared to old times’ K&R C gcc source code (yes it was really old 1970’s K&R C!), still it’s a lot of work, and (3) regarding the rationale I don’t think memory consumption for internal linkage float type constants is going to cut it. Still, there is the consistency thing, that having inline for functions and no such thing for data, while at the same time having an ODR exemption for templates that requires every compiler to implement the necessary support machinery, makes little sense. Maybe air the idea around Google folks?

        Cheers,

        – Alf

  23. larsschuetze says:

    Reblogged this on @larsschuetze and commented:
    A student’s lesson in C++ 🙂 Nice story!

  24. Denis Frolov says:

    Dear @brucedawson,
    Do you think we could translate this article to Russian and publish on our corporate blog (http://habrahabr.ru/company/abbyy/) — of course, with the name of the author clearly indicated and a link to the original text?
    We are a language software development company and our developers will certainly appreciate this great article.

    Thank you!

  25. claude says:

    I don’t get it, why “so expensive that some optimizers may omit the conversion to double which can lead to surprising results, such as CalcCircumference(r) == CalcCircumference(r) returning false!”

    Whatever the optimizer does, it will do on both call, with same result, since they are the same function, I’ve always thought that you cannot compare with == the result of different function because they could have different result depending on the rounding/whatever effect, but the result of the SAME formula/function should give the SAME result with the same roundup,error and everything .. and optimized the same way, no ? it’s the same assembly code anyway?

    • brucedawson says:

      The result will come back the same from the function, yes. However the compiler has to generate two calls to the function. It calls the function a first time, and then a second time. After making the first call it has to store the result somewhere – probably in memory. And it will store the results as a double. Then it makes the second call, loads the double from memory, and compares it to the full precision result. The result is that f(x) != f(x).

      The two calls produce the same result, but one is truncated to double precision, and the other is left at extended precision.

      This is not theoretical. This was a recurring problem with the x87 FPU design. I’m glad that we have pretty much moved beyond that.

      • claude says:

        Then it was a bug of the compiler/optimizer .. if both side of a comparison are casted differently from the same base, it’s a bug, not a inherent problem with FP

        • brucedawson says:

          The IEEE spec allows significant flexibility in how expressions are interpreted, and the C/C++ standards are mute on the topic, so nailing it down as a bug is actually quite difficult.

          But never mind, the reality is (well, *was*, since the x87 era is now behind us) that the cost of doing this correctly was so high that many programs could not afford to use the /fp:precise option (once it existed) that would do the right thing. Therefore this frustrating behavior occurred – in multiple compilers I believe. Bug or not, it was a Huge Pain ™.

          My point is really just to show how messy things can be. Extended precision seems, on the face of it, to be a brilliant idea. But the reality was often not so clearly good, so the idea has now been discarded on the dustbin of archaic architectures, and SSE requires each *instruction* to specify its precision. The x87 architecture forced compilers to make Sophie’s choice at every branch, and this didn’t work as well as was hoped.

          It was dark times. I am glad that they are over. Our maximum precision is now less, but our results are more predictable.

  26. scruss2 says:

    I feel *slightly* bad that I came up with this:

    float CalcCircumference(float d) {
      return (d == (d - d)) ? d : (d + d + d + d) * atanf(d / d);
    }

    … but at least no constants were harmed in the making of it.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.