Add a const here, delete a const there…

I just completed a series of changes that shrunk the Chrome browser’s on-disk size on Windows by over a megabyte, moved about 500 KB of data from its read/write data segments to its read-only data segments, and reduced its private working set by about 200 KB per-process. The amusing thing about this series of changes is that they consisted entirely of removing const from some places, and adding const to others. Compilers be weird.

This task started when I was writing documentation for some tools I’d been using to investigate binary-size regressions on Windows. I pasted in some sample output from one of the tools and was writing an explanation of it when I thought “that’s funny…” I’d noticed that several of the large globals, which looked like they should be constant data, were in the read/write data segment. An abbreviated version of the tool’s output is shown below:

image

Most executable formats have at least two data segments – one for read/write globals and one for read-only globals. If you have constant data, such as kBrotliDictionary, then it is best to have it in the read-only data segment, which is segment ‘2’ in Chrome’s Windows binaries. However some logically constant data, such as unigram_table, device::UsbIds::vendors_, and blink::serializedCharacterData were in section ‘3’, the read/write data segment.

imagePutting data in the read-only data segment has a few advantages. It makes it impossible to corrupt the data through stray writes, and it also ensures greater efficiency. The read-only pages are guaranteed to be shared between all processes that load that DLL, which in the case of Chrome can be a lot of processes. And, in some cases (although probably not these) the compiler can use those constant values directly.

Pages in the read/write data segment might end up staying shared, but they might not. These pages are all created as copy-on-write which means that they are shared until written to, at which point they are copied into per-process private pages. So, if a global variable is initialized at run-time it becomes private data. Or, if a global variable is on the same page as another global that is written to then it becomes private data – everything happens at the page granularity (4 KiB).

Private data uses more memory because there is a separate copy for each process, but private pages are also more expensive because they are backed by the pagefile instead of by the image file. This means that when memory gets low they have to be written to the pagefile instead of just being discarded, and paging them back in is more expensive because they tend to get written randomly and therefore get read randomly. For all of these reasons shared pages are strictly better than potentially-private pages.

Adding const is good

So, when my ShowGlobals tool showed that blink::serializedCharacterData was in the read/write data segment, and when investigation showed that it was never modified, I landed a change to add a const modifier, and it dutifully moved to the read-only data segment. Simple enough. A change like this is never a bad idea, but it’s hard to know how much it helps. Because this array is never written to it might end up staying as shared data forever. Or, more likely, if the ends of the array end up sharing pages with global variables that are written to then part of the array will end up turning into per-process private data, wasting 7,748 or 3,652 bytes (the size of the array minus one or two pages in the middle which are guaranteed to stay shared) in each process. Changes like this should help on all platforms, with all toolchains.

Marking your const arrays as const is a good idea, you should do it, and I’m sure some developers don’t know this, but I’m not sure this information alone would have been enough to inspire this blog post. Here is where we veer into peculiar new territory…

Sometimes removing const is even better

The next array that I investigated was unigram_table. This one was peculiar because it was initialized entirely with constant data using struct/array initializer syntax and it was marked as const but it was still being placed in the read/write data segment. This looked like an interesting VC++ compiler quirk so I followed my own instructions for how to reduce and report a compiler bug. I copied the types and the array definition to a stand-alone project and kept reducing it, all the time checking to see if the array was still going in the read/write data segment. I kept reducing until I had this repro, which is small enough to easily fit in a tweet, or one line of a blog post:

const struct T {const int b[999]; } a[] = {{{}}}; int main() {return(size_t)a;}

If you compile this and run ShowGlobals on the PDB it will show that ‘a’ is in section ‘3’, despite being tagged as const. Here are the steps for building and testing the code:

> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> cl /Zi constbug.cpp
/out:constbug.exe
> ShowGlobals.exe constbug.pdb
        Size   Section Symbol name
        3996   3       a

Having reduced my example to less than 140 characters it was easy to find the trigger. With VC++ (2010, 2015, 2017 RC) if you have a class/struct with a const member variable then any global objects of that type end up in the read/write data segment. Jonathan Caves explains in a comment on my bug report that this happens because the type has a “compiler generated deleted default constructor” (makes sense) which confuses VC++ which incorrectly sees this as a class that needs dynamic initialization.

So, the problem in this case is the const modifier on the ‘b’ member variable. Once that is deleted the array, ironically enough, gets put in read-only memory. Since the object itself is const the removal of const from the member variable doesn’t reduce safety at all, and actually improves it for VC++ builds.

I suspect that the VC++ team will fix this for VS 2017 – it seems like an easy win to me – but I didn’t want to wait that long. So, I started landing changes to remove const on member variables in Chrome where it was causing problems. This was simply a matter of continuing to look through the list of large global variables in the read/write data segment and categorizing them:

  • Actually written to – leave alone
  • Not written and missing const on the global variable – add it
  • Not written and problematic const on a member variable – remove it

Okay, this looks really funny…

So, I worked my way through Chrome’s source code, adding and removing const in a few useful places. For most of my changes the effect was just to move some data from the read/write data segment to the read-only data segment, as expected, but two of the changes did much more. Two of the changes shrunk the .text and .reloc sections. This was great, but it seemed too good to be true. This suggested that VC++ was generating code to initialize some of these arrays – a lot of code.

The most interesting change was removing three const keywords from the declaration of the UnigramEntry struct. Doing this moved 53,064 bytes to the read-only data segment, but it also saved over 364,500 bytes of code, in both chrome.dll and chrome_child.dll. That suggests that VC++ had been quietly creating an initializer function that used an average of almost seven bytes of code to initialized each byte of unigram_table. That couldn’t be. It was too much to hope for that my change would help that much!

So, I launched Chrome under the VS debugger and set a breakpoint at the end of the unigram_table array and VS dutifully stopped there at the beginning of the initializer. I’ve put a cleaned up and simplified excerpt of the assembly language below (I replaced ‘unigram_table’ with ‘u’ to make it fit better):

55                    push ebp 
8B EC                 mov  ebp,esp 
83 25 78 91 43 12 00  and  dword [u],0 
83 25 7C 91 43 12 00  and  dword [u+4],0 
83 25 80 91 43 12 00  and  dword [u+8],0 
83 25 84 91 43 12 00  and  dword [u+0Ch],0 
C6 05 88 91 43 12 4D  mov  byte  [u+10h],4Dh 
C6 05 89 91 43 12 CF  mov  byte  [u+11h],0CFh 
C6 05 8A 91 43 12 1D  mov  byte  [u+12h],1Dh 
C6 05 8B 91 43 12 1B  mov  byte  [u+13h],1Bh 
C7 05 8C 91 43 12 FF 00 00 00 mov  dword [u+14h],0FFh 
C6 05 90 91 43 12 00  mov  byte  [u+18h],0 
C6 05 91 91 43 12 00  mov  byte  [u+19h],0 
C6 05 92 91 43 12 00  mov  byte  [u+1Ah],0 
C6 05 93 91 43 12 00  mov  byte  [u+1Bh],0 
… 52,040 lines deleted…
c6 05 02 6e 0b 12 6c  mov  byte  [u+cf42h],6Ch
c6 05 03 6e 0b 12 6e  mov  byte  [u+cf43h],6Eh
c6 05 04 6e 0b 12 a2  mov  byte  [u+cf44h],0A2h
c6 05 05 6e 0b 12 c2  mov  byte  [u+cf45h],0C2h
c6 05 06 6e 0b 12 80  mov  byte  [u+cf46h],80h
c6 05 07 6e 0b 12 c4  mov  byte  [u+cf47h],0C4h
5d                    pop  ebp
c3                    ret

The hex numbers along the left are the machine-code bytes and the text to the right of that is the assembly language. After some prologue code the code gets into a routine of writing to the array… one byte at a time… using seven-byte instructions. Well, that explains the size of the initializer.

It is well known that optimizing compilers can write code that is as good or better than human beings in most cases. And yet. Sometimes they don’t. There are so many things that could be better about this function:

  • It could not exist. The array is initialized using C array syntax and if it wasn’t for this VC++ code-quality bug with const struct members then there would be no initializer – on other platforms there isn’t
  • The writes of zero could be omitted. The array is a global variable being initialized at startup and the contents of memory are already guaranteed to be zero, so every write of zero can be removed
  • Data could be written four bytes at a time (ten-byte instruction) instead of one byte at a time (seven-byte instruction)
  • The address of the array could be loaded into a register and then dereferenced instead of specifying its address in every. single. instruction. This would make the instructions smaller and would also save two bytes per instruction of relocation data, found in the .reloc segment

You get the idea. The function could be a quarter the size, or it could be omitted entirely. Anyway, with the three const keywords removed the initializer disappeared entirely (it is gone from canary builds already) and the ~364,500 bytes of code and ~105,000 bytes of relocation information have disappeared, from both chrome.dll and chrome_child.dll. The array used to be in the .BSS (zero initialized portion of the read/write data segment) where it occupied no space on disk and it now occupies 53,064 bytes in the read-only data segment, which is why the total disk-size savings is about 416,000 bytes per DLL. This change gave similar but slightly smaller improvements.

Most importantly, the various globals involved in these two changes go from being mostly or completely per-process private data (due to being initialized at run-time) to being shared data, saving an estimated 200 KB of data per process.

Example changes

I started with the largest and most common objects and types in order to get the biggest wins first and I quickly reduced the size of the read/write data segment by about 250 KB, moving over 1,500 global variables to the read-only data segment. It’s easy to get caught up in the game (who me? OCD? I don’t know what you’re talking about) but I have managed to stop even though I know that there are still hundreds more small global variables that I could fix. After a while I hit diminishing returns and it was time to move onto something else. To put it another way, if you’ve always wanted to land a Chromium change I left a few for you. Here are some of the changes landed as part of this project:

Some changes that deleted const:

Some changes that added const:

I was really hoping for conservation-of-const, and there are enough globals missing const that I could yet attain it.

See for yourself

If you want to debug Chrome and see the unigram_table initializer before it goes away then it’s fairly straightforward – you don’t need to be a Chrome developer to do it. Start by executing these two commands:

> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> devenv /debugexe chrome.exe

Make sure you have Chrome’s symbol server set up in VS per these instructions and then set a breakpoint on this symbol:

`dynamic initializer for ‘unigram_table”

In case wordpress mangles it that’s a back-tick, some magic text, a single quote, the symbol name, and then two single quotes. Make sure you’ve shutdown Chrome completely, and then launch Chrome from VS. VS will download Chrome’s symbols (the magic of symbol servers!) and set a breakpoint on the initializer, if it exists. Handy. If necessary you can switch to assembly language mode (Ctrl+F11) and marvel. If you want to see the source code then just enable source server by checking the box in the Visual Studio debug options.

And that’s how I spent my holidays – for less technical holiday shenanigans read about warm weather snowmen, and the importance of winning Christmas.

Reddit discussion is here, hacker news here, and twitter comments start here.

Advertisements

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Performance, Programming, Visual Studio and tagged , , . Bookmark the permalink.

27 Responses to Add a const here, delete a const there…

  1. iaanus says:

    You should prefer constexpr instead of const whenever you want to ensure static initializiation. It does the right thing in your “fit-in-a-tweet” example even if you keep the inner const. Bugs aside, it’s safer in general: since constexpr makes the code ill-formed when static initializiation is not feasible, you can spot the potential inefficiency without using post-build tools.

    • brucedawson says:

      I love constexpr and it bothers me that the C++ committee designed in such a way that I often cannot use it. Look at the change in https://codereview.chromium.org/2608823002 for example – in that example a const array is defined in one translation unit and use in another. constexpr does not support that. You cannot go “extern constexpr”.

      The reason why is because constexpr means “available at compile time” even though I often wish that it meant “generated at compile time”. There is no way in C++ to require that something be generated at compile time without requiring that it be available at compile time, so enforcing my wishes for these arrays is, in general, impossible.

      That said, there are some places where I could have used constexpr. I think I tried that and hit internal compiler errors 😦

      • aurelienrb says:

        What about an extern const ref / pointer that is initialized in the .cpp with a constexpr data?

        • brucedawson says:

          If you have a constexpr array and you have a const ref/pointer to it then you do get external visibility to the constexpr data, but at the expense of an added level of indirection. This wastes time and space and I hate compromises.

          Long-term the correct solution (suggested on reddit) is to do this:

          // a.h
          extern const data[1500];

          // a.cpp
          #include “a.h”
          constexpr data[1500] = { … };

          The extern const serves as a trick to make the constexpr variable externally visible. This is supported in gcc and clang and VC++ has said they intend to support this in VC++ 2017.

          • aurelienrb says:

            Wa that’s a cool feature, thanks for sharing! That’s what I had in mind but in a cleaner and more direct way. Note that in case of an array you don’t get an extra indirection by declaring another pointer to the same data (but you do with a ref). But I know you master this details better than me 🙂
            Your blog is cool and you are doing a great job (in my previous job I was also in charge of spotting the VC++ bugs and finding a workaround for them and it made me measure how much patience and effort it requires when you are dealing with millions of LOC).

          • brucedawson says:

            > Note that in case of an array you don’t get an extra
            > indirection by declaring another pointer to the same data

            Yes you do. You seem to be making the mistake of assuming that an array *is* a pointer. It is not. It is a blob of data. It can be converted to a pointer automatically (it “decays” to a pointer) but it is not a pointer. So, there is indeed an extra level of indirection. If we have an array of char ‘a’, a pointer to it ‘p’ zeroth element is either:

            Direct array access:
            mov eax, a

            Access through a pointer:
            mov eax, p
            mov eax, [eax]

            Sometimes there won’t be an extra instruction, but there will always be an extra memory access.

          • aurelienrb says:

            This is what I meant:
            https://godbolt.org/g/M9XOvI

            VC++ 2015 does the same optimization too.

            In case the link dies, here’s the code:
            const char str1[] = “hello1”;
            // double const required
            const char * const str2 = str1;

            const int array1[] = { 0, 1 };
            // static is a game changer here
            static const int *array2 = array1;

          • brucedawson says:

            Unfortunately that doesn’t prove anything except that sometimes the compiler can see through the indirection and get rid of it.

            The whole goal is to get efficient code in translation units other than the one that defines the pointers. In those cases you can’t count on the optimizer knowing that ‘array2’ points to ‘array1’, and the compiler will need to load the ‘array2’ and then dereference it, whereas with ‘array1’ it just loads the address, which is easier.

            Plus, having an extra pointer wastes space. Not a lot, but it’s there.

        • aurelienrb says:

          I understand. I realized afterwards that it works because of the static symbol, while you need an extern one to work across TU. I missed that point. But that was cool to learn a few things. Thanks for taking the time to explain 🙂

  2. Anonymous says:

    I believe the site has some malware injected in it. At least the mobile version of it. After a few second it redirected me to some site with a fake Facebook skin.

  3. sasq says:

    I recently started Unit testing for code size; I include a disassembler in my test project and check both number of opcodes and number of jumps in code I expect to be fully unrolled.
    Adding tests to make sure data expected to be const ends up in RO memory should work equally well I guess…

  4. todor says:

    instead of removing the consts you should have replaced them with some macro that evaluates to empty string on the problematic compilers. this way the locations are easily greppable and can be reverted when the compilers are fixed.

    • brucedawson says:

      That was suggested by one reviewer. I pushed back because I think that adding more preprocessor macros was unjustified because the final objects were always const. In general that is a reasonable technique, but it didn’t seem appropriate here.

  5. Venki says:

    Interesting, it matters saving even a byte or single instruction and that too complex code base like layout engines. An age old article by Dan Saks, may still be helpful in the era of modern C++(11).

    http://www.dansaks.com/articles/1998-06%20Placing%20const%20in%20Declarations.pdf
    http://www.dansaks.com/articles/1998-08%20What%20const%20Really%20Means.pdf
    http://www.dansaks.com/articles.htm

  6. Shantanu says:

    RE: The removal of const because of VS bug

    At what point is it okay to go against the language semantics to optimize in short term? and how do you plan to track the bug and reverse this change?

    • brucedawson says:

      I believe that the ‘const’ modifiers that I removed were not important enough to matter. I think their placement is more ad hoc than magically perfect. In particular, they were all (I believe) applied to objects that themselves were const, so the member variable const modifiers don’t actually add any security.

      In fact, I would say that this set of changes increases robustness and safety on all platforms, due to the places where I added const. I think that adding const to the remaining globals that should have it is more interesting and important than adding it back to places where it gives no additional safety.

      If we fast-forward to later this year when Chrome is building with VC++ 2017 which has a fix to this const bug, where is the most effective place to add const?
      a) To the places where I deleted it – const member variables of objects that are themselves const
      b) Global variables that are logically const but are not marked as such
      c) Variables that are logically const members of non-const objects

      I would say that the first priority would be b), because getting globals into the read-only data segment on all platforms is awesome. The second priority would be c), because marking mutable data as const may help to avoid future bugs. And, in a distant third, is a), because it seems to have little to no value, I think.

  7. Ilya Verbin says:

    Once I have discovered that const qualifier can increase a size of a simple binary from 17KB to 4MB: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02120.html
    It appears to be a bug in the linker: https://sourceware.org/bugzilla/show_bug.cgi?id=19162

  8. ericlaw says:

    This was awesome, thanks for sharing it!

    One potential typo? “conversation-of-const” -> “conservation-of-const”

  9. Jon says:

    Is it feasible to use gcc or Clang to build chrome in Windows?

    • brucedawson says:

      Yes, Chrome for Windows currently builds with clang, using the VC++ linker. There is a team working on improving the build performance, run-time performance, binary size, and debuggability, but it does work.

  10. Curios, what do you use to look at segments/size/… for PE+ files?

  11. Trass3r says:

    btw, http://gcc.godbolt.org/ finally added the MS compiler! Just switch to ‘x86-64 CL’
    Nice for bug reports, quickly checking codegen etc.

  12. Pingback: Delete an inline function, save 794 kB | Random ASCII

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s