50 Bytes of Code That Took 4 GB to Compile

While doing evil things with macros and the inline assembler (trying to run a weird test whose purpose is not really relevant) I managed to write a program that caused Visual Studio’s C++ compiler to allocate 4 GB of memory and then die.

Not bad for a program that can easily fit into a single 50 column line.

I might not have noticed except that my machine didn’t have 4 GB free at the time, and the frantic paging-out of data needed to find 4 GB of memory made my laptop completely unresponsive for a couple of minutes. If you have a machine with more than 4 GB free then this can be a good test case for doing memory analysis with ETW, to see if you can duplicate my results.

I’ve simplified the code down to its minimal essence because it amuses me:

void test()
{
    __asm { add eax
    __asm { add eax
}

Here is the compiler output:

error C2414: illegal number of operands
error C2414: illegal number of operands
error C2400: inline assembler syntax error in ‘opcode’; found ‘end of file’
fatal error C1060: compiler is out of heap space

image_thumb[7]I’m running 64-bit Windows and the compiler is a 32-bit large-address-aware process, so running out of heap space means allocating about 4 GB of address space. I did a few compiles in a row and you can see the 4GB spikes in memory usage from each one.

Then I got curious and wondered which part of the compiler was allocating all of this memory. I used etwheap.bat to record all heap and VirtualAlloc allocations from cl.exe and compiled the source file again. If I was doing it now I would use UIforETW – just make sure that VirtualAlloc stacks always is checked in the Settings dialog, or record a heap trace. There was just a couple of MB allocated from the heap, but lots allocated using VirtualAlloc, as shown here:

image

(etwheap.bat ships with UIforETW – run it from an elevated command prompt and follow the instructions – or just use UIforETW with VirtualAlloc call stacks, or UIforETW’s heap tracing for greater details)

And then, to finish off the investigation, I looked at the call stacks. We can see that the inline_assembler’s lexer is allocating a lot of Asm Tokens using its own VirtualHeap. VirtualHeap::Create reserves the address space, and VirtualHeap::HeapExtend commits the memory. Drilling down a bit deeper (not shown) shows that address space is reserved in 512 KB chunks, and is committed in 32 KB chunks.

image

There’s a few details that aren’t quite clear, such as why VirtualHeap::HeapExtend calls VirtualHeap::Create, but without source code that is an unknowable.

And so we stop. I’ll pass this along to the VC++ team as usual, and I wouldn’t be surprised if they fix it, but it’s not exactly a critical problem. I only noticed it because my machine didn’t have 4 GB free when I first hit this problem.

It’s a good thing the compiler is a 32-bit process or else it would have continued consuming memory beyond 4 GB. Three cheers for artificial limits!

These tests were done using VC++ 2010 on a debug build. I didn’t try any other variations.

Linux variation

A thematically similar problem (the linker consuming vast quantities of memory on a simple program is described on stackoverflow.

Windows Suxs?

I anticipate that some people will say that Windows is a horrible operating system and that is why my laptop locked up for a few minutes when this initially happened. Well, maybe, but if you allocate (and write to) 4 GB of RAM on your Linux and OSX machine and fail to cause a serious system slowdown that doesn’t actually prove anything. My laptop has 8 GB of RAM and most of it was in use, so the only possible way to get 4 GB of memory was to write a lot of data to disk. Laptop drives are notoriously slow. If I do the same test on my work machine (32 GB of RAM, 20 GB available) or my laptop when I have fewer programs running (5 GB free) then the 4 GB is allocated and freed in less than five seconds.

The reddit discussion can be found here.

Updates

It’s weird how some blog posts are more popular than others…

Some people had difficultly reproing this problem. The bug is only confirmed to happen in VS 2010 SP1, and it is only guaranteed to happen if the test function is the last thing in the source file.

This is clearly not a critical bug – the code is malformed, the compiler gives some warnings that point to the problem area, and nothing truly bad happens. However it is still a lexer failure. In particular, the out-of-memory failure prevents VC++ from reporting on the brace mismatch – if you add a function after test then the lexing completes and additional warnings are displayed.

Giving great error messages when compiling incorrect code is important enough that it is one of clang’s explicit design goals.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Investigative Reporting, Programming, Visual Studio, xperf and tagged . Bookmark the permalink.

37 Responses to 50 Bytes of Code That Took 4 GB to Compile

  1. Anon says:

    It’s not fair – Windows doesn’t need folks like you to improve things, it’s got plenty of folks to find its issues already. It’s those of us in Linux land who need your expertise to make gcc/clang/whatever better…

    Better for you to find these types of issues in Linux rather than Windows! 😛 (and yes fragmentation can affect any system and I suspect on a typical Linux system you would have brought out the OOM killer).

  2. Z.T. says:

    RE: “Laptop drives are notoriously slow.” You’re supposed to use an SSD these days and use spinning disks only for large files.

    • brucedawson says:

      Getting an SSD is on my to-do list.

      On the other hand, if developers use SSDs then they will, inevitably, write code that performs badly on spinning disks, leading to bad customer experiences. The VS .sdf files and Windows Live Photo Gallery .pd6 files are both guilty of this sin. Using a spinning disk ensures that I see problems that other developers miss.

      • Chris says:

        Agreed. Development on SSD’s is great. Compile times for C++ are heavily I/O. Testing on SSD’s would be a mistake for general applications.

        • brucedawson says:

          > Compile times for C++ are heavily I/O.

          Citation needed. I have done tests and in almost all cases this is simply not true. If your compilations are I/O bound then you don’t have enough memory. Or your operating system has terrible caching. All of your source files should be in the disk cache. The object files should also live there. I have profiled my builds and they are mostly CPU bound. Optimizing our builds is mostly about improving parallelism. But that’s a subject for another blog post.

          SSDs are great for some things, but improving build times is rarely one of them.

  3. You would be wise not to call me a liar.

  4. TempleOS uses physical memory addresses, more or less, since it is identity-mapped. Physical RAM might look like 0x0000,0000 – 0xBFFF,FFFF and 0x1,0000,0000 – 3,FFFF,FFFF. For large allocations, it chooses a little over a power of two-sized-chunks.

  5. Wyatt says:

    Grudgingly, I have to admit I wish the sort of instrumentation tools you show off in these posts were more readily available in Linux. I’ll grant I may just be missing the boat on things that are just as good, but that’s a problem in its own right: an unknown tool gets used just about as much as a nonexistent one.

    • Google perftools for heap profiling & sysprof for runtime sample profiling would be my first choices. Then if that isn’t enough, break out the big guns: valgrind / helgrind. Although the 10x performance penalty for running under valgrind can be a bit painful sometimes.

      • Pedestrian Traffic says:

        Oh, how about that, I had completely missed both of the first two. Thanks! But this brings up another problem that’s worth being concerned about: presentation. That is, how data is presented (up to and including GUI wrangling (eww)). From what I’m seeing these are all pretty raw-output, here-have-some-numbers affairs. Not 100% sure about perftools and sysprof, but I’ve spent enough time with Valgrind to know how it tends to do things. I do know that kcachegrind offers some niceties, but I’m reasonably certain there are things xperf does beyond that scope.

        On a personal note, I admit I’m a little peeved the only good datavis tools I’ve seen are GUI applications– I spend most of my time in a terminal, working on headless servers. We need more, and more places.

        • brucedawson says:

          Presentation is crucial. Recording great data is a nice start, but if you can’t explore it well then it’s not worth much. I really like flame graphs:

          Summarizing Xperf CPU Usage with Flame Graphs

          but I’d be way happier if they were built in to the tools on both platforms.

          The xperf visualizer have a steep learning curve, but they are insanely powerful once you grok them. A huge number of questions can be trivially answered, for any time period within the recorded trace, and that is amazingly powerful. I have no doubt that Linux can record equivalent information, but I have not seen equivalently powerful visualizations.

    • cac04 says:

      OProfile is pretty good for profiling anything to do with your CPU: usage, cache misses etc.

      LTT is a great tracing tool. The kernel tracing is extremely helpful for profiling/debugging multithreaded code and it also supports arbitrary userspace traces.

      Valgrind (cachegrind/callgrind/helgrind) is also nice but generally too slow to be useful, at least for the sort of things I work on.

      AddressSanitizer (part of Clang and GCC) is great for checking memory use: not quite as comprehensive as Valgrind but only a 2x slowdown.

      Gperftools includes good userspace CPU profiling as well as heap profiling.

  6. Pie says:

    I wish Linux had instrumentation tools like this. BSD has them, and tracing that makes Windows look like childs play.

    The __asm macro isnt valid in GCC, but __asm__(“”); is. As a result, the given wont compile. If you mangle this and attempt to make gcc take it, it wont compile, as it isnt valid assembly for the target, and bails saying “Fix your assembly”

    I would call this as a fault in MSVC++’s compiler’s lexer here. I’ve heard tales of working on yylex and none of them are pretty. It makes grandeur assumptions of what the code is going to need to compile, since it cant tell where the closing chunk is going to be (before macro macro macro expansion).

    So, not a windows fail, more a failure of a non lookahead lexer.

  7. The bug doesn’t seem to exist in Visual Studio 2013 (or I am doing something wrong)

  8. Kixunil says:

    This reminds me of 10 bytes program in Brainfu*k which caused Windows XP to freeze.
    History: Long time ago, I was in school and after I did assigned task and had some free time, I downloaded BF compiler and played for a while. I tried printing some random characters and one of them made beep. When I found out, which character it was (ascii 7), I decided to make program which would beep endlessly. So I did: +++++++[.]
    I run it and of course after couple of beeps other people in room told me to turn it off. But I couldn’t – mouse wouldn’t move any longer. I had to cut off power from computer.
    I’m not sure why it froze. I suppose, Windows did some kind of caching, so program could run while beeping (good thing to have), but it was growing faster than beeping, without limit (bad thing).

    Anyway, I still think Windows sucks regardless of article or this comment. 😀

  9. sakodak says:

    Windows sucks independently of the content of this article. So does Linux. Every operating system sucks in its own special way. (That being said, Windows sucks more violently and more frequently.)

  10. Doug says:

    == Can’t Reproduce ==
    Tried on Win7 x64 / VC++ 2010 Pro
    No memory spike.
    Compilation fails on the spot

    1>—— Build started: Project: Test1, Configuration: Debug Win32 ——
    1> Test1.cpp
    1>e:\…\test1.cpp(7): error C2414: illegal number of operands
    1>e:\…\test1.cpp(8): error C2414: illegal number of operands
    1>e:\…\test1.cpp(11): error C2415: improper operand type
    1>e:\…\test1.cpp(11): error C2400: inline assembler syntax error in ‘second operand’; found ‘(‘
    1>e:\…\test1.cpp(12): error C2400: inline assembler syntax error in ‘opcode’; found ‘(‘
    1>e:\…\test1.cpp(13): error C2400: inline assembler syntax error in ‘opcode’; found ‘constant’
    1>e:\…\test1.cpp(16): fatal error C1075: end of file found before the left brace ‘{‘ at ‘e:\…\test1.cpp(7)’ was matched

    • brucedawson says:

      How much memory do you have? I doubt that matters, but just in case.

      It could be slightly different compiler versions. Do you have SP1? If you paste this in at the top of the source file what does it say:

      #define QUOTE0(a) #a
      #define QUOTE1(a) QUOTE0(a)
      #define MESSAGE(msg) message(__FILE__ “(” QUOTE1(__LINE__) “): ver ” QUOTE1(_MSC_FULL_VER) ” ” msg)

      #pragma MESSAGE(“Built with this compiler version”)

      For me it says:

      ver 160040219 Built with this compiler version

      And the VS 2010 about box says that I have Visual Studio 2010 Version 10.0.40219.1 SP1Rel installed.

      • Doug says:

        I have VS2010 SP1 (KB983509)
        “Version 10.040219.1 SP1Rel”
        “.NET Framework Version 4.0.30319Rel”

        I have 12G of RAM by I honestly don’t think it’s related, as I compiled several times with no mem spike.

        • brucedawson says:

          It is important to have nothing in the source file after the test() function, or else the bug may not repro. With a source file containing nothing but the five-line function I have reproed this bug on four different machines. I’ve switched to building on the command line in order to make the repro simpler.

          C:\>”%VS100COMNTOOLS%vsvars32.bat”

          C:\>cl testprogram.cpp
          Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80×86
          Copyright (C) Microsoft Corporation. All rights reserved.

          testprogram.cpp
          testprogram.cpp(3) : error C2414: illegal number of operands
          testprogram.cpp(4) : error C2414: illegal number of operands
          testprogram.cpp(6) : error C2400: inline assembler syntax error in ‘opcode’; found ‘end of file’
          testprogram.cpp(6) : fatal error C1060: compiler is out of heap space

    • brucedawson says:

      In order to repro the bug you need to get rid of the extra code after the test function. From your error messages it looks like test() goes from line 5 to 9, then you have some other code (probably main) from line 11 to 15. That other code is changing the lexing enough to avoid the problem. Delete the code after test and the bug will appear.

      To be clear, this isn’t an important bug. It amuses me that it received this much attention.

  11. eu says:

    Also can’t reproduce – compile error instantly (after ~1s), no increased memory.
    VS 2008 and windows 7, >4GB RAM and not ssd disk.

  12. This bug is related to Visual Studio 2010 SP1 bug. What does it has to do with Windows? Nothing.

    Of course people like you blame Windows on everything, even on their malfunctioned brain.

    • brucedawson says:

      I’m not sure who you’re talking to…

      The relevance of Windows is in how it handles the storm of paging. It was frustrating that my Windows machine locked up for several minutes while data was paged out in order to allow the 4 GB of allocations. However I don’t know how Linux would have handled this, and setting up an equivalent test (ensuring that there is less than 4 GB RAM available) is more trouble than it’s worth.

      • cac04 says:

        Just in case you’re genuinely interested in how Linux would have handled this: arguably, worse than Windows. Here’s a short article on Linux’s dubious use of overcommit.

  13. Goran Mitrovic says:

    “There’s a few details that aren’t quite clear, such as why VirtualHeap::HeapExtend calls VirtualHeap::Create, but without source code that is an unknowable.”

    Similar happens with Win32 heap – if you are using low fragmentation heap, it will internally propagate function call to the non-LF heap that handles your block size.

  14. entheh says:

    “A thematically similar problem (the linker consuming vast quantities of memory on a simple program is described on stackoverflow.”

    My brain ran out of heap space 🙂

  15. yuhong says:

    “If I do the same test on my work machine (32 GB of RAM, 20 GB available) or my laptop when I have fewer programs running (5 GB free) then the 4 GB is allocated and freed in less than five seconds.”
    I wonder how fast it would be if it was paging to a SSD.

    • brucedawson says:

      If you have sufficient memory then no paging occurs, so an SSD is irrelevant. My work machine has an SSD, but it also has enough memory that that SSD is never used for paging. It is still handy for faster booting, faster app-launching, faster writes (when recording traces) and faster random access (.sdf files, for instance).

  16. Pingback: ETW Central | Random ASCII

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.