64-Bit Made Easy

The scariest aspect of porting your ancient 32-bit code to 64-bit is pointer truncation bugs. Any places where you store a pointer in an ‘int’ or a ‘long’ can come back to bite you when you move to 64-bit.

The problem is, these bugs can take a while to show up. Memory allocations on Windows default to starting at low addresses, so it takes a while for allocations to work their way up high enough for there to be anything in the top 32 bits to truncate.

Welcome Ubisoft wiki readers!

Just as with time-math it is really tedious to deal with bugs that may take hours to show up.

At Valve we use a simple technique to solve this problem. We make sure that our allocations start above the 4 GB line. If every allocation has some bits in the high 32 bits then pointer truncation bugs tend to cause warm fuzzy crashes immediately, and 64-bit cleanliness is easy.

Here’s some code:

void ReserveBottomMemory()
{
#ifdef _WIN64
    static bool s_initialized = false;
    if ( s_initialized )
        return;
    s_initialized = true;

    // Start by reserving large blocks of address space, and then
    // gradually reduce the size in order to capture all of the
    // fragments. Technically we should continue down to 64 KB but
    // stopping at 1 MB is sufficient to keep most allocators out.

    const size_t LOW_MEM_LINE = 0x100000000LL;
    size_t totalReservation = 0;
    size_t numVAllocs = 0;
    size_t numHeapAllocs = 0;
    size_t oneMB = 1024 * 1024;
    for (size_t size = 256 * oneMB; size >= oneMB; size /= 2)
    {
        for (;;)
        {
            void* p = VirtualAlloc(0, size, MEM_RESERVE, PAGE_NOACCESS);
            if (!p)
                break;

            if ((size_t)p >= LOW_MEM_LINE)
            {
                // We don't need this memory, so release it completely.
                VirtualFree(p, 0, MEM_RELEASE);
                break;
            }

            totalReservation += size;
            ++numVAllocs;
        }
    }

    // Now repeat the same process but making heap allocations, to use up
    // the already reserved heap blocks that are below the 4 GB line.
    HANDLE heap = GetProcessHeap();
    for (size_t blockSize = 64 * 1024; blockSize >= 16; blockSize /= 2)
    {
        for (;;)
        {
            void* p = HeapAlloc(heap, 0, blockSize);
            if (!p)
                break;

            if ((size_t)p >= LOW_MEM_LINE)
            {
                // We don't need this memory, so release it completely.
                HeapFree(heap, 0, p);
                break;
            }

            totalReservation += blockSize;
            ++numHeapAllocs;
        }
    }

    // Perversely enough the CRT doesn't use the process heap. Suck up
    // the memory the CRT heap has already reserved.
    for (size_t blockSize = 64 * 1024; blockSize >= 16; blockSize /= 2)
    {
        for (;;)
        {
            void* p = malloc(blockSize);
            if (!p)
                break;

            if ((size_t)p >= LOW_MEM_LINE)
            {
                // We don't need this memory, so release it completely.
                free(p);
                break;
            }

            totalReservation += blockSize;
            ++numHeapAllocs;
        }
    }

    // Print diagnostics showing how many allocations we had to make in
    // order to reserve all of low memory, typically less than 200.
    char buffer[1000];
    sprintf_s(buffer, "Reserved %1.3f MB (%d vallocs,"
                      "%d heap allocs) of low-memory.\n",
            totalReservation / (1024 * 1024.0),
            (int)numVAllocs, (int)numHeapAllocs);
    OutputDebugStringA(buffer);
#endif
}

Cool, eh?

The code is a bit messy but actually fairly simple. Call this as soon as possible when your 64-bit process starts up and you will be able to find and fix your pointer truncation bugs in no time at all.

The code is a bit verbose because it first reserves all of the low-memory address space and then tries to soak up address space that was previously reserved by the CRT and process heaps. Other heaps in your process may still be holding on to low memory, but in practice it shouldn’t be enough to matter.

It’s cheap

The VirtualAlloc calls reserve only address space, which means that the cost of this is very low. The code doesn’t reserve 4 GB of RAM, it just reserves some space and then never uses it. Cheap like borscht.

(App) Verify this

This is such a simple and obvious technique that I’m quite surprised that Application Verifier doesn’t offer it as an option*. In fact, Application Verifier has a bug that renders it almost incompatible with this technique: if you are using this technique at the same time that you use Application Verifier then it somehow ends up committing 4 GB of RAM! The first time we hit this was when a colleague was running a dozen copies of our asset conversion tool while Application Verifier was enabled for it. The 48 GB of extra RAM consumption did bad things to his computer’s performance.

I hacked around this problem by detecting Application Verifier (just check to see if one of its DLLs is loaded) and disabling the reservation in that case. Another alternative is to make the address space reservation optional, but this won’t find as many bugs.

It’s a darned good start

Pointer truncation bugs aren’t the only problem in porting to 64-bit. Indices and offsets can also truncate or wrap, so looking at compiler warnings and auditing likely problem areas is a good idea. It turns out that most integer loop variables should probably be ‘size_t’ or ‘ptrdiff_t’ rather than ‘int’.

I’m here all week, try the steak

Porting to 64-bit needn’t be scary, and this technique helps make the process more reliable and predictable. By using this technique at Valve I was able to flush out all the critical pointer truncation bugs in a large code base in very little time. This then made it easier to use Application Verifier to check for other memory bugs, and also lets our processes address vast amounts of memory.

* If Microsoft had added pointer truncation detection to Application Verifier then they might have caught this bug in their audio APIs. If you use the MIXER_OBJECTF_HWAVEOUT flag to pass a (64-bit) HWAVEOUT to mixerOpen then you find that the uMxId parameter is a (32-bit) UINT. Oops. Try it yourself, before and after calling ReserveBottomMemory(). Bug reported.

void TestAudio()
{
    WAVEFORMATEX w = {};
    w.wFormatTag = WAVE_FORMAT_PCM;
    w.nChannels = 1;
    w.nSamplesPerSec = 44100;
    w.wBitsPerSample = 16;
    w.nBlockAlign = w.nChannels * (w.wBitsPerSample/8);
    w.nAvgBytesPerSec = w.nSamplesPerSec * w.nBlockAlign;

    HWAVEOUT hWave;
    MMRESULT mmr = waveOutOpen(&hWave, WAVE_MAPPER, &w,
                NULL, 0, CALLBACK_NULL);

    if (mmr == MMSYSERR_NOERROR)
    {
        HMIXER hMixer = NULL;
        // Map the device onto an HMIXER. The flags parameter tells the API
        // to interpret the second parameter as an HWAVEOUT. The
        // mandatory cast truncates the pointer.
        mmr = mixerOpen(&hMixer, UINT(hWave), 0, 0,
                    MIXER_OBJECTF_HWAVEOUT);
    }
}
About these ads

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code Reliability, Programming, Visual Studio. Bookmark the permalink.

21 Responses to 64-Bit Made Easy

  1. Jaewon Jung says:

    A cool idea!

    BTW, I couldn’t understand the details of your function. I mean why it reserves gradually small blocks repeatedly rather than just reserving bottom the 4GB space once? And I guess, using the HeapAlloc below 64kB is because the page size is 64kB and reserving the address space is only possible for a multiple of the page size… Is that correct?

    Clear up my ignorance, please. ;)

    • brucedawson says:

      Parts of the bottom 4 GB of address space are already reserved or allocated before my process even finishes loading. Kernel32.dll, ntdll.dll, and user32.dll for instance, load right in the middle of the bottom 4 GB. The stack for my first thread also ends up down there. In order to fill in all of the cracks I have to reserve small blocks of address space. I start with large blocks because that is more efficient.

  2. Richard says:

    “It turns out that most integer loop variables should probably be ‘size_t’ or ‘ptrdiff_t’ rather than ‘int’.”

    Amen, but I’ve never been able to get much traction on this: old habits die hard, like the Fortran habit of using i, j, k. Perhaps this post will help.

  3. Riley L says:

    Do you see any practical benefits to shipping x64 game builds yet?

    • brucedawson says:

      Shipping a 64-bit only game would cut out a lot of the market, so that’s not generally practical yet. Shipping both 32-bit and 64-bit versions adds some testing cost, so most studios probably won’t do it, especially since tagging your 32-bit executable as large-address-aware gets you double the address space on a 64-bit OS already.

      The most likely reason, right now, to ship a 64-bit version of a game would be if it runs faster. Sometimes having twice as many registers and other 64-bit code-gen differences can give you a performance boost.

      The Steam Hardware Survey says that over 60% of Steam customers are on a 64-bit OS, so if somebody did make a game too big for 32-bit then there is already a significant market for it.

      The main reason for doing 64-bit now is for internal development. It allows bigger levels, better tools, and the use of tools like Application Verifier (pageheap) without worrying about out-of-memory.

  4. This is awesome…however my FileDialog DoModal are crashing now with non-sense callstacks. I get the feeling it’s a shell extension or something since I see the dialog a brief moment and then boom :(
    > 000000018000163b()
    0000000100000024()
    0000000000000001()

  5. brucedawson says:

    I can’t repro this crash. I tried with VS 2010 SP1, FWIW. The bug may be sensitive to VS version, configuration, placement of the call to ReserveBottomMemory(), and other factors. Try posting a complete repro project (all source and project files, no .obj or other output files) somewhere, along with VS version and configuration details.

  6. Yeah, it’s really weird. It crashes in a TppWorkerThread…one of the 20 that are spawned by calling GetOpenFileName. The weirdest thing is that when I tried to eventually find out which module called in no mans land by stepping the dissasembly on the different threads, it eventually ended up not crashing. Doesn’t smell good at all, especially when it’s in someone else’s code. May be related to some extension/plugin/etc that I have installed locally (I tried disabling most using SysInternal AutoRun, but still usually crashes).

  7. Relevant resources:

    All about 64-bit programming in one place – http://software.intel.com/en-us/blogs/2011/07/07/all-about-64-bit-programming-in-one-place/

    Lessons on development of 64-bit C/C++ applications – http://www.viva64.com/en/l/

  8. Thomas says:

    There’s also VirtualAlloc()’s MEM_TOP_DOWN flag. It can apparently be turned on system-wide in the registry.

    http://msdn.microsoft.com/en-us/windows/hardware/gg487503.aspx

  9. Pingback: When Even Crashing Doesn’t Work | Random ASCII

  10. I dropped your function into a codebase and it’s squeezing 64-bit issues into the foreground like an electric worm harvester. Thanks!

  11. jcopenha says:

    I took another approach as described here, http://blog.accusoft.com/posts/2012/august/how-and-why-you-make-mem_top_down-a-per-process-flag-part-1.aspx Well, that is Part 1 of 2. The final approach, writing a device driver to turn on MEM_TOP_DOWN on a single process is described in part 2.

  12. Hi Bruce!! I never thanked you properly for sharing this awesome piece of code.

    I’m writing a few slides for the Ogre 3D engine (mostly tackling design issues, current performance problems, future improvements, etc) and I included a very small 64-bit section at the end, and inside it there is a link to this blog post.

    I also want to include the code snippet you provide here in case your website is unavailable, and given the importance/usefulness of your snippet, I don’t want that to happen.
    Of course, you will be properly credited and the code snippet will be untouched, can I have your authorization to do it? I would really appreciate it.
    I will let you know when the slides are ready, in case you’re interested.

    Thanks again!
    Matias Goldberg

  13. As I promised, the slides are ready.
    If you’re interested, you can see them here:

    http://www.ogre3d.org/forums/viewtopic.php?f=4&t=75459&p=477602

    You probably already know most of the suggestions, but still you may find it’s a pretty complete summary of current next gen techniques.

    Best Regards,
    Matias

  14. Charles Goodwin says:

    My biggest problem with porting to 64 bits is the “side by side configuration errors” I keep getting. All the pointers stuff and so on is just a question of solving bugs, but this is just… a huge pointless waste of time and effort.

    Why did M/S make this so freaking difficult?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s