Symbols the Microsoft Way

Symbol servers allow developer tools on Windows to automatically find symbols. They do this so well that most developers never have to worry about the internal mechanisms. However when things go wrong it can be helpful to understand how they work, and it turns out that it is all very simple.

This article should serve as a good comparison to the process of getting symbols, especially for crashes on customer machines, for Linux. I documented that process in a four-part series:

My discussion of Windows symbol servers make use of the symbol server that I have on my laptop, for my own personal projects. Whenever I release a new version of Fractal eXtreme (64-bit optimized, multi-core, fast and fluid exploration of fractals, demo version here) I put the symbols and binaries on my symbol server so that I can trivially investigate any crash reports that I receive. This may seem like overkill for a home project, but in fact a local symbol server is just a copy of the files, arranged in a specific way for easy retrieval, and it is trivial to set up.

Finding PE files

Symbol servers store not just symbols but also PE files (DLLs and EXEs). If these aren’t already available (they will be absent when looking at a minidump or a xperf profile) then they must be retrieved first, before the symbols.

There are three pieces of information that are needed in order to retrieve a PE file from a symbol server: the file name, link time stamp, and the image size. In order to see if the latest version of FractalX.exe made it into my symbol server I would extract the link time stamp and the image size from the executable like this:

dumpbin FractalX.exe /headers | find “date stamp”
        4FFD0109 time date stamp Tue Jul 10 21:28:57 2012
dumpbin FractalX.exe /headers | find “size of image”
          147000 size of image

The format for the path to a PE file in a symbol server share is:

“%s\%s\%s%s\%s” % (serverName, peName, timeStamp, imageSize, peName)

My symbol server is in c:\MySyms (normally it would be on a shared server, but this is my personal laptop) so the full path for the file examined above is:

c:\MySyms\fractalx.exe\4FFD0109147000\FractalX.exe

Simple enough. In my case I use symstore.exe’s /compress option (it saves a lot of space) when I add the files. Compressed files are indicated by replacing the last character with an underscore, so the actual path is this:

c:\MySyms\fractalx.exe\4FFD0109147000\FractalX.ex_

This is a good test to make sure that your PE files have been correctly added to your symbol server but it’s not a very realistic use case since we used the PE file to obtain the values needed to retrieve the PE file. The more common scenario is that you would have a minidump or an xperf ETL file and this file would contain a series of module name, link time stamp, image size triplets and these would be used at analysis time to retrieve the PE files. In the case of minidump files there is an array of MINIDUMP_MODULE structures which contain the relevant data.

Note that the layout of symbol server shares can be much more complex. You should use the APIs (discussed later) to retrieve PE files – the technique above is purely for troubleshooting.

Finding PDB files

Finding PE files is handy when analyzing customer crash dumps in order to have the assembly instructions but it’s actually more important than that. Minidump files and most profile files do not actually record enough information to retrieve PDB files. Instead the tools retrieve the PE files and then look in the PE files to get the information needed to retrieve the PDB files. Once again we can extract this information from a PE file using dumpbin:

dumpbin FractalX.exe /headers | find “Format:”
    4FFD0109 cv           56 000B9308    B7B08    Format: RSDS, {6143E0D1-9975-4456-AC8E-F24C8777336D}, 1, FractalX.pdb

The long hexadecimal number after RSDS is a GUID, and the number after that (a 32-bit hexadecimal number, but in this case just ‘1’) is called the ‘age’. The PDB file name is also listed here. Together these uniquely identify a particular version of a PDB file. The format for the path to a PDB file in a symbol server share is:

“%s\%s\%s%s\%s” % (serverPath, pdbName, guid, age, pdbName)

As with the PE files a final underscore indicates when a file is compressed by symstore.exe. The path on my symbol server for the PDB listed above looks like this:

c:\MySyms\FractalX.pdb\6143E0D199754456AC8EF24C8777336D1\FractalX.pd_

Simple enough.

The algorithm for generating the GUID and age is that whenever you do a rebuild – whenever a fresh PDB is generated – a new GUID is created and the age is set to one. Whenever you do an incremental build the PDB is updated with new debug information and the age is incremented.

That’s it – use the PE name, link time stamp, and image size to find the PE (if it isn’t already loaded) and then use the GUID, age, and PDB file name to find the PDB file.

Note that the layout of symbol server shares can be much more complex. You should use the APIs (discussed later) to retrieve PDB files – the technique above is purely for troubleshooting.

Adding to a symbol server

If you ship software on Windows then you should have a symbol server. That symbol server should contain the PE files and PDB files for every product you ship. If you don’t do this then you are doing either yourself or your customers a disservice.

You should also have a symbol server for all internal builds that anybody at the company might end up running. If a program might crash, and if you want to be able to investigate the crash then put the symbols on the symbol server. If you’re worried about internal builds using up too much space then put them on a separate symbol server and purge the old files occasionally.

You should also make sure that your build machines are running source indexing so that when you’re debugging a crash in an old version of your software you will automatically get the right source files. Luckily I wrote about that already.

Adding files to a symbol server is the height of simplicity. Set sourcedir to point at a directory containing files to add and set dest to your symbol server directory, which should be accessible to all who need the symbols. Then run these commands:

symstore add /f %sourcedir%\*.dll /s %dest% /t prodname /compress
symstore add /f %sourcedir%\*.exe /s %dest% /t prodname /compress
symstore add /f %sourcedir%\*.pdb /s %dest% /t prodname /compress

That’s it. Use /r if you want the files recursively added, and see the help for more information.

You can download the compress.exe program if you want to compress an existing symbol server – we did this at work and saved many TB of space.

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=17657

Update: a reader followed my recommendation of using compress.exe and found that it is dangerously unreliable. He shared his tests with me and I confirmed that in some cases the files created by compress.exe are corrupted. They also don’t compress as well as using the /compress option of symstore.exe. If you do use compress.exe use -ZX as the compression option as this produces the smallest files and appears to avoid the corruption problem. But be careful. Another alternative is to extract .pdb files from your existing symbol server and then resubmit them with symstore.exe /compress.

You can also make your symbol server available through http if you want, but I know nothing about how to set this up.

Using a symbol server

The precise details of how to get your development tools to use your symbol server vary, but one almost universal method is to set the _NT_SYMBOL_PATH environment variable (advanced usage here and here), to something like this:

_NT_SYMBOL_PATH=SRV*c:\symbols*c:\MySyms;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

This tells tools to first look in the local cache (c:\symbols) and then look in the symbol server c:\MySyms. If symbols are found in c:\MySyms then they are copied (and decompressed) to c:\symbols. If none of that works then the same process (including the same cache directory) is followed for Microsoft’s web based symbol cache.

Note that a local symbol cache is required when dealing with compressed symbols.

Programmatically retrieving symbols

Usually the debuggers and profilers that you use will know how to use symbol servers, but occasionally you may need to write code to download symbols – perhaps you are writing a debugger or profiler. In my case I had a web page that listed GUIDs, ages, and PDB names for dozens of Microsoft DLLs from dozens of versions of Windows for which we needed symbols. Writing code to download all of these symbols was trivial – several orders of magnitude easier than getting symbols for other versions of Linux.

The short explanation of what I needed to do was “call SymFindFileInPath”.

In order to demonstrate how easy it was I decided to give a slightly longer explanation. The code below takes a GUID, age, and pdb name and downloads the symbols from Microsoft’s symbol server. The biggest chunk of code is for parsing the GUID – the actual PDB downloading is trivial.

// Symbol downloading demonstration code.
// No warranty, retain this notice, no other restrictions.
// For more information see:
// http://randomascii.wordpress.com/2013/03/09/symbols-the-microsoft-way/

#include <Windows.h>
#include <DbgHelp.h>
#include <string>

// Link with the dbghelp import library
#pragma comment(lib, “dbghelp.lib”)

#define TESTING

int main(int argc, char* argv[])
{
    // Tell dbghelp to print diagnostics to the debugger output.
    SymSetOptions(SYMOPT_DEBUG);

    // Initialize dbghelp
    const HANDLE fakeProcess = (HANDLE)1;
    BOOL result = SymInitialize(fakeProcess, NULL, FALSE);

    // Set a search path and cache directory. If this isn’t set
    // then _NT_SYMBOL_PATH will be used instead.
    SymSetSearchPath(fakeProcess, “SRV*c:\\symbolstest*http://msdl.microsoft.com/download/symbols”);

#ifdef TESTING
    // Valid PDB data to test the code.
    std::string gText = “072FF0EB54D24DFAAE9D13885486EE09″;
    const char* ageText = “2″;
    const char* pdbName = “kernel32.pdb”;
#else
    #error Fill in code to get filename, GUID, and age
#endif

    // Parse the GUID and age from the text
    GUID g;
    int count = sscanf(gText.substr(0, 8).c_str(), “%x”, &g.Data1);
    DWORD temp;
    count += sscanf(gText.substr(8, 4).c_str(), “%x”, &temp);
    g.Data2 = (unsigned short)temp;
    count += sscanf(gText.substr(12, 4).c_str(), “%x”, &temp);
    g.Data3 = (unsigned short)temp;
    for (auto i = 0; i < ARRAYSIZE(g.Data4); ++i)
    {
        count += sscanf(gText.substr(16 + i * 2, 2).c_str(), “%x”, &temp);
        g.Data4[i] = (unsigned char)temp;
    }
    DWORD age = 0;
    count += sscanf(ageText, “%x”, &age);

    if (count != 12)
    {
        printf(“Couldn’t parse the GUID/age string. Sorry.\n”);
        return 10;
    }

    char filePath[MAX_PATH] = {};
    void* id = &g;
    DWORD two = age;
    DWORD three = 0;
    DWORD flags = SSRVOPT_GUIDPTR;
    if (SymFindFileInPath(fakeProcess, NULL, pdbName, id, two, three,
                flags, filePath, NULL, NULL))
    {
        printf(“Found symbol file – placed it in %s.\n”, filePath);
    }
    else
    {
        printf(“Symbols not found – error %u. Are dbghelp.dll and “
                “symsrv.dll in the same directory as this executable?\n”,
                GetLastError());
    }

    SymCleanup(fakeProcess);

    return 0;
}

The one gotcha is that dbghelp.dll and symsrv.dll have to be in the same directory as your tool – having them in your path does not work reliably.

Diagnosing symbol problems with windbg

If you have a minidump and its symbols are not loading then I recommend loading the minidump into windbg and using its diagnostics:

  • !sym noisy – print verbose information about attempts to get symbols
  • lmv m MyModule – print a record from the crash dump’s module list including its name, time stamp, image size, and where the PDB is located if it was found
  • !lmi MyModule – print a module’s header information – this only works if the PE file has been loaded, which is a prerequisite for having symbols load

Dumpbin summary

  • “%VS100COMNTOOLS%..\..\VC\vcvarsall.bat” – this adds dumpbin’s directory to the path
  • dumpbin FX.exe /headers | find “date stamp” – find the link time stamp of a PE file
  • dumpbin FX.exe /headers | find “size of image” – find the image size of a PE file
  • dumpbin FractalX.exe /headers | find “Format:” – find the GUID, age, and file name of a PE file’s PDB file
About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Symbols and tagged , , , , . Bookmark the permalink.

27 Responses to Symbols the Microsoft Way

  1. Kdansky says:

    I’m using the windows symbol servers daily, but I have one weird issue: One of our customers has a machine on which our software crashes, and the generated minidumps give me a stack-trace into MFC-dlls of which I cannot get symbols, which is highly irregular. The exact version-numbers elude me right now, but I do not understand how this could happen in the first place. Does MS have holes in their PDB-coverage? Is the minidump faulty? Is it a problem with the client’s windows installation? Magic?

    • brucedawson says:

      That sounds peculiar. You should try loading the crashes into windbg and using “lvm m mfc100″ to get more information, and !lmi also. You can at least find out whether the problem is with loading the PE file or the PDB file.

      Are you shipping the MFC DLLs with your application? That’s the recommended thing to do and that would normally mean that they would be running a known version — your version.

      • Kdansky says:

        Thank you for the suggestions. I could load the dumps into Windbg, and they seem to be fine, it’s just that I don’t have the correct versions of the dll’s themselves, (or possibly VS/windbg can’t find them due to 32bit app on 64bit dev machine). It seems shipping the mfc100u.dll ourselves would be the better option to begin with.

        • brucedawson says:

          > VS/windbg can’t find them due to 32bit app on 64bit dev machine

          No. That is never a problem. The symbol lookup algorithm doesn’t give a damn about CPU architecture. It’s all about extracting fields from the PE file and using them as search keys. x86/x64/ARM/PPC does not enter in to it.

          Assuming you have your symbol path configured correctly the customer must have a version of mfc100u.dll that is not listed in Microsoft’s symbol server. This is possible, albeit very rare. You should be shipping mfc100u.dll anyway which will probably resolve both the crash and the symbol problems.

  2. Alexander says:

    Been using “srv*shared server*msdl” configuration for ages, and did all sorts of tricks to have a local cache. I did have a backup script to copy shared server to local cache, I had microsoft client-side caching, I even thought to hack its driver to force caching a directory! (The intent was to always write to shared server, but read from local cache). I also have microsoft and our own symbols all messed up in a single directory at shared server. Oh my, i feel so ashamed now that I learned I only had to configure “srv*local cache*shared server*microsoft” to get it working out of the box. Also, didn’t know I can actually put two symbol servers in a row to avoid having a mess of everything in a single directory.

    • brucedawson says:

      The syntax for _NT_SYMBOL_PATH is excessively messy but pretty configurable. Read the three links in the post for various other ways of having different caching policies for different symbol servers.

      I generally cache everything to the same directory and then delete it occasionally. I trust that it will get repopulated as needed.

      • Alexander says:

        I already read them. Your post merely served as a starting point. I actually wanted to learn more about compressing an existing server (the only thing I didn’t know from your post, and I seem to have skipped the part where you config local cache through intermediate), but reading stuff ended in learning all that. Thanks :)

  3. Alexander says:

    “Minidump files and most profile files do not actually record enough information to retrieve PDB files”

    Not quite so. I made a debugging tool for our crash handling purposes and I actually load PDB’s by MINIDUMP_MODULE.CvRecord, which is a CV_INFO_PDB70, containing everything you need. I didn’t even save PE files for ages, and it worked just fine.

    • Alexander says:

      Although I vaguely remember I had a case when I was helping some friend with his minidump and he didn’t have CvRecord in it. Probably a very outdated minidump-making tool or something like that.

    • brucedawson says:

      A problem we hit was that internally we would record full minidumps (with heap) and they had enough information to allow loading the PDBs, without finding the PEs. However when Microsoft sent us mini-minidumps (no heap) we couldn’t load the symbols. This is what forced me to learn more about how the process works so that I could configure our symbol publishing so that we could load symbols for *all* minidumps.

      So, there are some cases where the PE files are unnecessary, but I prefer not to risk it :-)

  4. Alexander says:

    Also, are you aware of SymSrvGetFileIndexes() / SymSrvGetFileIndexInfo() ? This is a programmatical way of what you’re doing with dumpbin

  5. Pingback: Symbols on Linux Part Three: Linux versus Windows | Random ASCII

  6. Pingback: Symbols on Linux Part One: g++ Library Symbols | Random ASCII

  7. Pingback: Symbols on Linux Part Two: Symbols for Other Versions | Random ASCII

  8. Alexander says:

    Finally I got time to re-configure our symbols servers and compress it.
    I downloaded the compress.exe from your link, with md5 a911550b51f759a723f40db3157572f7.
    I compressed the symsrv using some batch script.
    And now it’s all ruined! Many files just can’t be extracted, others can, but with warnings. 7-zip will show absolutely invalid original file sizes for every single file. Having googled the internal format of compress.exe I can confirm that header contains exactly that incorrect size. To be specific, if will always have byte 0×63 where it shouldn’t be.

    I’m pretty much terrified. Even though I do have a backup, just not too handy.

    Now, an experiment. Let’s make file of exactly 8465408 bytes, compress it and try to expand.
    HANDLE file = CreateFile(_T(“Zeroes.pdb”), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0);
    SetFilePointer(file, 8465408, 0, FILE_BEGIN);
    SetEndOfFile(file);
    CloseHandle(file);

    Compressing goes fine: compress.exe -R Zeroes.pdb
    Expanding results in a 0 bytes file: expand -R Zeroes.pd_

    • brucedawson says:

      Damn. I don’t know what would have happened. How are you trying to extract the files? The only way we try extracting them is with symbol server and that works. I don’t know what the format is — I don’t know that 7-zip is supposed to be able to decompress them.

      Sorry…

      • Alexander says:

        It all started with debugger acting WEIRD on one pdb. Then it turned out that this PDB can’t be extracted at all. Give my experiment a try.

      • Alexander says:

        By the way, how did you compress your symbol server? There’re two compression types available in compress.exe and I simply used default one (turns out its compression isn’t as good as -Z compression).

        • Alexander says:

          Now it turns out even that is a lie. Compress.exe says -ZX is default, but in fact if neither -ZX nor -Z is specified then it uses some third type of compression (which has caused me damages). It seems that -ZX compresses better then -Z, which compresses better then real default.

        • Alexander says:

          symstore /compress will compress even better then default / Z / ZX. In my case:
          original = 51mb
          ntfs = 24.2mb
          default = 18.5mb
          Z = 12.9mb
          ZX = 11.1mb
          symsrv = 9.5mb

        • brucedawson says:

          I don’t remember what option we used — it was over a year ago. Now we just use the /compress option to symstore. We probably used the default options, and symsrv.dll is able to decompress those.

          • Alexander says:

            If you’re able to find any PDB compressed back then, what is its signature? SZDD is bad default compression, MSCF is -Z, -ZX and symsrv compression. If it is SZDD, Do you have byte at offset 0x0A == byte 0×09 + 1? That’s what seems to be the bug. 4 bytes from 0xA should form a 4-byte original size.

          • brucedawson says:

            First four bytes are MSCF. Then 0×00000000. Then 0×73 E4 0F 00 00 00 00 00.
            Then 0x2C 00 00 00 00 00 00 00 03 01 01 00 01 00 00 00

          • Alexander says:

            MSCF means you’re lucky. I have a theory that it’s not compress.exe but some of windows DLLs are at fault, going to test it. So far Win7 x64 and Win8 x64 both have the problem.

          • Alexander says:

            Also, for quite a while now it looks like we’re both working quite intensively on pretty much the same technologies, and by that I mean general debugging / crash handling / debugging crash dumps / working on arcane faults. I feel that it would be great to make a closer acquaintance. If interested, please send me some instant messenger contact to me email.

          • Alexander says:

            Theory about Windows didn’t work out. WinXP SP3 has the same bug. Probably no need to go further on that. What I really wonder is how the bug still exists, it’s been over 10 years now and it’s no good when the file can be compressed, but not expanded.

            You pointed out that it’s symsrv that should be able do expand, so I’d like to clear that moment once again: it all started with symsrv. On one of the PDB’s after compression it would create a 0-byte-long PDB in my cache and fail to load any symbols. Investigating I found that the PDB can’t be decompressed by any means, and expand.exe produces the same 0-byte pdb. The next thing I found is that all of the PDB’s were compressed wrong, but most of them can still be decompressed, even though expand.exe will yield warnings. I think it’s best to incorporate that in your post. Also, compress.exe doesn’t compress as good as symstore with any flags. So it’s probably best to convert existing by renaming the symsrv and starting a recursive symstore on it. Transaction history will be lost as a downside, though, but it seems it can be restored by hand, replacing 000Admin and all .ptr files from original symbol store.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s