Source Indexing is Underused Awesomeness

If you’ve ever had to debug code that was not built on your machine – whether looking at crash dumps or debugging live code – then you need source indexing.

If you’ve ever wasted time trying to find the source file (or the right version of the source file) used to build a DLL or EXE you are debugging then you need source indexing. If you ever look at crash dumps, especially if they are from builds that are more than a day or two old, then you need source indexing.

Source indexing (also known as source server) is free, fast, available since Visual Studio 2005, easy to use, and it ensures that the correct source file will always appear in your debugger, whether you’re debugging a crash from yesterday or yesteryear.

The problem is that most people don’t know that source indexing exists. In this post I’m going to explain why you need it, what it is, and how to use it for C++ development.

This article was originally posted on #AltDevBlogADay.

I recently modified the build script for UIforETW to add source indexing support, pulling the source files as needed from github.com. It was eight lines of script, with a single line that does the actual work. That’s it.

TL;DR

The length of this article makes source indexing look more complicated than it really is. Here’s the short version.

On your build machine you need to:

  • Modify srcsrv\srcsrv.ini (from the debugging tools) so that MYSERVER points at your Perforce server
  • run p4index.cmd to embed source indexing information in your PDBs

On your development machines you need to:

  • Click the check box to enable source server in Visual Studio (Tools-> Options-> Debugging-> General-> Enable source server support)
  • Create a srcrv.ini file (in your Visual Studio install directory, in “common7\ide”) to tell VS to stop popping up security warnings

That’s it. Read on for the details.

The need for source indexing

When you step through code in your debugger – whether it’s Visual Studio or windbg – you probably want to have the debugger show you your source code. Even if you are stepping through assembly language instructions it is foolishly inefficient to not have the source code right there.

If you built all of the binaries on your machine then it is straightforward for the debugger to retrieve the correct source files. When VC++ builds your code it embeds the full paths to all of the source files in the PDB file. It also embeds records that associate ranges of instructions with a particular line in a source file. In optimized code this mapping is imperfect, but it gets you to the right area.

If you are debugging locally built binaries and you have modified one of the source files since doing the build then Visual Studio will detect this (by comparing file signatures) and warn you of the potential problem. So far so good.

If you are debugging binaries from your build machine (you do have a build machine don’t you?), especially if those binaries are from a few days or weeks ago, then this system doesn’t work. The first problem is that the source file paths may not match, since the build machine may have a different directory structure for its enlistment. This can be addressed by manually locating the files on your machine,  but that extra step is inconvenient.

The larger problem is that the version of the source files used to build the binaries may not match what is on your machine. Visual Studio will warn you that the files don’t match, but then you need to play a tedious game of find-the-version as you try to sync to the correct version of the source file in the correct branch. This probably is particularly likely to happen if you are looking at crash dumps from an older build, or from a different branch.

Wouldn’t it be nice if as you stepped through code the debugger would just automatically retrieve the correct version of the correct source files?

What is source indexing?

What we need is a way to embed some extra information in the PDB file. Assuming that we are using Perforce for our version control (I’m a huge fan and it’s free for limited use, so try it) then we can uniquely identify each checked in file by its Perforce path and version number. As an example, on my local machine I have this source file:

c:\homedepot\Source\libs\cygnuslib\simpletimer.cpp

This file exists on all clients of my Perforce database, but the path may be different on each client. However using the “p4 have” command I can get the depot path, which is:

//depot/Source/libs/cygnuslib/simpletimer.cpp#5

The depot path is not only a universal identifier of which file we are talking about it also contains the version number of the file. Thus, if we embed the depot path in a PDB file, and associate it with the existing file path which is already associated with blocks of instructions then we have all of the information needed to retrieve the correct version of the correct file when debugging.

That is exactly what source indexing does. It is a simple and efficient process that embeds version-control path and version-number information into the PDB. Both Visual Studio and windbg support this data and can use this information to automatically retrieve the necessary source files. Source indexing lets you retrieve the correct source file, even if it is from a branch in which you are not enlisted!

Source indexing is also supported for other version control systems (Source Safe, TFS, and CVS) and can be extended to support arbitrary version control systems. However, since my experience is exclusively with Perforce that is all I will discuss.

Running source indexing on your build machine

The source indexing tools and documentation are installed with the Windows debuggers, which come with the Windows SDK. The default path, for a 64-bit install, is:

c:\Program Files\Debugging Tools for Windows (x64)\srcsrv

You should probably copy this directory and check it in so that any modifications that you make are preserved. You might also want to look at srcsrv.doc since it gives lots of extra information about using source indexing.

The logic of source indexing is written in Perl so you’ll need to install that in order to use source indexing.

You then need to modify srcsrv\srcsrv.ini so that the MYSERVER variable contains the name of your Perforce server. You can just copy the server address from the output of “p4 info”. In my case the variable in srcsrv.ini looks like this:

MYSERVER=FHD:1666

To do source indexing with Perforce you will need to run p4index.cmd. Running it with -? gives help on the command line options. The only ones I use are source, symbols, and debug. The source option is used to point at the enlistment containing your source code, the symbols option is used to point at a directory which contains (recursively) all of your PDB files, and the debug option just tells it to list all of the PDBs as it indexes them.

Because source indexing records the current version numbers of source files it will only record useful results if you build from checked-in source files. Source indexing isn’t storing your source files – it’s just storing their paths and versions – so any modifications in checked out files will not be recorded. That’s why source indexing only makes sense on a build machine (or build enlistment) which by its very nature is building from the checked in files. The build machine doesn’t need to be synced to latest – the “p4 have” command returns the currently synced version not the latest version – but each source file has to be synced to some version, and not contain modifications that aren’t checked in.

My build batch file contains the following three lines to implement source indexing:

pushd %builddepot%\srcsrv
call p4index -source=%builddepot% -symbols=%builddepot%\output -debug
popd

That’s it. At this point the source indexing information is embedded in my PDB files (using a tiny amount of space) and I archive them just like normal, in my case using symbol server. Yes, even on my one-person home projects I use symbol server and source indexing. It’s that easy.

Using source indexing in your debugger

While source indexing is the process of putting version control information into your symbol files, source server is the set of tools that extract that information and get the correct source file for use in the debugger.

Because source indexing is sometimes known as source server it makes people think that it is related to symbol servers. It’s not. Symbol servers (perhaps the subject of another post) are for helping the debugger automatically find the right symbol files. Source indexing is for finding the right source files. Since the source indexing information is stored in the symbol (PDB) files the first step is to make sure that you have symbols loaded. Retrieving those symbol files from a symbol server is certainly a good idea, but is orthogonal to the topic of retrieving source files once you have the symbols.

The next step depends on what debugger you are using. If you are using windbg then simply type “.srcfix” into the command window. Source server is now enabled and source files will be automatically retrieved as you step through code or navigate the call stack. You will get this security alert because in order for source indexing to retrieve the file it must run the command specified in the PDB file. This is a security risk if somebody gives you a malicious PDB file so you should examine the command to make sure it isn’t running an executable that could be exploited. Then you should probably tell it to not ask you every time.

image

In Visual Studio there is a checkbox to enable source server. Go to Tools-> Options-> Debugging-> General and check “Enable source server support”. You should probably also check “Print server diagnostics to the Output window” to help diagnose any problems that might occur.

image

As with windbg there is a security warning that will come up:

image

Unfortunately it lacks a “stop asking me” button so, as John Robbins said, you will soon want to fly to Redmond to punish someone. This is an annoyance and a security flaw because this dialog quickly trains you to click “Run” without reading the message. The non-obvious solution to this is to create a srcsrv.ini file in “common7\ide” in your Visual Studio install directory and mark p4.exe as being a trusted command, like this:

[trusted commands]
p4.exe

For greater security you can specify the path to the trusted command, like this:

[trusted commands]
p4.exe=C:\Program Files\Perforce\p4.EXE

After clicking the check box and after creating the srcsrv.ini file you need to stop debugging and then resume debugging before Visual Studio will notice – proof positive that windbg has the superior debugging user interface.

Visual Studio will save the retrieved source files in your personal AppData folder, whereas windbg saves them in the shared ProgramData folder. The important factor is that they are extracted to a location far away from your enlistment – debugging an old executable doesn’t require syncing your enlistment to old files.

Troubleshooting

There are two basic stages to source indexing. First the Perl script does a “p4 have …” in the specified source directory, to get a list of all of the source files. This tends to be pretty fast. As long as you have specified the correct directory this stage should work correctly. Look at the “Source root” output to make sure it is where you think it is, or hack the Perl script to print some of the data retrieved.

The next stage the Perl script does is to recursively look for all PDB files in the specified symbols directory. The Perl script runs “srctool -r” on each PDB to get a list of source files, and then looks up each file in the information returned by the p4 have command. If it finds a match then it stores the Perforce path and version information in a special block of data. It is only the PDBs for EXEs and DLLs that are indexed, and any vc100.pdb files will not be indexed.

Typical output for source indexing with the -debug option looks like this:

————————————————————————-
ssindex.cmd [STATUS] : Server ini file: c:\builddepot\srcsrv\srcsrv.ini
ssindex.cmd [STATUS] : Source root    : c:\builddepot\Source
ssindex.cmd [STATUS] : Symbols root   : c:\builddepot\Source
ssindex.cmd [STATUS] : Control system : P4
ssindex.cmd [STATUS] : P4 program name: p4.exe
ssindex.cmd [STATUS] : P4 Label       : <N/A>
ssindex.cmd [STATUS] : Old path root  : <N/A>
ssindex.cmd [STATUS] : New path root  : <N/A>
ssindex.cmd [STATUS] : Partial match  : Not enabled
————————————————————————-
ssindex.cmd [STATUS] : Running… this will take some time…
indexing c:\builddepot\Source\plugins\AutoQuad.pdb
wrote C:\…\index453A.stream to c:\builddepot\plugins\AutoQuad.pdb …
indexing c:\builddepot\plugins\Mandelbrot.pdb
wrote C:\…\indexD346.stream to c:\builddepot\plugins\Mandelbrot.pdb …
indexing c:\builddepot\pluginsource\autoquad\Release\vc100.pdb
zero source files found …
indexing c:\builddepot\pluginsource\distortmand1\Release\vc100.pdb
zero source files found …

The cost to having too inclusive a source directory is that “p4 have …” will have to retrieve more file names, but this is very cheap. The cost to having too inclusive a symbols directory is that the “dir *.pdb /s” phase will take longer and the process will try to source index more PDBs. In both cases it is usually best to err on the side of more, and only specify a more restrictive directory set if (typically for symbols) performance requires it. In many cases source indexing completes in less than ten seconds and performance is not an issue.

Note that the vc100.pdb files, which are intermediate PDB files, don’t have source information so the “zero source files found …” message is normal and expected.

You can troubleshoot particular PDB files using the “pdbstr” command to view the raw source indexing stream. An example command would be:

pdbstr -r -s:srcsrv -p:outputfile.pdb

Typical output looks like this:

SRCSRV: ini ————————————————
VERSION=1
INDEXVERSION=2
VERCTRL=Perforce
DATETIME=Fri Oct 28 17:48:53 2011
SRCSRV: variables ——————————————
P4_EXTRACT_CMD=p4.exe -p %fnvar%(%var2%) print -o %srcsrvtrg% -q “//%var3%#%var4%”
P4_EXTRACT_TARGET=%targ%\%var2%\%fnbksl%(%var3%)\%var4%\%fnfile%(%var1%)
MYSERVER=FHD:1666
SRCSRVTRG=%P4_extract_target%
SRCSRVCMD=%P4_extract_cmd%
SRCSRV: source files —————————————
c:\homedepot\mandelbrot\stdafx.h*MYSERVER*depot/mandelbrot/StdAfx.h*3
c:\homedepot\mandelbrot\mandelbrot.cpp*MYSERVER*depot/mandelbrot/Mandelbrot.cpp*1
c:\homedepot\mandelbrot\mandelbrot.h*MYSERVER*depot/mandelbrot/Mandelbrot.h*1
c:\homedepot\mandelbrot\resource.h*MYSERVER*depot/mandelbrot/resource.h*1
c:\homedepot\libs\cygnuslib\xpdebug.h*MYSERVER*depot/libs/cygnuslib/xpdebug.h*7

Some variables are set up to define what command line will be executed (P4_EXTRACT_CMD) in order to retrieve files from version control (the debugger actually executes p4.exe) and then the “source files” section is a mapping of file system paths to perforce paths and version numbers.

You can also use srctool.exe to examine how source indexing is working. Some useful options are:

  • srctool -c: summarizes how many files were indexed and how many weren’t. Note that a lot of the files used to build your code are actually operating system and compiler source files which is why many are not indexed
  • srctool -u: lists all of the source files that were not indexed – look here to see if any of your files were missed
  • srctool -r: lists all of the source files listed in the PDB.

You can also troubleshoot source indexing by modifying the Perl script to print out additional information, such as how many source file names were retrieved from Perforce.

Note that if you generate source files as part of your build process and don’t check them in then they will not be indexed, because they won’t be part of the “p4 have…” command. Source files that are copied to a new location before they are compiled also won’t be indexed.

What about static libraries?

Source indexing embeds version control information in PDB files, and .lib files don’t really have PDB files – so what do you do? If the .lib files are built by your build machine and then linked into DLLs and EXEs then you don’t need to do anything. When you run source indexing on the DLLs and EXEs then the source files used to build the contained libraries will also be indexed. As long as those libraries were built from the source files that your build machine is currently synced to (which should be the case, unless you are doing odd procedures) then all will be well.

Summary

Source indexing is easy to set up, costs little to run, and can have persistent benefits for years to come. There is really no excuse not to use it. Some days I make no use of source indexing, and other days it saves me from wasting time trying to track down dozens of old source files in various obscure branches.

Recommended.

Mozilla has been using source indexing since 2008 and there is lots more information available, such as this article on source Server and Symbol Server Support in TFS 2010.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in AltDevBlogADay, Programming, Visual Studio. Bookmark the permalink.

15 Responses to Source Indexing is Underused Awesomeness

  1. Hi Bruce, this is a great article, really useful.

    Would you be interested in writing a guest blog post for Perforce? We regularly have articles from Perforce users (for example: http://blog.perforce.com/blog/?p=7618) and this topic would make a good posting or you could write something else if you have any suggestions.

    I look forward to hearing from you. My email address is mwarren (at) perforce.com.

    All the best,
    Mark
    EMEA Product Marketing Manager, Perforce

  2. Yeah, we had a college student reverse engineer how source indexing worked as a class project, and we shipped Firefox 3.0 with it. Then we had another student fix it to work with Mercurial when we switched from CVS to Hg. We use our own home-grown Python script to do the indexing, since we already had some post-processing going on for our symbols (we use Google Breakpad for cross-platform crash reporting):
    http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/tools/symbolstore.py

    It is pretty slick to be able to debug a release Firefox build and get symbols+source without much fuss (although debugging PGO-optimized code is not for the faint of heart). The source gets served via HTTP from Mercurial, so you don’t even have to deal with those unsightly security dialogs.

    • brucedawson says:

      Why did you need to reverse engineer how source indexing works? It’s pretty well documented in srcsrv.doc, and the Perl scripts which implement it are there for the examining.

      I haven’t tried using source indexing with any other version control system, but it looked like it should be fairly straightforward to do. I’m glad to hear that it is. Given my dislike of Perl I’m tempted to rewrite the indexing script for Perforce in Python just so that it is more maintainable, but I doubt I’ll bother.

      Source indexing definitely is slick. It always amazes me how little use it gets used given how magical it is.

  3. gergap says:

    Hi Bruce, I’ve seen your talk “Getting Started Debugging on Linux” and was curious about this source indexing feature. I’m developing on Linux and Windows, but I didn’t know this feature by the way😉
    I was thinking about how this could be realized on Linux. It was easy for me to get indexing part working. I created a new .note.gnu.source-id section in the ELF binary using GNU as similar to the build-id, but this contains information about the Git URL and Git SHA1 sum identifying the source in the repo. (I’m using git, but other VCS could work the same way). I could automate this process completely using CMake so that works great.
    The other part is more tricky. We must enable gdb to load the correct sources when debugging an executable or analyzing a core dump. So we need to modify the gdb sources to extract the source-id from the ELF .note section. That shouldn’t be hard. Then my idea is to modify ‘find_and_open_source’ function in gdb to execute an external script with url and sha1 info as parameters. This script can fetch the correct sources from a “source server”. This source server can simply be a cgit web frontend, so the script only need to create the correct URL from filename and SHA1 and fetch the source using wget (I think that’s easier than to do a complete new clone of the repo). By using this external script the user can integrate all kinds of VCS, not just git and we can keep the GDB changes to a minimum. Of course there will be more places to change for supporting a source-id continuously (e.g. objcopy, eu-strip, etc.) but an incomplete first shot that works should be easy to achieve.
    But before starting to work on that I wanted to hear from you if somebody is already working on that topic. If not here you have already a basic concept. I’m willing to help and prototype the code, but I don’t have the time to do a complete implementation which would be accepted by upstream. So please let me know what the status on this topic is.

    regards,
    Gerhard

  4. jcztery says:

    Hi, That is awesome, thanks for the article. BTW: how did you solve the problem with indexing script increasing pdb age? The pdb no longer matches the dll, at least that is the behaviour of pdbstr in WDK 8.1.

    Thanks.

    • brucedawson says:

      I’ve never had a problem with pdbstr increasing the pdb age. That sounds very peculiar as it would presumably make source indexing not work, and it has always worked for me.

      • jcztery says:

        I am using Chkmatch(http://www.debuginfo.com/tools/chkmatch.html) tool to verify this, and the age is increased by pdbstr, which makes sense because the file is modified. Contrary to what some sources say(http://www.debuginfo.com/articles/debuginfomatch.html), the age mismatches seem to be ignored by both windbg and Visual Studio.

        I am ambivalent about weather it is a good thing or not:
        1. It is a good thing because it allows for single dll and :
        a) private pdbs with source information there.
        b) public pdbs with source information removed.
        2. It is a bad thing because incremental builds will have the same signature but different ages, so it is quite easy to end up with pdbs not matching dlls in terms of symbols…

        Also this thread discusses age discrepancy in pdbs served by Microsoft(http://www.masmforum.com/board/index.php?PHPSESSID=786dd40408172108b65a5a36b09c88c0&topic=18383.0).

        Or am i going crazy?

        • brucedawson says:

          My tests show that you are going crazy.

          I just manually added a source-indexing stream and checked the before/after age with https://github.com/randomascii/main/tree/master/tools/pdbinfo. It showed no change. I suspect that chkmatch is looking at a different location.

          1) If the debuggers allowed PDBs with different ages to match then the debugging infrastructure would fail, because symbols from build N would match binaries from build N+1, and that would lead to chaos. This does not happen, and since the age is the only thing that changes on an incremental build, the age must be used. See also the ultimate reference on this subject (https://randomascii.wordpress.com/2013/03/09/symbols-the-microsoft-way/) which points out that the age is actually used in symbol server directory structures, and symbol server caches.
          2) Since (see point 1) the age is critical it would *not* make sense to alter it when adding source-indexing information, since that would break debugging. Source indexing information is an *addition* to the debug information and must not invalidate the matching.

          • brucedawson says:

            It’s pretty clear from that masmforum post that they are grabbing the wrong age, or grabbing something that is not an age. They download a file for an executable whose PDB age is 2, the output from the download says that the age is 2. Their stream dumping says that the age is 5. Therefore their stream dumping is buggy. That is all.

  5. jcztery says:

    You are right. Chkmatch does not read age properly. It reads something else, though. I am not sure what. Something that increases every time pdb file is modified by pdbstr, even if the same stream is written. Anyway, you restored my confidence in pdbs😉 Thanks!

  6. Ofek Shilon says:

    Hi Bruce. TFS-online now includes a default step for source indexing, and I still can’t get it to work… The logs show many lines of the form –
    2016-08-02T20:27:57.6664401Z ##[section]Starting: Publish symbols path: \\LAB\Builds\Symbols\$(Build.DefinitionName)\$(Build.BuildNumber)
    2016-08-02T20:27:59.4916050Z Found 19 files.
    2016-08-02T20:28:00.0063951Z Unable to index one or more source files for symbols file ‘D:\Agents\agent_01\_work\1\s\Application\Bin\Release\CathWorksApp.pdb’.
    2016-08-02T20:28:01.6911627Z ##[command]”D:\Agents\agent_01\externals\pdbstr\pdbstr.exe” -w -p:”D:\Agents\agent_01\_work\1\s\Application\Bin\Release\CathWorksApp.pdb” -i:”C:\Users\ofek.s\AppData\Local\Temp\tmp970E.tmp” -s:srcsrv
    2016-08-02T20:28:01.7223621Z Unable to index one or more source files for symbols file ‘D:\Agents\agent_01\_work\1\s\Application\Bin\Release\CathworksRenderer.pdb’.

    Now the documentation (https://www.visualstudio.com/en-us/docs/build/steps/build/index-sources-publish-symbols) does say explicitly:
    A common cause of sources to not be indexed are when your solution depends on binaries that it doesn’t build.
    But that doesn’t sound like a reasonable limitation – practically all solutions out there depend on external binaries. Do you have an idea what’s going on? Any advice on how I can gain visibility into it?

    Thanks.
    http://stand.org/sites/default/files/styles/blog_post/public/Arizona/YourMyOnlyHope-vblog.png?itok=NJnR4Lue

    • brucedawson says:

      Are you sure it’s not working? It is normal for one or more files to not be indexed. The real question is how many files *are* indexed, not how many aren’t.

      Ignore the doom-and-gloom about prebuilt binaries. I’m pretty sure that just means that any pre-built .lib/.dll/.exe files will not be indexed.

      The best way to gain visibility into what is going on is to modify the indexing script to improve its diagnostics or customize it, and use pdbstr (as mentioned in my post) to dump the indexing information that goes into your PDB.

      • Ofek Shilon says:

        Indeed, I haven’t tested yet – was just alarmed by the messages. TFS online doesn’t expose the indexing script, so modifying it means replacing altogether it which seems a bit much. Will test soon. Thanks a lot!

  7. I finally got around to do the indexing of our source, when we build.
    but I ran into a problem getting the source correct with windbg,- I’ve seen it work with source safe,
    but we are using perforce. So when I open a dump made with sysinternals procdump, I do not get the right source code, neither in windbg or vs 2015. I must be missing something.

    I have the srcsrv.ini modified to have:
    [variables]

    MYSERVER=10.10.1.20:1666

    [trusted commands]

    p4.exe=”C:\Program Files (x86)\Perforce\P4.EXE”

    And I have defined the P4USER and P4CLIENT as environment variables..

    any ideas?
    basically what I am asking, is how do I set windbg up, to get source code from Perforce?
    Thanks for you time, a good blog.
    Kenneth

    • brucedawson says:

      Your mention of using a dump made with sysinternals is only interesting because it is so irrelevant. The source indexing information is in the PDB. Make sure that the correct PDB is loaded and the source of the debugging session is not relevant.

      There are many things that can go wrong, but windbg is pretty well set up for diagnosing the problems. The tricky is to type “!sym noisy” and then start clicking around the call stack. Any failed attempts to load symbols or grab source code will print verbose messages. You can also try running the commands manually to see if they work or diagnose failures. You can also run a command like this to extract the source indexing information from a PDB, to make sure that it is there and correct.

      pdbstr -r -p:etwsymbols\UIforETWStatic_devrel.pdb -s:srcsrv

      I used all of these techniques to diagnose bugs in my source indexing script when I started using it for UIforETW:
      https://github.com/google/UIforETW/commit/ea1129d25ba58efd03ef649829348ca553f82383

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s