A Tale of Two Call Stacks

My kingdom for some symbols

I spend a large portion of my time at work trying to make things faster and less crashy. Usually the problems I investigate are in our own code so I have full information – source code and symbols. However sometimes the problems are at least partially in some other company’s code, and the task gets trickier.

This article was originally posted on #AltDevBlogADay.

Mandatory disclaimer: this post represents my opinion, not that of my employer.

A call stack we can believe in

For example, last week I investigated a Visual Studio hang. This intermittent hang had been bothering me for months and I finally decided to record an xperf trace of the hang and investigate. The details will be the subject of another post, but one vital clue was this call stack from the xperf CPU scheduling summary table:

image

The call stack is entirely in Microsoft code. It starts in Visual Studio and ends up in Windows, and this call stack shows that Visual Studio hung for 2.585 seconds while trying to CreateFileW so that it can GotoEntry in the CResultList. Even though I know nothing about the Visual Studio architecture that was enough information to let me understand the problem, and I then changed our project files in order to completely avoid this hang in the future. Shazam!

The reason I was able to diagnose this problem is because Microsoft publishes symbols for most of its code on a public symbol server. Symbols are published for Windows, Visual Studio, and much more, and this often lets me fix performance problems and crashes even when they are entirely separate from our code. Yay Microsoft!

A call stack that knows how to keep its secrets

Another example, not quite so happy, is demonstrated by this call stack. This is sampling profiler data from a thread that is in our game:

image

Huh. This thread sure is using a lot of CPU time. In our process. I wonder what it’s doing? Except for the two out 1,036 samples that hit in Windows functions I can only tell that it is NVIDIA code that is executing – there are no indications as to what it is doing.

I don’t mean to pick on NVIDIA here. Well, to be more accurate, I don’t mean to just pick on NVIDIA. This is a problem with all three major graphics vendors – NVIDIA, AMD, and Intel. None of them share symbols with the public and this leaves game developers with a significant problem. When a crash occurs deep in graphics driver code (a not uncommon occurrence) we are helpless. When a frame glitch occurs deep in graphics driver code (also quite common) we are helpless. And when game startup includes excessive memory allocations or CPU time deep in graphics driver code… we are helpless.

You can’t handle the truth

I’ve been told by some graphics vendors that having symbols would not be valuable to game developers, and might even be confusing. Game developers couldn’t possibly understand their cryptic function names, and might misinterpret them.

Poppycock.

I’ve solved dozens of performance problems in other people’s code, with just symbols to guide me. Having symbols has never been confusing, and has almost always saved me time.

If I had symbols for the graphics drivers then I could solve some problems on my own. I could recognize patterns in the crashes and performance problems that I see. I could give more precise suggestions and bug reports to the graphics vendors. I could more easily figure out what is happening in my code that is causing problems in their code.

As it is I can do almost nothing. Significant CPU time and memory is being consumed in my game’s process and I don’t have symbols to help understand why.

Call to action

If you’re a game developer, ask the graphics vendors that you work with for symbols. They’ll say no, but it’s still important to ask, in order to remind them of the importance of this issue. After they say no be sure to send them all of the crash dumps and xperf traces where they are a factor and insist that they help you, since they won’t let us help ourselves. And, don’t forget to share your stories and needs for symbols in the comments.

If you’re a graphics vendor – please release symbols. I’m confident that if you do it will let us make better games on your hardware, while saving time for your support team, and for game developers. I know that deciding to share symbols is hard, because symbols reveal a lot. I get that. But not sharing symbols with anyone is counterproductive.

Final disclaimer

I own stock in Microsoft, Intel, and AMD, but not NVIDIA. I hope it has not affected my impartiality. You decide.

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in AltDevBlogADay, Programming, Symbols. Bookmark the permalink.

15 Responses to A Tale of Two Call Stacks

  1. frankie says:

    Is Open Symbols the new Open Source?

    Or, should we demand all software we interact with is Open Source so that we never run into these debugging issues?

  2. Christopher F. Chiesa says:

    Not to be old and crotchety, but sometimes I miss the days when software was primitive and you could just write in assembler and debug anybody’s machine code ’cause that was what you worked with all the time anyway. ;-)

    • brucedawson says:

      I don’t miss those assembly language days. I’m a pretty low-level programmer but the only assembly language I’m using these days is generated by some C# code that I wrote, and I only did that because I knew that for that one rare case it would make a big difference.

      • Christopher F. Chiesa says:

        It’s possible I have a more positive view of assembly programming because I did it on a very nice architecture in which all registers were 32 bits and totally symmetrical/generic. I could use any register as either a pointer or a datum of any size (power of two) from 2 through 64 bytes (the latter requiring two consecutive registers) of either integer or floating-point type. Memory appeared as a flat 4Gb virtual address space. Low-level system services (what the PC world calls “BIOS functions”) were called using the exact same instructions as library functions or your own. The function-call and return instructions themselves constructed and destroyed the call/argument stack, and automatically saved-and-restored whatever registers you specified, around the call. The argument to one of the call instructions was the number of arguments, so you could easily retrieve that handy value from within a called routine. There were at least ten addressing modes, including some very handy multiply-indexed ones. I don’t even remember what else, but it was a beautiful, beautiful processor: the DEC VAX.

        As such, my first sight of Intel assembly was a terrible shock, what with all the hoops one had to jump through: segmented memory (I never have understood how to work with that), tiny address space (and a parade of several generations of mutually-incompatible techniques for accessing ever-larger spaces), explicit construction-and-destruction of call stacks “by hand,” and the invocation of BIOS functions via “INT .” (I never saw documentation of which INT operand values invoked which BIOS functions, either, until literally LAST WEEK!)

        I think if I’d begun my career in THAT environment I, too, would have a much dimmer view of assembler than I do!

        • brucedawson says:

          I first learned assembly on 68000 on the Amiga (okay, I’m willfully ignoring 6502 on the Apple ][), which is reasonably elegant. But assembly language is still too cryptic. I’d rather code in C++0x or C# or Python for the vast majority of my work. It’s just way more productive.

          Although, I must say that x64 is at least an improvement over x86. Go AMD. 16 registers on an out-of-order processor actually feels like ‘enough’.

  3. Mike says:

    “yay microsoft” ? You praise them for the privilege of being able to fix one of their bugs in a product they charged you $600 for ?

    • brucedawson says:

      Releasing symbols is hard, and I appreciate that Microsoft does the right thing and releases symbols for most things.

      Sometimes the issues that I find using Microsoft’s symbols are definitely Microsoft bugs. However, other times they are arguably or definitely problems with our usage of Microsoft’s products or APIs. Either way, I’m glad to have information that helps me solve problems.

      • Wade Mealing says:

        Why is releasing symbols hard, Software vendors in the Linux world have made stripped executables and separate debuginfo packages for some time. Even without the source code debuginfo packages are still useful.

        Are you saying “technically” its not hard, or to convince marketing is not hard ?

      • Wade Mealing says:

        Let me rephrase that last sentence. Or to convince marketing/management “is” hard.

        • brucedawson says:

          I knew what you meant. Yes, from a technical point of view releasing symbols is easy. Symbol servers make it trivial.

          However releasing the symbols to a commercial product is, in many ways, a brave act, since it exposes a lot (of good and bad). I routinely use Microsoft’s symbols to criticize them, and that sort of risk makes it easy for companies to convince themselves that it’s not worth it.

          But it is worth it, especially for the graphics vendors, because they would get better games if they released symbols.

  4. ethoxyethaan says:

    Surely they can’t just release the debugging symbols; people might reverse engineer it and and start competing with them!

  5. Riley L says:

    What are your thoughts about releasing stripped symbols for Source, for Server plug-in creators?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s