Symbols on Linux update: Fedora Fixes

I wrote last month about the challenges of obtaining Linux symbols based on a build ID. I’m now pleased to say that getting Linux symbols based on a build ID is trivial on both Ubuntu and Fedora – things are looking up.

Here’s a recap of the posts on this topic to date:

Ubuntu

In my last post I described the solution that I created for Ubuntu – an enhanced Packages file that lists all the build IDs found in each package on Ubuntu’s site. This makes finding the URL for the package that you need trivial (grep), and then you can download and unpack the symbols (no installation required) from any version of Linux, as described in the Installation is optional section of my second post. I’ve updated my enhanced packages file and I’ll try to continue doing that every week or two. See the last post for details.

Here’s an example, for the build ID 04d4f3f1d378a3ba45195f8102e2145a0115714b. Beware that these steps lack any error checking and can easily go wrong, so understand what the steps do.

$ export buildid=04d4f3f1d378a3ba45195f8102e2145a0115714b
$ grep $buildid PackagesEnhanced

BuildID: 04d4f3f1d378a3ba45195f8102e2145a0115714b /usr/lib/debug/usr/bin/zenity http://ddebs.ubuntu.com/pool/main/z/zenity/zenity-dbgsym_3.4.0-0ubuntu4_i386.ddeb

The download URL is the fourth field in the enhanced packages file so we can get all neck beardy and do a one-liner for the download:

$ wget $(grep $buildid PackagesEnhanced | cut -d ” ” -f 4)

Saving to: `zenity-dbgsym_3.4.0-0ubuntu4_i386.ddeb’

Now it’s the familiar unpack dance. Adjust to your tastes.

$ ar -x zenity-dbgsym_3.4.0-0ubuntu4_i386.ddeb
$ rm -rf contents && mkdir contents && cd contents
$ tar -xf ../data.tar.*z

That’s it. We now have the debug information we were looking for:

$ readelf -n usr/lib/debug/usr/bin/zenity

…
Build ID: 04d4f3f1d378a3ba45195f8102e2145a0115714b

Fedora

I reported last time that Fedora had a web server that would convert build IDs into package details, but that it was too far out of date to be useful. Well, it’s up to date now and they plan to keep it that way!

You can find everything you need to know about how to use it at https://darkserver.fedoraproject.org, but I’ll run through a sample anyway. Let’s start by setting buildid to the problematic build ID of the libc-2.16.so file I tried finding last time.

$ export buildid=cf7bdd994de74c7d4a0cff6a0293d96b64681e06

This command will grab a blob of data that describes the package containing that build ID:

$ wget https://darkserver.fedoraproject.org/buildids/$buildid

Reading through the blob shows the package name as being glibc-debuginfo-2.16-28.fc18.i686 and another wget retrieves information about that package:

$ wget https://darkserver.fedoraproject.org/package/glibc-debuginfo-2.16-28.fc18.i686
$ cat glibc-debuginfo-2.16-28.fc18.i686

{“url”: “http://koji.fedoraproject.org/packages/glibc/2.16/28.fc18/i686/glibc-debuginfo-2.16-28.fc18.i686.rpm”}

The package URL is easy to find in this data so we retrieve the package and, as described previously, unpack it:

$ wget http://koji.fedoraproject.org/packages/glibc/2.16/28.fc18/i686/glibc-deb-2.16-28.fc18.i686.rpm
$ rm -rf contents && mkdir contents && cd contents
$ rpm2cpio ../glibc-debuginfo-2.16-28.fc18.i686.rpm | cpio –idmv
$ readelf -n usr/lib/debug/lib/libc.so.6.debug

Build ID: cf7bdd994de74c7d4a0cff6a0293d96b64681e06

Obviously a bit of scripting and parsing could automate this quite nicely, but I leave that as an exercise for the reader. Or take a look at darkclient, for all your darkserver automation needs.

I haven’t figured out how to use darkserver with a breakpad build ID (32 characters instead of 40) but I’ve discussed it with the darkserver developers and they sound interested in handling that case as well.

The darkserver developers are also planning to support Ubuntu at some point. So some day darkserver could provide one-stop shopping for finding symbols from many Linux distributions, thus fulfilling the original dream of build IDs!

Questions? Problems? Suggestions? Join the mailing list here:

https://lists.fedorahosted.org/mailman/listinfo/darkserver

Share your experiences here, or on the dark server mailing list.

In closing

Thanks to Kushal for making darkserver work.

Two distributions down. There can’t be very many more, can there…

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048

View all posts by brucedawson →

12 Responses to Symbols on Linux update: Fedora Fixes

Ted Mielczarek (@TedMielczarek) says:

March 7, 2013 at 11:24 am

Interesting! We sort of punted on Linux symbols for the most part because I didn’t have a great idea of how to solve it (although I did add support to Breakpad for using Build IDs). We are sort of cheating though now, in that Ubuntu is uploading symbols for some system libraries to our symbol server when they upload symbols for their Firefox builds.

Now if only there was a way to accomplish this for Android libraries…

Pingback: Symbols the Microsoft Way | Random ASCII
Pingback: Symbols on Linux Part Three: Linux versus Windows | Random ASCII
Pingback: Symbols on Linux Part One: g++ Library Symbols | Random ASCII
Pingback: Symbols on Linux Part Two: Symbols for Other Versions | Random ASCII
Pingback: Counting to Ten on Linux | Random ASCII
Dave Pacheco says:

March 27, 2013 at 10:12 am

Thanks for the detailed posts! On illumos- (and Solaris-)based systems, we’ve taken a rather different approach to debug information, using CTF (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/ctf.h#38). Binaries built with CTF have negligible space overhead (no runtime overhead), and our SOP is to build everything with CTF where possible. CTF has its limitations: it’s a little cumbersome to build outside of the OS, it doesn’t support C++ well (yet), and it’s not as rich as DWARF (no local variables or source mapping), but in cases where these features aren’t as necessary, it’s quite convenient because it completely eliminates the problem of matching debug symbols with built binaries, in production, testing, and development. It would be interesting to see it extended to support some of these other use cases (especially C++).

- brucedawson says:
  
  March 27, 2013 at 1:43 pm
  
  Good point. For many purposes (profiling in particular) function names are all that is needed and those can be far more compact than full debug information. Can’t symbol stripping of ELF files be done in such a way that all function names are retained while types/source-mappings/locals are discarded?
  
  - Dave Pacheco says:
    
    March 27, 2013 at 3:22 pm
    
    Yes, I neglected to mention an important point that we take for granted: we don’t run “strip” on anything, ever. The space savings aren’t significant, and the cost in debugability is huge. CTF not only preserves these symbols, but also information about the types of arguments for each function and the layout for all C types so you can also print out any object in the debugger. More importantly, you can write high-level debugger commands in terms of structures’ field names and have the offsets effectively resolved at debug-time from the information in the binary.
    
    - brucedawson says:
      
      March 27, 2013 at 3:46 pm
      
      If you don’t have local variable information or source mappings aren’t you paying a large price in terms of debuggability? Having partial debug information available everywhere is great, but not having the option for full debug information seems like a step backwards.
      
      - Dave Pacheco says:
        
        March 27, 2013 at 4:29 pm
        
        That’s a matter of opinion. 🙂 We have a strong production emphasis in our tooling: while it’s fine to have tools that only work in development, the reality for us is that the development and testing environments are much less constrained, so there are many options for debugging there. Our most challenging issues are first (and often only) seen in production, where facilities like local variable inspection aren’t available anyway. Personally, I rarely miss either source mappings or local variable inspection: it’s just not that hard to model the execution of a C function from its arguments, and most of my time spent debugging is not figuring out how a function did the wrong thing once I know *that* it did and what its arguments were; it’s getting to that point in the first place.
        
        I’m not saying these features aren’t a nice convenience, but that I’d rather build up tools and techniques that work in the more constrained production setting. Such tools include high-level debugger commands that summarize complex system state (which can take a long time to piece together by hand, even with source mapping and local variables). Such techniques include writing smaller, more well-defined functions that *are* easy to model in your head.
        
        
        brucedawson says:
        
        March 27, 2013 at 4:53 pm
        
        Interesting perspective. We make heavy use of crash dumps (minidumps on Windows, breakpad on Linux) and on Windows in particular I find it quite delightful to be able to load up a customer crash and instantly be taken to the correct source file (courtesy of source indexing, https://randomascii.wordpress.com/2011/11/11/source-indexing-is-underused-awesomeness/) and be able to see the local variables and types. We also make some use of custom debugger commands — which also make use of the symbol and type information.
        
        Our production bug analysis is still constrained because we generally don’t record the full contents of memory — just stacks and module details — but I would hate to add any additional restrictions. That said, it would be very handy if debug information from other people’s modules was more accessible, either by being embedded, or easily findable.
        
        Clearly there are many different ways to solve these problems. However I do think that embedding partial information *instead* of having full information available separately is a false choice.