When Debug Symbols Get Large

TL;DR – upgrade your tools, including Visual Studio, windbg, and Windows Performance Toolkit, if you want to handle Chromium’s symbol files.

Details:

Death, taxes, and browser engines relentlessly growing – those are the three things that you can really be certain of. And so it was in early 2020 when I realized that Chromium’s inexorable growth meant that we were eventually going to produce PDB (Windows debug symbol) files that exceeded the PDB format’s 4 GiB limit.

I filed a Visual Studio bug in February 2020 requesting that the limit be raised, and three years and three days later we flipped the switch so that Chromium can produce larger PDBs. At that point the PDB for Chrome was at 95% of 4 GiB, and several test binaries had already crossed over the threshold, so it was just in time.

My understanding of the PDB format is that it is page based. The format allows two to the 20th (2^20, or 1,048,576) pages, and the default page size is 4 KiB. These two numbers multiplied together give the maximum size of 4 GiB. The maximum number of pages can’t be increased, but the page size can be. In fact, the PDB format always recognized the need for different page sizes, but essentially no tools had ever supported a page size other than 4 KiB.

“All” that had to be done to support larger PDBs was to update a few tools to support larger pages. The tools that needed updating included:

  • PXL_20230309_010858992Visual Studio debugger
  • link.exe – Microsoft’s linker
  • lld-link – the linker used by Chromium
  • windbg and other debuggers (kd.exe, ntsd.exe, etc.)
  • dbghelp.dll (used for loading symbols)
  • pdbstr.exe (used for source indexing)
  • symstore.exe (used for uploading to symbol servers)
  • msdia140.dll (COM API for loading symbols)
  • Windows Performance Analyzer (WPA, the ETW trace viewer)
  • Probably other tools

Easy!

According to comments on the bug an internal fix was released at Microsoft in August, 2021. By December this fix had shipped in updates to Visual Studio 2019 and in the just released Visual Studio 2022, and lld-link also supported it. Once the linkers supported larger page sizes we added a use_large_pdbs build setting which would switch the PDB page size to 8 KiB. However this setting initially needed to be off by default due to a lack of complete tool support.

Ideally the other tool updates would have shipped in the fall of 2021 but… they didn’t. I reached out to the Windbg and WPA teams and – though I could be wrong – I got the impression that they didn’t realize that they needed to ship updates until I told them.

imageThere are various release channels for windbg (and the many tools associated with it) and WPA so it’s hard to say when fixes to these tools first shipped. Large PDB support appeared in the Microsoft Store versions sometime in 2022, and in nuget, but it wasn’t until fall 2022 when the Windows 11 22H2 SDK shipped that it was practical to update all of the versions of all of these tools.

The other thing that held us back was Windows 7. Newer versions of dbghelp.dll had problems running on Windows 7. So, as long as we were running tests on Windows 7 we had to generate PDBs that worked with the old versions of dbghelp.dll. Version 109 of Chrome is the last version that supports Windows 7 so as soon as we had branched for that version I could start working on large PDB support on trunk.

While working on the fix I hit a number of mysterious failures – you can see this by noticing that my first public attempt to create a change to switch to large PDBs was on December 28th – more than a month before the change landed. The first problem is that one test was stubbornly failing to load the updated PDB files. I gradually narrowed that down to the test failing to load the new version of dbghelp.dll. I then wasted a lot of time figuring out which DLL import was causing the problems before finally realizing that… the new dbghelp.dll was not being deployed to the test machines. It’s difficult to load a DLL that isn’t there. Hilariously, we had not been deploying dbghelp.dll to our test machines for years, but the system version was good enough so it hadn’t mattered, so this bug was never previously noticed. I deployed some pretty cool “loader snaps” diagnostics (based on this tool) but in the end it was overkill for “DLL not present” being the reason.

The next mysterious failure only happened on our official builders, which greatly complicated testing. This failure was because the wrong version of msdia140.dll was being loaded. In an almost perfect parallel to the first issue we had never properly been deploying msdia140.dll. I still don’t know where it had been loaded from before, but copying the right version to our output directory resolved that.

Finally, 37 days and 38 patch sets after uploading the first version of my change I landed it. The new PDB size limit is 8 GiB but we should be able to increase the page size to double this again anytime we want to.

Changes like this can easily be disruptive – it’s hard to anticipate all of the hidden dependencies or places this might fail – so I was pleased when the change landed and I heard almost nothing back. A couple of people complained that various tools couldn’t load Chrome Canary symbols but in all cases an update of the necessary tools to the latest Windows 11 SDK solved the problem. Note that the Windows SDK doesn’t install the Debuggers and Windows Performance Toolkit (which contains WPA) by default so if you depend on those tools you need to be sure to select them when installing or else you will still have the old versions.

Most people don’t work on Chromium, but if you run Chrome you may at some point want to be able to profile or debug the official builds you are running. You can configure your tools to point at Chrome’s imagesymbol server and then symbols and source will be downloaded on demand – but only if you are running large-PDB compatible tools. I know that some game studios are also hitting the 4 GiB limit, so those developers also need the latest tools.

I’ve updated UIforETW so that it will install the latest version of WPA when you run it. You can find the latest release (currently 1.58) here.

Twitter announcement is here.

Hacker news discussion is here.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Debugging, Programming, Symbols, uiforetw, xperf and tagged , . Bookmark the permalink.

12 Responses to When Debug Symbols Get Large

  1. Chris Davis says:

    So glad this landed. A few weeks ago I found out the HUD perf tool didn’t support > 4GB PDBs. Fix is now shipping.

  2. Sur says:

    I recently used the WPR tool to grab the log, open the ETL log when reported
    >An exception occurred while processini the events:0x80070032. Aborting processing
    What causes this? Has anyone else experienced this

  3. Todd says:

    Just Develop on Linux and target wine.

    Worked for the WindowsDefender crew.

  4. akraus1 says:

    @Sur: Yep see https://aloiskraus.wordpress.com/2023/03/09/wpa-fails-with-0x80070032/ how to work around it until a fixed WPA is released.

  5. Andrei says:

    Isn’t it possible to split the Chrome executable in more modules/dlls to make pdb smaller?

    • brucedawson says:

      Yes, that is possible, and in fact Chrome supports what we call “component” builds. This gives us DLLs like base.dll, v8.dll, etc.
      The problem is that gives comes with a performance cost. The import/export tables end up being quite large, cross-module inlining is significantly inhibited, and cross-module function calls become more expensive. Therefore we only use these component builds during development.
      As a general rule we try to make the versions that we ship to users run as fast as possible, even if that causes extra work for Chromium developers. In this case we were able to break through the 4 GiB barrier so that we get a fast-as-possible Chrome that we can also debug.

      • Andrei says:

        That is interesting. Knowing that you’ve probably profiled this alternative, would you care to write a post on actual performance penalty, with numbers?

        • brucedawson says:

          Probably not. The cost is highly variable, based on things like how many cross-module calls are happening in a particular inner loop, making it very workload dependent. I suppose if I get energetic I could compare the total size of the DLLs to see how much bigger that is in a component build.
          The reason I probably won’t bother is because we have never had any intent to ship Chrome as a component build. Whether the slowdown is 2% or 20% we’re not going to do it because there is no worthwhile benefit.

          • I dare to suggest, that one of reasons why splitting to separate modules works slower it is, that a lot of compile time optimization can’t be done across them. The caller cannot be sure that this is the only use scenario present and the dll must assume that it will be called in all possible conditions. This is at least what I could observe in microcontrollers world.

            • brucedawson says:

              Oh absolutely, however we only use component builds during development when we aren’t doing cross-module optimization anyway. The different options we have are:
              – debug component builds – optimizations off and many DLLs, slowest
              – release component builds – optimizations on and many DLLs
              – debug non-component builds – optimizations off, one DLL (rarely if ever used)
              – release non-component builds – optimizations on, one DLL, quite fast
              – release non-component build with link-time code-gen for cross-module optimizations, even faster
              – release non-component build with profile-guided link-time code-gen, fastest (this is what we ship to customers)

              So, there are multiple reasons why our component builds are slower, because we never combine them with the full optimizations that we use when releasing Chrome, because it doesn’t make sense to do so

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.