Bugs I Got Other Companies to Fix in 2013

Over the course of a year I fix a lot of bugs – that’s part of my job. But I’m not going to write about that. Instead I want to write about bugs that I found in other companies’ code, which they fixed in 2013. These are bugs that I blogged or tweeted about that then got fixed, and these make me happy.

The main reason I blog is because I enjoy writing and researching and it feels good to vent and share. But I also like trying to change the world, whether by teaching developers how to use xperf, or by reporting bugs. However reporting bugs is only truly satisfying if it leads to a fix.

This end-of-year post covers bug fixes that shipped in 2013 and I hope to make this an end-of-year tradition.

4 GB allocation

imageOne of my favorite finds has got to be the bug I discovered in the driver for a Western Digital backup drive. The driver would repeatedly tell Windows to allocate 4 GB of temporary memory, which would flush 4 GB of useful data out of memory. This bug feels like a good example of the duty of nerds everywhere to notice and investigate janky behavior in order to make computers safe for everyone. The average consumer using this drive would have noticed Windows being slow but would not have had a hope of figuring out why.

The bug was originally reported in Windows Slowdown, Investigated and Identified. Western Digital saw the blog post and fixed the bug fairly quickly. The fix was reported on in Windows Slowdown, Investigated and Identified. Okay, technically the fix was released in December 2012, but I’m hopeful that nobody will notice that I illicitly pulled it into the 2013 roundup ‘cause I really liked this bug.

Stack corruption

imageThe most important developer-facing bug that got fixed was the Windows 7 64-bit SP1 AVX stack corruption bug. This bug would corrupt the stack when a program you were debugging crashed, which would destroy the call stack and the evidence of where the crash happened. This bug turned trivial crashes into complex forensic investigations. The only reason this bug didn’t cause more outrage was because most developers never realized what was happening. This bug wasted millions of hours of developer time. It certainly wasted many hours for myself and my coworkers.

I wasn’t the first to report this bug, but I yelled about it a lot and I like to think I helped convince Microsoft to fix it.

I originally reported on the bug as part of When Even Crashing Doesn’t Work. I then did a poll to show Microsoft that developers actually wanted the bug fixed. When they fixed it I initially encouraged developers to install the hotfix, and then updated that advice when I found that the fix was being pushed through Windows Update.

Most ironic bug

I use the VC++ /analyze feature a lot so it was annoying when I upgraded to VC++ 2012 and found that the /analyze phase was intermittently crashing. I reported the bug and worked with Microsoft to get them a repro. It took a while but they fixed it and that made me selfishly happy. The bug felt ironic because it was a use-after-free crash in the code that parses the SAL annotations that help /analyze find bugs.

/analyze

I tweeted that /analyze didn’t correctly understand “float x[10] = {};” and I also reported the bug through other channels. It was fixed a few days later and the fix shipped in VS 2013. Getting /analyze false-positives fixed is important because every spurious warning wastes time and makes developers less likely to believe the real warnings.

I’ve reported several dozen /analyze bugs over the years and most of them have been fixed but I haven’t kept track of when the fixes were released. I’ve submitted a new batch of bugs for VS 2013 because I have a vested interest in seeing this tool continue to improve. Maybe some of those will get fixed next year.

Xperf/WPA/WPT

When Microsoft first shipped WPA as an xperf trace viewer it had a number of problems, especially around displaying generic events which I reported on in June of 2012. WPA would sort numbers alphabetically and would incorrectly decode many payloads. And, WPA continued the fine xperfview tradition of not documenting its exquisitely subtle data columns, which meant I had to roll my own documentation. The pre-release version of WPA 8.1 fixed some bugs but added new ones – it wasn’t looking encouraging.

When the final 8.1 version of WPA came out there was much rejoicing as I could finally completely switch from xperfview to WPA. The generic events bugs were finally fixed (16 months later) and tooltips for the data columns were finally added. The documentation I wrote is still handy for those times when a tooltip doesn’t give quite enough information.

Miscellaneous

I’m sure there were other fixes that shipped in 2013 but I can’t think of any other critical ones.

Turnabout is fair play

There were also a few bugs reported on my blog in the comments section. One of these was an occasional hang in Steam, which I believe has been fixed. I love that this reader turned the tools that I write about against the software that I work on. Perfect.

Failure to fix

Unfortunately there are many bugs that I have reported on that remain unfixed.

Alt+Tab in Windows 7 is still unreliable but at least I hacked up a solution. Microsoft is well aware of this bug in Windows Explorer but they have shown no inclination to fix it, despite the constant stream of complaints.

Windows Live Photo Gallery’s performance problems are still there, but at least I’ve managed to partially hack around some of them. It’s frustrating that they haven’t bothered to finish polishing this quite good product. It could be so much better.

Chrome still raises the Windows timer frequency, thus wasting power and slowing down computers.

Mythbusters and hockey still can’t do math, and this year I learned that I should be annoyed by baseball statistics like on-base percentage which is rarely actually calculated as a percentage. Percentage has a specific meaning that is embedded in the word and it irrationally irks my inner statistician when people use it to mean ‘proportion’.

Open source

Inevitably somebody will say that if I focused on open-source software then I wouldn’t be forced to wait for others to fix these bugs. There is some truth in this, but it’s also not that simple. It is often not practical or not possible to fix bugs, even in open source software. When the Linux time command mislead me (by not accurately reporting CPU time consumed) I spent some time discussing the bug with the developers but got no traction. And, the power management inefficiencies I reported on (excessive use of subprocesses confused the Linux power management so that the CPUs ran slowly despite the task being CPU bound) have not, to my knowledge, been investigated – and I certainly don’t plan to try rewriting the scheduler.

I’ve had some luck with submitting minor improvements to google-breakpad and I fixed up the HAP Python Debugger, so there definitely can be benefits to bug investigations in open source.

That said, I sure wish I had the source code to Windows Live Photo Gallery. I suspect some of those performance bugs would not be hard to fix. And when a bug in an open-source project is fixed you are more likely to be able to get the fix promptly.

That’s all for 2013. Happy New Year!

image

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Bugs, Programming and tagged , . Bookmark the permalink.

12 Responses to Bugs I Got Other Companies to Fix in 2013

  1. Hehe, being mentioned made me smile. Thanks. Happy New Year to you as well!

  2. dishwasher says:

    Hi Bruce, nice post and congratulations, you are doing really great job pushing companies to fix bugs! Could you please take a look at this VS2013 bug that’s seriously degrading code performance – https://connect.microsoft.com/VisualStudio/feedback/details/812192/inefficient-c-sse2-code-generation ? Would really appreciate it!

    • brucedawson says:

      I took a quick look at the bug report. It might be good adding comments explaining the implications of using movdqu versus movaps. If you can somehow measure the performance impact of the difference (comparing the generated code to assembly language using the code the compiler should generate) then that would be ideal. The first levels of compiler bug triage may be developers who are not expert in the performance implications of SSE assembly language. If you can make their job easier you may have better luck.

  3. Ben Craig says:

    Maybe you can get the Visual Studio team to fix this one:
    http://connect.microsoft.com/VisualStudio/feedback/details/635209/missing-warnings-are-generated-from-vs2010-compiler-that-are-connected-to-64-bit-incompatibility

    Then you wouldn’t need to rely as much on VirtualAlloc tricks to find 64-bit pointer truncations, the compiler could statically do it for you.

  4. Ofek Shilon says:

    Bruce – didn’t you have some compiler bugs you reported fixed too?

  5. shane says:

    People on reddit seem to think they’ve found a floating point bug in Dota 2: http://www.reddit.com/r/DotA2/comments/1v6ng0/basic_math_bug_with_stat_calculations/

    • brucedawson says:

      Interesting. I don’t know enough of the Dota stats to tell whether this is a real problem, Dota 1 compatibility, or just rounding anomalies. I’ll pass it along to the team.

  6. Anon says:

    Re: Linux bugs

    Uh oh! Any chance you can link to your discussions/bug reports? I know things aren’t likely to be fixed any time soon but it would provide a nice reference for future discussions^Warguments…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s