Over the course of a year I fix a lot of bugs – that’s part of my job. But I’m not going to write about that. Instead I want to write about bugs that I found in other companies’ code, which they fixed in 2013. These are bugs that I blogged or tweeted about that then got fixed, and these make me happy.
The main reason I blog is because I enjoy writing and researching and it feels good to vent and share. But I also like trying to change the world, whether by teaching developers how to use xperf, or by reporting bugs. However reporting bugs is only truly satisfying if it leads to a fix.
This end-of-year post covers bug fixes that shipped in 2013 and I hope to make this an end-of-year tradition.
4 GB allocation
One of my favorite finds has got to be the bug I discovered in the driver for a Western Digital backup drive. The driver would repeatedly tell Windows to allocate 4 GB of temporary memory, which would flush 4 GB of useful data out of memory. This bug feels like a good example of the duty of nerds everywhere to notice and investigate janky behavior in order to make computers safe for everyone. The average consumer using this drive would have noticed Windows being slow but would not have had a hope of figuring out why.
The bug was originally reported in Windows Slowdown, Investigated and Identified. Western Digital saw the blog post and fixed the bug fairly quickly. The fix was reported on in Windows Slowdown, Investigated and Identified. Okay, technically the fix was released in December 2012, but I’m hopeful that nobody will notice that I illicitly pulled it into the 2013 roundup ‘cause I really liked this bug.
Stack corruption
The most important developer-facing bug that got fixed was the Windows 7 64-bit SP1 AVX stack corruption bug. This bug would corrupt the stack when a program you were debugging crashed, which would destroy the call stack and the evidence of where the crash happened. This bug turned trivial crashes into complex forensic investigations. The only reason this bug didn’t cause more outrage was because most developers never realized what was happening. This bug wasted millions of hours of developer time. It certainly wasted many hours for myself and my coworkers.
I wasn’t the first to report this bug, but I yelled about it a lot and I like to think I helped convince Microsoft to fix it.
I originally reported on the bug as part of When Even Crashing Doesn’t Work. I then did a poll to show Microsoft that developers actually wanted the bug fixed. When they fixed it I initially encouraged developers to install the hotfix, and then updated that advice when I found that the fix was being pushed through Windows Update.
Most ironic bug
I use the VC++ /analyze feature a lot so it was annoying when I upgraded to VC++ 2012 and found that the /analyze phase was intermittently crashing. I reported the bug and worked with Microsoft to get them a repro. It took a while but they fixed it and that made me selfishly happy. The bug felt ironic because it was a use-after-free crash in the code that parses the SAL annotations that help /analyze find bugs.
/analyze
I tweeted that /analyze didn’t correctly understand “float x[10] = {};” and I also reported the bug through other channels. It was fixed a few days later and the fix shipped in VS 2013. Getting /analyze false-positives fixed is important because every spurious warning wastes time and makes developers less likely to believe the real warnings.
I’ve reported several dozen /analyze bugs over the years and most of them have been fixed but I haven’t kept track of when the fixes were released. I’ve submitted a new batch of bugs for VS 2013 because I have a vested interest in seeing this tool continue to improve. Maybe some of those will get fixed next year.
Xperf/WPA/WPT
When Microsoft first shipped WPA as an xperf trace viewer it had a number of problems, especially around displaying generic events which I reported on in June of 2012. WPA would sort numbers alphabetically and would incorrectly decode many payloads. And, WPA continued the fine xperfview tradition of not documenting its exquisitely subtle data columns, which meant I had to roll my own documentation. The pre-release version of WPA 8.1 fixed some bugs but added new ones – it wasn’t looking encouraging.
When the final 8.1 version of WPA came out there was much rejoicing as I could finally completely switch from xperfview to WPA. The generic events bugs were finally fixed (16 months later) and tooltips for the data columns were finally added. The documentation I wrote is still handy for those times when a tooltip doesn’t give quite enough information.
Miscellaneous
I’m sure there were other fixes that shipped in 2013 but I can’t think of any other critical ones.
Turnabout is fair play
There were also a few bugs reported on my blog in the comments section. One of these was an occasional hang in Steam, which I believe has been fixed. I love that this reader turned the tools that I write about against the software that I work on. Perfect.
Failure to fix
Unfortunately there are many bugs that I have reported on that remain unfixed.
Alt+Tab in Windows 7 is still unreliable but at least I hacked up a solution. Microsoft is well aware of this bug in Windows Explorer but they have shown no inclination to fix it, despite the constant stream of complaints.
Windows Live Photo Gallery’s performance problems are still there, but at least I’ve managed to partially hack around some of them. It’s frustrating that they haven’t bothered to finish polishing this quite good product. It could be so much better.
Chrome still raises the Windows timer frequency, thus wasting power and slowing down computers.
Mythbusters and hockey still can’t do math, and this year I learned that I should be annoyed by baseball statistics like on-base percentage which is rarely actually calculated as a percentage. Percentage has a specific meaning that is embedded in the word and it irrationally irks my inner statistician when people use it to mean ‘proportion’.
Open source
Inevitably somebody will say that if I focused on open-source software then I wouldn’t be forced to wait for others to fix these bugs. There is some truth in this, but it’s also not that simple. It is often not practical or not possible to fix bugs, even in open source software. When the Linux time command mislead me (by not accurately reporting CPU time consumed) I spent some time discussing the bug with the developers but got no traction. And, the power management inefficiencies I reported on (excessive use of subprocesses confused the Linux power management so that the CPUs ran slowly despite the task being CPU bound) have not, to my knowledge, been investigated – and I certainly don’t plan to try rewriting the scheduler.
I’ve had some luck with submitting minor improvements to google-breakpad and I fixed up the HAP Python Debugger, so there definitely can be benefits to bug investigations in open source.
That said, I sure wish I had the source code to Windows Live Photo Gallery. I suspect some of those performance bugs would not be hard to fix. And when a bug in an open-source project is fixed you are more likely to be able to get the fix promptly.
That’s all for 2013. Happy New Year!
Hehe, being mentioned made me smile. Thanks. Happy New Year to you as well!
Hi Bruce, nice post and congratulations, you are doing really great job pushing companies to fix bugs! Could you please take a look at this VS2013 bug that’s seriously degrading code performance – https://connect.microsoft.com/VisualStudio/feedback/details/812192/inefficient-c-sse2-code-generation ? Would really appreciate it!
I took a quick look at the bug report. It might be good adding comments explaining the implications of using movdqu versus movaps. If you can somehow measure the performance impact of the difference (comparing the generated code to assembly language using the code the compiler should generate) then that would be ideal. The first levels of compiler bug triage may be developers who are not expert in the performance implications of SSE assembly language. If you can make their job easier you may have better luck.
Thanks a lot for the tips! I will!
Maybe you can get the Visual Studio team to fix this one:
http://connect.microsoft.com/VisualStudio/feedback/details/635209/missing-warnings-are-generated-from-vs2010-compiler-that-are-connected-to-64-bit-incompatibility
Then you wouldn’t need to rely as much on VirtualAlloc tricks to find 64-bit pointer truncations, the compiler could statically do it for you.
For the format string warnings I have been pushing Microsoft to have full format-string analysis as part of regular compiles instead of as part of /analyze. That would get VC++ up to parity with gcc/clang. That would address your main complaint.
https://connect.microsoft.com/VisualStudio/feedback/details/799869/detection-of-format-string-errors-should-be-part-of-the-regular-c-compile-instead-of-analyze
I have long avoided /Wp64 because Microsoft never fully correctly implemented it, and its time is now past. It was a hack to let you find 64-bit bugs while doing 32-bit compiles, and it always gave too many false positives for my tastes. However I would like to see good pointer truncation warnings as part of regular 64-bit compiles.
I don’t think compiler warnings will ever be a complete substitute for run-time testing, however. They are complementary.
Bruce – didn’t you have some compiler bugs you reported fixed too?
The /analyze compiler crash I reported was fixed, and various /analyze warnings I complained about were improved, but no compiler bug fixes (i.e.; bad code-gen) that I reported were shipped in 2013.
This bug was reported fixed in 2013, but I haven’t seen it ship yet:
https://connect.microsoft.com/VisualStudio/feedback/details/804947/incorrect-sign-extension-from-int-to-64-bit-int
This one has not been reproed yet:
https://connect.microsoft.com/VisualStudio/feedback/details/812124/code-gen-bug-incorrect-devirtualization-and-inlining-when-building-with-ltcg
People on reddit seem to think they’ve found a floating point bug in Dota 2: http://www.reddit.com/r/DotA2/comments/1v6ng0/basic_math_bug_with_stat_calculations/
Interesting. I don’t know enough of the Dota stats to tell whether this is a real problem, Dota 1 compatibility, or just rounding anomalies. I’ll pass it along to the team.
Re: Linux bugs
Uh oh! Any chance you can link to your discussions/bug reports? I know things aren’t likely to be fixed any time soon but it would provide a nice reference for future discussions^Warguments…
I found that ‘time’ doesn’t accurately report CPU time used when there is lots of process creation going on, and in the same situation the Linux thread scheduler and power management systems interact badly:
https://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/
The scheduler issue isn’t necessarily a bug, but I maintain that the ‘time’ behavior is a bug, or should at least be documented.
On the upside, I reported a bug in QtCreator and a patch was created the next day. Pretty cool.