It’s important to understand the cost of memory allocations, but this cost can be surprisingly tricky to measure. It seems reasonable to measure this cost by wrapping calls to new and delete with timers. However, for large buffers these timers may miss over 99% of the true cost of these operations, and these hidden costs are larger than I had expected.
Further complicating these measurements, it turns out that some of the cost may be charged to another process and will therefore not show up in any timings that you might plausibly make.
When I run into a problematically slow program I immediately reach for a profiler so that I can understand the problem and either fix it or work around it.
This guidance applies even when the slow program is a profiler.
And so it is that I ended up using Windows Performance Toolkit to profile Windows Performance Toolkit. Again. The good news is that once again I was able to learn enough about the problem to come up with a very effective workaround.
Intel’s manuals for their x86/x64 processor clearly state that the fsin instruction (calculating the trigonometric sine) has a maximum error, in round-to-nearest mode, of one unit in the last place. This is not true. It’s not even close.
The worst-case error for the fsin instruction for small inputs is actually about 1.37 quintillion units in the last place, leaving fewer than four bits correct. For huge inputs it can be much worse, but I’m going to ignore that.
I was shocked when I discovered this. Both the fsin instruction and Intel’s documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made.
The great news is that when I shared an early version of this blog post with Intel they reacted quickly and the documentation is going to get fixed!
It was a fairly straightforward bug. A wide-character string function was called with a byte count instead of a character count, leading to a buffer overrun. After finding the problem the fix was as simple as changing sizeof to _countof. Easy.
But bugs like this waste time. A playtest was cancelled because of the crashes, and because the buffer-overrun had trashed the stack it was not trivial to find the bad code. I knew that this type of bug was avoidable, and I knew that there was a lot of work to be done.
I just finished creating the third in a series of training videos that cover Event Tracing for Windows, also known as xperf or the Windows Performance Toolkit. This set of videos, available on WintellectNow, should be enough to teach any experienced programmer how to use this amazing set of tools to investigate tricky performance problems on Microsoft Windows. You can get two weeks of access to all of the videos on WintellectNow by using promo code BDAWSON-14 – no credit card required.
I’m currently watching John Robbins’ excellent WinDBG training video (slightly condensed from Tolstoy’s original version).
“Please write a C++ function that takes a circle’s diameter as a float and returns the circumference as a float.”
It sounds like the sort of question you might get in the first week of a C++ programming class. And yet. This question is filled with subtlety if you dig into it. Let’s try some solutions.
My last post mentioned the ‘standard’ risks of undefined behavior such as having your hard drive formatted or having nethack launched. I even added my own alliterative risk – singing sea shanties in Spanish.
The list of consequences bothered some people who said that any compiler that would intentionally punish its users in such manners should never be used.
That’s true, but it misses the point. Undefined behavior can genuinely cause these risks and I don’t know of any C/C++ compiler that can save you. If you follow Apple’s buggy security guidance then it can lead to your customers’ hard drives being formatted.
As of May 19th, one month after my report, I see that Apple’s security guidance has not been fixed.