In 1992, near the end of a very slow trip around the world, my fiancée (now wife) and I visited Iran for two weeks. I thought I’d share a few pictures and stories from that visit.
We entered Iran through the Taftan land crossing from Pakistan, having taken an all-night bus trip from Quetta. I don’t mean to criticize Pakistan by the description that follows – I am merely setting up our first impressions of Iran.
The bus trip was horrible. Imagine, if you will, a twelve hour trip in a school bus – famed for their comfortable seats and smooth suspension – on a road that was almost entirely washboard and potholes. We quite enjoyed Pakistan (it was safer then) but that bus trip was not one of our favorite bits.
In the previous episode of “Simple Changes to Shrink Chrome” I discussed how deleting ‘const’ from a few key locations could lead to dramatic size savings, due to a VC++ compiler quirk. In this episode I’ll show how deleting an inline function definition can lead to similar savings.
The savings this time are less important as they are mostly in the .BSS segment, but there are also some modest code-size savings, and some interesting lessons. It’s worth confessing up front that this time the problem being solved was not caused by a compiler quirk – it was entirely self inflicted.
Doing this investigation has reminded me that the behavior of linkers is best described by chaos theory – the details of their behavior defy prediction by simple heuristics, and the results can be changed dramatically by tiny
butterflies code changes.
I just completed a series of changes that shrunk the Chrome browser’s on-disk size on Windows by over a megabyte, moved about 500 KB of data from its read/write data segments to its read-only data segments, and reduced its private working set by about 200 KB per-process. The amusing thing about this series of changes is that they consisted entirely of removing const from some places, and adding const to others. Compilers be weird.
This was originally written in the summer of 2005. I’m reposting it here to get all my writing in one place. For more information on long-distance and high-speed unicycling go to the unicycle section of my blog, available here.
In the spring of 2004 I purchased a Coker unicycle. This unicycle has a 36″ diameter tire and a reinforced air seat with hand grips and was designed for riding long distances. I started riding my new unicycle to work (an eight mile commute each way), and on some twenty to thirty mile rides with other Seattle area unicyclists. Around the same time I met Lars Clausen and read his book “One Wheel – Many Spokes”, about his unicycle ride across the United States. These events conspired to get me thinking about doing the STP (Seattle to Portland bike classic) on a unicycle.
The STP is a two-day 204 mile ride which draws 8,000 cyclists every year. A few people – notably Jack Hughes – have done it before on a unicycle. Other unicyclists have done STP level distances in a single day, so clearly doing it in two days was quite possible, but not easy.
Microsoft’s VC++ compiler has an option to generate instructions for new instruction sets such as AVX and AVX2, which can lead to more efficient code when running on compatible CPUs. So, an obvious tactic is to compile critical math-heavy functions twice, once with and once without /arch:AVX (or whatever instruction set you want to optionally support).
It seems like a good idea, and it’s been used in various forms for years, but it’s devilishly difficult to do safely. It usually works, but guaranteeing that is trickier than I had realized.
TL;DR – I can finally record CPU performance counters for processes on Windows.
I’m mostly a Windows developer but I’ll occasionally fire up my Linux box to use the perf tool to examine CPU performance counters. Sometimes you really need to see how many cache misses or branch mispredicts your code is causing, and Windows has been curiously hostile to this endeavor.
Some time ago Windows gained the ability to record CPU performance counters from within ETW events, but (so the story goes) there was no way to enable it. Then the ability to enable this feature was added, but there was virtually no documentation.
Posted in xperf
Tagged ETW, pcm, pmc
There’s good news, and there’s bad news.
The good news is that the latest Windows Performance Analyzer (WPA), the visualization tool for ETW (Event Tracing for Windows) traces, can now load symbols faster than ever before – it’s multi-threaded, and it scans huge PDBs about eight times faster.
The bad news is that it utterly fails to download Chrome’s symbols.
Update: the Creators Update version of WPA fixes this bug and the latest UIforETW automatically installs it.
Oops. Luckily I was able to diagnose and work around the problem.