Base Ten For (Almost) Everything

imageIt’s 2016 and Windows still displays drive and file sizes using base-2 size prefixes. My 1 TB SSD is shown as 916 GB, and a 449 million byte video file is shown as 428 MB. That is, Windows still insists that “MB” means 2^20 and “GB” means 2^30, even when dealing with non-technical customers.

  1. This makes no sense.
  2. Just because some parts of computers are base 2 doesn’t mean all parts are base 2.
  3. And, actually, most of the visible parts of computers are base-10.

So just stop it. Base 2 prefixes should only be used when there is a compelling advantage for the typical user, and for file and drive sizes in Windows explorer there are no such advantages. If you think I’m wrong (and I know that lots of people do) then be sure to explain exactly why base-2 size prefixes make sense in the context of file and drive sizes.

My specific use case is that I often end up seeing file sizes in exact bytes – from the “dir” command or from other sources. Eight digit numbers are inconvenient so I want to convert these numbers to MB before sharing them. Dividing by 1,048,576 is much more difficult than dividing by 1,000,000 and I see zero advantage to doing the more complicated division. But, if I do the simple/obvious division then I get different answers from Windows Explorer. Hence this rant.

In this article I am going to use words like thousands, millions, billions, and trillions when talking about base-10, KiB, MiB, GiB, and TiB when talking about base 2, and kB, MB, GB, and GB when quoting other people’s usage. I hope that this will always make it clear what I’m trying to say.

Things that are naturally base 2

cropped_IMG_6365Let me say up front that memory sizes, address-space sizes, virtual memory page sizes, cache sizes, register sizes, sector sizes, cluster sizes, and probably a few other things that I’ve forgotten about are naturally base 2. Cool. So, when talking about these things you should use base-2 based size prefixes.

However, the only one of these that is ever exposed to a consumer is memory size. A computer might have 8 GiB of RAM and describing that as 8.59 billion bytes is just cumbersome. So go for it and use base-2 prefixes for memory. And, if you want to tell consumers about page sizes and sector sizes then feel free to use base-2 prefixes – but really, why would a consumer care?

Amusingly enough, some Dell brochures have a blanket disclaimer that “GB refers to one billion bytes” and they carefully footnote this even on their memory sizes. This means that when Dell sells you an 8 GB computer they are technically only promising you 7.45 GiB. That’s just weird. It means that they are lying about how much memory their computers contain, but in the wrong direction!

Base 2 prefixes make sense for memory capacity because memory chips have a power-of-two capacity. Base 2 prefixes makes sense for address space because n bits can identify 2^n different addresses. Page sizes are base 2 because it allows for easy bit masking to select the page number and the address within the page. Bit masking is, in fact, one of the main advantages of base 2. So yeah, base 2 has its place.

But its place is not everywhere.

Things that come in base-10 sizes

imageThe list of things that are best represented by base 10 includes CPU frequencies, Ethernet speeds, hard drive sizes, and flash drive sizes. One GHz is actually one billion Hz, Gigabit Ethernet runs at one billion bits per second, one TB drives are actually one trillion bytes, and a 32 GB flash drive is actually 32 billion bytes.

Some of these may seem surprising, but the question to ask yourself is “why should (technology x) use base 2?” If there is no compelling reason to use base 2 then using base 10 is the appropriate choice because it then matches the number system that human beings use. Base 10 should be the default, and base 2 should only be used when there is a compelling reason, such as for memory related technologies. Because base 10 is the default, the designers of oscillating crystals, Ethernet, hard drives, and flash drives have sensibly used base 10.

There are some interesting implications from frequencies being base 10, and memories being base 2. If you have 4 GiB of RAM and a bus that can read 256 billion bytes of memory per second then you might thing that you could read all of memory 64 times per second, right? But you can’t, because the frequency is base 10 and the memory size is base 2, which adds a a 7.4% mismatch.  Because 4 GiB is actually 4.29 billion bytes this bus can only read all memory about 60 times per second.

Yes, there is also usually overhead for memory refresh cycles and what-not which mean that the actual read-all-memory passes per second is even lower. My point is that in addition to allowing for that overhead you also need to adjust for GB versus GiB.

In fact, one of the things that sparked this article was a press-release talking about memory chips that had 256 GB/s of bandwidth. The article then breathlessly pointed out that four of these chips would have 1 TB/s of bandwidth. This is almost certainly wrong. The chips probably have 256 billion bytes per second of bandwidth, so four of them would have 1.024 trillion B/s of bandwidth – neither 1.0 trillion B/s nor 1.0 TiB/s. A minor error, but it amused me.

Wait a minute, flash memory is base 10?

cropped_IMG_6368A lot of geeks are surprised when they find out that the capacity of flash memory drives is measured with base-10 units. Given that thumb drives are always 8, 16, 32, or 64 GB it seems reasonable to assume that the “GB” refers to GiB. But it doesn’t. Grab a few flash drives and take a look at their capacity. I just looked at the “32 GB” SD card for my camera and its capacity is 31.91 billion bytes. If flash drives were using base 2 prefixes then that should be 34.36 billion bytes – it’s not even close.

But those should be base 2!!!

Really? Why ‘should’ some of these technologies be based on base-2? There is clearly no reason for frequencies to be base 2, so they aren’t.

cropped_IMG_6364Hard drive capacity is the product of sector size (base 2) times sectors/track times tracks/platter times number of platters. Constraining those last three numbers to be powers of two would be ridiculous. One small power of two doesn’t make the whole package a power of two. And, since the capacities aren’t powers of two, there is no good reason to clumsily represent the capacities with base-2 prefixes. Describing 320 billion bytes as 298 GiB doesn’t help anything.

One could argue that hard drive manufacturers use base 10 because it makes their drives look bigger, and I’m sure they don’t mind that aspect of it. But, base 10 being financially convenient isn’t enough to justify the claims of a vast hard drive conspiracy. The hard drive manufacturers are simply using the most convenient and standard units because there is no compelling reason to do otherwise.

Flash drives are more surprising because the underlying chips have power of two raw capacity. But flash drive manufacturers necessarily over provision in order to leave space for wear leveling, spare sectors, etc. Flash memory already does complex remapping of ‘addresses’ so constraining themselves to power of two capacities would have no benefits. The reason why flash drives normally have sizes like 8, 16, 32, or 64 GB is probably because the 7.4% to 10.% overhead that this provides is conveniently close to what they need. If the amount of spare capacity changes then flash drives could end up being sold with 120 GB or 130 GB capacities.

Does it matter?

You could reasonably say that it doesn’t matter if we display base 2 units to the user because they don’t care. That’s a terrible argument because if they don’t care we shouldn’t display any numbers at all. If we are going to display numbers to the user then they should be base 10 unless there is a compelling argument for base 2. For file sizes and disk sizes there is no compelling argument – and this is something that OSX does right.

Do you really want to tell your parents that a 1 TB drive is more than twice as big as a 500 GB drive? Or that a 1,010 GB drive is smaller than a 1 TB drive? This is the sort of madness that base-2 causes, for no good reason. The mixing of base-2 and base-10 is even worse, because you can’t even come close to fitting 320 files that Windows says are 1 GB onto a drive that you purchased as 320 GB – you won’t even fit 300.

Do you really enjoy explaining to your friends and relatives why Windows is telling them that their brand new hard drive is smaller than the size listed on the box?

But what about computer nerds, surely they should use base 2 for everything, shouldn’t they? No – only when it makes sense. Using base 2 for anything except memory and bit masks leads to ambiguity and to errors. If you use the wrong unit then you will add 2.4%, 4.9%, 7.4% or 10% error (for kiB, MiB, GB, and TB). There are probably many calculations of disk or memory bandwidth that have been off because of the MiB/million discrepancy, and the errors only get worse as disks and frequencies get larger.

Compatibility

I used to work at Microsoft so I know something about how they think and I’m sure that the main reason they still use base-2 units in Windows Explorer is simply because that is what they have always done. Fear of breaking something, somewhere, will probably keep them on base-2 prefixes forever. But I want to do my part to convince developers to not repeat Microsoft’s mistake.

If you’re going to show sizes using base 2 then I recommend that you acknowledge the nerdiness of this in the most honest way possible – use hexadecimal. Or cut off your thumbs and we’ll switch the whole world to octal.

Hacker news discussion here, twitter discussion here.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Rants and tagged , , , , , , , , , . Bookmark the permalink.

57 Responses to Base Ten For (Almost) Everything

  1. brent says:

    “base 10 is the appropriate choice because it then matches the number system that human beings use”

    http://mentalfloss.com/article/31879/12-mind-blowing-number-systems-other-languages

    • brucedawson says:

      Hey, great link. I’m surprised it doesn’t mention base 60 (sexagesimal) used by the Sumerians – the reason that minutes, hours, and degrees work the way they do. And then there’s the odd mixed-base of non-metric systems (12 inches to a foot, three feet to a yard, three teaspoons to a table spoon, etc.).

      I could change it to say “the number system that human beings use the most”, but I doubt I will.

      • IIRC from college – “Base Burroughs” from the 1960’s – IIRC, base 12 for one digit, base 10 for the others -> 100 = 120(base 10). Burroughs line printers had a 120 width I think.

        Burroughs B2500’s and later (to B49xx – RPI-ACM had an old B2700 in ~1983) had BCD ALUs and a base-10 memory address setup.

  2. Pingback: 3 – Base Ten for (Almost) Everything

  3. Pingback: Base Ten for (Almost) Everything – Daily Hackers News

  4. uxcn says:

    Base 2 makes math easier when it matters though, and that would probably be my argument against at least on the technical side. For example, if you want to talk about disks in terms like blocks (e.g. 512b/4096b). An even better example might be IPv4 vs IPv6.

    Using imperial in software is probably still arguably worse.

    • Zr40 says:

      Except it never matters. Computers are quite good in performing calculations, humans not so. And these calculations only happen when displaying data for the user.

    • brucedawson says:

      > Base 2 makes math easier

      Well… base 2 makes *some types* of math easier *for computers*.

      So, if you need to do some bit masking, in the computer, the use base 10. And disks do have sectors that are a small power of two, so bit masking is used to calculate the address within a sector. But none of this has any relationship to what should be shown to consumers.

  5. Nor Treblig says:

    Although I mostly agree with your article there is one big advantage of base 2 (when used with correct prefixes) which is its unambiguousness:

    If it says 1 GiB, it IS 1 GiB (base 2).
    If it says 1 GB then it SHOULD be base 10 (like storage manufacturers always have used) ooor it could be base 2 (e.g. Windows and possible other applications/OS).

    That’s why I would prefer my OS and applications to use base 2 for storage with correct prefixes (yes, I’m looking at YOU, Microsoft!). Also RAM manufactures should use correct prefixes.
    I’m fine with using base 10 everywhere else, even storage manufactures though I think it wouldn’t be much a problem to always print both numbers on the packages for reference.

    • brucedawson says:

      Yep, the ambiguity is the worst part of MB/GB/TB. When it matters I’ll often say “20 million bytes” to avoid the ambiguity. Sometimes I’ll use MiB/GiB/TiB, but I rarely find that valuable for file sizes.

      But I don’t find the ambiguity a compelling enough reason to prefer MiB/GiB/TiB for file and drive sizes. And if the ambiguity is a problem we should invent some equally ugly prefixes that unambiguously mean base-10 – perhaps MeB/GeB/TeB – since the bad computer nerds stole the originals.

      • Nor Treblig says:

        I really don’t like the ambiguity and I think you made a good proposal.

        But instead of ‘e’ (MeB…) we could use ‘o’ because it looks more like the zero in “base 10” (OK OK, it’s actually that this results in more funny words: KoB, MoB, GoB, PoT, EoT)🙂

        Maybe a question could be when to use it. I think it should *always* be used for everything computer storage related (i.e. bytes and bits) (of course except for those rarely cases when actually base 2 is preferred, e.g. RAM):
        GB -> GoB; GB/s -> GoB/s; Mbit/s -> Mobit/s
        (don’t forget hectobit per second: hbit/s -> Hobit/s, hehe)

  6. adamjk says:

    This whole thing reminds me of the frustrating semantics of ‘date’ in programming. Is it a time, is it just a calendar day, does it include a timezone, is it interoperable with that date stored there? The lack of consistency leads to misunderstanding, mistakes, time waste and annoyance.

    • brucedawson says:

      Good link. But I think he correctly answered the wrong question. The real question is “why does Explorer use base-2 MB instead of base-10 MB?” – and for that the evidence is much murkier. Many parts of the industry (OSX, hard-drive vendors, computer OEMs) do follow this path, and it is a simpler path (IMHO).

      Say it with me: Using base-2 for file sizes makes no sense. Using base-2 for drive sizes makes no sense. Using base-2 for file sizes makes no sense. Using base-2 for drive sizes makes no sense.

  7. The real problem IMO is not the choice of base 2 vs. 10; it’s notation. Suffixes like Kb and words like Kilobyte have meant base-2 multipliers for a few decades (eternity in CS) before this debate really started. Why not adopting the new suffixes for base-1-0 instead? For example KdB / “kidebyte” (Kilo-decimal-bytes), then nobody would be fighting over this.

    Of course “Kilo” means 1000, and K/Kilo are SI standard suffixes for base-10. But consider also this: in a decimal Kb, the multiplier may be a nice base-10 number but the byte is not a fundamental unit; a byte is 2^8 bits, and a bit is 2^1 possible values. So the decimal variant is irremediably broken… it’s a Frankenstein like Kilo-inches or Mega-ounces. That’s ultimately what drags me into the position of sticking to base-2 multipliers AND insisting on their right to the nicer suffixes. Not just because tradition and convenience for engineers, but because it’s the only coherent unit system.

    Too bad that we have some base-10 HW standards like “Gigabit Ethernet”; but those are arbitrary, there’s no reason why that spec couldn’t have defined as a binary-Giga – the numbers are close enough that, by the time technology was good enough to transmit 10^9 bits, it was certainly good enough for 2^30. (And again, Gigabit is not anymore a nice round decimal number when you convert it to 125 million bytes per second.) There are zero cases I know where the choice for base-10 is “natural”; there are only two cases, those where electronics and digital logic dictate base-2, and those that are completely arbitrary (almost always related to timing, because we can manufacture oscillators of any frequency we want and time is orthogonal to the “space” factors from base-2-dominated logic).

    • CdrJameson says:

      I agree with your point (and I’d personally go for powers of 2 everywhere). If you’re going to use base-10 multipliers then capacity should sensibly be measured in bits, not the rather arbitrary groups of 8-bits we call bytes. ‘Gigabit Ethernet’ is therefore internally consistent.

      Of course everyone would get mightily annoyed when they got the ‘bit’ version of things they were expecting in ‘byte’s but I’m sure they’d get over it.

      • brucedawson says:

        Capacity is measured in bytes because in most cases that is the indivisible atom of computing. Most processors can read or write no less than a byte. Memory allocations are requested in terms of byte counts. Bit granularity is, in most cases, not possible.

        Byte granularity is also appropriate because a 32-bit computer can address 2^32 bytes of memory – the address of any byte can bit in a 32-bit register. Not so with bit granularity. Byte is the appropriate base unit.

        Networking uses bits – I don’t know why. Tradition?

        • Nor Treblig says:

          It’s bits/s because they are normally sent one after another -> It’s the raw speed of the data communication independent of how bytes are assembled later, i.e. of how many bits they consist, if there is a start bit, stop bits, parity etc (e.g. RS232).

          You can convert the speed to bytes/s (e.g. assuming octets), but it won’t represent some actual possible data throughput because of the overhead of all the involved protocols (starting with Ethernet frames (header, addresses, checksums) to higher level protocols e.g. used for file transfers).

          I think it’s good to keep bits/s because it cannot be easily converted to some meaningful bytes/s value.

        • CdrJameson says:

          If we’re going to apply the argument that the user doesn’t need to know implementation details and should have the thing that’s easiest to deal with in maths I don’t see why the user should need to know about bytes.

          Or we could make up a new ten-bit byte because it’s more metric.

          Incidentally, Nintendo cartridge storage capacities are measured in megabits, but as far as I can figure out that’s a base-2 mega.

          • brucedawson says:

            Not sure if serious…

            I’m not sure how moving away from the 8-bit byte gains us anything. It adds a new type and a new confusion, and fails to simplify anything. If bytes were ten bit then it would make conversions between bits and bytes slightly simpler, but it’s not, and pretending doesn’t make it so.

    • brucedawson says:

      kB and MB and GB have meant *both* base-2 and base-10 for decades. Check your history.

      The number of bits in a byte is irrelevant. There are eight ounces in a cup and 256 tablespoons in a gallon, but that doesn’t mean we should fill our cars with %1100 gallons of gas. Please explain why bits-in-a-byte matters in the slightest.

      Base-2 should be used only where it maps much more cleanly to the underlying technology – which for consumer visible numbers means memory capacity and *nothing else*. Therefore, base-10 should be used for everything else (and *is* used for most other things) because we decided centuries ago that base-10 was best for human math.

      The benefits of using base-10 for ease of human calculation (metric, currency, *everywhere*) are huge and base-2 needs a preponderance of evidence and advantages to overcome that.

  8. C. says:

    12 is divisible in more ways than 10, and therefore base 12 is obviously better than base 10 – why do you think most layout grids are base 12? If you’re going to push for arbitrarily changing a well-established standard, why encourage what is essentially a lateral move designed to dumb things down for the poorly educated? Let’s move to a system that’s actually better. Base 12 for everything!

    • brucedawson says:

      Come now, base 12? The Sumerians had it right and we should be using base 60. Learning our timestables would be more cumbersome, but most numbers would then have far fewer digits, and the divisors are many and varied.

  9. Adrian says:

    Hard drives are just another level in the storage subsystem, so using base 10 with those while measuring RAM and cache in base 2 creates more confusion that it clarifies, because you end up with a change in units at an _arbitrary_ level in the stack. That’s why sectors are sized to a power of 2, because they have to interoperate with the lower levels of the stack.

    Reporting of disk drive capacities has evolved. I remember when some hard drive companies used a MB unit that was neither 1000 * 1000 bytes nor 1024 * 1024 bytes but 1000 * 1024 bytes. And nowadays, many manufacturers report formatted capacity rather than raw capacity (though they rarely talk about which level of formatting they refer to).

    How about display resolutions? There are two version of 4K, neither of which are 4000 pixels wide. There’s 3840 x 2160 and 4096 x 2160. Oh, look, another use of K for base 2.

    “You could reasonably say that it doesn’t matter if we display base 2 units to the user because they don’t care. That’s a terrible argument because if they don’t care we shouldn’t display any numbers at all.”

    That’s a huge non-sequitur. When we’re up to gigabytes and terabytes, most people probably no longer care whether it’s base 10 or base 2, because the difference is relatively insignificant, but the first couple digits do still matter–the order of magnitude matters. Even the base-10 reporting disc drive manufacturers are only reporting one or two sig figs.

    • brucedawson says:

      > Hard drives are just another level in the storage subsystem

      Hmmm. No. They are the first level in the storage subsystem. They are the last level in the memory hierarchy. So, one could justify discussing page-file sizes in terms of binary GB, but that’s it.

      > When we’re up to gigabytes and terabytes, most people probably no longer care
      > whether it’s base 10 or base 2, because the difference is relatively insignificant

      Well that’s a funny thing to say because the discrepancy in the TB range is larger than at any of the smaller prefixes. At 9.95% – which rounds to 10% – it is plenty large enough to affect the first couple of digits. 1.0 TiB versus 1.1 trillion bytes seems important to me.

  10. Adrian H says:

    https://en.wikipedia.org/wiki/Megabyte

    “…one megabyte is one million bytes of information. This definition has been incorporated into the International System of Quantities.”

  11. snoukkis says:

    Type “man resize2fs” in linux for some LOL:

    BEGIN QUOTE (translate underline to UPPERCASE):
    Note: when kilobytes is used above, I mean REAL, power-of-2 kilobytes, (i.e., 1024 bytes), which some politically correct folks insist should be the stupid-sounding “kibibytes”. The same holds true for megabytes, also sometimes known as “mebibytes”, or gigabytes, as the amazingly silly “gibibytes”. Makes you want to gibber, doesn’t it?
    :END QUOTE

    … well at least it’s free (as in free beer) software.

  12. munk says:

    I’m a little late on the comment here (I was avoiding all technology during vacation …), but …

    Whenever I’m presented with a problem like this, where the same nomenclature means two different things, I tend towards a solution where you create a new name with a specific definition. In this case, we already have one created for us. I would just use e-notation.

    I know what the main objection here is, which is that many or most consumers don’t currently understand e-notation. But we’re talking about the same customers who have already had to learn what Kb, Mb, Gb, Tb, etc. mean. Surely they can figure out e-notation if everybody started using it for memory sizes (and RAM sizes, and chip speeds, etc.).

    /just my 2e1 bits …

    • brucedawson says:

      I’m not sure what you mean by memory sizes as distinct from RAM sizes, and I’m not sure why you’re suggesting the e-notation (KiB/MiB, etc.) for chip speeds. It is needed for RAM sizes and nothing else.

      • munk says:

        Sorry, brain fart on the RAM vs. *hard drive* sizes, but in any case I was mistaken thinking the your Dell example was regarding hard drive rather than RAM. So total confusion.

        My suggestion to use e-notation for everything is so the laypeople don’t need to learn the prefixes after giga- and tera-. Although point taken, it’s probably going to be a while before we even need terahertz, much less petahertz.

  13. Pingback: Cars in Canada Get Better Mileage than in the US | Random ASCII

  14. Pingback: Understanding Base-2 vs. Base-10 Numeric Systems | Senior DBA

  15. ryan says:

    You seem to be mainly arguing in defense of the average user. To which I’d say to the average user the measurement of data is abstract and imprecise. A user is really only concerned about the relative size of things, not the exact number of bytes per order of magnitude. The way data is addressed (and often stored) makes powers of two intuitive, at least for the people concerned with that type of thing. Why do it both ways, at the expense of ambiguity and confusion, all on account of the average user that doesn’t know the difference?

    • brucedawson says:

      I am also arguing in favor of myself. Just a few days ago I was copying 150 GB of data over a 100 M-bit Ethernet. I estimated the time as 150 * 10 * 8 seconds. Oops – don’t forget to multiply by 1.074 for the GiB to GB conversion factor because Windows describes file/directory sizes using base-2 GB, but Ethernet is base 10. Even if Ethernet frequencies were base 2 there would still be a 1.024 adjustment for the GB/MB conversion factor. Aaarrrgghh! It’s ridiculous.

      Base-2 KB/MB/GB/TB are rarely useful. They make sense (outside of programming) for memory capacity only. We should use base 10. Or, we should standardize on base-2 and start referring to 95.4 M-bit Ethernet.

      In other words, standardizing on base 2 is never going to happen. Standardizing on base 10 for everything-except-memory-capacity would be easy, and would confuse no one.

      • Nor Treblig says:

        Don’t get me wrong, I’m for base 10 too, just not with the old prefixes.

        But if you do calculations like a dumb user would do, you will always get half-assed results.
        For instance you remembered to incorporate the base 2 prefixes but then completely ignored any network overhead.

        (In numbers: Assuming you are using TCP/IP over Ethernet with no jumbo frames, no package lost and no addtional overhead of higher level protocols you have about 5% of overhead. This is comparable to MiB vs MB (~5%) and also GiB vs GB (~7%))

        So maybe a user is 10% off… it probably won’t matter. And if exact numbers matter someone will be using integers without prefixes anyway (just bytes).

        • brucedawson says:

          And by the “old prefixes” you mean the ones that were used for base-10 first.

          Yes, you should allow for network overhead. It is unfortunate that the errors are cumulative, so that the total error (if you ignore GiB versus GB and network overhead) is over 12%.

          And if a 10% error doesn’t matter then lets use base 10. Base 10 is the math that we are taught from kindergarten. The base-2 believers have to justify why it should be used, and I’m hearing nothing compelling.

          • Nor Treblig says:

            IMHO it was mixed since the very beginning. I’m solely talking about prefixes used with Bytes and Bits. Using proper SI prefixes with SI units should always stay the same as it ever was. I.e. km, GHz etc.

            With TB, GB, MB, kB or even KB you just never know what you get. Even if all decide let’s use base 10 there will be legacy stuff out there for a very long time.
            IMHO new base 10 prefixes for Bytes and Bits which everybody should use henceforth unless base 2 is really beneficial (RAM) is the best solution.

  16. brucedawson says:

    I’m not sure that a *third* type of prefix (on top of MB/MiB) is going to solve anything. When I need to be unambiguous I say millions or MiB.

    Maybe we need to sue memory makers who claim to have 4 GB of RAM with 256 GB/s of read bandwidth for their base 2/base 10 inconsistency.

    • Nor Treblig says:

      Maybe if it were the other way around, e.g. they promise 4 GiB and then it’s only 4 GB. But like it actually is you could sue and they won’t really care.

      Regarding another prefix:
      Someone like Microsoft is using its prefixes since decades. They won’t change it for instance that in Windows 10 some disk has 1.23 TB (base 2) and in Windows 11 it’s suddenly 1.35 TB (base 10), customers would be confused (understandably).
      IF they change something it’s more likely the prefix from TB -> TiB, but that’s not what you actually want.

      • brucedawson says:

        They promise 256 GB/s of bandwidth and only deliver 256 billion bytes/s. If GB really does “mean” binary GB then they are under delivering.

        • Nor Treblig says:

          Giga means billion and you know that. What they say wrongly is 4 GB but then they actually overdeliver. Nobody can sue them because of that.

          • brucedawson says:

            Sadly, drive manufacturers were sued multiple times for saying that giga means billions, and had to settle. Explorer’s usage of giga as 2^30 probably hurt their cause. I agree that Giga *should* mean billion, but wishing has not yet made it so.

  17. Pingback: A modest proposal for a more natural KB | Random ASCII

  18. John Payson says:

    Things like hard drives are often subdivided into power-of-two-sized storage chunks of 512 to 4096 bytes. While one could say that 1GB represents 1,953.125 sectors of 512 bytes, or that 1TB represents 244,140,625 clusters of 4096 bytes, such treatment won’t work MB when using either size of block, nor for GB when using 4096-byte blocks.

    Problems were avoided for the 2^10 prefix by making the letter uppercase “K”, as distinct from the lowercase “k”; the former prefix may be pronounced “kay” and the latter “kilo”. A “32 kay-hertz crystal” would be 32768Hz; a “32 kilohhertz crystal” would be 32000Hz (both frequencies are used, though the former is far more common). It’s too bad other prefixes never worked out so nicely in writing, since the same pronunciation-based distinctions could otherwise work just fine for them.

    • brucedawson says:

      Hard drives are divided into 512 or 4,096 byte sectors. So what? It’s a boring implementation detail. Since the total sizes of these drives have *zero* correlation to powers of two I don’t think you’ve made a compelling case. There are 256 tablespoons in a US gallon, but that doesn’t mean we should use base-2 prefixes for tanker trucks. If the number of sectors was usually a power of two then I’d buy your argument, but in fact that is essentially *never* the case.

      You say that 32,768 Hz is more common than 32,000 Hz – got any reference for that? In my experience frequencies are almost all base-10.

      I think programmers assume that base-2 prefixes make sense because they think that *all* computer units are base two. It just ain’t so.

      • John Payson says:

        Hard drives have a total size which is an integer number of allocation units. It’s possible for a file to take exactly 1.000MiB. It is not possible on most systems for a file to take exactly 1.000 million bytes. I’d also suggest that “million”, “billion”, and “trillion”, are the same length as “mega”, “giga”, and “tera”. Only “quadrillion” is longer than the corresponding prefix “peta”. I suppose “Mebi”, “Gibi” etc. aren’t totally horrible, but a good system should allow hybrid sizes (e.g. multiples of 1,024,000).

        As for 32768Hz vs 32000Hz, I don’t have sales figures available for the two kinds of crystals, but if you examine chip manufacturer’s datasheets, you’ll find a lot of chips which accept a 32,768Hz crystal and report the number of whole seconds elapsed. For some reason, the chips rarely allow read-out of the raw number of counts, and an annoying number insist upon formatting the data as year/month/day/hour/minute/second, often using BCD(!). I have yet to see one that uses a 32,000Hz crystal; even the one I’ve used that allowed a 1/100 second readout produced that by sometimes requiring 327 pulses per count and sometimes requiring 328, rather than always requiring 320.

        • brucedawson says:

          If 32,768 Hz is more common that is an interesting fact. However higher frequencies (GPU, CPU, networking, and memory clocks) are all base 10 so it doesn’t change my basic claim which is that base ten is far more prevalent in computing than most people realize.

          I’m afraid I don’t find your file/disk size arguments compelling. Most math is done in decimal. You need a compelling advantage to justify presenting users with base-2 prefixes. I, for one, don’t wan to explain to users that twenty 100-MB files are smaller than two 1-GB files. That’s just dumb.

          Hybrid sizes are the most confusing idea possible. Please, just don’t. In fact, that’s probably at the root of my annoyance with base-2 prefixes. When you say 640 GiB you are saying 6.40 * 10^2 * 2^30. Either use hexadecimal for your file/drive sizes with base-2 prefixes, or use base-10 for everything.

  19. Ken says:

    Here’s the kicker…SI does NOT cover usage for bytes. One could argue that the use of SI prefixes is incorrect in the first place, base whatever it may be.

  20. Philip Bloom says:

    That was very compelling. It really is annoying in terabytes, and probably will be even more ridiculous when home computers start having picobytes. Frankly if you’re not in tech, it makes no sense at all and has wasted tens of thousands of pages of explanations in help files, docs, hardware box printouts and probably many more.

  21. Jay says:

    When transferring files between let say hard drives and the speed is 50MB/s is it in base 10 or base 2? What about in IDM download box or torrent applications is it in base 2 or base 10?

    • brucedawson says:

      Unfortunately there is no standardization. You would have to ask the developers, but they might give you the wrong answer if they aren’t aware of all of the places where base-10 is used in computing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s