skip to Main Content

 

“Hmmmm…  tasty numbers.”

Fun topic for a Friday!

Thought I’d take a break from the Precedence Project and just share something that I find both amusing and fascinating at the same time.  Your mileage may vary of course.

Recently, someone copied me on an email they sent to a colleague, explaining something kinda funky about Excel.  Actually, it’s something funky about ALL spreadsheets, and all computers in general really.

“Bob – as we discussed, Excel should sum 1.12-1.23+0.11 to zero but as you can see, certain orders actually produce the wrong answer.  This is not an excel formatting issue: Excel is actually calculating the wrong answer.  Try it for yourself.  The first time I discovered this it was with a more complex data set and it drove me nuts trying to figure out why my model wasn’t working exactly as it should.  It turns out it is a known limitation of the way in which computers represent "Double Precision Floating Point" numbers, whatever that means.  Garner has tried to explain it to me but it went over my head.  The simpler explanation is to just blame Rob Collie.”

And he included the following example:

clip_image001

Of course, I have nothing to do with this – I wasn’t even in high school at the time Excel was built.  But in my time at Microsoft I DID get embroiled in a very bitter battle related to this topic, so I am at least somewhat qualified to talk about it.  And I like to think that no one is better qualified to make analogies between this topic and cartoon characters – I’m your guy for that.

Why does this happen?  (Condensed Version)

We all know that the fraction “1/3” does not “fit” into our base-10 number system:  .3333 repeating infinitely.

Well, that isn’t peculiar to base 10 – converting fractional numbers between ANY two number systems can result in values that don’t “fit.”  Try fitting the fraction “1/7” into base-5 for example.

And alternatively, if you wanted to convert “1/3” into base-3, it WOULD fit nicely – you would represent “1/3” as “0.1” in base-3 – that looks weird, I know, but it’s true.

Well, computers use a base-2 number system.  So guess what?  Our friendly little base-10 fractions like 0.1 very often don’t “fit” into computer accurately.  They get stored as approximate values, but the difference between the value we expect to see and the value that is truly stored is cleverly hidden in digits that you never see.

image

Converting 0.1 to Binary Yields a Non-terminating Number, so Computers Approximate It

For example, I think Excel limits you to 15 visible digits of precision.  But under the hood, it’s operating on more than 15 digits.  So any errors that occur based on the base-10 vs. base-2 conversion most often only appear in those “dark” digits that you never see.  The difference is INCREDIBLY tiny, like trillionths of a percent, and it’s hidden, so you never really care.

Sometimes though, as you do arithmetic in Excel, that incredibly tiny difference, out at the 16th decimal place or beyond, can ripple into the visible digits, as it does in the example that was sent to me.

Do all spreadsheets have this problem?

Yes, they do.  Well, every spreadsheet but the first one.  VisiCalc didn’t have this problem, but in hindsight they wished they’d had it,  Their solution was much slower in terms of performance:

“At its heart, VisiCalc is about numbers. One of the early decisions we made was to use decimal arithmetic so that the errors would be the same one that an accountant would see using a decimal calculator. In retrospect this was a bad decision because people turn out to not care and it made calculations much slower than they would have been in binary.

We did want to have enough precision to handle large numbers for both scientific calculations and in the unrealistic case it would be used to calculate the United States budget. Of course, as it turned out, that was one of the real applications.”

From http://www.frankston.com/public/?name=ImplementingVisiCalc 

All Right, why is Homer Better Off?

You know why human civilization adopted base-10 numbers?  Because we all start out counting with ten fingers.  There isn’t anything all that special about 10 other than that it matches our finger count.

Here’s the punchline:  in most cartoons, animators have discovered that characters still look ok if you only draw four fingers on a hand.  And that, of course, is faster to do.  So it’s become basically an industry standard.  8-fingered cartoon characters.

 the_simpsons_movie_torrent_bittorrent_download MickeyMousefred_flintstone

 

 

 

 

 

So…  cartoon characters would use a Base-8 number system in their daily lives, not base-10.  And because 8 is really just 2*2*2, ALL fractions expressed in base-8 could be expressed 100% accurately in a base-2, binary computer.

Spreadsheets in Homer’s world don’t have these weird exception cases.

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 5 Comments
  1. Ridiculous premise, since 0x10^-16 is such an approximation of zero as to make no difference in any practical application. You explained its origin well, but were the user to simply format the worksheet with consistent decimal places in the SUM() row, the problem disappears.

    1. A post with Homer Simpson in the title has now been labeled a “ridiculous premise.” That’s gold, thanks for taking the time Jon. Did I see you in Columbus at SQL Saturday?

      1. You did, actually, gave me a book and everything. (Great presentation, btw)

        I realized after replying last time that the problem was really that “the fingers you used to dial are too fat”…

  2. To be fair, many civilizations did not opt-in to this base-10 crazayness. The Babylonians, I believe, had a base 60 (can you imagine putting that into Excel?!). I prefer the Piraha’s system, which consists of one, two and many, without any bases at all 🙂

    1. Those crazy Babylonians. I do like the reason why they chose 60 though, which is that it had a very flexible prime factorization of 2x2x3x5 and was thus divisible by more numbers. (Yes I just looked this up, I did not know anything about their base 60 system until just now).

      Still wouldn’t have helped them in a binary world though – that 3 and that 5 wouldn’t have worked out too well 🙂

      And imagine Wall Street in the Piraha system. No need for complex derivatives in order to hide the true value of a transaction, confuse everyone, and ultimately steal everyone else’s stuff. You just had to be better at making the right “many for many” trade. I’ll give you 5 cows for 50 horses and that’s deemed fair. Maybe Wall Street can lobby for legislation that all organizations outside of banking must adopt the Piraha system.

Leave a Comment or Question