I'm Donald Trump and I'm Here to Fire Your Predictive Models

Yeah, It’s Hardly an Original Pun

So…  the United States ran a massive numerical “experiment” last night, and it didn’t turn out the way any of the experts predicted.

What does this tell us “numbers people” about our chosen fields?

Simultaneously, I think the answer is “something important” and “nothing at all.”

But first, an aside.

Aside:  Careful What You Wish For!

For my entire adult life, I’ve longed for a True Outsider candidate for US President.  One who is not beholden to the money of big firms in order to get to office.  One who could truly Shake Things Up.

Well, um, I finally got my wish, but did it have to be This Guy?  He seriously worries me.  I don’t think it’s a risk to PowerPivotPro’s business for me to say that most people reading this recognize that this dude is dangerous – to the USA and the world.

That’s not the same thing as saying he will be dangerous of course – I like to think that right now he’s getting The Talk from some very serious, deliberately-faceless people from the darkest corners of the so-called Deep State (the folks who never leave, regardless of who wins elections).  And they’re explaining to him that his leash is a lot shorter than he expected.  And maybe he’s having some epiphanies.

Anyway, “fingers crossed” is not a strategy, but that accurately describes where I’m at this morning.  Maybe he surprises us in a good way.  Please?

Never Trust Single-Trial Predictions!

Prediction Markets and Elections: A Bad Match

Predictive Analytics fundamentally relies on lots of trials, lots of experiments, lots of training – before it can offer much in the way of accurate advice on the future.  You need to run highly-similar experiments, many times, and feed their results back into the system.

Over the past year, I’ve been having a long-running friendly debate with two very dear and very intelligent friends.  In short, the debate has been about how much trust to put into the prediction markets.  They’ve long “liked” the prediction markets, and I’ve long said they’re near-worthless.

In short: Rob 1, Friends 0.  Heh heh.

But in fairness to them, they haven’t really disagreed with me in any entrenched sense.  They’ve been more asking me WHY I haven’t trusted the prediction markets.

My answer has always been that Predictive Analytics fundamentally relies on lots of trials, lots of experiments, lots of training – before it can offer much in the way of accurate advice on the future.  You need to run highly-similar experiments, many times, and feed their results back into the system.

And a single Presidential election, something that happens every four years, is about as far from “lots of similar trials” as you can get.  This election, furthermore, was clearly about as dissimilar to past elections AS YOU COULD POSSIBLY CONCEIVE.

So it’s unfortunate for us Numbers Types that so much of the celebrity attention on Analytics has been focused on this election – one of the places where Analytics is BY FAR at its weakest and least-useful.

I hope it doesn’t undermine us in places where we actually DO make a tremendous difference.

And if you do take some heat, maybe these observations will help deflect it.

“Wait, Rob, Were you Predicting a Trump Win?”

Oh hell no!  Not at all.  I’m almost as surprised as anyone today.  An email I sent to those friends on Saturday sums up where I was at:

Prediction Markets and Elections: A Bad Match

The Highlighted Part is the Important Part

“Is This the End of Nate Silver?”

Hey, Nate is an amazing guy.  And he has an amazing team.  They are an awesome crew and nothing about this election changes that.  I will continue to read FiveThirtyEight on a regular basis… especially about topics other than politics.

BUT.  In the political sphere, they have an impossible job.  Predicting once-in-a-lifetime events with virtually non-existent prior information is basically impossible.  It’s a wonder, in hindsight, that Nate EVER rose to fame playing the political predictions game.  I feel for him, today, having to defend his entire reputation like this.  On the flip side though, hey, that reputation has been very good to him.

Forget it.  It’s stupid.  It’s more intellectually honest to say “flip a coin” than it is to expect someone to accurately predict these sorts of things, no matter how smart and skilled they happen to be.

(Wouldn’t it be AWESOME for Nate to just come out and say “yeah folks this is impossible, we’re going to cease predicting elections?”  But he can’t do that even if he wants to – too many other people would be hurt, too many corporate masters would be displeased.)

OK, so what ARE Analytics Good At?

MANY THINGS!

First off, we’re amazing at breaking down, tearing apart, and/or digesting cold hard facts.  You know, things that have indisputably actually happened.  Which is where most analytics, and the vast majority of them that add value, are being performed today…  and will continue to be performed, well into the future.

Yeah, I’ve heard that dismissive “stop looking in the rearview mirror at the past, look FORWARD to the future” argument made in favor of predictive analytics versus analytics on actual data, and you know what?  Bullshit.  You know, there’s two things about the so-called “past” that make it super valuable – 1) There’s no doubt about facts, you can call them 100% accurate predictions if you’d like   and 2) the “past” is also synonymous with where you are right now.  And that is super valuable, to know where you are.  Imagine how useless our smartphones’ mapping technology would be without the GPS component.  Hating on factual analysis is an anti-intellectual attempt by certain vendors to sell their stuff into a market that has yet to even realize 10% of the value of the existing factual data it already has.

As a company, PowerPivotPro alone has PROVABLY created hundreds of millions of dollars of value for our clients via Factual Analysis work.  And that doesn’t include all of the impossible-to-count value created via our books and articles.  Don’t for a moment get fooled that Predictive (or Machine Learning) is in some sense a replacement for Factual work.

And predictive analytics definitely have a valuable role to play!  We’re currently diversifying into them as well, so follow what we DO over the coming years as much as what we SAY.  But they are not going to replace Factual Analysis any time soon.  And we’re only going to deploy predictive analytics responsibly, in cases where there’s a lot of relevant history on which to train the models.

Which brings us to an interesting point:  even future-looking predictive models are 100% reliant on the much-maligned “past.”  Funny, huh?

  Subscribe to PowerPivotPro!
X

Subscribe

Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology. 

This Post Has 34 Comments

  1. Always a great read. Nailed it once again:
    “Hating on factual analysis is an anti-intellectual attempt by certain vendors to sell their stuff into a market that has yet to even realize 10% of the value of the existing factual data it already has”

  2. I think Nate Sliver’s model, and nearly every other political prediction model, had the problem that we’ve all experienced, bad data. Your analysis and predictions are only as good as the data being input. The prediction models were based on third party polls of which very few were predicting a Trump win, http://www.realclearpolitics.com/epolls/latest_polls/. When most of your data is pointing in one direction, you can say the opposite is going to happen but your data isn’t going to back you up. Nate Silver hedged by saying there was a lot of uncertainty in the polls which is why he was actually more bearish on Hillary than other models. I think the most interesting question is, why were the polls/data so bad?

  3. Loved the reference to the shadowy characters and ‘Deep State’, that’s pure Deep Throat and X-Files scenario

    Although for some reason I am thinking of the character Johnny Smith in the Dead Zone (played by Christopher Walken) who stops the political madman getting anywhere near the red button!

  4. What is rarely discussed with analytics, machine learning, and big-data is proper “Experimental Design” and “Survey Sampling” methodology — the “inputs”.

  5. Big Data is a broad church, so it’s hard to talk about particular areas of analysis (financial, social, political, scientific, etc) without being more specific about what aspects of data you’re addressing.

    I don’t claim infallibility on this topic either, but here’s one view. Analytics relies on models, which in turn rely on connections between data. Just because a connection exists doesn’t make it causal (correlation != causation). Some models are more mechanistic and deterministic than others, and we tend to give more credence to the output of those models because (we think) more care and effort went into locking down those inputs and outputs.

    Data can be bad in so many ways – missing, garbled, incomplete, irrelevant, and so on. A model can be built to overcome one or two shortcomings but fails if the data is bad in an unexpected way, or if the connections between the data were built to overcome (say) missing data but not irrelevant data. Bad data is easiest to spot in scientific settings (though Tufte’s classic analysis of the Challenger disaster demonstrates even scientists have blind spots).
    acid + base = salt + water or Ohm’s Law are consistent connections of input to output that always hold true. Models that depend on the laws of science can usually be trusted – if they stick to their core competencies.

    Long Term Capital Management (LTCM) was one of the first financial models to first succeed and then spectacularly fail when a Russian bond default caught the market by surprise. Obviously we haven’t yet learned our lessons there. Finance is a special case – not only are there real-world surprises waiting to break your model, but for every bright guy on Wall Street creating one, there are 3 others trying to break it and leave you with the check. High risk, high reward.

    Most of the other models are in-between. Crime statistics and community policing are getting better, but stepping outside the comfort zone to use those models to predict real-estate projects stretches the data connections. Facebook has algorithms to determine sales likelihoods but those are still working out the kinks too.

    Rob says above that “There’s no doubt about facts, you can call them 100% accurate predictions if you’d like.” Ummm, not so much. Do people agree on what the facts are? Just because I have an RFID chip that pings at 2:47pm when it crosses a detector, can I say whether the tagged item was entering or leaving the storeroom? I can build my model to infer from other events its direction of movement, and thus whether it’s likely to reappear at that particular scanner in the future, but those inferences are where my model becomes fuzzy. My inference is no longer a fact; it never was, but users of my model are likely to conclude it was a fact because my logic says so. Is my data collection for the other events as precise and complete as the RFID ping at the storeroom door? And that’s for data elements that my corporation controls – what happens when I’ve got umpteen grad students canvassing neighborhoods for demographic data that goes into my marketing model? Or when I buy supposedly compatible and complementary marketing data from different vendors who maybe didn’t use the same income ranges for their groupings?

    Every time we build a PowerPivot model we’re making those inferences. Some are as close to no-brainers as possible – relating our calendar table to a sales date. Some however reflect an inherent bias, however widely held. When I link my item master table to the storeroom RFID data, I’ve made an assumption that the tag reflects the correct item and that the item is properly assigned to the correct category (e.g. a car door part for power windows). The more connections, the more assumptions.

    I agree with Rob that we’ll get better at this as the tools become simpler to use so that we can properly focus on ensuring the inputs and outputs match the smell test. But some areas are more suited to analytics, sooner, than others, and it can be very hard to determine what those areas are. We need to have more precise terms than “analytics” and “big data” to get there.

  6. P.S.

    It’s also my belief that all models – except the pure scientific ones – eventually fail. We tweak and increase the density of our models to account for new information and keep them relevant, but eventually the density of connections becomes too heavy for anything more than incremental changes. It makes more sense to abandon a model once a set of data or correlation appears – like a Russian bond default – that renders the existing structure incorrect or irrelevant.

    Power Pivot is nimble, and in fact starts to penalize you if your model starts to get too dense. Does this force us into “wide” rather than “deep” analysis? Do we need a different tool for the other? Or would you disagree with the premise?

  7. Rob, you are young and a little full of yourself. But I get some good stuff from you that helps me to learn about things I need to know in my business. So I will just continue to ignore your political stuff. May I suggest you save this and open it about 20 years from now and see what you think about your comments then?

    1. Oh good gravy Ron. Always, always start with an attack, cuz it’s such a combative environment around here – kidding, we keep it polite. If you’re looking for ad hominem, there are many other places for it yes?

      BTW, in 20 years I’m pretty sure we will all still think 2016-era single-trial predictions were a sham. But I suspect that’s not what got your hackles up.

      Anyway, come back with politeness if you come back ok?

    1. Yeah that’s a good one, heard of it before but hadn’t read it until now.

      That seems believably better than the 538 methodology since it leverages historical “trials” more fully and doesn’t get overly complex.

      For instance, in the book “Everything is Obvious, Once You Know the Answer,” they showed that if you just picked the home team to win in the NFL, you get within 3% of the performance of the prediction markets. If you then factor in simple recent W/L records for each team, you basically get to “par” accuracy with the prediction markets. Informed by many thousands of trials, stays simple, does the job as well as anything we’ve got.

      Link to book, if you’re interested: http://amzn.to/2fUYRer

  8. I enjoyed reading your piece, but I’m afraid that you’re not doing justice to Nate’s work.

    538 did not predict a win for Hillary, but gave her a 71% chance of winning, hedged with that statement that “it shouldn’t be hard to see how Clinton could lose. She’s up by about 3 percentage points nationally, and 3-point polling errors happen fairly often, including in the last two federal elections.”

    He’s using the historical data, with thousands of simulations, and reaching an uncertain/probabilistic conclusion. I’d totally go with Nate over a coin toss, given appropriate odds based on his probability estimates. Further, 538’s writing has been well worth reading all along – no hubris, lots of humility, and lots of transparency about their process.

    I’ll keep reading Nate’s work on politics since it does a good job leveraging the factual data towards probabilistic predictions that have a long track record of being better than random. Note how much better 528 came out vs. Princeton or Huffpo.

    1. I still think Nate is amazing. I just think he’s playing an impossible game (but profiting from the illusion that it IS a tractable problem).

      Hedging ahead of time like that – 71% chance of X, but it’s really easy to see Y happening – doesn’t undermine my point at all, but rather supports what I’m saying, which is… might as well flip a coin. (A prediction which requires such hedging is not much of a prediction at all, in other words.)

      1. Thanks Rob. Nate’s 2008 to present track record is way better than 50/50, but no doubt that he has stumbled in this cycle. I think we are better off building election probabilities based on the best data and conservative probability estimates, and that there’s clear evidence that it’s better than coin flipping (49/50 in 2008 and 50/50 in 2012).

        Also, I believe that they type of hedging he did would have qualified as extremely useful advice in a business situation: Informed risk assessment.

        1. Sure, but when an unsophisticated chap like me is saying “I don’t trust any of this, I think it’s 50/50 Trump wins,” and that coarse prediction ends up more accurate than the experts, it DOES say something about the tractability of that field yes? Plus the American University professor referenced elsewhere in these comments has an even better track record than 538. Why isn’t he more famous than 538?

          I love and respect folks like Nate. At the same time, I try very hard to hold nothing sacred, and I think we’ve reached the “emperor has no clothes” moment for the political prediction game as played by 538. Those same rigorous approaches when applied to other domains however? Much more valuable.

          1. I think you’d agree that we should look at all of the relevant data, and judge based on that. We will disagree, of course, but I don’t think we are anywhere near “emperor has no clothes” for Nate. I totally agree that nothing is sacred, and Nate himself agrees with that given his deep post mortem. The dude iterates and tries to improve based on past misses.

            It’s about probabilities, not being right 100% of the time. Nate admits that. Certainly there are other domains where predictive analytics do a better job. As you explain, this is a tough domain. My point is that Nate adds much value in this arena by claiming that “I am pretty sure that the emperor is naked, but I could be wrong.” With multiple predictions over time, we can judge his track record, and I judge it to be pretty good.

            The American U professor (Lichtman), is less famous because he has added much less value to the field of analytics, and his “predictions” are much squishier (he called his prediction of Al Gore winning the popular vote “accurate” while at the same time saying that calling Trumps win was accurate). Regardless, kudos to the man for his work. Nate wrote “the signal and the noise,” plus developed a highly successful baseball predictive system, plus great writer, plus great blog, etc, etc.

            If the bottom line for you is that “Nate’s predictions are no better than coin tosses,” I’m cool with that. I come to a different conclusion based on Nate’s track record (plus I greatly appreciate his “I got it wrong” fessing up within our “never claim failure” information environment.

            Thanks again for the fruitful and thought provoking discussion.

  9. For better or worse, I predicted this outcome 18 months ago with the analytics machine I like to call my mindgrapes. What I find particularly interesting is that when he was up, I was told the polls were wrong. When he was down and I would say well I think that doesn’t take into account X, they would call the polls gospel. Good stuff Robstradamus

  10. Analytics aside, I am remembering the election that occurred in Africa or Asia some years back, where the dictator allowed the election to go forward after all the polling made it very clear that he would definitely win. There was quite an upset as when the votes were counted he lost by a wide margin. That was because when polled everyone said that “Of *course* they would vote for their glorious leader!”. Polling in the US has not only self selection but also deliberately chosen inaccurate answers to contend with and the models didn’t particularly weigh that in as far as I could see.

    I regularly lie when filling out intrustive ‘none of their beeswax’ forms online. My zip code is 99999 and my birthdate is 1/1/1929 and so on. I am fairly certain that some percentage of hubbies in front of wives or other people who might react poorly “fudged” a few answers here and there, carefully not expressing support for Trump out loud, regardless of their intended vote.

  11. A little bit harsh IMHO. Ballsy, straight speaking, undoubtedly. Like in politics, folk are yearning for a different viewpoint because the usual modus operandi just hasn’t delivered, rather just protected the vested interests involved

    What is meant by the “political stuff”?. I certainly cannot see any hidden agenda here. A bit different to someone who is selling Tableau ad-nauseum and calling Excel shit at every opportunity. Or someone selling a technology completely on the basis of buzz words like “big data” and “machine learning” without any consideration for what is necessary underneath. “Fur coat and no knickers” as we say in UK

    I’m certainly not smoke blowing here but i’d posit that the likes of Rob, Avi & Matt are ‘Modern Day Pioneers’. You might not like their opinion but you certainly don’t have to read under sufferance

    I wouldn’t want to be the person who said to themselves in 20 years, “you know those guys who challenged the status quo and entrenched vested interests were absolutely bang on the money weren’t they?”

  12. Random events are the elements shaping the future and very frankly, it would seem that this guy Silver and all the other who predicted a democratic win (even 99%m odds!!) are outright amateurs…. I think I know a thing or two about risk analysis and must say that at no time I read anything about these people factoring random events into their models…. Hope they learn something… or change their trade.

  13. I’ve got to say this article is a little mind blowing. I’ve learned a lot from PowerPivotPro, and sincerely appreciate all your work and knowledge you’ve contributed to the community. However, this is a really disappointing article for several reasons. In addition, the comments you wrote also surprised me.
    The danger that you’re preaching here is a misrepresentation of what a forecast is; they’re probabilistic, representing a range of outcomes rather than just one number. This an important concept for so many reasons, and how confidently you wrote about (and disregard) it is worrisome. The distribution of outcomes represented the most honest expression of the uncertainty in the real world. Sure, you finished by saying predictive analytics can be useful in some cases, but you misrepresented the entire concept throughout the rest of the article. The presidential election isn’t a once in a lifetime event, and even if it happens once every 4 years that doesn’t preclude it from being modeled. There are other elections, polls, historical data for turnout at local levels and based on demographics.

    “1) There’s no doubt about facts, you can call them 100% accurate predictions if you’d like and 2) the “past” is also synonymous with where you are right now. And that is super valuable, to know where you are.” Absolutely, and I haven’t heard of anyone who favors predictive analytics argue against “Past facts”. But extending your GPS analogy, another important perspective is where you’re going. Depending on the day, time of day, and destination the quickest path can vary greatly. If you use google maps, you may get a different route for the same starting point and destination. You can see this for yourself by going to google maps and playing with the departure time.
    From your comments: “A prediction which requires such hedging is not much of a prediction at all, in other words.” Nate Silver’s forecasting model, which reflects the probability of an outcome, changed many times throughout the election cycle. Making the most of the limited information available requires a willingness to update one’s forecast as newer and better information becomes available. Predicting without recognizing the probability of an alternative outcome is foolish, especially in the face of new information.

    “First off, we’re amazing at breaking down, tearing apart, and/or digesting cold hard facts. You know, things that have indisputably actually happened. “
    You know who else thought that? The ratings agency for Triple A mortgage securities. Moody’s used data on the housing market from 1980 to 2000’s to estimate the correalation between mortgage defaults. But the housing collapse was an out-of-sample event, and their models didn’t capture those conditions. If they’d expanded the model to look further back, or to different countries housing crisis, the models will understandably change. It was a quantitative and data driven model, but the model was dangerously wrong because it assumed that the default risk for different mortgages were uncorrelated, which doesn’t make sense in a housing and credit bubble. How do we reconcile the need to use the past as a guide with our recognition that the future may be different?

    Given that it rained today, what is the probability that it will rain tomorrow? In powerpivot, I could examine the rain patterns for Seattle in November and conclude that the long-term average suggests it that 50 percent of the days’ will include precipitation. But this isn’t that useful – it doesn’t tell me whether I should leave my house in Seattle with an umbrella or not on any given day in November. The point isn’t that these “past facts” are useless, just that they take on more of a purpose when we apply them to the future and use that to help inform current decision making.
    Your email says that its hubris to provide numbers like 17% chance – as if that’s more accurate than 20% or 50/50. If you are a poker player, you’ve probably come across the concept of expected value. If I’m betting on the election, or betting on investing in a new branch or plant, I want to know my expected value from said investment (or bet). I also want to know the tail risks of that bet. Nate Silver saying there is a 70% of Clinton winning, is still saying a 30% chance of losing. The 30% isn’t because Nate Silver wants an excuse if he’s wrong, it’s because with a good forecast the probability turns out to be about right over the long run. Research shows that we have trouble distinguishing the difference between a 90% chance of something and a 99.99% chance. If the difference is in a plane landing safely this has a huge implication on whether we should buy our plane ticket. Likewise, the difference between a 17% chance and a 50% chance also has implications.

    Polls do become more accurate the closer you get to Election Day. A Senate candidate with a five-point lead on the day before the election, for instance, has historically won his race about 95 percent of the time. By contrast, a five-point lead a year before the election translates to just a 59 percent chance of winning—barely better than a coin flip. His model, which is constantly fed new information, helps capture the changing probability of each possible outcome

  14. I agree with Adam that 538’s predictions weren’t that bad. A 71% chance of a Clinton win means it’s clearly within the realm of possibility that she doesn’t win. In a small sample size (of 1), 50% and 71% can look the same (at very different costs), but that isn’t evidence that a coin flip is just as good as a complex analysis. 538’s methods reflected that the paths Trump had to 270 electoral votes were less in number than Clinton. A coin flip ignores that.

    The question that underlies your post is why predict in the first place (i.e. why not just wait and see). I assume there must be value to someone, somewhere. (Perhaps for the news media, who can run headlines and earn clicks – I’m not sure.) Still, where there’s money to be made, the race to build better & better models will continue. The weather is as unpredictable as it gets, but no-one gives up and flips a coin to decide if it’s going to rain or not (or if the hurricane is going to hit Miami or not).

    I will agree that predicting the future is fraught with uncertainty that increases the likelihood of a Black Swan event (which is just lingo for “a surprise”). I won’t even put in a caveat about more history equalling better predictions: more history often equals more misplaced certainty that the future will mirror the past.

    The value of predictive analytics (in politics, weather, or anything) is planning rather than certainty. If you can position yourself for a huge reward at relatively little cost in the event of a surprise outcome, go for it. Don’t invest millions in a giant ark that will sink your business if the flood never comes. But if you can “scale up” your boat building business in the event of a giant flood, you can take advantage of misplaced certainty on the part of others.

    On a side note, your opinion posts, political or otherwise, continue to bring me to this site. I always enjoy what you have to say and the way you say it.

  15. Thanks Leonard.

    For you and Adam actually, let’s stop talking about whether Nate is good or bad, because we all agree he is good. I never once intended to say anything of the sort, and still believe that I never have.

    Stating my point a bit differently: I think it’s a shame that political analytics are currently the “most famous” kind (or at least tied with sports analytics for most famous), because they are inherently one of the least reliable places in which to deploy our toolsets.

    I agree with Leonard that “more history” would not be sufficient, because predictive analytics rely on large volumes of RELEVANT history. Political elections provide neither the volume nor the relevance of history required.

    And when we say predictive is valuable because it positions us to profit, well, we would have lost money betting on any of the predicted outcomes yes?

    Also, when we say a Trump win isn’t evidence that the models are flawed/weak, I disagree – it absolutely IS evidence. I think what you mean to say is that it is not PROOF. And I would agree with that statement. But failed trials absolutely count, we don’t get to say “well it was within acceptable limits” and pretend it is not counter-evidence. A Trump win is actually the most damaging counter-evidence that the real world could possibly have provided us right? When all you are doing is picking between two outcomes, if you end up wrong, or with overly-wide “error bars” forcing you to hedge ahead of time, you are not providing much value. Provide near-certainty, and be correct, or be replaced by a much cheaper coin flip 🙂

    If that still strikes you as “Rob not getting it,” maybe re-read it (this comment) again with the following thought in mind: I’ve been precisely in your shoes before. A few years back I would have said the same things as you guys. And I’m not offering something that I believe to be an anti-intellectual point of view, but rather a meta-intellectual point of view (if such a thing exists).

    Thanks for the discussion and kind words,
    -rob

  16. “Danger,” “disappointment,” “worrisome.” Honestly Ryan I find it dangerous/disappointing/worrisome that those words are thrown around so easily these days :p

    Especially true in intellectual discussions. Let’s welcome the ideas that challenge us, even if in the end we decide to throw them out, and especially so if they come from formerly-trusted and formerly-valuable places. I can be wrong of course, but the tone here sounds like some of you think you are talking to a climate-change-denier 🙂 I like to think of myself as uncommonly thoughtful and careful, not the sort to run my mouth lightly, and I’ve been noodling on this for a long time.

    I sincerely believe that I understand and agree with everything you say about probabilities, range of outcomes, expected value, etc., and yet I *still* believe I am making an intellectually-valid point that you may not be hearing clearly. This couldn’t be more unlike poker. I merely think that national presidential elections are about the single worst place to apply predictive models. What’s controversial about that, really? As a serious professional, would you advise a paying client to put trust in a prediction with so little “training” and only one trial to be right or wrong about? (Unlike in poker where your statistical edge wins out over time).

    I do then take it one step further and say that it’s SUCH a bad place to apply them… that it qualifies as entertainment rather than truly valuable.

    And even if THAT is worthy of argument, sheesh, I think we are still well short of the word “dangerous.”

  17. Not related to that topic… But do you plan any post about alternatives to Azure DataMarket? Or it already exists? I am during reading your book, started to learn PowerPivot, started to use Azure DataMarket (so far for Calendar table)… And today received e-mail from Azure Team “Thank you for using Azure DataMarket. On March 31, 2017, we’ll retire DataMarket and all its services.”

  18. Rob,

    Do you really consider a 700 person poll as “analytics”? There is no way sample sizes that small could ever be even remotely accurate, as there are so many other variables. Most Bernie supporters felt Hillary stole the election from him and most of everyone else just didn’t like her—even Kanye said he voted Trump.
    A lot of people predicted a Trump win using big data. I didn’t expect him to win Pennsylvania, Michigan, Wisconsin, and Minnesota though.

    At least the Dow Jones is hitting record highs!

    Maybe you can help the pollsters out (other than LA times and IBD/TIPP Tracking who got it right) going forward! They need some POWERful help!!

  19. I see your point. If your goal is to know who’s going to win, what’s the point of predictive analytics that predicts wrong?

    I see value because I look beyond that binary prediction. I look for the fine print…the 29% chance of a surprise that equals risk/opportunity. Knowing that one result would be more of a surprise than the other is interesting, and wouldn’t be reflected by a coin flip. (And isn’t invalidated by the headline prediction being wrong.)

    My thought, beyond politics & sports, is that predictive analytics is always a risk. Netflix will predict a 3 star rating for a movie I’d rate 5 stars and they have relevant history up the whazoo. Predictive analytics will always be a tool in a toolbox, never a certainty.

  20. I’m a little confused by this comment: *Also, when we say a Trump win isn’t evidence that the models are flawed/weak, I disagree – it absolutely IS evidence.*

    I would have thought that you could only call it evidence if the predictions predicted Clinton to win with 100% certainty. Any prediction made with *less* than 100% certainty leaves wriggle room. He left quite a bit…not too much and not too little to leave his reputation in tatters.

    I think what we’re arguing about here is whether his estimate of the wriggle room was a fair one, given what he had to work with. When it comes to estimating wriggle room, I don’t really see predictive analytics for politics that much different than predictive analytics for sport. Try saying “Never trust single-trial predictions” to a sports betting agency…they’ll laugh and then drive off in a cloud of ferrari. Ask them how successful they are at picking *winners*, and they’ll probably say 51% of the time. That’s a risky – but profitable – business model. What Silver and others wasn’t any different, was it? Silver was effectively just stating his odds should you want to place a hypothetical bet with him, no? Gambling odds. “If I was a betting man…”

  21. I find it quite puzzling the fact that the discussion is centered on analytics and many complain about the limited data, bad data, etc. A predictive model based on quantitative risk analysis could have been built using whatever data was available PLUS factoring random events into the model. For example, republicans were talking about a great number of voters who did not disclose their preference for Trump. This is one of a number of random events which should have been used, setting probability of occurring and and impact values, using subjective probability distributions and running, say, 30,000 trials. You then need to stress test results. It would seem NO ONE did that, and the astonishing off-the chart wrong results seem to indicate it.
    80% to 99% odds for the democratic nominee to win? Give me a break!!! I have the impression that these “experts” were consulting and seeking guidance from astrology…

  22. I never thought about the election as a predictive model gone wrong. As an analyst, I was trying to decipher the data and understand how so many polls could have been wrong. But we also need to realize that we can predict the future and outcomes but there one thing no one can predict; it is the emotions of the people who were voting. The majority of rural folks voted for Trump because they have not seen their standard of living improve in the last eight years.

Leave a Comment or Question