Monthly Archives: November 2012

Confessions of a Big Data Blasphemer: What if Big Data Doesn’t Work?

ASLcolorLARGEI’m Off to Hell Now

Let me say this in advance – I know that I am going to go to hell for writing this, in the same way that my Catholic friends see little hope for my immortal soul because I’m Jewish, and my born-again friends here in the south just shake their heads and hope that I’ll come to my senses before my time is up. But here it goes:

What if Big Data doesn’t work?

The mythos of Big Data, what has the industry either very excited or completely freaked-out or bored to tears, depending on where you sit, is the belief that we finally have enough information about a person to predict their behavior with some level of statistical rigor.  We’ve moved away from talking about Big Data as the analogy to Asimov’s psychohistory. He posited a science of mass behavior; the priests of Big Data espouse modeling at the individual level. But before Lenny sends me to Limbo, or at least Purgartory, I ask again:

What if Big Data doesn’t work?

Big Data’s promise relies on a set of assumptions, none of which may be valid (or, in fairness, may not be valid today but might be in the future).

  • Big Data assumes a deterministic view of the world, that a person’s behavior is sufficiently consistent and that the determinants of that behavior operate in a homogenous manner across individuals of some cohort; that you can actually build a model of that behavior and use it for prediction. We have no reason to believe this is true except blind faith. Mostly, we can’t predict a lot of things beforehand with a model, either for human behavior or the more restricted domain of consumer behavior. This is why 80% of new product introductions fail. This is why we do experiments, as Howard Moskowitz has [rightfully and righteously] proclaimed on more than one occasion.
  • Big Data assumes sufficient acquisition of the causal factors involved in a decision and this may not be true. A simple opt-out will keep them from knowing what television shows I watch, because Comcast is prohibited from sharing my viewing data (if they even keep it) with anyone else. Nor do they know what radio ads I’ve been exposed to, and because I’m a Luddite at heart and still have a bit of late 1960s paranoia,  I don’t have a smart phone so I’m not getting mobile ads. In short, Big Data only works if it has all the relevant information, and it may never have that if consumer activists and privacy opt-in initiatives prevail.
  • Big Data assumes we know how to ask the proper questions and this may not be true. Big Data is only as smart as the researcher who is querying the database or creating the model. Contrary to some popular conceptualizations, it does not recognize patterns on its own, nor does it create statistical models on the fly. While some of this can be automated, the automation itself needs be programmed, the type of model to assume (linear, Bayesian, structural equation, etc.) needs to be explicated, checks on the rationality of the results (colinearity biases, Heywood cases, etc.) built, and so forth. For those of you who read the Retailwire daily blogs, you know that retailers have not come close to figuring out this part. IBM’s Watson may be capable of making remarkable connections remarkably fast, but the days when it is cost effective to ask it how to sell more Charmin to Cottonelle users is not in our near future.

Will Big Data have some big wins? You betcha, if only because every supplier with access will be desperately seeking a highly-publicizable example.  I just finished listening to Retailwire’s webinar on retailer usage of Big Data, and the short answer is they are not using it as much as you may think or in a very sophisticated way. But for every big hit you hear about, you will also hear about the big miscues (see Target and Pregnant Teenager – oops). And you can bet you won’t hear about all the little miscues – the ones where the model says “do this” and “doing this” doesn’t help the business. Because it’s just a model. Because it’s just a model based on imperfect or incomplete data. Because it’s just a human being asking the questions.

I’m off to Purgatory now. Mea culpa, mea culpa, mea maxima culpa.

Originally posted on Greenbookblog.Org. To learn more, call Steve Needel at +1-404-944-0248, write us at moc.l1511250697iamg@1511250697lsaev1511250697etsrd1511250697,  or visit us at www.advancedsimulations.com .