国产视频

In Short

Beware the Big Data Gospel

Data
Shutterstock.com

Earlier this month, I published an on CNN.com that examined and described the limits of big data as an instrument of progress. I won鈥檛 rehash the arguments of that article here (I do hope you鈥檒l read it), but I want to respond to two critiques of the piece from Marco L眉bbecke, a professor of operations research, and Thomas Davenport, author of Big Data at Work, on a .

L眉bbecke鈥檚 counter-claim that 鈥渂ig data saves lives鈥 is emotionally manipulative and unsupported by any evidence he puts forth. The antithesis Davenport sets up between 鈥渄ata and analytics鈥 on the one hand and 鈥渦naided human intuition鈥 on the other is a dangerously misleading simplification that wrongly conflates intuitive knowledge with arbitrary subjectivity. Davenport also fails to account for the fact that data compilation and analysis is performed by human beings and is therefore neither automatic nor objective. As philosopher Michael Polanyi tacit knowledge, 鈥渨e can know more than we can tell.鈥 Polanyi鈥檚 point remains true no matter how many and one has at one鈥檚 disposal.

Quantitative analysis of data has been central to what we鈥檝e come to call science since well before the word 鈥渟cience鈥 existed. Babylonian astronomers, and used that data to predict future eclipses. As science and technology have evolved over the millennia, so too have tools for gathering data and for analyzing it. Many of those tools are invaluable to the scientific endeavor. Adherents, like L眉bbecke and Davenport, to the church of big data, fail to see what : 鈥淏ig Data is the ultimate expression of a mode of rationality that equates information with truth and more information with more truth, and that denies the possibility that information processing designed simply to identify 鈥榩atterns鈥 might be systematically infused with a particular ideology.鈥 My attempt is to interrogate that ideology, not to deny that the creation and analysis of quantitative data is a necessary part of science.

A close reading of the examples L眉bbecke puts forth to illustrate the life-saving potential of Big Data illustrate the hollowness of Davenport鈥檚 claim that 鈥渁t the core of analytical decision-making is not soft fad, but hard science.鈥 L眉bbecke cites polio vaccination campaigns as an 鈥渙utstanding example of the way that Big Data saves lives.鈥 His evidence is a paper at the Centers for Disease Control (CDC), but his reading conflates the very real ability of vaccines to save lives with the life-saving ability of convoluted analytical techniques about how effective vaccines are. The relevant question is the virtue of analytical techniques about how vaccines ought to be optimally applied, not the virtue of the vaccines themselves.

The CDC paper concludes that 鈥渟ustained intense immunization efforts鈥 are better than 鈥渨avering commitments鈥 to immunization. I don鈥檛 doubt this is true enough. Common sense would dictate that sustained efforts are better than wavering commitments. But what value does data-driven analysis add to this proposition? Analytic techniques, L眉bbecke points out, yield the claim that the Global Polio Eradication initiative (GPEI) has and will save between $40 and $50 billion from 1988 to 2035, and that Vitamin A delivered along with polio vaccines accounts for a further savings of between $17 billion and $90 billion.

Do the numbers $17-90 billion tell us anything that the words 鈥渓ots of money鈥 do not? The CDC journal article goes on to quote the director of Rotary International鈥檚 anti-polio campaign: 鈥淲e regularly use the $40鈥50 billion estimate of net benefits of the GPEI as we raise funds to finish polio eradication.鈥 This goes to the point I was trying to make in the original piece. It鈥檚 not that anyone should really have confidence that the polio eradication campaign saved $45 billion +/- $5 billion. It鈥檚 that saying so is an effective fundraising technique. Pretending that a range of $17-90 billion conveys more information than 鈥渁 lot of money鈥 is where an uncritical acceptance of the virtue of data goes off the rails.

L眉bbecke is correct in his generic call for careful analysis; but he doesn鈥檛 follow through on his own prescription.

It鈥檚 simply a category mistake to attempt to come up with a specific number for the economic impact of polio eradication. It is not as if there is some accurate figure, say $47,253,238,334, which more sophisticated methodology will allow us to pin down. No such number exists, and all the economists in all the business schools can鈥檛 reliably find it. A world in which fewer people die of polio is a different world and, I would argue, a better one. The true case for vaccination is a moral one that rests on lives saved and people saved from the ravages of polio, not on a dollar figure of benefit to the economy.

However, the polio example isn鈥檛 as laughable as L眉bbecke鈥檚 other example purporting to demonstrate the life-saving benefits of big data: a that discusses a business-school study of the number of 鈥渃ounterterror agents鈥 the US needs. L眉bbecke endorses the model that Kaplan uses, in which the 鈥渘umber of counterterror agents drives the rate with which [terrorist] plots are detected.鈥 But Kaplan鈥檚 model is ludicrously oversimplified. He doesn鈥檛 clearly define who 鈥渃ounterterror agents鈥 are. Do police officers, DEA agents, and bureaucrats with the Department of Homeland Security count? Do customs agents? Do US Marshals?

Is the probability that a terrorist plot is uncovered really simply a function of the number of agents, as in Kaplan鈥檚 model, and not of factors like the agents鈥 intelligence, legal constraints and technological tools? Kaplan and L眉bbecke highlight that though the 鈥渕odel suggests an optimal staffing level of only 2,080 agents,鈥 in 2004 the FBI had 2,398 agents 鈥渄edicated to counterterrorism鈥. (L眉bbecke incorrectly states that 2004 is the most recent year for which FBI staffing figures are publicly available, though a quick search finds this , which gives a number of 3,445 FBI agents 鈥渁ddressing counterterrorism matters鈥 in 2009.) In any case, the juxtaposition of the number Kaplan鈥檚 model spits out with the number of FBI counterterrorism agents in 2004 is hardly, as Kaplan characterizes it, 鈥渋nteresting,鈥 let alone of tangible life-saving benefit, as L眉bbecke claims.

The paradox at the heart of the argument L眉bbecke and other cheerleaders for big data make is that they claim to place great value on evidence as opposed to intuition. But rather than present analytical evidence for the value of 鈥渆vidence,鈥 they merely assert that it is tremendously useful and expect us to believe them.

L眉bbecke鈥檚 examples point to the silliness of big data鈥檚 claim to epistemic superiority, but they don鈥檛 adequately illustrate the damage that can be done by big data evangelists like Davenport. To understand that damage, one must parse the political economy of data creation and analysis, something I began to do . In short, using data along the lines Davenport advocates imposes costs on society unequally. As , 鈥淭here鈥檚 a real threat that the negative effects of algorithmic decision-making will disproportionately burden the poorest and most marginalized among us.鈥

These are not new fights. Steven Shapin, an historian of science, was writing about the 17th century when he remarked, 鈥渋t is just when the authority of long-established institutions erodes that the solutions to such questions about knowledge come to have special point and urgency鈥ethod, broadly construed, is the preferred remedy for problems of intellectual disorder.鈥 Blind faith in the superiority of 鈥渂ig data鈥 or 鈥渨ell-designed analytics鈥 does not resolve underlying intellectual discord about how society ought to guard itself against terrorism or structure its economy.

L眉bbecke and Davenport seek objective certainty where it is not attainable. They do not seriously wrestle with the limitations of data-driven analysis but merely make a fetish of it.

More 国产视频 the Authors

Konstantin Kakaes
Konstantin Kakaes

Future Tense Fellow; International Security Program Fellow; National Fellow, 2013

Beware the Big Data Gospel