What is Big Data?
Lots of people are asking and answering this question. (This blog included). Scholars in various fields, including the history of science and technology, have begun to tackle it as well.
For good reason. Gobs of private and public funding, influence over policymakers, civil liberties, the future of the planet and the people on it, and, oh yes, the practice of science are (reputedly) at stake.
The most familiar answers to "What is Big Data?" fit into three categories:
1. It's going to change the world.
2. It's going to ruin the world.
3. It's actually nothing new. (Does not exclude #2.)
These are very different answers, but they share a tendency to praise or damn Big Data without giving a satisfactory sense of what exactly Big Data is. Today and tomorrow, I'm going to contrast two approaches to making sense of this category. One tackles Big Data directly; the other goes looking for Big Data in particular sites and actors. As the late Mike Mahoney wrote of the history of computing, the most effective way to answer "What is Big Data?" is to look at particular people, disciplines, institutions, and practices.
|"What is Big Data?" answered in word cloud form, based on the responses of forty "thought leaders" tapped by Berkeley's program in data science. Their full answers: http://datascience.berkeley.edu/what-is-big-data/|
Big Data is a "cultural ideology." That's the answer that sociologist Nathan Jurgenson offers in an essay published a few weeks ago in The New Inquiry. In this essay. Jurgenson neatly weaves together a few threads of the emerging humanistic critique of data utopianism.
First, the promises of Big Data aren't anything new. Rather, they're the latest version of the "rationalist fantasy that enough data can be collected with the “right” methodology to provide an objective and disinterested picture of reality": the old positivist dream of a universal social physics based on patterns in quantitative measurements. The big-data experts may aim to describe society through a bunch of distinct regularities rather than a few master equations, but the totalizing ambition remains the same.
Second, Big Data's bigness generates a dangerous combination of interpretive flexibility and rhetorical certainty. Skillful (or unwary) data scientists can cherry-pick conclusions while dazzling the masses and policy-makers with claims of statistical certainty. Writes Jurgenson, "Big Data can be used to give any chosen hypothesis a veneer of science and the unearned authority of numbers."
Third, it's all just a means of legitmizing and extending existing power relations. Claims of "the end of theory" always constitute efforts (knowing or not) to universalize a particular theory. Views from nowhere "must be unmasked as a view from a very specific and familiar somewhere": in this case, from the eyes of a 23-year-old programmer with libertarian leanings and a dearth of life experience living in the Bay Area, or a CEO in his late 30s seeking to make the business of collecting and analysis of data look like a public service and a lucrative business at the same time. In the process, purveyors of big data reinforce the social categories that structure the data that they collect, and the relationships implicit in these categories.
|Collecting data can reinforce social categories.|
These are all very reasonable critiques, but I wonder whether Jurgenson is aiming at the correct target. For instance, Jurgenson singles out Dataclysm, the popular recent book by OkCupid President Christian Rudder, as the exemplar of the "cultural ideology" that he critiques. In passing, Jurgenson mentions "similarly inferential sciences like evolutionary psychology and pop-neuroscience" and their popularity, along with Big Data, in mass-market social science.
I suspect that the cultural ideology that Jurgenson wants to critique is more a matter of genre than method. Big Data, evolutionary psychology, and neuroscience (overlapping categories, to be sure) can all be deployed in making sensational, seemingly well-evidenced claims about everyday life, the grist of bestsellers and magazine articles. But if it would be improper to draw a bright line between a field of scientific practice and its presentation in popular media, it would be equally misleading to assume that we can understand the essence of the former by looking at the latter.
Jurgenson notes that there are examples of responsible, thoughtful research with large data sets, singling out the Data & Society Research Institute as an example. Nevertheless, he writes, "the positivist tendencies of data science — its myths of objectivity and political disinterestedness — loom larger than any study or any set of researchers, and they threaten to transform data science into an ideological tool for legitimizing the tech industry’s approach to product design and data collection."
If you scan the bookshelf from Freakonomics to Dataclysm and then check out Wired and watch a TED Talk or two, that conclusion may be unavoidable. But if our aim is to answer the question "What is Big Data?" from the (situated!) perspective of the history of science and technology, it will take a different approach. This requires setting aside our "digital dualism" (a useful term coined by Jurgenson) to piece together the material systems and embodied practices that constitute distinctive ways of using computers and data to generate scientific knowledge.
In order words, we should keep in mind:
Tomorrow, I'm going to turn to a book that exemplifies how to ask and answer situated but broadly significant questions about the production and use of large datasets: Hallam Stevens' Life out of Sequence.