The Limits of Big Data, or Big Data and the Practice of History

1 Leave a comment on paragraph 1 4 Will Big Data have a revolutionary impact on the epistemological foundation of history? As historians work with ever-increasing arrays of information, we need to consider the intersection of this trend with broader discussions around the nature of history and the past itself. At first glance, larger amounts of data would seem to offer the potential of empirical advances in ‘knowing’ history: the utopian promises of a record of all digital communications, for example, or the ability to process the entirety of a national census over several years. In some ways, it is evocative of the modernist excitement around “scientific” history.1 Big Data has substantial implications for historians, as we again move from studies based on positive examples to the findings of overall trends from extensive computational databases. This is similar to debates from the 1970s and 1980s, where quantitative historians faced criticisms around the reproducibility of their results which were derived from early computer databases and often presented only in tables. Yet we believe that for all the importance of Big Data, it does not offer any change to the fundamental questions of historical knowing facing historians.

2 Leave a comment on paragraph 2 4 For all the excitement around the potential offered by digital methods, it is important to keep in mind that it does not herald a transformation in the epistemological foundation of history. We are still working with traces of the past. Our belief, inspired by numerous philosophers of history, is that the past did happen. Events happened during the time before now: lives lived, political dynasties rose and fell, working people made do with the limited resources that they had before them. Most of the data about these experiences and events, however, has disappeared: the digital revolution may make it possible to consider more than before, perhaps even on an order of magnitude.⁠2 Yet even with terabytes upon terabytes of archival documents, we are still only seeing traces of the past today. While there is a larger debate to be had about the degree of historical knowing possible, we believe that amidst the excitement of Big Data, this is a point that needs to be considered.

3 Leave a comment on paragraph 3 5 History, then, as a professional practice, involves the crafting of this available information and transforming it into scholarly narratives. Even with massive arrays of data, historians do not simply cut and paste findings from computer databases; such an approach would be evocative of the “scissors and paste” model noted by historian R.G. Collingwood.⁠3 Having more data is not a bad thing: in practical terms, there is arguably a higher likelihood that historical narratives will be closer in accordance with past events, as we have more traces to base them on. But this is not a certainty. History is not merely a reconstructive exercise, but also a practice of narrative writing and creation.⁠4

4 Leave a comment on paragraph 4 1 Throughout the methodological chapters that follow, the “subjective” appears throughout: decisions about how many topics to include in a model, for example, what words to exclude from computations, what words to look for, what categories of analysis to assume. Underlying this too are fundamental assumptions made by computational linguists, such as the statistical models employed or even premises around the nature of language itself.5 At a more obvious and apparent level, much of what we do also involves transforming data, or altering these traces of the past into new and in many cases – for the purpose of our narrative construction – more fruitful forms. Many of the techniques discussed in this book involve text, yet many source are more than just texts: they have images, texture, smell, or are located in specific areas. Even if attuned to issues of historical context, these can and often will be lost.

5 Leave a comment on paragraph 5 2 On a philosophical level, however, the digital transformation of sources does not represent a significant challenge to historical practices. Historians are always changing their sources and are always engaged in choices and decisions: what notes to take? What digital photograph to take? Who to interview? What sources will lead to a better publication? What will my dean/manager/partner think? To this litany of issues, digital techniques add new questions that we will discuss in this book: how to break up texts (should you separate ‘tokens’ on word breaks, punctuation, or so forth, as discussed in Chapter Two), what topic modeling algorithm to use, or whether to ignore non-textual information or not.

6 Leave a comment on paragraph 6 6 If we do not believe that Big Data offers a fundamental corrective to the practice of history, however, this also means that we do not believe that the issues briefly outlined above are fundamental shortcomings in our methodologies. Yes, mediums will be transformed, decisions will be made, but this is within the historical tradition. Digital history does not offer truths, but only a new way of interpreting and understanding traces of the past. More traces, yes, but still traces: brief shadows of things that were. Even with the most advanced computer, it is still up to the historian to put these traces together.

