An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

The Limits of Big Data, or Big Data and the Practice of the History

1 Leave a comment on paragraph 1 0 Previous section: Accessing the Third Wave Today

2 Leave a comment on paragraph 2 0 A quick proviso is in order here. Will Big Data have a revolutionary impact on the epistemological foundation of history? As historians work with ever-increasing arrays of information, we need to consider the intersection of this trend with broader discussions around the nature of history and the past itself. At first glance, larger amounts of data would seem to offer the potential of empirical advances in ‘knowing’ history: the utopian promises of a record of all digital communications (turned into a dystopian reality by the National Security Agency, unfortunately), for example, or the ability to process the entirety of a national census over several years. In some ways, it is evocative of the modernist excitement around “scientific” history.[1] Big Data has substantial implications for historians, as we again move from studies based on positive examples to the findings of overall trends from extensive computational databases. This is similar to debates from the 1970s and 1980s, where quantitative historians faced criticisms around the reproducibility of their results that were derived from early computer databases and often presented only in tables. Yet we believe that for all the importance of Big Data, it does not offer any change to the fundamental questions of historical knowing facing historians.

3 Leave a comment on paragraph 3 0 For all the excitement around the potential offered by digital methods, it is important to keep in mind that it does not herald a transformation in the epistemological foundation of history. We are still working with traces of the past. The past did happen, of course. Events happened during the time before now: lives lived, political dynasties rose and fell, working people made do with the limited resources that they had before them. Most of the data about these experiences and events, however, have disappeared: the digital revolution may make it possible to consider more than before, perhaps even on an order of magnitude.[2] Yet even with terabytes upon terabytes of archival documents, we are still only seeing traces of the past today. While there is a larger debate to be had about the degree of historical knowing possible, we believe that amidst the excitement of Big Data, this is a point that needs to be considered.

4 Leave a comment on paragraph 4 0 History, then, as a professional practice, involves the crafting of this available information and transforming it into scholarly narratives.[3] Even with massive arrays of data, historians do not simply cut and paste findings from computer databases; such an approach would be evocative of the “scissors and paste” model noted by historian R.G. Collingwood.[4] Having more data is not a bad thing. Indeed, as we note throughout, one challenge that we are now grappling with is quite the opposite: having more data is not a bad thing either, but it does bring its own challenges. With more data, in practical terms, there is arguably a higher likelihood that historical narratives will be closer in accordance with past events, as we have more traces to base them on. But this is not a certainty. History is not merely a reconstructive exercise, but also a practice of narrative writing and creation.[5]

5 Leave a comment on paragraph 5 0 Throughout the methodological chapters that follow, the “subjective” appears throughout: decisions about how many topics to search for as we generate a topic model, for example; what words to exclude from computations; what words to look for; what categories of analysis to assume. Underlying this too are fundamental assumptions made by computational linguists, such as the statistical models employed or even premises around the nature of language itself.[6] At a more obvious and apparent level, much of what we do also involves transforming data, or altering these traces of the past into new and in many cases – for the purpose of our narrative construction – more fruitful forms. Many of the techniques discussed in this book involve text, yet many source are more than just texts: they have images, texture, smell, or are located in specific areas. Even if attuned to issues of historical context, these can and often will be lost.

6 Leave a comment on paragraph 6 0 On a philosophical level, however, the digital transformation of sources does not represent a significant challenge to historical practices. Historians are always changing their sources and are always engaged in choices and decisions: what notes to take? What digital photograph to snap? Who to interview? What sources will lead to a better publication? What will my dean/manager/partner think? What will help build my career? To this litany of issues, digital techniques add new questions that we will discuss in this book: how to break up texts (should you separate ‘tokens’ on word breaks, punctuation, or so forth, as discussed in Chapter Two), what topic modeling algorithm to use, or whether to ignore non-textual information, or is a network a useful visualization of the results, or not.

7 Leave a comment on paragraph 7 0 If we do not believe that Big Data is here to correct what other historians are doing in their subdisciplines, however, this also means that we do not believe that the issues briefly outlined above are fundamental shortcomings in our methodologies. Yes, mediums will be transformed, decisions will be made, but this is within the historical tradition. Digital history does not offer direct truths, but only new ways of interpreting and understanding traces of the past. More traces, yes, but still traces: brief shadows of things that were.

8 Leave a comment on paragraph 8 0 But to what end? Trevor Owens draws attention to the purpose behind one’s use of computational power – generative discovery versus justification of an hypothesis. For Owens, if we are using computational power to deform our texts, we are trying to see things in a new light, new juxtapositions, to spark new insight.[7] Ramsay talks about this too in Reading Machines discussing the work of Jerome McGann and Lisa Samuels.[8] “Reading a poem backward is like viewing the face of a watch sideways – a way of unleashing the potentialities that altered perspectives may reveal”. Owen’s purpose in highlighting ‘justification’ against ‘discovery’ though is not to condemn one approach over another, but rather to draw attention to the fact that:

9 Leave a comment on paragraph 9 0 When we separate out the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.”[9]

10 Leave a comment on paragraph 10 0 This then is the nub of big data for digital history. Archaeologists have used computers for decades to try to justify or otherwise span the gap between our data and the stories we would like to tell. A digital archaeology that sat within the digital humanities would worry less about that, and concentrate more on discovery and generation, of ‘interesting way[s] of thinking about this’. So too with digital history. Digital approaches to the past that sit within the digital humanities use our computational power to force us to look at the materials differently, to think about them playfully, and to explore what these sometimes jarring deformations could mean.

11 Leave a comment on paragraph 11 0 Next section: Chapter One Conclusion

12 Leave a comment on paragraph 12 0 [1] Robert William Fogel, “‘Scientific History’ and Traditional History,” in Robert William Fogel and G.R. Elton, Which Road to the Past? Two Views of History (New Haven and London: Yale University Press, 1983).

13 Leave a comment on paragraph 13 0 [2] See, for brief introduction, Keith Jenkins, Rethinking History (London: Routledge, 1991) and Alun Munslow, The Future of History (New York: Palgrave Macmillan, 2010).

14 Leave a comment on paragraph 14 0 [3] In the years ahead, it is possible the digital turn will render the word “narrative” too confining for describing what historians produce. We continue to use the word in this book, however projects like SCALAR and ORBIS are making the term increasingly inaccurate. A more encompassing term may be “historiographies.”

15 Leave a comment on paragraph 15 0 [4] R.G. Collingwood, The Idea of History (London: Oxford University Press, 1965).

16 Leave a comment on paragraph 16 0 [5] A point made by a good number of historians, but see Jenkins and Munslow for concise introductions to this line of reasoning.

17 Leave a comment on paragraph 17 0 [6] A point discussed at the Stanford Digital Humanities Reading Group, as recounted by Mike Widner, “Debating the Methods in Matt Jockers’s Macroanalysis,” Stanford Digital Humanities Bloghttps://digitalhumanities.stanford.edu/debating-methods-matt-jockerss-macroanalysis, accessed 6 September 2013.

18 Leave a comment on paragraph 18 0 [7] Trevor Owens, “Deforming Reality with Word Lens,” Trevor Owens, February 3, 2012, http://www.trevorowens.org/2012/02/deforming-reality-with-word-lens/.

19 Leave a comment on paragraph 19 0 [8] Ramsay, Reading Machines: Toward an Algorithmic Criticism (Urbana: University of Illinois Press, 2011), 33.

20 Leave a comment on paragraph 20 0 [9] Owens, “Deforming Reality.”

Page 21

Source: http://www.themacroscope.org/?page_id=607