¶ 2 Leave a comment on paragraph 2 0 We have entered an era of Big Data. Big Data emerges from the context of constant, always-on, notificated, checked-ins, NSA-metadata-gleaned, pervasive computing world. Even our thermostats, as we write this on a chilly January morning, are watching what we do to generate insights from our Big Data lives. This makes decisive movement towards digital methods, over the next ten or twenty years, imperative for the profession. As IBM noted in 2012, “90% of the data in the world today has been created in the last two years alone.” Yet while Big Data is often explicitly framed as a problem for the future, it has already presented fruitful opportunities for the past. The most obvious place where this is true is archived copies of the publicly accessible Internet. The advent of the World Wide Web in 1991 has had revolutionary effects on human communication and organization, and its archiving presents a tremendous body of non-commercialized public speech. There is a lot of it, however, and large methodologies will be needed to explore it. It is this problem that we believe makes the adoption of digital methodologies for history especially important.
¶ 3 Leave a comment on paragraph 3 0 As we saw above, historians have been through these questions before. In the 1960s, large censuses met punchcard computing to result in significant scholarly contributions that continue to contextualize more focused studies today. Now, as the digital humanities flourish, we can see the current historical interest in them as owing its roots to that initial flurry of interest. Putting this into historical context, if the first wave of computational history emerged out of humanities computing, and the second wave developed around textual analysis (and H-Net, Usenet, and GIS), we believe that we are now on the cusp of a third revolution in computational history. There are three main factors that make this instrumental: decreasing storage costs, with particular implication for historians; the power of the Internet and cloud computing; and the rise of open-source tools. We are asking similar questions, in many cases, to the 1960s pioneers: just with more powerful and even more accessible tools (for all the frustrations that the tools discussed in this book can occasionally produce, they luckily are not punch card based).
¶ 4 Leave a comment on paragraph 4 0 Significant technological advances in how much information can be stored herald a new era of historical research and computing that we need to prepare for now. Historical methods need to develop in order to keep up with where our profession might go in the next ten or twenty years. In short, we can retain more of the information produced every day, and the ability to retain information has been keeping up with the growing amount of generated data. As author Jason Gleick argued:
¶ 5 Leave a comment on paragraph 5 0 The information produced and consumed by humankind used to vanish – that was the norm, the default. The sights, the sounds, the songs, the spoken word just melted away. Marks on stone, parchment, and paper were the special case. It did not occur to Sophocles’ audiences that it would be sad for his plays to be lost; they enjoyed the show. Now expectations have inverted. Everything may be recorded and preserved, at least potentially.
¶ 6 Leave a comment on paragraph 6 0 This has been made possible by the corollary to Moore’s Law (which held that the number of transistors on a microchip would double every two years), Kryder’s Law. He argues, based on past practice, that storage density will double approximately every eleven months. While this law may be more descriptive than predictive, the fact remains that storage has been getting cheaper over the last ten years and has enabled the storage and hopeful long-term digital preservation of invaluable historical resources.
¶ 7 Leave a comment on paragraph 7 0 We store more than we ever did before, and increasingly have an eye on this digital material to make sure that future generations will be able to fruitfully explore it: the field of digital preservation. The creation or generation of data and information does not, obviously, in and of itself guarantee that it will be kept – for that we have the field of digital preservation. In 2011, humanity created 1.8 zettabytes of information. This is not an outlier: from 2006 until 2011, the amount of data expanded by a factor of nine.
¶ 8 Leave a comment on paragraph 8 0 This data takes a variety of forms, some accessible and some inaccessible to historians. In the latter camp, we have walled gardens and proprietary networks such as Facebook, corporate databases, server logs, security data, and so forth. Save a leak or forward-thinking individuals, historians may never be able to access that data. Yet in the former camp, even if smaller than the latter one, we have a lot of information: YouTube (seeing 72 hours of video uploaded every single minute); the hundreds of millions of tweets sent every day over Twitter; the blogs, ruminations, comments, and thoughts that make up the publicly-facing and potentially archivable World Wide Web. Beyond the potentialities of the future, however, we are already in an era of archives that dwarf previously conceivable troves of material. While this book is not about accessing these archives in particular, it does concern itself with the methods necessary to access these sorts of datasets. Historians need to begin to think computationally now so that our profession is ready to access this data in the next generation.
¶ 9 Leave a comment on paragraph 9 0 The shift towards widespread digital storage, preserving information longer and conceivably storing the records of everyday people on an ever more frequent basis, represents a challenge to accepted standards of inquiry, ethics, and the role of archivists. How should historians respond to the transitory nature of historical sources, be it the hastily deleted personal blogs held by MySpace, the destroyed websites of Geocities? How can we even use large repositories such as the over two million messages sent over USENET in the 1980s alone? Do we have ethical responsibilities to website creators who may have had an expectation of privacy, or in the last had no sense that they were formally publishing their webpage in 1996? These are all questions that we, as professionals, need to tackle. They are, in a word, disruptive.
¶ 10 Leave a comment on paragraph 10 0 It is important to pause briefly, however, and situate this claim of a revolutionary shift due to ever-bigger data sets into its own historic context. Humanists have long grappled with medium shifts and earlier iterations of this Big Data moment, which we can perhaps stretch back to the objections of Socrates to the written word itself. As the printing press and bound books replaced earlier forms of scribed scholarly transmission, a similar medium shift threatened existing standards of communication. Martin Luther, the German priest and pivotal figure in the Protestant Reformation, argued that “the multitude of books [were] a great evil;” this 16th century sentiment was echoed as well by Edgar Allen Poe in the 19th century and Lewis Mumford as recently as 1970. Bigger is certainly not better, at least not inherently, but it should equally not be dismissed out of hand.
¶ 12 Leave a comment on paragraph 12 0  A great overview of Big Data can be found in Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution that Will Transform How We Live, Work, and Think (Boston: Eamon Dolan Book, 2013).
¶ 16 Leave a comment on paragraph 16 0  Chip Walter, “Kryder’s Law,” Scientific American, 25 July 2005, http://www.scientificamerican.com/article.cfm?id=kryders-law.
¶ 17 Leave a comment on paragraph 17 0  John Gantz and David Reinsel, “Extracting Value from Chaos” (IDC iView, June 2011), http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf.
¶ 18 Leave a comment on paragraph 18 0  Twitter, “Total Tweets Per Minute | Twitter Developers,” Twitter.com, November 2012, https://dev.twitter.com/discussions/3914; YouTube, “Statistics – YouTube,” YouTube, May 29, 2013, http://www.youtube.com/yt/press/statistics.html.