An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Big Data

1 Leave a comment on paragraph 1 0 Previous section: The Joys of Big Data for Historians

2 Leave a comment on paragraph 2 0 “How big is big?” we rhetorically ask: big data for literature scholars might mean a hundred novels (“the great unread”),[1] for historians it might mean an entire array of 19th century shipping rosters,[2] and for archaeologists it might mean every bit of data generated by several seasons of field survey and several seasons of excavation and study – the materials that don’t go into the Geographic Information System. For computer scientists, they are often focused not just on materials of a scope that can’t be read, but on volumes of information that elude processing by conventional computer systems, such as Google’s collection or the shocking amount of information generated by experiments such as CERN’s Large Hadron Collider.

3 Leave a comment on paragraph 3 0 For us, as humanists, big is in the eye of the beholder. If it’s more data that you could conceivably read yourself in a reasonable amount of time, or that requires computational intervention to make new sense of it, it’s big enough! These are all valid answers. Indeed, there is some valid hesitancy around the use of the term ‘data’ itself, as it has a faint whiff of quantifying and reducing the meaningful life experiences of the past to numbers. We believe that this book outlines various methods to computationally explore historical data in a way that would previously seem resistant to qualification. With that proviso in mind, of course, we do take ‘big data’ to be a central concept in this book.

4 Leave a comment on paragraph 4 0 As scholars, we have all used varying degrees of datasets and considered them “big.” On one extreme lie the datasets that stretch the constraints of individual researchers and personal computing, such as the monumental eighty-terabyte sweep of much of the publicly-accessible World Wide Web in 2011, made available by the Internet Archive.[3] Such projects may require High Performance Computing and specialized software. But others work with more manageable sets of data: popular music lyrics, proposals or papers submitted to conferences, databases of dissertations, historiographical inquiries. For us, big data is simply more data that you could conceivably read yourself in a reasonable amount of time – or, even more inclusively – information that requires or can be read with computational intervention to make new sense of it.

5 Leave a comment on paragraph 5 0 Historians must be open to the digital turn, thanks to the astounding growth of digital sources and an increasing technical ability to process them on a mass scale.[4] Both trends are discussed in this book. Historians are collectively witnessing a profound transformation in how they research, write, disseminate, and interact with their work. As datasets expand into the realm of the big, computational analysis ceases to be “nice to have” and becomes a simple requirement. While not all historians will have to become fluent with data (just as not all historians are oral historians, or use Geographic Information Systems, or work within communities), digital historians will become part of the disciplinary mosaic. Computational skills may be increasingly viewed as akin to language requirements: in some cases, a nice-to-have, and in other programs, a rote requirement.

6 Leave a comment on paragraph 6 0 In this chapter, we introduce you to what big data means, what opportunities it affords, where it came from, and the broader implications of this “era of big data.” New and emerging research tools are driving cutting-edge humanities research, often funded by transnational funding networks. Historians are asking new questions of old datasets with new tools, as well as finding new avenues on previously inaccessible terrain. After this survey of the current state of affairs, we then turn our eyes to the historical context of this current scholarly moment. These projects must be situated in the context of an original big data scholar, Father Busa and his Index Thomisticus, as well as the more recent shift from “humanities computing” to the “digital humanities.” Finally, we discuss the broader implications of an “era of big data.” Here we see both the joys of abundance, but also the dangers of information overload.[5] The contours of this challenge and opportunity are fascinating, and help anchor the discussion that follows. While there will certainly be bumps on the road ahead to come, we generally see a promising future for an era of big data.

7 Leave a comment on paragraph 7 0 Next Section: Putting Big Data to Good Use: Historical Case Studies

8 Leave a comment on paragraph 8 0 [1] See Moretti, Graphs, Maps, Trees and Margaret Cohen, The Sentimental Education of the Novel (Princeton; New Jersey: Princeton University Press, 1999).

9 Leave a comment on paragraph 9 0 [2] See Trading Consequences, “Trading Consequences | Exploring the Trading of Commodities in the 19th Century,” 2014, http://tradingconsequences.blogs.edina.ac.uk/.

10 Leave a comment on paragraph 10 0 [3] Internet Archive, “80 Terabytes of Archived Web Crawl Data Available for Research,” Internet Archive Blog, October 26, 2012, http://blog.archive.org/2012/10/26/80-terabytes-of-archived-web-crawl-data-available-for-research/.

11 Leave a comment on paragraph 11 0 [4] The foundational text in this field is Cohen and Rosenzweig, Digital History. This work primarily focuses on putting information on the web, whereas we explore various ways to improve your research with that information.

12 Leave a comment on paragraph 12 0 [5] Roy Rosenzweig, “Scarcity or Abundance? Preserving the Past in a Digital Era,” American Historical Review, 108.3 (June 2003): 735-762, available online at http://chnm.gmu.edu/digitalhistory/links/pdf/introduction/0.6b.pdf.

Page 16

Source: http://www.themacroscope.org/?page_id=597