¶ 1 Leave a comment on paragraph 1 1 How big is the data in “big data”? The answer varies, depending on who is answering, from large corporate data warehouses to academic researchers. For humanities researchers, it has a variety of meanings, all valid. For literature scholars, “big data” might represent corpuses larger than what could feasible be read by a single researcher: the thousands of novels published in Victorian England, for example, as evocatively proposed by scholar Franco Moretti in his Graphs, Maps, Trees , or the “great unread”, the 99% of literature outside of the canon as proposed by Margaret Cohen. An archaeologist would correctly define their “big data” as every bit of data generated over several seasons of field survey, and over several seasons of excavation and study: the materials that don’t go into the better-used and understood Geographic Information System (GIS) databases. For a historian, “big data” might be as expansive as the entire records of 19th century trans-Atlantic shipping rosters, tracing the ebbs and flows of commodities as they travel around the world.
¶ 2 Leave a comment on paragraph 2 4 Big is in the eye of the beholder. As scholars, we have all used varying degrees of datasets and considered them “big.” On one extreme lie the datasets that stretch the constraints of individual researchers and personal computing, such as the monumental eighty terabyte sweep of the entire publicly-accessible World Wide Web in 2011, made available by the Internet Archive. Such projects may require High Performance Computing and specialized software. But others work with more manageable sets of data: popular music lyrics, proposals submitted to conferences, databases of dissertations, historiographical inquiries. For us, big data is simply more data that you could conceivably read yourself in a reasonable amount of time – or, even more inclusively – information that requires computational intervention to make new sense of it.
¶ 3 Leave a comment on paragraph 3 4 Historians must be open to the digital turn, thanks to the astounding growth of digital sources and an increasing technical ability to process them on a mass scale. Both trends are discussed in this book. We are collectively witnessing a profound transformation in how historians research, write, disseminate, and interact with their work. As datasets expand into the realm of the big, computational analysis ceases to be a “nice to have” and becomes a simple requirement. While not all historians will have to become fluent with data (just as not all historians are oral historians, or use Geographic Information Systems, or work within communities), digital historians will become part of the disciplinal mosaic.
¶ 4 Leave a comment on paragraph 4 1 In this chapter, we want to introduce you to what big data means, what opportunities it brings, where it came from, and the broader implications of this “era of big data.” New and emerging research tools are driving cutting-edge humanities research, often funded by transnational funding networks. They are asking new questions of old datasets with new tools, as well as finding new avenues on previously inaccessible terrain. After this survey of the current state of affairs, we then turn our eyes to the historical context of this current scholarly moment. These projects must be situated in the context of the original big data scholar, Father Busa and his Index Thomisticus, as well as the more recent shift from “humanities computing” to the “digital humanities.” Finally, we then discuss the broader implications of an “era of big data.” Here we see both the joys of abundance, but also the dangers of information overload. The contours of this challenge and opportunity are fascinating, and help anchor the discussion that follows. While there will certainly be bumps on the road ahead to come, we generally see a promising future for an era of big data.