¶ 2 Leave a comment on paragraph 2 0 In this chapter we have introduced you to the key terms that define the digital humanities, have advanced an argument that we are all digital historians now as we use a variety of tools from Google to newspaper databases, have provided cautionary notes and then showed some of the unplumbed depths stretching out before us. This all has implications for rethinking on a practical (if not epistemological) level around what these digital tools will have about our current approaches to studying the past.
¶ 3 Leave a comment on paragraph 3 0 Franco Moretti, in his groundbreaking 2005 Graphs, Maps, Trees, sketched out the potentials that this field had for the study of English literature. Scholars had generally focused on a canon of around two hundred novels, which Moretti pointed out was a mere fraction of the sheer output of 19th century novels:
¶ 4 Leave a comment on paragraph 4 0 [C]lose reading won’t help here, a novel a day every day of the year would take a century or so. … And it’s not even a matter of time, but of method: a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole.
¶ 5 Leave a comment on paragraph 5 0 This has been described, elsewhere by Margaret Cohen, as the “great unread.” In this lies the important metaphor of the macroscope. A way to pull back, to find broader structures within data, while keeping one’s inquiries firmly rooted in the humanistic tradition. As noted, this does not produce ‘truth’ or more scientific outcomes, but rather provides a new process of studying the past. There are “great unreads” everywhere, with potential implication for the pre-digital past.
¶ 6 Leave a comment on paragraph 6 0 Grappling with these datasets is not new. For historians, the census is probably the best traditional example of a “great unread.” It is an assemblage of traces of the past without parallel: a record of every person recorded by the census takers: names, birth dates, occupations, locations, and so forth. This record of millions of people who lived in the past has been studied in parts and chunks: initially case studies of individual cities, or painstaking tabulations, using statistical methods, of other dimensions. The computational wave of the 1960s and 1970s allowed for rudimentary computational processing of ever-increasing datasets. Finally, new digital methods within the last few years have seen the application of Natural Language Processing methods to these bodies of data: finding individuals who moved districts, changed occupations, beginning to sketch out complicated models of change over time with millions of people. While a person can probably identify within a margin of error that a 20-year-old man named Frank Smith with a wife named Marie, and two children is the same person, 10 years later, as a 30-year-old man named Frank Smith, a wife named Marie, and three children, a computer has difficulty. This is due to OCR constraints, having to identify minor issues, but also the numerous inferences and rules needed.
¶ 7 Leave a comment on paragraph 7 0 These “great unreads” are everywhere, presenting alluring topics for scholars. Entire annals of parliamentary records, in varying states of transcription and OCR, stretching back centuries or more. Certainly, historians have an understanding of many of the “important speeches”: the lofty invocations that launched or settled wars, connected nations, or began political scandals or other skulduggery. Much of what happens in a legislature, however, is routine, casting light on the people sitting inside of it and the broader society. In Australia and Canada, for example, private members spend a not inconsequential amount of time giving private statements: the accomplishments of local sports teams, notable citizens, entrepreneurs, infrastructure deficiencies, birthdays, wedding anniversaries, and other issues of local concern. There would be an ebb and flow in these.
¶ 8 Leave a comment on paragraph 8 0 Big data will accelerate this process, as the arrays of data being generated are being created on a previously unimaginable scale. As historians begin to turn their attention to the 1990s and 2000s, as the Internet and the World Wide Web become in many cases necessary sources for understanding the social and cultural past, exploring this “great unread” will become necessary. Internet comments, blogs, tweets, and everything will combine to present an indispensable source. If newspapers form the foundational source for much of 20th century historiography, an accelerating process as they go online, these networks of communication will be equally important as we begin to study the 21st. Could you, we rhetorically ask, even do justice to many topics situated in the 1990s and beyond, without drawing on the World Wide Web? In the next two chapters, we specifically focus on what one could learn from large quantities of text. In this chapter, we have learned how to collect it and how to begin to use it; in the next, let us put it to good use.