Chapter 2 Footnotes & Links

These links all worked as of September 18 2015. Please ping us if you discover broken links.

1 Hal Abelson, Ken Ledeen and Harry Lewis (2008), Blown to Bits: Your Life, Liberty, and Happiness after the Digital Explosion, Boston, MA: Pearson,

2 We are not the first to use such a metaphor, of course. Elaine G. Toms and Heather L. Obrien (2008), “Understanding the Information and Communication Technology Needs of the E-Humanist,” Journal of Documentation, 64(1), 102–130 use the similar metaphor of the “e-humanist’s workbench.”

3 The differences largely have to do with philosophy; free software has a much more significant political point, as opposed to the license-focused open source approach. For more on what these differences may imply in your own research, see Richard Stallman, “Why Open Source misses the point of Free Software,”

4 Richard Stallman (1985, revisions 1993), “The GNU Manifesto,”, accessed 29 July 2013.

5 Christopher M. Kelty (2008), Two Bits: The Cultural Significance of Free Software, Durham, NC: Duke University Press, pp. 99–100.

6 The processes leading to this are discussed in Christopher M. Kelty (2008), Two Bits: The Cultural Significance of Free Software, Durham, NC: Duke University Press, p. 108.

7 See “The Open Source Definition,” OpenSource.Org,, accessed 29 July 2013.

8 “Welcome to CommentPress,”,, accessed 29 July 2013.

9 Tim Wu (2010), The Master Switch: The Rise and Fall of Information Empires, New York, NY: Knopf, p. 287.

10 David Vise (14 December 2004), “Google to Digitize Some Library Collections; Har- vard, Stanford, New York Public Library Among Project Participants,” Washington Post, E05. Accessed via Lexis|Nexis. See also

11 Ibid.

12 Stephen Castle (6 May 2005), “Google Book Project Angers France,” The Independent (London), p. 33. Accessed via Lexis|Nexis.

13 Edward Wyatt (25 May 2005), “Challenging Google,” New York Times, E2. Accessed via Lexis|Nexis.

14 Yuki Noguchi (13 August 2005), “Google Delays Book Scanning; Copyright Concerns Slow Project,” Washington Post, D01.

15 Edward Wyatt (25 May 2005), “Challenging Google,” New York Times, E2. Accessed via Lexis|Nexis.

16 Edward Wyatt (20 October 2005), “Publishers Sue Google over Scanning,” New York Times, Finance 13.

17 Miguel Helft and Motko Rich (30 October 2008), “Google Settles Suit over Putting Books Online,” New York Times, Finance 20.

18 See

19 For more information on HathiTrust itself, see The New York Times article is found at See also the overview at Heather Christenson (2011), “HathiTrust,” Library Resources & Technical Services, 55(2), 93–102.

20 Julie Bosman (13 September 2011), “Lawsuit Seeks the Removal of a Digital Book Collection,” New York Times, B7.

21 See Matthew L. Jockers, Matthew Sag and Jason Schultz (3 August 2012), “Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google,” Social Science Research Network,, accessed 26 July 2013.

22 Ibid.

23 Paula Findlen, “How Google Rediscovered the 19th Century,” Chronicle of Higher Education: The Conversation Blog, 22 July 2013,, accessed 13 August 2013.

24 Roger C. Schonfeld and Jennifer Rutner (7 December 2012), “Supporting the Changing Research Practices of Historians,” ITHAKA S+R,, accessed 30 July 2013.

25 Ian H. Witten, Marco Gori and Teresa Numerico (2007), Web Dragons: Inside the Myths of Search Engine Technology, San Francisco, CA: Morgan Kaufmann, pp. 182–183.

26 Ibid., p. 185.

27 Jennifer Slegg (21 May 2013), “Google’s Market Share Drops as Bing Passes 17%,” Search Engine Watch,, accessed 31 July 2013.

28 Ted Underwood (Summer 2014), “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago,” Representations, 127(1), 65.

29 This problem has been addressed in depth by one of the co-authors, in Ian Milligan (December 2013), “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010,” Canadian Historical Review, 94(4).

30 For background on the project, Cold North Wind has a fairly detailed website. Please visit “About Paper of Record,” (accessed 21 June 2012).

31 Maya R. Gupta, Nathaniel P. Jacobson and Eric K. Garcia (2007), “OCR Binarization and Image Pre-Processing for Searching Historical Documents,” Pattern Recognition: The Journal of the Pattern Recognition Society, 40, 389.

32 “Why is it Called Python?” Python documentation, last updated 31 July 2013,, accessed 31 July 2013.

33 For many issues concerning the Web, Internet, their underlying architectures, gover- nance, and so on, Wikipedia is actually one of the best sources for gaining that initial understanding to help make sense of what else you might find. In this case, the article on “link rot,” Wikipedia contributors, “Link rot,” Wikipedia, The Free Encyclopedia,, accessed 8 July 2014.

34 Which can be found at See also Ian Milligan (11 February 2013), ‘Exploring the Old Canadian Internet: Spelunking in the Internet Archive’,

35 Many tutorials for getting the most out of Zotero exist; the Profhacker column in The Chronicle of Higher Education for instance has written about Zotero several times and contains many useful pointers to tutorials, videos, tablet and smartphone apps that extend the functions of Zotero, making it even more useful. See

36 As discussed in Ian Milligan (28 July 2013), “Quick Gender Detection Using Wolfram|Alpha,”, available online, Lincoln Mullen provides a repository with tools for gender detection of names using the R statistical package (we discuss R in more depth in subsequent chapters). For more, see

37 Alison Prentice (2001), “Vivian Pound was a Man? The Unfolding of a Research Project,” Historical Studies in Education/Revue d’histoire de l’ ́education, 13(2), 99–112,

38 Ibid., 99.

39 For example, at “WGET for Windows (Win32),”∼bpuype/wget/, accessed 31 July 2013.

40 William J. Turkel and Alan MacEachern (May 2008), “The Programming Historian: About this Book,” available online,, accessed 12 August 2013.

41 William J. Turkel and Alan MacEachern (May 2008), Programming Historian, ch. 2,, accessed 12 August 2013.

42 Peter Holdsworth, “Author page,” Figshare,

43 A useful tutorial by Mike Klaczynski scrapes movie titles from a database Keep in mind, however, that some of these steps (or perhaps all) could become obsolete if changes materially once it is formally launched (that is, no longer in “beta”).

44 See for more information.

45 A tutorial by Jens Finnas gives detailed instructions at Also note that Finnas maintains a useful list of resources for “Data Journalism” at (in Swedish).

46 In later chapters we describe the statistical programming language R and how to accomplish various text-mining tasks within it. Code written in R works on PC or Mac or Linux without issue, provided you have the latest version of the R lan- guage installed on your machine. R is extremely versatile. Once you become famil- iar with it, you will see that you can accomplish many of these kinds of “helper” tasks within R itself, which is advantageous since you can do all of your analyses within the one environment. Ben Marwick has written a “script” for R (in essence, a recipe of commands) to separate the rows in a single Excel file into separate text files, which is available at If you’d like to use this script right away, skip ahead to our sections on R in Chapter 4 to get a sense of the basics, and then follow Ben’s instructions. We thank Ben for writing and contributing this, and several other pieces, of R code. Eke’s code is also available at Try it yourself on John Adams’ Diary (which you can download at Incidentally, having the CSV broken into separate files arranged in chronological order (so, sort them by date written before running the macro) means that, when the zipped folder of these files is uploaded to Voyant Tools, you can explore trends over time, reading from left (older) to right (newer), in the basic Voyant Tools interface.

47 Franco Moretti (2005), Graphs, Maps, Trees: Abstract Models for Literary History, London, New York: Verso, p. 4.

48 Margaret Cohen (1999), The Sentimental Education of the Novel, Princeton, NJ: Princeton University Press, p. 23.