¶ 2 Leave a comment on paragraph 2 0 Let’s begin to delve into things, shall we? We like the (rather common) metaphor of a “toolkit” for historians beginning to tackle Big Data problems: an assortment of various software programs that can each shed new light on the past. Some of these programs have explicit tool-like names: MALLET, the MAchine Learning for LanguagE Toolkit; others, more methods-based names such as Voyant, the Stanford Topic Modeling Toolkit; others, more whimsical names such as Python, a programming language named after the 1970s British comedy series Monty Python’s Flying Circus. They all have two shared characteristics: they are free and open source, and historians have fruitfully employed them all in the past. We will introduce several tools throughout this book, but there are more of course. Some other useful places to explore are the DiRT directory of Digital Research Tools at http://dirtdirectory.org/ and the Text Analysis Portal for Research (TAPoR) at http://tapor.ca/. We find TAPoR especially useful as it provides good write-ups of digital tools, provides historical context, and helps one learn about the broader field as they click around the various sites.
¶ 3 Leave a comment on paragraph 3 0 In Stephen King’s On Writing, A Memoir of the Craft, he tells us a story about his Uncle Oren. It’s an instructive story for historians. The screen door was broken. Uncle Oren lugged the enormous toolbox that had once belonged to his father to the door, found that only a small screwdriver was required, and fixed the door. In response to his young nephew’s question about why didn’t just grab the screwdriver in the first place, Uncle Oren said, “I didn’t know what else I might find to do once I got out here, did I? It’s best to have your tools with you. If you don’t you’re apt to find something you didn’t expect and get discouraged.” Working with digital materials is a bit like King’s suggestions for writing. You want to have a variety of tools handy, in order to ask different kinds of questions, and to see different kinds of patterns. There is no ‘canon’ of big data tools, but the ones we mention in this chapter and the next are increasingly becoming part of the standard fluency of digital historians.
¶ 4 Leave a comment on paragraph 4 0 One tool that we highly suggest that historians use, whether they primarily engage with big data research or with conventional work, is the research tool Zotero. Developed by the Roy Rosenzweig Center for History and New Media at George Mason University, Zotero is a one-click shop for gathering, organizing, sharing, and citing sources. Available at https://www.zotero.org/ as either an extension for the open-source Mozilla Firefox browser (it runs as a program within the browser) or as a standalone application, Zotero is a good platform to begin using and saving your material. Once you have it installed, you can automatically begin to extract source information. If you go to an article or book listing on Amazon.com, for example, or your institution’s library website, a little picture of a book will appear in the address bar of your website – one click, and the information is added to your database. Even more promising, if you are in a repository like JSTOR, one click will also add the full text of the article and then index it for searching. Finally, once you want to cite material, you can add citations in Word or OpenOffice through Zotero, in the citation format that you want. No more painstaking by-hand changing from Chicago Style footnotes to MLA-style embedded notes! While Zotero is invaluable, it does occasionally have some problems when interacting with other programs: it works quite well with Microsoft Word, for example, but sometimes applications built within its ecosystem can break. Still, the ever-present reminder of the Zotero icon in the upper right hand of a Firefox browser, or in the start-up menu of a desktop computer, reassures that your research database is always at hand.
¶ 5 Leave a comment on paragraph 5 0 There is also the problem of ‘link-rot’. The ecosystem of links that pins digital materials together is prone to decay. If you searched for ‘http://geocities.com/’ in for instance Google Books (perhaps you are interested in the early history of the world wide web, a time when it seemed everyone had a ‘home page’ hosted on Geocities), the results you retrieve will contain links to pages that are now dead. Zotero has a function that allows you to keep a ‘snapshot’ of any webpage you visit, and to store it within your library as part of its citation (it stores the snapshot locally on your computer, or any other computer on which you’ve installed Zotero). The problem is no one else can see this snapshot. Another service, called ‘WebCite’ can be used to create a permanent citation to that web resource for you, which you can then share. In essence, WebCite creates a copy, and provides a permanent address for it. Even if the original is taken off-line, a copy will live at http://www.webcitation.org/. Finally, archived versions of websites exist in many corners of the online world, from Google’s cache (click on the down arrow beside a search result to see the ‘cached’ version; this is typically only the most recent version of the page indexed by Google’s spiders), to the Internet Archive Wayback Machine. There are others, but the Memento service from the Library of Congress (http://www.mementoweb.org/) brings these together into a portal service. While we will return to Zotero once or twice in the chapters that follow, to point you towards specific plugins that can enhance it for Big Data research, we believe that any historian can fruitfully benefit from having it installed, and from being aware of these other services.
¶ 6 Leave a comment on paragraph 6 0 The tools we use all presuppose different ways of looking at the world, and many of them have a bias towards American datasets and linguistic constructions. Wolfram Alpha, for instance (the so-called ‘computational knowledge engine’) allows one to parse text automatically, assigning each name it finds a gender tag. But what of the name ‘Shawn’? Is that a male name, or a female name? Shawn Graham (male) Shawn Colvin (female). Who compiled the list of names for this tool? How were they tagged? Who decides that ‘Shawn’ is a male or female name? What about names that at one point were predominantly male (Ainsley) but are now more typically used by females? Alison Prentice, in her brave mea culpa ‘Vivian Pound was a man?’ discusses how her research into the history of women members of the University of Toronto’s Physics Department revolved around the figure of Vivian Pound, whom she thought was a woman:
¶ 7 Leave a comment on paragraph 7 0 …it was certainly a surprise, and not a little humbling, to learn in the spring of 2000 that Pound, a physicist who earned a doctorate from the University of Toronto in 1913, was not a female of the species, as I had thought, but a male. In three essays on early twentieth-century women physicists published between 1996 and 1999, I had erroneously identified Vivian Pound not only as a woman, but as the first woman at the University of Toronto to earn a Ph.D. in physics.
¶ 8 Leave a comment on paragraph 8 0 In Prentice’s case, the dataset she was using – simple lists of names – led her to erroneous outcomes. Her work demonstrates how to reflect and build outwards from such reflection. For as we use computational analysis, a list of names might become a ‘tool’ to use: and tools are rarely straightforward and neutral. This is an essential component to using digital tools (indeed, tools of any kind). In later chapters we will discuss in more detail how the worldviews built into our tools can lead us similarly astray.
¶ 10 Leave a comment on paragraph 10 0  “Why is it Called Python?” Python documentation, last updated 31 July 2013, http://docs.python.org/2/faq/general.html#why-is-it-called-python, accessed 31 July 2013.
¶ 11 Leave a comment on paragraph 11 0  For many issues concerning the web, internet, their underlying architectures, governance, and so on, Wikipedia is actually one of the best sources for gaining that initial understanding to help make sense of what else you might find. In this case, the article on ‘Link Rot’ Wikipedia contributors, “Link rot,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Link_rot&oldid=615587779, accessed 8 July 2014.
¶ 12 Leave a comment on paragraph 12 0  Which can be found at archive.org/web/. See also Ian Milligan, ‘Exploring the Old Canadian Internet: Spelunking in the Internet Archive’ ianmilligan.ca 11 February 2013, http://ianmilligan.ca/2013/02/11/exploring-the-old-canadian-internet-spelunking-in-the-internet-archive/.
¶ 13 Leave a comment on paragraph 13 0  Many tutorials for getting the most out of Zotero exist; the Profhacker column in The Chronicle of Higher Education for instance has written about Zotero several times and contains many useful pointers to tutorials, videos, tablet and smartphone apps that extend the functions of Zotero making it even more useful. See http://chronicle.com/blogs/profhacker/tag/zotero
¶ 14 Leave a comment on paragraph 14 0  As discussed in Ian Milligan, “Quick Gender Detection Using Wolfram|Alpha,” 28 July 2013, ianmilligan.ca, available online, http://ianmilligan.ca/2013/07/28/gender-detection-using-wolframalpha/. Lincoln Mullen provides a repository with tools for gender detection of names using the R statistical package (we discuss R in more depth in subsequent chapters). For more, see http://lincolnmullen.com/blog/analyzing-historical-history-dissertations-gender/.
¶ 15 Leave a comment on paragraph 15 0  Alison Prentice, “Vivian Pound was a Man? The Unfolding of a Research Project,” Historical Studies in Education/Revue d’histoire de l’éducation, 13, 2 (2001): 99-112, http://historicalstudiesineducation.ca/index.php/edu_hse-rhe/article/view/1860/1961.