¶ 2 Leave a comment on paragraph 2 0 This big data, however, is only as useful as the tools that we have to interpret it. Luckily, two interrelated trends make interpretation possible: more powerful personal computers, and more significantly, accessible open-source software to make sense of all this data. Prices continue to fall, and computers continue to get more powerful: significantly for research involving large datasets, the amount of Random Access Memory, or RAM, that computers have continues to increase. Information loaded into RAM can be manipulated and analyzed very quickly. Even humanities researchers with limited research budgets can now use computers that would have been prohibitively expensive only a few years ago.
¶ 3 Leave a comment on paragraph 3 0 It is, however, the ethos and successes of the open-source movement that have given digital historians and the broader field of the digital humanities wind in their sails. Open-source software is a transformative concept that moves beyond simply “free” software: an open-source license means that the code that drives the software is freely accessible, and users are welcome to delve through it, make changes that they see fit, and distribute the original or their altered version as they see fit. Notable open-source projects include the Mozilla Firefox browser, the Linux operating system, the Zotero reference-management software system developed by George Mason University’s Centre for History and New Media (CHNM), the WordPress and Drupal website Content Management System (CMS) platforms, and the freely-accessible OpenOffice productivity suite.
¶ 4 Leave a comment on paragraph 4 0 For humanists, then, carrying out large-scale data analysis no longer requires a generous salary or expense account. Increasingly, it does not even require potentially expensive training. Take, by way of introduction, the Programming Historian, an open-source textbook dedicated to introducing computational methods to humanities researchers – itself is written with the open-source WordPress platform. It is a useful introduction, as well, to the potential offered by these programs. They include several that we saw in our preface:
- ¶ 5 Leave a comment on paragraph 5 0
- Python: An open-source programming language, freely downloadable, that allows you to do very powerful textual analysis and manipulation. It can help download files, turn text into easily-digestible pieces, and then provide some basic visualizations.
- Komodo Edit: An open-source editing environment, allowing you to write your own code, edit it, and quickly pinpoint where errors might have crept in.
- Wget: A program, run on the command line, that lets you download entire repositories of information. Instead of right-clicking on link after link, wget can quickly download an entire directory or website to your own computer.
- MALLET: The MAchine Learning for LanguagE Toolkit provides an entire package of open-source tools, notably topic modeling which takes large quantities of information and finds the ‘topics’ that appear in them.
¶ 6 Leave a comment on paragraph 6 0 These four tools are just the tip of the iceberg, and represent a significant change. Free tools, with open-source documentation written for and by humanists, allow us to unlock the potential inherent in big data.
¶ 7 Leave a comment on paragraph 7 0 Big data represents a key component of this third wave of computational history. By this point, you should have an understanding of what we mean by big data, and some of the technical opportunities we have to explore it. The question remains, however: what can this do for humanities researchers? What challenges and opportunities does it present, beyond the short examples provided at the beginning of the chapter?