An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart


1 Leave a comment on paragraph 1 0 Previous section: Basic Text Mining: Word Clouds, their Limitations, and Moving Beyond

2 Leave a comment on paragraph 2 0 AntConc is an invaluable way to carry out some forms of textual analysis on data sets. While it does not scale to the largest datasets terribly well, if you have somewhere in the ballpark of 500 or even 1,000 newspaper-length articles you should be able to crunch data and receive tangible results. AntConc can be downloaded online from Dr. Laurence Anthony’s personal webpage at http://www.antlab.sci.waseda.ac.jp/software.html. Anthony, a researcher in corpus linguistics among many other varied pursuits, has created this software to carry out detailed textual analysis. Let’s take a quick tour.

3 Leave a comment on paragraph 3 0 Installation, on all three operating systems, is a snap: one downloads the executables directly for OS X or Windows, and on Linux the user needs to change the file permissions to allow it to run as an executable. Let’s explore a quick example to see what we can do with AntConc.

4 Leave a comment on paragraph 4 0 Once AntConc is running, you can import files by going to the File menu, and clicking on either Import File(s) or Import Dir, which would allow you to import all the files within a directory. In the screenshot below, we opened up a directory containing plain text files of Toronto heritage plaques. The first visualization panel is ‘Concordance.’ we type in the search term ‘York,’ the old name of Toronto (pre-1834) and visualize the results (figure 3.5):

5 Leave a comment on paragraph 5 0 3.5-antconc

[insert Figure 3.5 The AntConc interface]

7 Leave a comment on paragraph 7 0 Later in this book, we will explore various ways that you could do this yourself using the Programming Historian - but, for the rest of your career, quick and dirty programs like this can get you to your results very quickly! In this case, we can see various contexts in which ‘York’ is being used: North York (a later municipality until 1998), ties to New York state and city, various companies, other boroughs, and so forth. A simple search for the keyword ‘York’ would reveal many plaques that might not fit our specific query.

8 Leave a comment on paragraph 8 0 The other possibilities are even more exciting. The Concordance Plot traces where various keywords appear in files, which can be useful to see the overall density of a certain term. For example, in the below visualization of newspaper articles, we trace when frequent media references to ‘community’ in the old Internet website GeoCities declined (figure 3.6):

9 Leave a comment on paragraph 9 0 3.6-concordance-plot

[insert Figure 3.6 Concordance plot tool in AntConc]

11 Leave a comment on paragraph 11 0 It was dense in 1998 and 1999, but declined dramatically by 2000 – and even more dramatically as that year went on. It turns out, upon some close reading, that this is borne out by the archival record: Yahoo! acquired GeoCities, and discontinued the neighbourhood and many internal community functions that had defined that archival community.

12 Leave a comment on paragraph 12 0 Collocates are an especially fruitful realm of exploration. Returning to our Toronto plaque example, if we look for the colocates of ‘York’ we see several interesting results: “Fort” (referring to the military installation Fort York), “Infantry,” “Mills” (the area of York Mills), “radial” (referring to the York Radial Railway), and even slang such as “Muddy” (“Muddy York” being a Toronto nickname). With several documents, one could trace how collocates change over time: perhaps early documents refer to Fort York and subsequently we see more collocates referring to North York? Finally, AntConc also provides options for overall word and phrase frequency, as well as specific n-gram searching.

13 Leave a comment on paragraph 13 0 A free, powerful program, AntConc deserves to be the next step beyond Wordle for many undergraduates. It takes textual analysis to the next level. Finally, let’s move to the last of our three tools that we explore in this section: Voyant Tools. This set of tools takes some of the graphical sheen of Wordle and weds it to the underlying, sophisticated, textual analysis of AntConc.

14 Leave a comment on paragraph 14 0 Next section: Voyant Tools

Page 35

Source: http://www.themacroscope.org/?page_id=637