An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Data Mining Tools: Techniques, and Visualizations

1 Leave a comment on paragraph 1 0 Previous section: Bringing It All Together: What’s Ahead in the Great Unread

2 Leave a comment on paragraph 2 0 In this chapter, we set up and explore some basic text mining tools, and consider the kinds of things these can tell us. We move on to more complex tools (including how to set up some of them on your own machine rather than using the web-based versions). Regular expressions are an important concept to learn that will aid you greatly; you will need to spend some time on that section. Finally, you will learn some of the principles of visualization, in order to make your results and your argument clear and effective.

3 Leave a comment on paragraph 3 0 Now that we have our data – whether through wget or by using Outwit Hub or some other tool – now we have to start thinking about what to do with it! Luckily, we have many tools that will help us take a large quantity of information and “mine” it for the information that we might be looking for. These can be as simple as a word cloud, as we begin our chapter with, or as complicated as sophisticated topic modeling (the subject of Chapter Four) or network analysis (Chapter Six and Seven). Some tools are as easy to run as clicking a button on your computer, and others require some under-the-hood investigation. This chapter aims to introduce you to the main contours of the field, providing a range of options, and also to give you the tools to participate more broadly in this exciting field of research. A key rule to remember is that there is no ‘right’ or ‘wrong’ way to do these forms of analysis: they are tools and for most historians, the real lifting will come once you have the results. Yet we do need to realize that these tools shape our research: they can occasionally occlude context, or mislead us. These questions are at the forefront of this chapter.

4 Leave a comment on paragraph 4 0 Next section: Basic Text Mining: Word Clouds, their Limitations, and Moving Beyond

Page 33

Source: http://www.themacroscope.org/?page_id=631