An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

On Topic Modeling

1 Leave a comment on paragraph 1 0 In this section, we explore various ways of creating topic models, what they might mean, and how they might be visualized. We work through a number of examples, so that the reader might find a model for adapting to his or her own work.

2 Leave a comment on paragraph 2 0 We want to suggest to you a number of approaches or tools you can keep in your toolbox, keeping in mind that not all tools are appropriate in all situations, and that you need to sometimes look carefully at the tool to understand that the way the tool views the world will have an impact on your results. The tools we use for examining large volumes of historical texts are not, by and large, built by historians and thus do not necessarily have a world view or affinity for historical ways of thinking. To understand how these digital tools have a kind of agency in their own right it can be more helpful to think of them as robots. They are given goals by the user which they work towards based on instructions written by their creators, and they make judgment calls based on those instructions. In video games, these instructions are sometimes called ‘procedural rhetorics’, in that each process, each set of instructions in the code, embody a particular way of looking at the world.1 William Urrichio argued in 2005 that the rule-sets of games and simulations, the algorithms which represent the world and which interact with the player to change that representation of the world and the possibilities for action within it, traditionally are akin to ‘a structuralist understanding of historical process’.2

3 Leave a comment on paragraph 3 0 As with video games, so too for other kinds of programmatic representations of the world, especially our digital tools for exploring and representing it. That is, the processes, the algorithms, are analogous to historiography.

4 Leave a comment on paragraph 4 0 What kind of world view does the ‘topic modeling’ robot hold? Blei tells us that the point of these probabilistic topic models is to ‘discover the hidden thematic structure in large archives of documents’.3 There are a couple of ideas to unpack here. First, let’s consider the idea of ‘topics’. If you are a literary scholar, you will understand what a ‘topic’ might mean perhaps rather differently than how a historian might understand it (and later on we will look at what a topic might mean in the context of archaeological data).4 Then there is the problem of how do the mathematicians and computer scientists understand what a ‘topic’ might be? To answer that question, we have to wonder what a ‘document’ might be. For the mathematicians, a ‘document’ is simply a collection of words that are found in differing proportions (thus it could be, in the real world, a blog post, a paragraph, a chapter, a ledger entry, an entire book). To decompose a document then to its constituent ‘topics’ we have to imagine a world in which that there is a unique distribution of all possible words, a unique combination of words used in various proportions, that describe any possible topic.

5 Leave a comment on paragraph 5 0 Got that? In the beginning there was the topic. The entire universe of writing is one giant Bulk-Barn wherein its aisles are bins of words – here the bins of Canadian History, there the bins for major league sports (a very small aisle indeed). All documents (your essay, my dissertation, this book) are composed of words plucked from the various topic bins and combined. If we can imagine the world to work in this fashion, then we can reverse-engineer any document to its original constituent bins.

  1. 6 Leave a comment on paragraph 6 0
  2. Bogost 2007 []
  3. Urrichio 2005 []
  4. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012. p77 []
  5. Jockers, Matthew. Macroanalysis 2013 []
Page 76

Source: http://www.themacroscope.org/?page_id=113