Dynamic Networks in Gephi

Network analysis is great when we are dealing with a static network. Most metric algorithms are built to work with networks that capture one moment in time (where moment can be defined as everything from a single day, to a week, to a month, to a span of years). Sometimes, it will make sense to consider a pattern of interactions (a network) that took place over a long period of time as a single network. For instance, in archaeological network analysis, given the nature of the material, a single network can encompass two hundred years’ worth of interactions. But other times it would make more sense to see the shape of a network as an evolving, dynamic set of relationships, as in correspondence networks.

One could create a series of networks in time-slices – the exchange of letters between individuals in 1836 as one network; all those in 1837 as another; and so on. This can be a good approach (depending on your question), but it is subject to edge effects: the decision on where to draw the boundary changes the shape of the network. Fortunately, Gephi can deal with dynamic data and with edges that exist for varying durations, and can calculate some metrics on the fly (re-visualizing on the fly as well).

Screen Shot 2014-06-25 at 12.56.59 PM — Caption: A dynamic network. This screenshot shows a dynamic evolving correspondence network of the diplomatic correspondence of the Republic of Texas (using the data extracted in the regex section of chapter 3). It may be seen at http://youtu.be/OvTYN2gN0PA .

But what does ‘duration’ mean for a network? In a correspondence network, perhaps ‘duration’ means ‘the time the letter is uppermost in one’s mind’. Perhaps it means ‘the time until I write a response’, thus closing the relationship (and which implies that a correspondence network can have both directed and undirected edges). In Graham’s 2006 network of land-owning relationships deduced from the archaeology and epigraphy of Roman stamped bricks, ‘duration’ was considered to be open ended (in broad strokes, the stamps represent a kind of rental agreement, and so unless we find a parcel of land suddenly being leased under a different name, we can assume the relationship continues). The decisions that the historian makes while assembling, cleaning, and representing her data become the objects that the computer manipulates, so these issues are theoretically significant!

In this section, we do not provide step-by-step guidance for creating a dynamic network in Gephi because the way that Gephi imagines ‘duration’ and ‘time’ in networks is currently being modified (as of July 2014) in preparation for a new release. Gephi currently imagines dynamic networks as consisting of continuous time (as opposed to blocks of discrete points of time). In all likelihood, the broad strokes of what we describe here will still apply, though the details will differ.

Consider these two videos:

These videos capture our Gephi window with the Texan correspondence displaying as a dynamic network. In the first, the timeline slider is set to imagine the ‘duration’ of the relationship represented by a letter as being one of short duration, while in the second, once a letter is sent, the relationship persists. A force-atlas layout algorithm is being applied to those appearing and disappearing nodes. As you watch those two videos, you can see quite clearly that how we imagine ‘duration’ makes a significant difference to how the network is visualized, and indeed, how we could interpret it. What kinds of historical arguments are best supported by the first video? What kinds are best supported by the second? And what kinds are best supported by our initial, static, network? These are questions that most uses of network analyses and visualizations for history do not ask. We will leave these questions hanging for the reader.

For reference, here is how we created the dynamic network. Please remember that newer versions of Gephi might differ significantly in how they treat dynamic networks.

We imported our Texan network data into Gephi (version 0.8.2) (after having reformatted our dates so that they were in dd-mm-yyyy arrangement).
We created a dynamic time interval column by:
Selecting ‘edges’ under ‘data table’.
Then selecting ‘merge columns’ from the options at the bottom.
Under ‘available columns’ we selected ‘date’. We clicked on the right pointing arrow to move ‘date’ under ‘columns to merge’.
In the drop down menu beside ‘merge strategy:’ we selected ‘create time interval’.
A new options window opened called, ‘time interval creation options’. ‘Start time column’ and ‘end time column’ both had ‘date’ automatically selected (if we had another column with the end date of the relationship – perhaps we could have a ‘response’ column with the dates a response to a letter was sent – we could have used that as the ‘end date’). There are two radio buttons, ‘parse numbers’ and ‘parse dates’. We clicked ‘parse dates’ and selected the date format dd-mm-yyyy. We clicked ‘ok’. Our data was now a dynamic Gephi network! At the bottom of the Gephi window a button with ‘enable timeline’ on it appeared. We clicked on that.
A timeline bar opened on the bottom of the screen, but it appeared otherwise blank. To the far left was a cogwheel icon. We clicked on this to adjust the settings for our timeline. One of the options was ‘set custom time bounds’. The minimum and maximum under ‘bounds’ were the length of the timeline; the start and end under interval is the ‘slice’ of the network we wished to see at any given moment.

If you are using a windows machine, Clement Levallois has a utility for creating csv files with the dates correctly formatted for you from an existing spreadsheet.

—-

The text above didn’t make the final cut for the book; it is much changed from its original incarnation , which you can view on the open draft website. That piece has a bit more detail concerning regex patterns etc. There are some errors there, not least of which are caused by WordPress’s habit of stripping out my regex patterns, mistaking them for malicious code!