|
An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Dynamic Networks in Gephi

1 Leave a comment on paragraph 1 0 Network analysis is great when we are dealing with a static network. Most metric algorithmns are built to work with networks that capture one moment in time (where moment can be defined as everything from a single day, to a week, to a month, to a span of years). Sometimes, it will make sense to consider a pattern of interactions (a network) that took place over a long period of time as a single network. For instance, in archaeological network analysis, given the nature of the material, a single network can encompass two hundred years’ worth of interactions.[1] But other times it would make more sense to see the shape of a network as an evolving, dynamic set of relationships, as in correspondence networks.

2 Leave a comment on paragraph 2 0 One could create a series of networks in time-slices – the exchange of letters between individuals in 1836 as one network; all those in 1837 as another; and so on. This can be a good approach (depending on your question), but it is subject to edge effects: the decision on where to draw the boundary changes the shape of the network. Fortunately, Gephi can deal with dynamic data and with edges that exist for varying durations, and can calculate some metrics on-the-fly (revisualizing on the fly as well).

3 Leave a comment on paragraph 3 0 Screen Shot 2014-06-25 at 12.56.59 PMCaption: A dynamic network. This screenshot shows a dynamic evolving correspondence network of the diplomatic correspondence of the Republic of Texas (using the data extracted in the regex section of chapter 3). It may be seen at http://youtu.be/OvTYN2gN0PA .

4 Leave a comment on paragraph 4 0 In the video of the diplomatic correspondence of the Republic of Texas, node degree is calculated on the fly, and the node is resized automatically. In the video, Gephi is constantly applying a layout-algorithm to the appearing and disappearing nodes. The time slider at the bottom is set to only show those relationships in existence for approximately half a year. Alternatively, that slider can be set to have only one boundary, that is, once a relationship is displayed, it remains active throughout the analysis and visualization.

5 Leave a comment on paragraph 5 0 That brings up the first issue when considering dynamic networks. What does ‘duration’ mean for a network? In a correspondence network, perhaps ‘duration’ means ‘the time the letter is uppermost in one’s mind’. Perhaps it means ‘the time until I write a response’, thus closing the relationship (and which implies that a correspondence network can have both directed and undirected edges).  In the network of land-owning relationships deduced from the archaeology and epigraphy of Roman stamped bricks, perhaps ‘duration’ there could be considered open ended (in broad strokes, the stamps represent a kind of rental agreement, and so unless we find a parcel of land suddenly being leased under a different name, we can assume the relationship continues).

6 Leave a comment on paragraph 6 0 Let’s imagine that in the case of the letters that the duration of the relationship between sender and receiver will be one month. Gephi currently imagines dynamic networks as consisting of continuous time (as opposed to blocks of discrete points of time).

7 Leave a comment on paragraph 7 0 In chapter 3, we used regex to get the letters index into a csv file that we could import into Gephi (here on this website). To make that csv file dynamic, we have to do a bit more work.  We need to transform those dates into MM-DD-YYYY format. Using what we learned about regex

8 Leave a comment on paragraph 8 0 While this might not be the most elegant way of doing what we need to do, it does get done what we need done .

9 Leave a comment on paragraph 9 0
Find (\bDecember\b) ([0-9])

10 Leave a comment on paragraph 10 0 Replace 12-

11 Leave a comment on paragraph 11 0 In Notepad++, we would look for (\<December\>) ([0-9]).  Of course, one could just search for the word ‘December’, and replace it with ‘12-‘, but remember to include the space after the word December. Otherwise you would end up with

12- 24 1844.

Replace all of the months with their digit equivalents (making sure to use 0 where appropriate, ie 09 for September), and then we can replace that space between the day and the year with a hypen.  We are looking then for a digit, a space, and then a set of four digits:

12 Leave a comment on paragraph 12 0
Find ([0-9]) ([0-9]{4})
Replace -

That is, replace group 1 with itself, insert a hypen, and replace group 2 with itself.

13 Leave a comment on paragraph 13 0 Save the file. Our csv file looked like this:

14 Leave a comment on paragraph 14 0
Source,target,date
Sam Houston,J. Pinckney Henderson, December 31 1836

15 Leave a comment on paragraph 15 0 And now looks like this:

16 Leave a comment on paragraph 16 0
Source,target,date
Sam Houston,J. Pinckney Henderson, 12-31-1836

17 Leave a comment on paragraph 17 0 There’s just one problem left. Gephi reads dates in dd-mm-yyyy format: but we have our dates as mm-dd-yyyy. We can fix this using regex. The first issue is that some of our days only have single digits, rather than double: 5 rather than 05.

Find: -([0-9])-
Replace: -0-

18 Leave a comment on paragraph 18 0 This finds a single digit with a hyphen on other side, and inserts a zero before it. Now, we need to find the mm-dd and switch it around to be dd-mm.

19 Leave a comment on paragraph 19 0 Screen Shot 2014-06-25 at 3.36.27 PM

20 Leave a comment on paragraph 20 0

21 Leave a comment on paragraph 21 0  

22 Leave a comment on paragraph 22 0 (It would also be a good idea to search for any commas with a space after them, replacing comma+space with just a comma).

23 Leave a comment on paragraph 23 0 Open the csv file in a spreadsheet. Before you proceed much further, it would be a good idea to examine the contents of the file for any errors that have occurred – missing dates, dates that did not get converted in our regex pattern because of spacing, bad OCR, and so on. Fix them!

24 Leave a comment on paragraph 24 0 Gephi does not need an end-date to render this network dynamically. But it will consider your dates to have a duration of exactly the time listed – one day. (It could well be that this is all you need for your analysis, but you will need to think through the implications.) Once a letter is sent, a reciprocal letter, or action dependent on that letter, could be imagined to take place within the following month.[2] To see what this looks like, save your csv file (having cleaned it up). Open Gephi. Select a ‘new project’. Click on the Data Laboratory tab. Click on ‘import spreadsheet’ and select your csv file. Make sure you have ‘edges table’ selected. Gephi should read all of your edges, and load the network without issue. To make it dynamic at this point:

25 Leave a comment on paragraph 25 0 Click on ‘edges’ under ‘data table’.

26 Leave a comment on paragraph 26 0 Click on ‘merge columns’.

27 Leave a comment on paragraph 27 0 Under ‘available columns’ select ‘date’. Click on the right pointing arrow to move ‘date’ under ‘columns to merge’.

28 Leave a comment on paragraph 28 0 In the drop down menu beside ‘merge strategy:’ select ‘create time interval’.

29 Leave a comment on paragraph 29 0 A new options window opens, ‘time interval creation options’. You’ll see that ‘start time column’ and ‘end time column’ both have ‘date’ selected. There are two radio buttons, ‘parse numbers’ and ‘parse dates’. Click ‘parse dates’ and change the date format to dd-mm-yyyy.  Click ok: and you now have a dynamic network in Gephi! At the bottom of the Gephi window is a button with ‘enable timeline’ on it. Click on this.

30 Leave a comment on paragraph 30 0 Screen Shot 2014-06-25 at 2.55.15 PM

31 Leave a comment on paragraph 31 0 A timeline bar will open on the bottom of the screen, but it appears otherwise blank. To the far left is a cogwheel icon. Click on this to adjust the settings for our timeline. One of the options is ‘set custom time bounds’. The minimum and maximum under ‘bounds’ are the length of the timeline; the start and end under interval is the ‘slice’ of the network you wish to see at any given moment.

32 Leave a comment on paragraph 32 0 Set your maximum bounds to just after the period we have letters for, and try setting your interval for a year at a time. Click OK. In the timeline you’ll now have years marked. Click and drag the slider so that it only covers ‘1836’. You can also set whether this slider has one edge or two, under the cogwheel >> set time format. Play with the settings to see what works best for you, thinking too about what the implications are for your analysis.

33 Leave a comment on paragraph 33 0 Hit the play button to see the network of correspondence develop over time. Arrows will jump around the nodes as relationships happen – in this case, mail is sent from a source to a target. If you choose a layout and leave it running while the timeline is playing, the layout will adjust dynamically. You can also choose some dynamic metrics like clustering coefficient and degree. When you click on these, you will also be asked for time settings for these values to be calculated.

34 Leave a comment on paragraph 34 0 Screen Shot 2014-06-25 at 3.11.50 PM

35 Leave a comment on paragraph 35 0 Run the metric, and then resize the nodes by dynamic degree. A little ‘infinity’ loop will appear beside the ‘apply’ button. Click the loop, and ‘apply’ becomes ‘auto apply’. Click ‘auto-apply’, and the nodes will change size depending on their dynamic degree as the slider plays.

36 Leave a comment on paragraph 36 1 If you are using a windows machine, Clement Levallois has a utility for creating csv files with the dates correctly formatted for you from an existing spreadsheet.


37 Leave a comment on paragraph 37 0 [1] Graham 2006 Ex Figlinis does this with relationships of landholding around Rome in the first three centuries. See also Graham 2014 On Connecting Stamps: Network Analysis and Epigraphy Nouvelles d’archaeologie 135: xx-yy.

38 Leave a comment on paragraph 38 0 [2] If you wish to have a duration for a relationship, you will need to create a new column with the end date listed. One way of achieving this relatively quickly would be to open the csv in a spreadsheet. Make a new column called ‘end-date’. Insert the end dates as appropriate. You could make this a bit faster by using combinations of LEFT, RIGHT, and CONCATENATE functions in excel, for instance. =LEFT(a2,2) for instance will return the first two characters from cell a2. In the next column, you could then transform that value to whatever new value you want, and then use concatenate to join that value to the rest of the date. Then, once you have the start-date and end-date as you want them, in Gephi, you simply select BOTH columns to perform the merge operation.

Page 96

Source: http://www.themacroscope.org/?page_id=525