Using Gephi to Explore Web Archive Structures

The evolution of inbound links, Canadian political movements, 2005-2014.
The evolution of inbound links, Canadian political movements, 2005-2014.

In my last post, I discussed how I could take WAT files and extract Gephi graphs from them. In this post, I want to show how I’ve moved past that and am now working with dynamic Gephi graphs. Some of the fruits of this can be seen in the animated GIF at left. It’s been two days of learning, which is among my favourite things to do! Again, this is super preliminary: mostly just liveblogging some of the questions that are popping into my head from these files..

To generate these, I did the following (mostly following the Import Dynamic Data tutorial, as well as getting helpful hints from Micki Kaufman):

  • Drew on a Gephi file generated for each year of these collections (thanks to my RA, Jeremy Wiebe);
  • Implemented a number of tweaks: ran a modularity detection algorithm and coloured clusters accordingly, played with the filter for ‘Topology -> In Degree Range’ and generated versions for 2 and 4 limits of in-bound links, and made sure to extract these to a new workbench;
  • Each workbench was then exported as a GEFX file;
  • I then started a new project, and opened each GEFX file in turn: making sure to select ‘time series’ and filling out the box asking for the date (I used the year value).

The results have been illuminating, although analysis is obviously still to come. I wish there was a way to export the ‘chart’ results that one can generate in the ‘ranking’ section of the workbench, but apparently this doesn’t exist.

Due to some memory restrictions, we’re using a subset of years. For the political interest groups and political parties, we’re using 2005, 2006, 2009, 2010, 2012, 2013, and 2014 (so missing 2007, 2008, and 2011); for the labour dataset, we’re using 2006 (start of collection), 2008, 2009, 2010, 2011, 2013, and 2014 (missing 2007 and 2012). This will be fixed soon.

But what can we see? In the labour union collection, the results looked like this:

You can see how it evolves over time, and by zooming in, we see how the ‘Green Party’ disappears as a locus of activity. Similarly, we can see evolutions in the Canadian political party movement, as the following video demonstrates. Notice how labour disappears as a serious in-bound link destination from 2012 onwards (as we see in the NUPGE disappearance as well as the Canadian Labour Congress’ downgrade).

We’re not learning any new historical tidbits, yet, but we’re beginning to see a few questions being raised. This to me might be the potential of WAT files. The next step will be to look at the link anchor text, and see if we can use that to figure out what on earth might be going on. Are these links to labour unions in the context of a particular political campaign or movement? If so, the hypertext data that the WAT files contains should be useful..

Stay tuned.

And because the visualizations I’m using up above are workaday, here are some nicer pictures:

Canadian Political movement from 2013
Canadian Political movement from 2012
Canadian Political movement from 2010
Canadian Political movement from 2009
Canadian Political movement from 2006
Canadian Political movement from 2005
Canadian Labour movement from 2014
Canadian Labour movement from 2013
Canadian Labour movement from 2011
Canadian Labour movement from 2010
Canadian Labour movement from 2009
Canadian Labour movement from 2008
Canadian Labour movement from 2006

One thought on “Using Gephi to Explore Web Archive Structures

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s