Testing Cohesiveness in GeoCities Neighbourhoods by Extracting and Plotting Locations

The 'heartland' of GeoCities?
The ‘heartland’ of GeoCities?

This weekend, I went back to my old GeoCities archive to play around with the methods I experimented with in my last post on the Wide Web Scrape. One question that I’ve been curious about was whether GeoCities was a community (drawing on an old debate that waged in the 1990s and beyond about virtual communities), and how we could see that: in volunteer networks, neighbourhood volunteers, web rings, guestbooks, links to each other, etc. As with all of my blog posts, this is just me jotting a few things down (the lab notebook model), not a fully thought-out peer reviewed piece. But keep reading for the pictures and discussion. 🙂

GeoCities and Neighbourhoods: An Extremely Short Introduction

Welcome to the Neighbourhood! (December 20th, 1996) - click through for the Wayback Machine version of this page.
Welcome to the Neighbourhood! (December 20th, 1996) – click through for the Wayback Machine version of this page.

Before its 1999 acquisition by Yahoo!, GeoCities was arranged into a set of neigbourhoods. It was presented as a “cityscape,” with streets, street numbers, recognizable urban and geographical landmarks (one might live on a virtual Fifth Avenue or on a festive Bourbon Street), and this was strongly emphasized in press releases and user communications. The central metaphor that governed the admission of new users into GeoCities was that of homesteading. A conscious choice, keeping with the spirit of the frontier so common during the early days of the Web (harkening to its communalist roots), as it captured the then-common heady expansionary rhetoric. Users would need new homes, and these new homes would be located in neighbourhoods. This sat with the visions of GeoCities founders David Bohnett and John Rezner (who joined in August 1995), who saw “[n]eighbourhoods, and the people that live in them, [as providing] the foundation of community.” When selecting sites, users were presented with a list of various places where their site might belong. Those writing about “[e]ducation, literature, poetry, philosophy” would be encouraged to incorporate their site into Athens; political wonks to CapitolHill; small businesspeople or those working from home in Eureka, and beyond. Some neighbourhoods came with restrictions and guidance, such as the more protective and censored EnchantedForest for children. Others were much wider in scope, such as “Heartland” focusing on “families, pets, hometown values.”

I could talk your ear off (that paragraph is a compression of several pages I’ve written) but you should get the broad picture by now.

Did Neighbourhood Members Share Similarities? The Case of Heartland Athens

One of the questions that I’ve been playing with is to test the homogenity of these neighbourhoods. If we extract all the images from CapitolHill (the political neighbourhood), do we see similarities or differences? Did they comment on each others guestbooks? How were web rings organized? Did they like linking to each other more than externally?

So the question here is did they discuss similar places all around the world? Could NER and GIS tell us anything?

For this blog post, then, I ran the full text of the Heartland neighbourhood through NER and extracted location names. This text came from the Archive Team snapshot. Sites could have been edited after 1999, but new sites were created under vanity profiles rather than neighbourhoods from that point onwards, so we are dealing with a lot of material that stopped being updated around ~ 2000 from my deep reading. Again, Heartland – by far the largest GeoCities neighbourhood as this visualization demonstrates – focused on things like “family” and “hometown” values.

What places were discussed?

A simple plot of places mentioned, without incorporating frequency data. A good first glimpse.
A simple plot of places mentioned, without incorporating frequency data. A good first glimpse.

Starting at this point, we get a sense of the simple geographic spread. At a glance: main topics of discussion are the eastern United States, tapering off in the western States as reflects population dropping down, as well as England, Germany, and to a far lesser extent France and Spain. There are clusters of interest and disinterest. At a glance, the literal “Heartland” of America – the midwestern States – seem overrepresented. Let’s include frequency data and see what happens:

Zooming in on the main part of the map with frequency data (larger the circle, more times individual site is mentioned).
Zooming in on the main part of the map with frequency data (larger the circle, more times individual site is mentioned).

A few things to point out. If “Canada” was mentioned it puts the circle in the middle of the administrative unit – in the Canadian case, that circle ends up north of Saskatchewan in the southern part of the Northwest Territories. You can see similar centre-of-units in the United States, England, Scotland, Ireland, France, Norway, etc, as well as states (California, for example).

Here’s where things get interesting. Heartland is disproportionately drawn from a few midwestern and southern States. Indeed, the most frequent location entities are Iowa, Indiana, Virginia, Kentucky, and Ohio. West coast cities are under-represented, as are southern Canadian ones. In Europe, we see major cities appearing: Rome, for example, as well as London. I suspect these are tourist destinations at first glance, but this will need further investigation. As I get more time, it would be neat to map against population to get a better list of what areas are over and under-represented. From my glance, however, the Heartland did attract a particular clustering: around specific, central American values. Given the breadth of the topic, this is surprising.

When we take the “Athens” neighbourhood – the higher education one – and do a similar thing, we see different results:

Aha! There's the west coast.
Aha! There’s the west coast.

We see some clustering around cities this time – notably San Francisco and Los Angeles, activity in Texas (related to the University of Texas at Austin), Oklahoma (the university), and a few other hubs. It is less concentrated than the Heartland example. And if we pull our gaze back…

Screen Shot 2014-06-23 at 12.22.17 PM

Still focused on the United States, southern Canada, and England – unsurprising for an English-speaking website c. the late 1990s – but we also have Australia, as well as a large amount of mentions of Israel, Hong Kong, South Korea, and a scattering elsewhere.

What’s Next?

This is the sort of thing that I only had a morning to play with, so I don’t pretend to have anything fully, fully developed here. Again, the next step will be to:

  • NER all the neighbourhoods and extract GeoData;
  • plot them so you could compare them all?
  • some way to find over and under-represented areas?
  • keyword-in-context to better refine some of these results. i.e. we’re seeing a blip for “Washington” which I’m putting in Washington State. Some of that would be “University of Washington,” but clearly much of that is also Washington DC, George Washington, etc. etc.
  • better use QGIS to make maps that are more fun to look at.
  • continue to search for community, using the many other tools that we’re tinkering with when playing with WARC files.

One thought on “Testing Cohesiveness in GeoCities Neighbourhoods by Extracting and Plotting Locations

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s