geocities badge

I Had a Website: The Practices of Archive Team and the Internet Archive in Archiving GeoCities

geocities badge

Conclusion

It is important to note that these two teams do not work in complete isolation and often collaborate: the Internet Archive provides stable storage for Archive Team's torrent, and Archive Team provides the Internet Archive with WARC files and URL lists to crawl through. Regardless, I've attempted to demonstrate the different approaches to archiving GeoCities by the two teams. The Internet Archive offers an archive of GeoCities that prioritizes public access over freedom of usage: prioritizing websites marked as important by the public, storing websites for easy access through the Web, and displaying WARC files as rendered views. Archive Team offers an archive of GeoCities that prioritizes freedom of usage over public access: attempting to archive indiscriminately as many websites as possible, storing websites without reliance on a central server, and only offering WARC files without rendered views.

Like traditional archives, web archives require decisions on selection and maintenance of materials and their storage. The Internet Archive data stored in the Sun Microsystems's Santa Clara Data Center relies on the physical infrastructure and maintenance of the servers. Reflecting on the archival process, Archive Team founder Jason Scott believes that he should have shipped hard drives with copies of the GeoCities archive to multiple locations, so multiple computers could seed the torrent. Saving digital work is ironically dependent on tangible natural and social resources; without them, web archives are nonfunctional. As Scott describes, “there's a chance that if you're hit by a car, things will disappear that maybe shouldn't disappear” (Scott, “Q&A: The Long Term Prospects ”).

The work of web archives like the GeoCities archives by Archive Team and the Internet Archive also ironically reveals that the Web is not permanent storage. Wendy Hui Kyong Chun states in Programmed Visions:

“If our machines' memories are more permanent, if they enable a permanence that we seem to lack, it is because they are constantly refreshed — rewritten — so that their ephemerality endures, so that they may “store” the programs that seem to drive them” (Chun 170).

Chun argues that in order for software (and thus the Web and websites) to endure, its regeneration relies on the deletion and degeneration of its previous form. So, by revealing previous iterations of what appears to be the same website, the Internet Archive and Archive Team's archival efforts expose 1) the “ephemerality” of the Web and websites and 2) that the Web and websites have a history. Even sites archived in the Archive Team's torrent no longer offer the same user experience as their original form: MIDI files sound different with modern hardware, images hosted on other websites are lost, Java applet code has long since deprecated (Connor). The Web is “constantly refreshed”; without this constant energy, websites are lost to closures of web hosting services or lack of archival support. The web crawling and scraping of GeoCities by Archive Team and the Internet Archive thus uncovers that the Web is not permanent continuous storage but requires a constant energy to maintain.

GeoCities is representative of a Web now lost. While today's Web is based on “profiles” that directly map people to their web counterparts, abstracting away any unnecessary details, GeoCities relied on geographical metaphors for users to understand the Web and its community structure. I propose a return to the geographic, the material, and the tangible when considering the Web and the products produced and serviced on the Web today; through this framework, we can reveal the hidden labor, maintenance, and energy in the Web often unseen.

back arrow next arrow