Sorting the Pile: Making Sense of A Networked Archive

The installation of Christo and Jeanne-Claude’s “Gates” in New York City was celebrated by a massive gathering in Central Park. As we walked through the Gates, snapping pictures with our digital cameras, we noticed hundreds of other people doing the same. It occurred to us that the kind of activity this artwork engendered, impromptu community building and prolific content creation, made it a perfect subject for our first networked book experiment.

We decided to see if we could build an archive, then edit and shape it using existing software. We accomplished the first part of this quite handily, gathering over 3,000 images through the Flickr network, 75 story links on our del.icio.us page, and 50 blog posts with 27 comments. But as we moved into the next phase of our project, editing the assembled archives, we quickly discovered limitations inherent in the software, which does not allow community participation in the organization of content. Programs like Flickr, created primarily as collection, storage, and sharing facilitators, do not set up useful editorial structures for understanding an archive. The problem of how to get the collective to find meaning in the collection is the focus of this paper which describes: our experience building a collective memory archive with social software; the limitations we came up against when we began the editorial process; questions the project raised about the role of the editor in a networked environment; and how social software might be modified to enhance the editorial process.

Kim White
The Institute for the Future of the Book, University of Southern California

INTRODUCTION

The Institute for the Future of the Book undertook the Gates Memory Project as an experimental effort: part of a broader ambition to understand the many ways in which books are rapidly evolving in the digital environment. We designed this project to address the issues on a macro and a micro level.

On the micro level, the project allowed us to examine how a networked archive is created and processed in the communal web environment. It allowed us to identify the advantages and limitations social software lends to the authoring process. And it gave us some ideas about how far we can push the current applications and what modifications are necessary to serve the needs of this emergent literary form.

On the macro level, the project helped us understand the essential nature of networked books and how they are structured. It allowed us to take a closer look at the processing of networked books and how authoring in the networked environment differs from the editorial process of a paper book. We also examined the notion of memory and preservation on the web and asked what role networked archives will play in the creation of digital memory institutions.

THE GATES MEMORY PROJECT

Collaborating with Flickr
Rather than build our own software for the database, we decided to see if we could use tools that were already in existence. We got the idea for the Collective Memory project after we visited the Gates and noticed they were inspiring millions of digital photographs. In order to optimize participation while interest was still at its peak, we needed to have a site up and ready to receive submissions a few days after we came up with the idea. To do this we decided to collaborate with Flickr, an online database that allows users to upload photos, attach metadata, search and visualize photos by metadata, and annotate other user's photos with comments or tag them as favorites. Flickr also supports group forums created by users for the collection and discussion of a particular image type, like the "squared circle" group. The ability to tap into Flickr's network of photo enthusiasts, and to draw content from them was invaluable. Flickr also offers an open API that allows users to create their own search and visualization functions. Notable examples include Michal Migurski's mappr visualization.

Our expectation when we began this project, was that Flickr's architecture would support collection and storage, along with editing and presentation of the images. But when we started trying to organize the archive, we came to a dead-end. First, we discovered that the Flickr interface is primarily designed for uploading, storing, and viewing images, and only secondarily for performing searches on them. There’s a search function that is capable of sorting via tags that uploaders have chosen to add to their contributions. You can look through a list of tags & click on a tag to see all photos from all users with that tag. Or, you can search for a specific tag; you can also do a Boolean AND or OR search for more than one tag. (A tag, in Flickr, is a string without spaces.) But that's as advanced as a search in Flickr can be.

For an example of a better system, look at something like Amazon.com, where searching the database is paramount. While each object in the database has a review (the book, say, with its reviews), its displayed form isn’t as constant, and creating new forms are facilitated. Look at Amazon Lists, for example, which allow customers to create new & interesting content by adding to what’s already there.

Another problem is Flickr’s clear distinction between what uploaders can do and what viewers can do – particularly in respect to metadata (or tags). Uploaders can attach as many tags as they want to an image they upload. However, they cannot add metadata to photos they did not upload. The only exception to this is the generic "favorites" tag, which allows viewers to tag and assemble a collection of favorites. This "favorite" tag does not contribute enough specific metadata to support meaningful editing of the archive.

MODIFYING SOFTWARE

Most of what we would like to do with Flickr is based on the idea of communal tagging and the capacity for advanced search and visualization functions using the attached metadata. Flickr software would need extensive modification in order to provide this. Before we undertook a custom software project, we had to define, precisely what kind of editorial structure we wanted the archive to support? In an April 14, 2005 blog entry entitled where do we go from here? We proposed three very different ways to approach the communal editing task for this project.

FIRST IDEA Unless we actively engage them, memories fade.

We've now collected over 3,000 photographs of the Gates in Flickr. Let's try to imagine an interface that captures the perishability of memory and at the same time gives a compelling reason to interact - i.e. if you don't, it will all disappear.

This could work in many ways. Here is one:

Rating
Visitors could watch a random slideshow of the archive. Next to each picture there would be sliding meter between "remember" and "forget." If you love the photo you will slide it to the top. If you hate the photo, you will slide it all the way down to forget. Most will choose somewhere in the middle. These actions will be registered on some sort of visualization of the entire archive - a grid, say, with every photo arranged as a small tile. Depending on visitors' actions during the slideshow, some tiles will be brighter than others. Some will have paled slightly, some will be very faded out, and some will have disappeared altogether. The more people visit, the more nuanced this memory map will become, and photos will fluctuate in and out of memory. You can either let it go on indefinitely, or eventually freeze it and voila: your lasting, definitive document.

SECOND IDEA Archive as Landscape

This interface allows visitors to “play” with the images to build a collective work of art that reconstructs the Gates photo archive as virtual pathways.

We would turn each of the 3,000+ Gates photos into a tile which can be dragged and dropped onto a surface/screen that is scrollable horizontally and vertically. The objective is to layout a string of photos that connect to each other like dominos. Orange bleeding off the side of a photo is a possible connection. Visitors look for images that begin where the last one left off. These progressions can proceed vertically, horizontally or diagonally, growing and spreading as visitors search the archive, finding new connections and adding new tiles. We could also allow users to “edit” the collage, replacing existing photo-connections with better ones.

The Result: threads that build and ribbon across the space will form a kind of virtual memory of movement through the Gates. The landscape of our collage can be examined close up or far away, using a zoom function. Zooming out gives you an aerial view of the Gates, zooming in allows you to see it on the ground level.

THIRD IDEA Guest Editors

We would like to invite guest editors to review the photo archive and select pictures that exemplify a single, specific idea about the Gates. Each editor will have something different to say. Our hope is that a multifaceted understanding of the Gates Memory Project will emerge when the various viewpoints are seen/read together.

This approach is a variation on the traditional editing process, wherein editors comb through a collection of photos and choose items that correspond to a particular point they wish to make about the collection. The selection of photos is usually paired with a written essay or collection of essays and turned into a printed book.

We would like to invite anyone to be a guest editor and to look through the Gates photos, calling out specific images by adding metadata. These images could then be gathered by a search function and presented along with the editor's commentary.

We chose the third idea, Guest Editor, because it would provide a framework and a motivation for reader/editors to add metadata to the archive and thus assist in the organization of the whole. It also seemed to have the greatest potential for interesting, multi-faceted, in-depth analysis. Then we began looking for ways to implement our "Guest Editor" idea in Flickr. As previously mentioned, Flickr has an API which provides access to information in their database. We thought that we could use it to build our editing interface in the application. Unfortunately, the API was not enough; other proprietary aspects of the Flickr infrastructure did not allow us to create a communal tagging system.

This left us with two options. We could take all the information out of the Flickr database and put it into our own database, (which raises copyright issues). Or we could create another front end for the Flickr database. The second option seemed preferable. It allowed us to keep everything we liked about Flickr, while giving us to the capacity to go further. We identified four things that we wanted to build into the software.

1. A Communal Tagging Feature (Reader-Created Metadata)
We want to add functionality that would allow any reader to add tags or metadata to any photograph. Readers should be able to go through all of photos tagged “gatesmemory,” and add metadata that defines categories or subcategories. It would be important to save information about who creates each new tag. If I make a private tag called “gatesfavorites” and Bob makes a private tag called “gatesfavorites,” there should be a way of distinguishing between the two viewers’ tags.

2. Access To All Metadata
Once this metadata is created, we want to be able to access it easily. We would like to be able to query & sort the Flickr database in more advanced ways. Organizing related images through advanced search and visualization functions.

3. Visualizations
We would also like reader/authors to be able to create meaningful visualizations of the Flickr photos tagged with "gatesmemory." We had some success in this area, Neil Kandalgaonkar, a.k.a. "brevity," created a visualization tool that combined fifty photos of The Gates, from fifty different Flickr users in a kind of palimpset, which gives a marvelous sensation of collective memory. Ben applied Montage-a-google to the Gates tag and got an interesting composite image (scroll down to view). We are hoping that once we are able to tag the images with new metadata more visualizations will be possible.

4. A Robust Assembly Tool
Even if we devise a way to attach metadata to photos and stories, access the tagged content with the search engine, and pass it through an interesting visualization tool to see how it "looks" in relation to the larger database, we still cannot parlay this into the networked book we imagined because there is no tool available for creating a front end, within the archive, that is flexible enough to allow each editor to assemble the content they have chosen and design their own presentation.

What Modifying Software Means
Our experience with Flickr is an interesting case in point regarding economics of customizing software for networked books. The site administrators were interested in our suggestions, but were unable or unwilling to provide the programming resources necessary to accommodate our needs. They made the API available, and we were welcome to hire our own programmer to build the systems we required. But these services are far too expensive for a non-profit book experiment to sustain. Moreover, the outcome of these programming services would be available to Flickr, who could, in turn, fold them back into their interface to offer enhanced functionality for their users. In other words, we would be investing our own R & D resources into an application that we would then hand over to Flickr. This is an ingenious method of facilitating product "growth" at no cost to the developer, has been taken up by other online service providers. Ben noted this development in his if:book post "Remixing the News" The API is becoming a powerful tool for creative reinvention of the web. Back in April, I wrote about Dan Gilmor's piece on "Web 3.0".. Web 1.0 was the early web, a place you went to read - a series of interconnected brochures. Web 2.0 is the "read-write" web - it's a place you go to interact. Web 3.0 is where we start weaving the disparate pieces into new forms. APIs let you do this. You take one application and design a new front end that shows your point of view. Or you take two applications and mix them together, creating something new and illuminating.

This may indicate that software development for rapidly growing databases and archives will revolve around similar efforts. Open source software, will allow the network of users to build and tweak the tools, just as folksomies are allowing the community to build and tweak the organization systems.

THE META-ISSUES

Even in its unfinished state, the Gates Memory Project has given us valuable insights into the nature of networked books; how those books are created, edited and distributed; and about how collective memory is assembled and preserved in the networked environment.

Understanding the Networked Book
Understanding the networked aspect of digital books is central to understanding the trajectory of the book’s evolution. Electronic books are, like their paper counterparts, by-products of the surrounding culture. This is precisely why we are predicting that the networked structure of electronic communications will become an integral part of emerging book forms. The ubiquitous interconnectedness of the world wide web; the nature of exchange taking place in social software environments, and networking behaviors like email and text messaging, will spawn a type of book that incorporates these new modes of thinking, imagining, understanding, and interacting.

We have devised a preliminary definition of the networked book that includes the following characteristics:

The networked book is an open book
"Our intuitive idea of the text is again changing rapidly. It is no longer that of a closed and protected entity but that of an open and penetrable object which can be copied and interpolated without limits." (Simone 249). The networked book maintains an open structure during all or part of its creation. For example, Lawrence Lessig's Code v.2 uses a wiki to open the editing process of the second edition of Code and Other Laws of Cyberspace to the public in order to "draw upon the creativity and knowledge of the community. This is an online, collaborative book update; a first of its kind. Once the project nears completion, Professor Lessig will take the contents of this wiki and ready it for publication." In other words, at some point the book will be declared, finished, and will then be closed to further input and adjustment by the community. Another example of an open book structure is the popular Wikipedia. This online wiki encyclopedia is entirely and indefinitely open.

The networked book is structurally granular or disaggregated
The book, as we see it, is not inseparable from its paper page-based format, and the networked book manifests a certain “disarticulation of the body of text” and “disaggregated/reaggregated” structure as described by Raffaele Simone in his essay, “The Body of the Text.”

Disarticulation of the body of the text occurs when the text generated by an author is not perceived as closed to external interventions, an entity to which the author can have access only to read (or, to use an information science image, in the manner of ROM, that is "read only"), but as an open entity to which one has access -- for purposes of both reading and writing. When the text is disarticulated it is perceived as an entity which can be disaggregated (broken apart), manipulated, and reaggregated (reassembled) without damaging the text per se or the author. (Simone 239).

Collective memory projects in general must find a way to assemble granular content so that the archive can be presented in a coherent way. The Gates Memory Project, however, is not looking for just one way to reassemble and present content, it is looking for all the ways content can be reassembled and presented. In other words, for the project to be a success, the content must be available for constant reassembly according to the insights of new author/readers.

Another approach to a networked collective memory is Jonathan Harris’ 10 x 10 which builds its content from RSS feeds. The piece selects the most frequently used words from the major news networks to assemble an hourly “portrait” of our world. I would argue that this visualization tool represents a type of structure that we will soon see in networked books. The human editor/programmer creates a search and visualization function that collects, edits, and presents content according to the criteria built into the program. This type of "book" format depends on granular content that can be "manipulated and reaggregated" by the tool.

The networked book is social
An early example of the socially networked book is , a semi-closed mailing-list that began in 1995. This excerpt from Geert Lovink and Pit Schultz's Introduction or ">"Go Paper" for NET CRITICISM beautifully describes the way electronic books can be created by a social network.

Each and every participant of is a contributor-editor, following a potlatch information economy of ring exchange. Copyright is not the most urgent issue here, but the build-up of trust between the subscribers. This bond is based on the utopics of face to face contacts and mutual friendship. The quality of the texts is a product of social filtering of external material and it is editing. The goal is a non-hierarchical selection which does not end in entropic noise, but results in a self-organizing meta-stability. (Lovink and Schultz)

The social software environment creates a space for communal authorship. It allows the raw, unedited content to be collectively assembled within the nascent form of the electronic book itself. It also facilitates a gestational space for the content to evolve from a discussion into an edited “book” according to the activity of the social network. Wikicities and LiveJournal are more contemporary models for socially networked books.

The multiple-author forum creates a different kind of thinking environment. There is nothing about communal authoring that precludes distinct points of view, but these points of view are mediated by a multitude of voices. This may allow for a more democratic approach to issues and a multifaceted rendering of topics not possible in the single-author print model.

The networked book is edited/processed

The future book will be a networked book or a "processed book" as Joseph Esposito calls it. To process a book, he says, is more than simply building links to it; it also includes a modification of the act of creation, which tends to encourage the absorption of the book into a network of applications, including but not restricted to commentary. This "processing" creates iterations of the book, critiques, revisions and trajectories, that accumulate around the original draft. The iterations of Wikipedia are a good example of this principle. The networked book, as a process-based knowledge machine, may encourage a more in-depth examination of ideas.

Processing is slightly different from editing because of the sheer volume of content forces the editor to rely on applications, search engines, and other mechanisms to help sort and organize the material. During an email exchange with Sarah Townsend regarding networked books, Sarah brought up this interesting point: "more and more there's this unfolding tension, this dialectic, between the speed of the media and the amount of time requisite to a proper digestion of their products. The reader is absorbing more of the role of the editor." Networked books attempt to mitigate this by setting up useful editorial structures for reader/editors and introducing other "helps."

Processing the Networked Book
It is useful, in this case, to compare the processing of a networked book to the standard editorial procedures found in print culture. Both print books and networked books originate from an idea conceived by a senior editor or an author. However, the participatory framework of a networked book is articulated by a designer, and executed by a programmer, before the content is written or assembled, thus creating an open book structure. Paper-based books are turned over to the designer, production artist and printer after the content is finished, resulting in a closed book structure. The networked book is assembled, in whole or in part, by a community of authors according to the thesis imagined by the editor/author and within the space created by the designer and programmer. The community of contributors acts as both author and reader, which is drastically different than the single-author print model wherein, reader is audience rather than co-creator. In a networked book, content is generated and revised by the community and the various iterations of the text are often saved and can be returned to and discussed. In a paper-based book, content is generally constructed by a single author and is revised under the supervision of an editor. The readers have no part in this process and the revisions are only examined and debated in special cases, and then usually by scholars or authors, not by the general readership.
After the content has been generated by the community within the framework of the network book’s thesis and architecture, the reader/editor implements strategies for marking out meaningful pathways through the material using search engines and visualization applications.

Enduring Knowledge and Virtual Memory Institutions
We wanted to use the Gates Memory Project to explore ways in which the web could be used as a vehicle for establishing a specific collective memory. Our original ambition was “to create a lasting and definitive memory of the Gates.” As we pondered and argued over how to achieve that, we kept tripping over the “lasting and definitive” part. To create stability and permanence is the job of an archive; to facilitate a definitive statement is the job of an editor; and to make that definitive statement is the job of the author/academic. We imagined that all of these steps would take place online. But we had not considered that, perhaps, lasting and definitive is not what the web does best; rather than a fixed and stable archive, the web creates a flexible, fragile archive. Rather than a definitive interpretation of that archive, social software creates a forum for evolving, democratic statements that leave the question permanently open. Thus, the digital environment gives us a new way to think about archiving. It creates a collective memory environment that is truly collective. It provides a memory machine that supports a permeable and perpetually changing "memory," which is, perhaps, akin to the way human memory really functions. And it gives us a way to quickly and efficiently collect and store a large body of work on almost any subject of interest.

CONCLUSION

Social software has enormous potential to help us collect, examine, discuss, organize, and edit large archives. It can set up useful structures to aid the reader/editor in the digestion of content. And it can transform the role of the author, editor, and reader. But we will not realize these potentials unless the tools are honed in appropriate ways. As our experience with the Gates Project points out, we must embrace a new notion of editorial structure and of reader/authorship in order to craft tools and methods that will allow us to understand large bodies of content and make meaningful statements in the networked environment.

REFERENCES

Bryant, Lee. "Smarter, Simpler Social:
An introduction to online social software methodology." Version 1.0. 18 April 2003

Esposito, Joseph J. “The Processed Book.” First Monday 8.3: March 2003.

Lovink, Geert and Pit Schultz. ">"Go Paper, an introduction to NET CRITICISM Version 1.0." ZK Proceedings 1995.
http://www.nettime.org/desk-mirror/zkp/gopaper.txt

Shirky, Clay. "Social Software and the Politics of Groups." 9 March 2003. "Networks, Economics, and Culture" mailing list

Simone, Raffaele. “The Body of the Text.” The Future of the Book. Ed. Geoffrey Nunberg. California: The University of California Press, 1996.

Veltman, Kim H. “New Ways of Scholarly Work in a Networked World:” Image, Text, Sound and Technology: A strategic research support program, University of Calgary: University of Calgary Press, 2004. (In press)

Vershbow, Ben. "Remixing the News." if:book 3 June 2005

Comments

I thought it interesting when you talked about Wikipedia as "open" because I tend to associate the idea of open texts with open content, or openness as defined by David Bollier. Wikipedia's sucess, I think, is dependent upon the open source model of collaboration which in turn depends upon open content licensing (i.e., Mulgan, Steinberg, and Salem's Wide Open: Open source methods and their future potential). I noticed that the submission instructions for your project say to use a copyleft CC license with the images. Are the comments also copyleft?

I'm not sure how central the notion of the potlatch is to the argument, but it seems a bit romanticized as a metaphor. In a number of the Northwest Coast tribal cultures, the potlatch was not some well intentioned gifting, but instead a way to bring about the finanical ruin of the so-called "honoree." One source said that this was a more common practice further to the north (say coastal British Columbia as opposed to southern Puget Sound south of Seattle) but no matter the motivation, the potlatch was just about always a display of status and wealth, even if someone had to work all year to acquire what they would be giving away during the potlatch. This doesn't seem to be an apt metaphor (I couldn't find the piece referenced in the essay) posited by Lovink and Schultz, but more a romantic notion of Native American and First Nation's cultures.

One can liken the potlatch to the Thai white elephant. Because the white elephant was a revered creature, no expense was to be spared in its care and keeping. When the Thai emporer wanted to "honor" someone, he would assign to them the care and keeping of the white elephant. This would drive that "honoree" into bankruptcy and poverty as they devoted themselves and all their resources to the care of the elephant. Te potlatch can function in the same way. This just doesn't seem an appropriate metaphor for a collective memory.

I'll be happy to be set straight on how the metaphor is an apt one, but right now I'm just not seeing it. Still, beyond that gripe, I liked the paper the ideas and thinking behind it. I don't often get the time to visualize where the book may be going. I don't expect it to disappear as we know it, paper and such, but I do think it will evolve as described, as something more widely collaborative and social that provides new ways--such as the brevity piece--to look at old and new things alike.

Bradley

The more I considered Bradley's comment about the overly romantic nature of the potlatch metaphor (in the Lovink & Schultz's excerpt), the more I think it may be really useful in terms of the results of this experiment.

I'm reminded of Ronald Sukenick's 98.6 and its disillusioned characters' experimentation with communal living; the wildest potlatch celebrates the community 'wealth'; then they all begin changing their names, metaphorically altering the nature of the collective itself. Riffing off Sukenick then, the "potlatch economy" seems to exemplify the Gates Memory Project quite nicely.

virginia kuhn

Have you thought about using del.icio.us or a similar tool for adding & storing user metadata? It might not be able to do everything you want without programming a bit of a front end, but some kind of combo of del.icio.us user tagging functions and flickr storage functions might not be as hard to implement as relying solely on flickr/yahoo. Just a thought.

I don't think the Future of the Book people want to switch over for this project, but if anyone is thinking of something similar in the future, Drupal will have folksonomy integrated into the core installation (it's available for testing now) for the next major release (sometime early fall, most likely). So it may be possible to implement these kind of structure without any coding.

"I thought it interesting when you talked about Wikipedia as "open" because I tend to associate the idea of open texts with open content"
I am using the term "open" more in terms of structure (how these "books" are assembled and reassembled) rather than content. But it is interesting to think about how those two issues might converge or how one influences the other. How will copyright function in such an environment? And will the "openness" of the networked book's structure (in contrast to the established structure of a paper book) change the content? In other words, will people write differently when they don't know what the final presentation will be?

"submission instructions for your project say to use a copyleft CC license with the images. Are the comments also copyleft?"
No. but that's a great idea.

"I don't think the Future of the Book people want to switch over for this project."
Switching is not out of the question, but since Flickr is an evolving application that regularly adds user-created functionality, we are holding out hope for a communal tagging feature. One example that is moving in the right direction is Geobloggers, which allows uploaders to tag their Flickr photos with GPS data making them searchable by exact location. You can only add this metadata if you are the image's owner. However, you can add the "GeoTagged" link to someone else's image. According to the site: "if you aren't the owner, you can only add the link as a comment." Using the comment feature as a vehicle for metadata is an interesting work-around.

The suggestion to use Drupal or to graft del.icio.us functionality onto our Flickr database is interesting because it reinforces the notion that networked "books" will require modular software that can be customized according to the particular needs of each project.

Also, wanted to mention that we like Drupal, and are using it for a new project (to be launched soon).