Archival description in OAI-ORE

Archival description in OAI-ORE

Deborah Kaplan
Tufts University
deborah.kaplan@tufts.edu

Anne Sauer
Tufts University
anne.sauer@tufts.edu

Eliot Wilczek
Tufts University
eliot.wilczek@tufts.edu

Abstract

This paper proposes using OAI-ORE as the basis for a new method to represent and manage the description of archival collections. This strategy adapts traditional archival description methods for the contemporary reality of digital collections and takes advantage of the power of OAI-ORE to allow for a multitude of non-linear relationships, providing richer and more powerful access and description. A schema for representing finding aids in OAI-ORE would facilitate more sophisticated methods for modeling archival collection descriptions.

1. Introduction

This paper seeks to define a new method for representing and managing description of archival collections using OAI-ORE. This new method has two advantages. Firstly, it adapts traditional archival description methods for the contemporary reality that digital collections, unlike collections of physical materials, are not best described by arrangement location. Secondly, it takes advantage of the power of OAI-ORE to allow for a multitude of non-linear relationships, providing description and access that is not only richer and more powerful, but that also may be a more accurate representation of the multifaceted contexts of many records.

2. Archives and Finding Aids

Archival collections are composed of aggregations of interrelated materials. These materials document the activities of individuals or organizations. Unlike library collections, where holdings are generally of discrete items that can stand on their own, the significance of archival collections lies in their aggregate, or collective, nature. Archivists keep records in collections according to the individual or organization that created or aggregated them. This principle of provenance, along with the principal of original order, is the foundation of archival practice. Archivists work to preserve the original order of the documents within a collection because the physical arrangement and context of the documents before archival deposit are as valuable as the information they contain. The archival finding aid is the tool that was developed to provide archivists with intellectual and physical control of their holdings, and users with the means to discover documents within collections and understand the way documents relate to each other.

Traditionally, the finding aid is a hierarchical and linear narrative document. It begins at a high level, describing the collection as a whole, its creator(s), and how the collection is organized. From there, subgroupings of records, or series, may be described, followed by a container list of boxes and perhaps the folders or items contained within a box. The linear flow of the traditional finding aid closely mirrors the physical arrangement of the documents in hand, serving as a description of the collection and a map to where records are physically located within the actual shelves, boxes, and files. Currently, archivists can encode and electronically share finding aids using the EAD (Encoded Archival Description) standard, an XML standard for encoding archival finding aids maintained by the Library Of Congress and the Society of American Archivists (EAD 2006). Although encoding finding aids in an XML standard has the potential to offer great power and flexibility, the EAD standard does not fundamentally transform archival description. It provides a standardized data structure for the same linear narrative of the traditional paper-based finding aid.

3. The Limits of Finding Aids

Contemporary recordkeeping practices, however, challenge archivists' ability to maintain original order as the basis for describing archival collections. Although original order is still vitally important, it is often complex and multifaceted in institutions with dynamic organizational structures, business functions, and recordkeeping systems that operate in digital environments. The difficulty of archives trying to document undergraduate courses illustrates this challenge.

For example, imagine a course called "EAS 421: Senior Seminar in Widgets, Special Topics" taught by Professor Irene Adler, in the Department of Widgets. A variety of records document this course. These records, however, are distributed across several archival collections. The course syllabus and assignments are managed in the archival collection that holds the records of the university's Learning Management System. The course description is managed in the archival collection that holds departmental records for the Department of Widgets. The lecture notes for the class are held in the manuscript collection comprising Professor Adler's papers, which she has donated to the university. (Professor Emily Pollifax teaches the same class in alternate years. Her lecture notes will be in the manuscript collection holding Professor Pollifax's papers.) Student theses for this course--20-35 page papers--are held in an archival collection of student theses from across university departments. [1]

Using traditional archival description modes, there is no way to describe the class EAS 421. Indeed, teaching is a notoriously under-documented function in university archives, in part because courses cannot be properly documented in a single archival collection, as the records that document a single course are usually created by multiple record creators (Samuels 1992). In order to produce a coherent documentary representation of EAS 421, the archives needs a mechanism that can string together items from five collections. Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines a set of standards to describe aggregations of resources to facilitate reuse (OAI 2010). Using OAI-ORE, individual elements of each of these five traditional archival and manuscript collections can be linked by their relationships to one another. A user-friendly finding aid could then easily be produced which would depict EAS 421 as if it were a coherent collection--which, as far as the end user is concerned, it is.

4. Provenance, Relationships, and Digital Realities

This is not merely a matter of producing on-the-fly theme-based collections for the convenience of researchers. This goes to the heart of addressing the limitations of provenance-based arrangement and description. Make no mistake; provenance is vital. The Glossary of Archival and Records Terminology notes that the "principle of provenance or the respect des fonds dictates that records of different origins (provenance) be kept separate to preserve their context" (Pearce-Moses 2005). This is a sound principle; archives preserve records to document the activities of records creators and the individuals, organizations, and societies that interact with the records creators. The context of who originally created these records and how they managed them is just as critical and revealing as the content contained by these records.

The weakness comes not in the principle of provenance, but its oversimplified application in archival description. Archives arrange records into a record group based on the records' creator. The finding aid serves as an internal tool for the archivist and as a research tool for the user. However, the success of this strategy depends on the archival records having a rigid one-to-one relationship with a creator that can only be understood in a single way. In reality, this is rarely the case (Bearman 1993). The traditional finding aid is not equipped to describe complex relationships between records and a multitude of creators. In addition, traditional archival arrangement and description strategy is not good at documenting activities whose records are spread across multiple record groups in the archives. [2] Professor Adler's lecture notes document the activities of Professor Adler, but they also document the EAS 421 course. A traditional finding aid, even one encoded in EAD, is not equipped to easily represent this second documentary reality of Professor Adler's lecture notes.

In addition, the nature of electronic records and digital objects adds another layer of complexity to archival description and the notion of original order. "Filing" in the digital world is not an electronic equivalent of paper filing. A single digital document may exist in multiple contexts: on the network, a draft on the desktop, attached to an email, posted to a wiki, as well as being printed and filed. The flexibility of the digital environment allows people to manage their files using search tools and tagging rather than organizing their files into a particular arrangement.

Moreover, this exposes the significant shortcomings of using the physical arrangement of documents as the basis for describing the structure and organization of digitized and born-digital documents. What "location" means for digital records is difficult to define. The location of servers in a data center, bits on a disk platter, or files on a filesystem is not relevant for discovery and retrieval; a persistent unique identifier, such as a handle, provides all of the organizational information necessary for retrieval. Instead, a meaningful original order should represent how that document functioned intellectually in the creator's world. This requires detaching original order from physical location.

5. New Ways of Thinking

New approaches should add to, but not undermine, the fundamentals of archival theory. Provenance and original order continue to have primary importance to understanding records. However, the reality of modern records creation is that even in their active use records may exist in multiple contexts and have multiple relationships that describe their significance and value. Our descriptive tools should have the flexibility and power to reflect this instead of forcing us to present multifaceted records in a single hierarchical arrangement.

More radically, for many users, provenance and original order will not have particular significance. Secondary or even tertiary relationships may be where their interest lies, and our access tools should enable multiple avenues of discovery.

OAI-ORE is a standard developed by the Open Archives Initiative to support the discovery and use of aggregations of web-based resources (OAI 2010). The genesis of OAI-ORE lies in a desire to enable the more intelligent use and exchange of aggregates of content on the web. OAI-ORE provides a structure first to identify an aggregation, and then to describe the relationships between the components of the aggregation. It focuses on describing relationships and context so that users of an OAI-ORE resource map can understand the meaning and significance of an object and how it connects to other resources (OAI 2008).

There is nothing inherently archival about OAI-ORE. It was designed to describe aggregations of web resources, and while it can also reference entities that are not on the web, such as people, that is not its main purpose. However, there are several attributes of OAI-ORE that make it intriguing as a structure for representing archival collections:

  • Description at the aggregate level: A core component of OAI-ORE is its foundational concept of describing resources at an aggregate level. Just as archives describe records at the fonds, record group, series, or sub-series level, OAI-ORE sees significance in resources grouped together, and further, codifies that significance by assigning the aggregate a unique identifier.
  • Context: Because OAI-ORE is intended to help users understand and make use of complex aggregations of material, the standard focuses on methods to enable creators of OAI-ORE objects to contextualize resources within an aggregation. This contextualization happens in the Resource Map component of the OAI-ORE model. The Resource Map has its own identifier and can be encoded in RDF, Atom, or RSS for delivery on the web.
  • Accommodation for non-web resources: OAI-ORE objects can include non-digital resources, such as people, organizations, and physical materials. For example, the referenced resource could include the metadata about a physical report including an identifier and location to assist the user in retrieving--physically--the resource, either by visiting the repository or requesting a copy through conventional reference services. While this application is not the typical use of OAI-ORE, it is expressly allowed and is a key attribute for meeting the needs of archival description.

OAI-ORE provides a very open structure with few requirements. This flexibility would allow archives to tailor their application of OAI-ORE to their specific description needs. Michael Witt’s Library Technology Report on OAI-ORE (2010) provides a comprehensive overview of ORE for information professionals with particular focus on its application in a library setting. He identifies several key strengths of OAI-ORE: the ability to uniquely identify aggregations; the ability to provide contextualizing metadata for aggregations; and the possibilities enabled by capturing this contextualizing information in machine readable and actionable form. It is this latter strength that makes OAI-ORE particularly attractive for archival description. Witt states that a key challenge for libraries is liberating resources from the silos that hold them. “ORE presents the possibility of breaking down these silos by exposing the semantics of these resources and providing hooks to retrieve them without the need for a human being to read a webpage and click on a link.” OAI-ORE goes beyond the harvesting enabled by its sibling, OAI-PMH, to enable direct retrieval of resources. The "Reuse and Exchange" of OAI-ORE will help archivists get their collections more easily into the hands of users with the robust contextualizing metadata that is essential to enabling users to understand and use archival information.

6. In Practice: Implementing OAI-ORE as an Archival Description Tool

We do not currently use OAI-ORE in our descriptive practices at Tufts, but we are looking for opportunities to implement the standard, which would include upfront development work. OAI-ORE provides a very open structure with few requirements. This flexibility would allow archives to tailor their application of OAI-ORE to their specific description needs. However, it is not reasonable or strategically savvy to expect archives to implement OAI-ORE from scratch. In order to exist as a usable descriptive tool that a wide range of archives could implement, an OAI-ORE-based description standard would require the following resources:

  • Ontology: An ontology is a vocabulary for describing relationships and is needed to provide a framework for representing relationships within archival descriptions. While some relationships contained in generic ontologies could be applicable to archives, there are relationships and concepts within archival collections that warrant the creation of a domain specific ontology for archives. Ontologies in closely related fields such as museums (CIDOC CRM) (2010) and libraries already exist but do not meet the needs of the archives community. There are several well-documented methodologies for ontology development to draw on for this project. One of the simplest is by Natalya Noy and Deborah McGuinness: Ontology Development 101 (2001). Jones, Bench-Capon, and Visser (1998) also outline approaches to the process. Generally, ontology development is an iterative process involving the following steps:
    1. Define domain and scope
    2. Evaluate existing ontologies for applicability
    3. Enumerate important terms/concepts
    4. Define classes and class hierarchy
    5. Define properties, or attributes, of classes
    6. Define the types and allowed values for properties
    Once this intellectual work is completed, the ontology could be encoded in an appropriate ontology language such as OWL (Web Ontology Language), a W3C standard.
  • Ontology validation tool: The validation tool would allow archives to check the implementation of the ontology in their OAI-ORE-based archival descriptive metadata.
  • Resource map models: An OAI-ORE resource map identifies the objects in an aggregation (collection or fond in archival terminology) and defines the boundaries of this aggregation. These models would provide archives with guidelines for constructing resource maps.
  • XSLT for EAD to OAI-ORE and OAI-ORE to EAD: With many institutions managing their description in EAD, it would be essential to deliver crosswalks and a tool to transform EAD documents into OAI-ORE objects and visa versa. Delivering this ability would essential to the feasibility of including OAI-ORE in the archivist’s arsenal of descriptive tools.
  • Best Practice Documentation/Guidelines: The considerable flexibility of OAI-ORE would leave archivists with a broad range of options for constructing OAI-ORE-based finding aids. Documentation and guidelines would provide archivists with guidance for efficiently creating and deploying effective OAI-ORE finding aids. One of the key components of these guidelines and documentation would be guidance for constructing resource maps. The resource map together with the ontology would form the core of an OAI-ORE-based finding aid.

The development of these resources should be based on a set of description requirements for describing archival collections. These requirements should accurately convey the complexity and multi-faceted nature of archival records and manuscripts. This would help ensure that the OAI-ORE resources are able to describe archival holdings of various size, complexity, and format. These requirements should be informed by the literature on archival description and user studies.

Ideally the descriptive flexibility afforded by OAI-ORE would be complemented by data visualization tools. Implemented correctly, OAI-ORE would be used in concert with a data content standard such as DACS or ISAD(G) and ISAAR(CPF) that served as underlying the syntax for the content of descriptive metadata (DACS 2007; ISAD(G) 2000; ISAAR(CPF) 2004). This layer of structural and descriptive metadata should be managed separately from a presentation layer that delivers information to end-users. This would give archives the flexibility to convey information about its collections or collections materials to its users through a range of tools from simple stylesheets to sophisticated visualization tools such as Tufts’ VUE (Visual Understanding Environment) (VUE 2011).

The history of the development and implementation EAD illustrates the importance of managing data delivery as a separate task from data management. One of the difficulties with EAD is that it is a document-centric encoding standard. It was designed with a particular delivery strategy—legacy paper-based finding aids—in mind. In contrast to this assumption of a highly specific output format, the underlying data structure was made intentionally permissive and flexible in order to encourage adoption. The experience of implementing EAD in tools such as the Archivists’ Toolkit and Archon has shown how EAD can be unwieldy as a data-exchange format. Indeed, the upcoming revision of EAD will in part attempt to improve its functionality as a data-centric standard (TS-EAD 2010).

In pursuing an OAI-ORE-based description strategy at Tufts, we could continue to provide a traditional, linear, "collection guides" functionality if we encoded our finding aids using OAI-ORE. The Tufts University Digital Library currently displays HTML "collection guides" which are transformed from EAD finding aids encoded in XML. Our Fedora content model has a disseminator that displays the metadata for these collection guides, and offers the finding aid transformed by XSLT into chunks. Since it would be trivial to write a crosswalk which would convert an OAI-ORE collection description into a less informative EAD finding aid, we could produce instances of our finding aid in EAD and use our existing content model to continue to display those finding aids in our digital library. Additionally, we could produce a multitude of traditional-seeming finding aids in EAD by slicing and dicing the OAI-ORE encoded information along different lines. Although encoded as if they were traditional EAD documents, each of these finding aids would present meaningful arrangements rather than reproduce the physical arrangement of archival records.

However, we could also provide additional rich and flexible visualizations of our collections using OAI-ORE finding aids. We would not be restricted to the old linear view inspired by the paper finding aid. For example, by leveraging the granular flexibility of OAI-ORE, we could provide a visualization of the portions of the five archival collections that document EAS 421 thereby producing a virtual collection centered on the course. As mentioned above, we could deliver this flexibly assembled archival description by having a Fedora disseminator automatically build a VUE map displaying the rich set of relationships in a OAI-ORE finding aid, allowing the end-user to see the relationships between different collections, record creators, records, business functions, and recordkeeping systems.

7. Conclusion

A schema for representing finding aids in OAI-ORE would allow richer methods for modeling archival collection descriptions. In conjunction with an XSLT to create EAD from the OAI-ORE, archives could switch to modeling their collections in OAI-ORE. For all existing tools, a transformation to EAD would represent the data in a way that pre-existing collection guide tools expect. Augmenting existing collection description methods with OAI-ORE would revolutionize archives' ability to provide more nuanced, flexible, and accurate descriptions of their collections.

8. Notes

  1. Terry Cook presents a similar scenario with a CEO sending an electronic report to her managers in (Cook 1994).
  2. There is a sizable body of literature on the purpose and limitations of finding aids. Some key examples of this literature include (Hurley 1998; Hurley 2000; Light and Hyry 2002; Duff and Johnson 2003; Hostetter 2004). The Bentley Historical Library at the University of Michigan maintains a digital collection about the American intervention in northern Russia at the end of the First World War that explores new ways to formulate archival description. However, this project focus on discovery and delivery tools rather than scalable description standards. See (BHL 2010; Krause and Yakel 2007; Yakel, Shaw, and Reynolds 2007).

9. References

  • Bearman, D. (1993) "Record-Keeping Systems". Archivaria, Vol. 36, 16-36
  • BHL (2010) "Polar Bear Expedition Digital Collections". Bentley Historical Library, University of Michigan, last checked December 3, 2010 http://polarbears.si.umich.edu/
  • CIDOC CRM (2010) "The CIDOC Conceptual Reference Model". International Council of Museums, last checked April 20, 2010 http://www.cidoc-crm.org/
  • Cook, T. (1994) "Electronic Records, Paper Minds: The Revolution in Information Management and Archives in the Post-Custodial and Post-Modernist Era". Archives and Manuscripts, Vol. 22, No. 2, 300-329
  • DACS (2007) Society of American Archivists. Describing Archives: A Content Standard (Chicago: Society of American Archivists)
  • Duff, W. and Johnson, C. (2003) "Where is the List with all the Names? Information Seeking Behavior of Genealogists". American Archivist, Vol. 66, No. 1, 79-95
  • EAD (2006) "Development of the Encoded Archival Description DTD (EAD Official Site, Library Of Congress)". The Library of Congress, last checked April 5, 2011 http://www.loc.gov/ead/eaddev.html
  • Hostetter, C. (2004) "Online Finding Aids: Are They Practical?". Journal of Archival Organization, Vol. 2, No. 1-2, 117-145
  • Hurley, C. (1998) "The Making and the Keeping of Records: (1) What Are Finding Aids For?". Archives and Manuscripts, Vol. 26, No. 1, 58-77
  • Hurley, C. (2000) "The Making and the Keeping of Records: (2) The Tyranny of Listing". Archives and Manuscripts, Vol. 28, No. 1, 8-23
  • ISAD(G) (2000) International Council on Archives. ISAD(G): General International Standard Archival Description. 2nd ed. (Ottawa: International Council on Archives)
  • ISAAR(CPF) (2004) International Council on Archives. ISAAR(CPF): International Standard Archival Authority Record For Corporate Bodies, Persons and Families. 2nd ed. (Canberra: International Council on Archives)
  • Jones, D., Bench-Capon, T., and Visser, P. (1998) "Methodologies For Ontology Development". CiteSeerx beta, last checked April 20, 2011 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.2437
  • Krause, M. and Yakel, E (2007) "Interaction in Virtual Archives: The Polar Bear Expedition Digital Collections Next-Generation Finding Aid". American Archivist, Vol. 70, No. 2, 282-314
  • Light, M. and Hyry, T. (2002) "Colophons and Annotations: New Directions for the Finding Aid". American Archivist, Vol. 65, No. 2, 216-230
  • Noy, N. and McGuinness, D. (2001) "Ontology Development 101: A Guide to Creating Your First Ontology." Knowledge Systems Laboratory, last checked April 20, 2011 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-01-05.html
  • OAI (2010) "Open Archives Initiative Protocol - Object Exchange and Reuse". Open Archives Initiative, last checked December 3, 2010 http://www.openarchives.org/ore/
  • OAI (2008) "ORE User Guide - Primer". Open Archives Initiative, last checked December 3, 2010 http://www.openarchives.org/ore/1.0/primer.html
  • Pearce-Moses, R. (2005) "Provenance". in Glossary of Archival and Records Terminology (Chicago: The Society of American Archivists, 2005), http://www2.archivists.org/glossary
  • Samuels, H. (1992) Varsity Letters: Documenting Modern Colleges and Universities (Metuchen, N.J.: Scarecrow Press)
  • TS-EAD (2010) "Technical Subcommittee on Encoded Archival Description (TS-EAD)." Society of American Archivists, last checked April 20, 2011
  • VUE (2011) "Visual Understanding Environment: Search, Organize, Present." Tufts University, last checked April 20, 2011 http://vue.tufts.edu/
  • Witt, M. (2010) “Object Reuse and Exchange (OAI-ORE)”. Library Technology Reports, Vol. 46, No. 4
  • Yakel, E., Shaw, S., and Reynolds, P. (2007) "Creating the Next Generation of Archival Finding Aids". D-Lib Magazine, Vol. 13, No. 5/6, http://www.dlib.org/dlib/may07/yakel/05yakel.html