If you build it, will it fly?
Criteria for success in a digital repository
International collaborations have produced a standard describing the functions of a digital repository and the characteristics of one that can be trusted. These results provide an abstract frame of reference for evaluating such repositories, but meaningful evaluation requires that they be supplemented by empirical data on the purpose of each repository and the institutional, cultural and resource context in which it operates. Informed evaluation will consider how a repository balances the competing objectives of preservation and dissemination, whether it is defined primarily in terms of a community of producers or a community or users, and the extent to which it operates in isolation or in collaboration with other institutions.
What constitutes success in a digital repository can only be addressed in context, specifically the context of the purpose the repository serves and of the environment in which it operates. No repository can be said to be truly successful in a meaningful sense unless it fulfills its purpose. Thus, criteria for success must be derived from its statement of purpose. Similarly, the metrics for estimating success against the criteria must be formulated in light of the culture, constraints and opportunities existing in the environment.
The Open Archival Information System (OAIS) reference model, which provides an abstract description of the functions of any system used to preserve any type of information for any significant period of time, as well as detailed delineation of the information management required not only to ensure that the information survives but also that it can be accessed and correctly understood in future, explicitly offers itself as at least a benchmark for evaluating of digital repositories. One might assert that a basic purpose that all digital repositories must achieve can be derived from the OAIS standard’s definition of an archival information system as “An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.” (CCSDS, 2001) In this view, the implicit purpose of a digital repository is twofold: to preserve some information for some time and to deliver products and services, derived from the preserved information, which satisfy the needs and/or desires of its designated user community. Following this train of thought, specific success criteria for a given repository could be articulated by describing the nature of the information objects it preserves, the requirements for their preservation, and the dissemination services it provides as compared with the characteristics, needs, and desires of its designated user community. This approach is consistent with the RLG/OCLC report’s description of a trusted digital repository as “one whose mission is to provide reliable, longterm access to managed digital resources to its designated community, now and in the future.”[RLG/OCLC] One might say then that a digital repository is successful if it functions as an OAIS in a reliable and trustworthy manner.
However, while that approach is undoubtedly appropriate in given cases, it is neither necessary nor sufficient as a general rule. It is not necessary because a digital repository is not necessarily an instance of an OAIS. The OAIS model defines three basic functions: ingest, preservation, and dissemination. There can be major differences in the importance attributed to these functions in different repositories. In cases where repositories are responsible for fulfilling all three OAIS functions equally, emphasizing one function to the detriment of any other would be relevant to evaluation, but this differentiation itself only characterizes a repository. It should not be used as a basis for evaluating success without determining whether that differentiation is appropriate given the charge or charter under which the repository operates.
In different contexts, there can also be substantial variations in what information a repository should preserve, what is entailed by preservation, as well as in what it means to serve the needs of a designated community. Such variations should be factored into evaluation. The significance of such variations is readily apparent even in a cursory view of the differences both within and among major categories of repositories; such as libraries, corporate repositories, and government archives.
Libraries are organizations whose primary purpose is to deliver relevant information products – typically books and periodicals – to their users. While there is commonality in the goal of providing users with the information they seek, there is considerable diversity among libraries. In many libraries, the emphasis is on the currency of the collection: readers want new publications, not ones they have already read, or they seek current, not outdated reference information. Such institutions may implement acquisition and retention polices which result in little need for long-term preservation. In contrast, a library which operates under a legal deposit environment is required to provide secure, long-term access to deposited materials. Similarly, libraries which specialize in particular subjects may aim for comprehensive coverage in those areas and face a broad range of preservation requirements derived from materials spanning centuries. Obviously, success in these different cases will mean different things.
There is a huge difference between the service model of a public or educational library and that of a repository whose primary purpose is to protect the interests of its owner. Companies in the pharmaceutical industry, for example, are making substantial investments to preserve electronic laboratory notebooks, but the purpose of this preservation is not – as it might be in a library – to enable future research either in science or in the history of science. Rather it is to enable the company to protect its intellectual property in potential patent litigation. Whether accomplished through enterprise content management, traditional records management, or some other method, the delivery service for such a repository is narrowly defined: to provide documentary evidence the company’s lawyers could use to demonstrate that the company had made a certain discovery by a date certain. (Davies, 2005)
Contrast this with purposes served by preserving digital information in other industries, aerospace, for example. In the aerospace and other heavy manufacturing industries, there is a long-term need for reuse of product data for maintaining and adapting aircraft and other mechanical systems, which are often kept in operation for many decades. The product data can be reused to manufacture replacement parts or to modify the aircraft to accommodate new components, such as new navigation instruments, new cargo handling equipment, or new passenger seats. In order to be reused, the data will have to be fed into computer assisted engineering and manufacturing systems in the future. No one knows what such systems will be like 20 or 30 years from now; therefore, no one knows what data formats they will require. Satisfying the needs of this designated community requires a different model of preservation. This need cannot be satisfied – as may be the case with library books and pharmaceutical notebooks – by preserving specific documents in their original form. Keeping product data for its intended reuse requires the ability to transform it to some unknown future format, and the success of the transformation cannot be measured in terms of fidelity to the information object that was originally entrusted to the repository, but in whether it enables production of a replacement part which exactly duplicates the original one, or which enables successful adaptation of the aircraft. [LOTAR]
A very different context is that of a government archives in a democracy. For institutions such as the National Archives and Records Administration (NARA), the legally designated community of users is anyone who has an interest in records of the government or the information they contain, for whatever reason. This clientele is very diverse ranging from the government itself, through other governments at national and lower levels, through academic scholars, law firms, other businesses, TV and movie producers, to individuals doing family history, and in the case of electronic record even to individuals looking for records of their own life. NARA supported the development of the OAIS standard from the beginning, and required all companies who bid on the development of its Electronic Records Archives system to conform to this model in their designs; however, the diversity of NARA’s user communities inhibits tailoring the system to characteristics, such as knowledge level, of the designated user community, as stipulated by the OAIS standard. The inability to tailor dissemination services is not a shortcoming, but a recognition of the basic requirement to serve a highly diverse clientele. In democratic nations, probably the most important purpose of a government archives is to enable citizens to determine what the government has done by reviewing records, which are the instruments and by-products of government activities. Such institutions must emphasize supply-side and archival functions, to guarantee that adequate records of government are acquired and that they are preserved authentic, even if authentic versions of the records are less than optimal for users’ needs. Government archives, thus, need to define their service model according to two criteria that are essentially agnostic of the characteristics of the ‘community’ of users; namely, the requirement for preserving and being able to provide authentic records and, within the constraints of this requirement, providing dissemination services to persons with a right of access.
Besides conformance to the OAIS model, the RLG/OCLC report, TRUSTED DIGITAL REPOSITORIES, identifies additional attributes a trusted repository should satisfy: administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The description of these attributes in TRUSTED DIGITAL REPOSITORIES often is phrased in terms of the behaviors of a repository which merits trust. One might then look for evidence of conforming behavior as signs of success. [RLG/OCLC]
Accordingly, the follow-up project, the RLG/NARA Task Force on Digital Repository Certification, reports, “The goal is to develop a single process that can apply to all types and development levels of digital repositories. It is likely that the eventual certification will involve levels of certification which signify an organization’s readiness and development level. Clearly the goal would be to have all repositories meet the highest level of standards and be certified as fully trustworthy, but it is assumed that some organizations may step through the levels as they advance in their development phases.” [RLG-NARA] One might assume that the levels of certification correspond to levels of success. However, such considerations are only intended to feed into the judgement of whether a repository is to be trusted, which could be assumed as a precondition, but not a clear sign of success. Success requires translating the ‘-abilities’ of a trusted repository into actions, and then evaluating the results and outcomes of action according to criteria which are appropriate for the individual repository.
2. FRAMEWORK FOR EVALUATION
The need to take into account the specific purposes and environments of individual repositories across a wide range of possibilities does not mean we cannot construct a general framework addressing success. A framework for organizing information needed to evaluate the success of digital repositories can be articulated along five dimensions: service, orientation, coverage, collaboration, and state. While these dimensions are not entirely independent of each other, breaking them out provides a broader perspective and yields greater depth in understanding how digital repositories can succeed or fail.
An obvious basis for evaluating success of a digital repository is customer service. One might easily identify the customers as the individuals who use materials in the repository; i.e., the members of the ‘designated community’ in the OAIS model. But there are alternatives for determining whom a repository serves. Blythe and Chachra point out:
“A ... tension is mounting today in the “institutional repository” movement. The question is: ‘Should digital object repositories be individual-focused or institution focused?’ And like the centralized versus-distributed debate of thirty years ago, there is a developing realization today that although institutional repositories must have institutional organization, coordination, and investment, they will be successful only when they achieve broad and voluntary participation by individuals in the communities they serve.” (Blythe, 2005)
However, in contrast to the basic distinction in the OAIS model between producers and consumers, for many repositories, both are ‘customers;’ that is, parties to whom the repository promotes and provides its services. A repository needs to serve both its producers and its consumers well. Unless it succeeds in getting producers to deposit materials, there will be litte to attract consumers. Conversely, if it does not attract consumers, it will not serve producers well. Thus, rather than “customers,” to avoid confusion and more importantly to promote success, it would be better to speak of “service groups;” that is, parties directly served by a repository in carrying out its mission. The dichotomy of producer and consumer is too coarse for evaluating the success of a repository in delivering services.
The same individual may require services in more than one role, and even in a single role may value services differently in different contexts. In university digital repositories, for example, a professor as author is a “producer,” as defined in the OAIS model. In this context, the services the repository provides for disseminating scholarly or creative products under appropriate controls would be most relevant to the professor. As educator, the professor may also be a provider of syllabi and other materials to support courses, and may also request acquisition of third-party materials, but for the use of students rather than himself. In this role, services that facilitate the development and enhance the value of course materials are more important. The professor as researcher will be a consumer of repository holdings. As consumer, she will value the depth and breadth of relevant collections, richness of discovery tools and delivery options. But even defining the different roles an individual may play with respect to a digital repository may be insufficient to appreciate the needs of groups served by the repository. An in-depth study of faculty use, or more accurately lack of use, of the institutional repository at Rochester University found:
“When it comes to research, a faculty member's strongest ties are usually with a small circle of colleagues from around the world who share an interest in the same field of research.... It is with these colleagues, many of them at other institutions, that researchers most want to communicate and share their work.... In the absence of a strong connection that would naturally bring these documents together into a collection that other scholars would look for, find, and use, there is no compelling reason for the authors to make the submission.” (Foster, 2005)
Moreover, not all repositories are institutional and not all directly serve individuals. Some repositories, like JSTOR and ARTstor, serve institutions as their primary customers. JSTOR is a not-for-profit organization maintaining and providing online access a trusted archive of important scholarly journals. [JSTOR] ARTstor is a nonprofit organization developing a digital library of images and related information for researchers, teachers, curators, and students. [ARTstor] While both provide individuals with online access, only individuals authorized by particpating institutions, which pay license fees, may access the collections. Lessons learned in the development of ARTstor illustrate the importance of recognizing the different interests of various service groups, including content owners, end users and other repositories. ARTstor’s content owners include not only artists, photographers and other producers, but also several types of other repositories, including archives, libraries and museums, which have different standards, processes, and expectations. ARTstor’s end users are typically in such repositories, but they are not archivists, librarians or curators. Rather they are the end users of ARTstor’s clients. ARTstor found that there are signficant trade-offs in simultaneously building “user-driven” collections and accommodating client repositories interests in interoperability with their own collections and services. (Marmor, 2006)
Even for institutional repositories, success may very well hinge not only on how the repository exercises its own functions, but also on how it contributes to other activities within the institution: “The "growth industry" for IRs may very well depend upon identifying and implementing creative ways for researchers, students, and other campus professionals to use the scholarly information these repositories contain.” (Walters, 2006)
Clearly, different service groups have different needs. While specific criteria for success will differ according to the various services provided and the groups served, general criteria for evaluating service are:
One can define a spectrum of purposes ranging from prospective to retrospective. In a retrospective repository the emphasis is on preservation of assets while a repository with a prospective orientation will optimize the ability to satisfy demands of a user community. If the primary purpose is retrospective, to ensure that assets existent at any point in time are preserved intact for future times, then access to those aspects is a bi-product. Conversely, if the primary purpose is to support the needs and demands of a user community for information, long-term preservation of information assets may not be even necessary and at most it will be instrumental to the primary purpose. Preservation, in the strict sense, may not be necessary where repurposing or adapting information to capitalize on opportunities offered by new technologies for delivery is important. Obviously, even in repositories which highly prioritize service to users, often there will be an instrumental need for preservation simply because there can be no delivery of assets that no longer exist. In many cases, it will also be necessary to preserve information about the context in which information was originally created and used to enable appropriate interpretation of repurposed or derived products.
Criteria for success can be articulated relative to orientation.
A third dimension for evaluating success of a digital repository is how well it covers the universe of assets it should or might hold. Coverage should include both acquisition of assets and execution of functions against those assets. The scope of a repository’s holdings may be defined according to several different criteria: institutional, subject, geographic, or personal. Clifford Lynch defines an institutional repository (IR) as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” (Lynch, 2003) University digital libraries are typically institutional repositories. A subject-oriented repository is one which collects materials on a given subject, typically from a large number of sources, including institutional repositories. Professional societies, special purpose non-profits, such as ARTstor and JSTOR, and commercial enterprises are common creators subject-oriented repositories. Geographic repositories aggregate information about specific places or areas. They may contain very homogeneous data. For example, the U.S. Census Bureau’s State Data Centers maintain socio-economic data from the bureau about each of their states. [SDC] But geographic-oriented repositories may also assemble highly varied information. For example, the Online Archive of California (OAC) brings together historical materials from a variety of California institutions, including museums, historical societies, and archives.” [OAC] Personal digital repositories were defined from an architectural viewpoint by Robert Wilensky: “Personal libraries allow users to create collections of distributed documents with the functionality and discipline of full-fledged library services, yet at the same time, being lightweight, at cost comparable to managing resources in a file system or at a web site.” (Wilensky, 2001) Distinguishing “collections,” the sets of information assets assembled by an individual, from “repositories,” the mechanisms for hosting the assets, a personal digital library may combine materials housed in many different repositories, the personal digital library is seen as “a container to collect and synthesize data and information in the way that the individual needs and uses that data and information.” (Gandel, 2004) However, a personal library may also have a public face, where the creator of the collection may expose all or part of it to collaborators or even the general public (Beagrie, 2005; Borgman, 2003).
Relevant considerations for evaluation of coverage include:
In estimating the success of a repository, it is necessary to consider the environment in which it operates; in particular, whether it can achieve its purpose operating in isolation or whether it collaborates with other organizations in order to achieve success. This ‘isolated’ v. ‘collaborative’ spectrum defines a fourth category of criteria for evaluating success of a digital repository. A repository may be said to operate in isolation if internally it fulfills all the functions described in the OAIS model from the receipt of Submission Information Packages (SIPs) to the export of Dissemination Information Packages (DIPs). A repository may be said to be collaborative if it relies on any other institution(s) to fulfill any of its core functions. However, one needs to distinguish arrangements where the repository contracts with an outside service bureau for one or more services from those which require true collaboration. A repository is responsible for proper management of any contracts and, presumably, gets value from the contract in direct proportion to what it pays. Therefore, contracts for services would fall towards the ‘isolated’ end of the collaborative spectrum. A more collaborative arrangement would exist where the separate institutions independently execute missions or pursue goals which they recognize as complementary and decide to work together to leverage each other’s strengths, or more broadly where they form or join consortia for such purposes.
Over the entire spectrum from isolation to collaboration, one needs to evaluate whether a repository recognizes and exploits possibilities for collaboration. Some repositories may be constrained legally or by their institutional setting from entering into collaborative relationships. In the face of such constraints, one could not fault a repository for not exploiting collaborative possibilities. Absent external constraints on its options, it is legitimate to assess whether a repository which operates in isolation could improve its performance through collaboration. Similarly, one should consider whether a repository which does engage in some collaboration might have other opportunities for improvement through additional collaborations.
For actual collaborations, a relevant criterion for success is whether a collaborative relationship actually improves the repository’s performance over what it could achieve acting in isolation. Evaluation according to this criterion should take into account the nature of the collaboration. An example of a collaborative relationship is that between the Florida Center for Library Automation (FCLA) and its client libraries in pubically funded colleges and universities in Florida. FLCA provides the repository for digital versions of library-owned collections. It relieves individual schools’ need to preserve the digital materials, but does not provide direct services to library users in any of these schools[FCLA]. A success model based on the OAIS standard would ask how well a repository fulfilled all OAIS functions. However, this would be inappropriate for institutions participating in the FLCA program. One should not, for example, fault the FCLA for not providing any direct services to library users. In effect, the FCLA and its client libraries have split OAIS functions, with FCLA providing digital preservation and the libraries providing service to end users. This arrangement is asymmetric, but there are other permutations that could be described as collaborative, including peer-to-peer relationships, collaboration on voluntary standards for repository functions, sharing of best practices and lessons learned, and perhaps agreements on coverage.
The LOCKSS (Lots Of Copies Keep Stuff Safe) collaboration, exemplifies a peer-to-peer relationship. Each library in this collaboration performs the full range of functions needed in a digital repository, or at least any variations in functions are at the discretion of each institution, with the sole exception of preservation of digital materials. For preservation, each library is responsible for maintaining the authenticity, integrity and availability of its digital collection so that, if needed, other partners can obtains copies.(Maniatis, 2005)
The criterion of whether collaboration improves performance can be applied to both the FLCA and LOCKSS cases, but the application needs to be tailored to the context. In Florida, the question for each participating library is whether the centralization of repository and preservation services in FLCA reduces its costs and/or provides it will access to greater or better resources for these functions than it could achieve in isolation. In the LOCKSS alliance, use of the approach probably entails additional expenses over operating in isolation, because each member needs to acquire and operate a LOCKSS appliance, which enables coordination and sharing of copies on an as-needed basis. The question for a library participating in LOCKSS is whether the possibility of recovering lost or damaged items from other partners is worth the additional investment.
Realistically, any evaluation of the success of a digital repository must take into account its state of development. The development of a repository can range from just beginning to fully mature, but evaluation should also take into consideration the possibility that a repository may be more accurately described as reinventing itself. One should evaluate a new entity, such as ARTstor, differently than an established library or archives which is trying to extend its collections and services from hard copy to the digital realm. Even in long-established institutions, digital repositories are in relatively early stages of development.
“Libraries and archives are just beginning to grapple with the problem of capturing, managing, distributing, and preserving the digital material that their constituents are producing, and to effectively deal with this content requires not only new technological infrastructure but new policies and procedures, new core competencies of staff, and new business lines and cost models-in other words, significant transformation of the current models of institutional scholarly content management.” (Smith, 2005)
The consequences of this transitional state should be taken into account in
applying each of the other four success factors.
Abstract models of what a digital repository should be, what functions it should fulfill, and whether it merits trust need to situate digital repositories along several axes:
Answers to these questions describe the space in which a repository operates, and allow us to contextualize criteria for evaluating how well a repository achieves its objectives, given its resources and constraints.
Neil Beagrie (2005). Plenty of Room at the Bottom? Personal Digital Libraries and Collections. D-Lib Magazine. Volume 11 Number 6. <http://www.dlib.org/dlib/june05/beagrie/06beagrie.html >
Blythe, Erv and Chachra, Vinod (2005). The Value Proposition in Institutional Repositories. EDUCAUSE Review Articles. <http://www.educause.edu/LibraryDetailPage/666?ID=ERM0559>
Christine L. Borgman (2003). Personal digital libraries: Creating individual spaces for innovation. NSF Workshop on Post-Digital Libraries Initiative Directions. <http://www.sis.pitt.edu/%7Edlwkshop/paper_borgman.html>
Consultative Committee for Space Data Systems (CCSDS), Reference Model for an Open Archival Information System (OAIS) (July 2001) <www.ccsds.org/documents/pdf/CCSDS-650.0-R-5>
Antony N. Davies and Ann McDonough. (2005). Ensuring the Integrity of Electronic
Laboratory Notebook Records. Pharmaceutical Technology.
Florida Center for Library Automation. <www.fcla.edu/FCLAinfo/aboutinfo.html>
Nancy Fried Foster and Susan Gibbons (2005), “Understanding Faculty to Improve Content Recruitment for Institutional Repositories,” D-Lib Magazine 11, no. 1. <http://www.dlib.org/dlib/january05/foster/01foster.html>
Gandel, P., Katz, R. & Metros, S. (2004). The Weariness of the Flesh: Reflections on the Life of the Mind in an Era of Abundance. Educause Review, 39:2, pp. 40-51. <http://www.educause.edu/apps/er/erm04/erm042.asp>
Maniatis, P., Roussopoulos, M., Giuli, TJ , Rosenthal, D., Baker, M., and Muliadi,
"LOCKSS: A Peer-to- Peer Digital Preservation System", ACM Transactions on Computer Systems, Vol. 23, No. 1, 2–50. <www.eecs.harvard.edu/~mema/publications/TOCS2005.pdf>
[LOTAR] Project Group “LOTAR.” White Paper for Long Term Archiving and Retrieval of Product Data within the Aerospace Industry (LOTAR). Technical Aspects of an approach for application. Version 1.0. 2002. <www.prostep.org/file/17291.WP_LOTAR>
Clifford A. Lynch (2003). “Institutional Repositories: Essential
Infrastructure for Scholarship in the Digital Age,” ARL Bimonthly Report, no. 226 (February 2003). <http://www.arl.org/newsltr/226/ir.html>
Max Marmor (2006). Six Lessons Learned: An (Early) ARTstor Retrospective. RLG DigiNews. Volume 10, Number 2. <http://www.rlg.org/en/page.php?Page_ID=20916>
[RLG/OCLC] Research Libraries Group. Trusted Digital Repositories: Attributes and Responsibilities. An RLG/OCLC Report. RLG. Mountain View, CA. May 2002. <www.rlg.org/en/pdfs/repositories.pdf>
[RLG-NARA] RLG-NARA Task Force on Digital Repository Certification. An Audit Checklist for the Certification of Trusted Digital Repositories. Draft for Public Comment. RLG. Mountain View, CA. August 2005. <http://www.rlg.org/en/pdfs/rlgnara-repositorieschecklist.pdf>
MacKenzie Smith (2005). Exploring Variety in Digital Collections and the Implications for Digital Preservation. Library Trends. Urbana:Summer2005. Vol. 54, Iss. 1, p. 6-15 (10 pp.)
Tyler O. Walters (2006). Strategies and Frameworks for Institutional Repositories and the New Support Infrastructure for Scholarly Communications. D-Lib Magazine.Volume 12 Number 10. <http://www.dlib.org/dlib/october06/walters/10walters.html>
Wilensky, Robert (2001). Personal Libraries: Collection Management as a Tool
for Lightweight Personal and Group Document Management. Joint Conference on
Digital Libraries 2001.