The nestor Catalogue of Criteria for Trusted Digital Repository Evaluation and Certification
Susanne Dobratz
Humboldt-University Berlin, University Library, 10099 Berlin, Germany
Tel: ++49 30 2093 7070
Email: dobratz [AT] cms [DOT] hu-berlin [DOT] de
Astrid Schoger
Bavarian State Library, Digital Library, 80328, München, Germany
Tel: ++49 89 28638 2600
Email: astrid [DOT] schoger [AT] bsb-muenchen [DOT] de
Stefan Strathmann
Göttingen State and University Library, Papendiek 14, 37073, Göttingen, Germany
Tel: ++49 551 39 78 06
Email: strathmann [AT] sub [DOT] uni-goettingen [DOT] de

Abstract

This paper describes the general approach nestor - the German "Network of Expertise in Long-Term Storage of Digital Resources" has taken in designing a catalogue of criteria for trusted digital repositories for long-term preservation. Further developments are intended to lead to the implementation of evaluation schemas and a formal certification process for trusted digital repositories.

Keywords

digital libraries, long-term preservation, certification, trustworthiness, digital repositories

Introduction

One of the central challenges to long-term preservation in a digital repository is the ability to guarantee the interpretability of digital objects for users across time. This is endangered by the aging of storage media, the obsolescence of the underlying system and application software as well as changes in the technical and organizational infrastructure. Malicious or erroneous human actions also put digital objects at risk. Trustworthy long-term preservation in digital repositories requires technical, as well as organizational provisions. A trustworthy digital repository for long-term preservation has to operate according to the repository's aims and specifications.

As the long-term preservation of digital objects is, globally speaking, in its infancy and little experience has been amassed to date, trustworthiness is not intended to, " give a declaration of guarantee for five or fifty years, but to enable institutions to develop strategies in order to cope with the continuous change of information technology in a responsible way"[1].

2. Background

In December 2004, the German nestor project (Network of Expertise in Long-term STOrage of Digital Resources - A Digital Preservation Initiative for Germany) set up the nestor Working Group on Trusted Digital Repository Certification, which consists of representatives from national, state and university libraries, federal and state archives, museums, data centers, publishers, and certification experts, from Germany and Austria. Taking into account the work of Digital Repository Certification Task Force of the Research Libraries Group (RLG) in 2002 [4] the nestor group has focused on identifying features and values that may be relevant in evaluating digital object repositories (those that already exist as well as those which are just emerging or, as yet, are only planned). The aim is to form a web of trustworthiness in which digital repositories can function as long-term digital archives within various environments: the library community, the archival world (in a traditional sense), the museum community, and other data producers such as government institutions, world data centers, and publishing houses.

In January 2005, the nestor group carried out a small-scale survey on recent standards and usage within digital repositories. It was followed by a public workshop in June 2005 and an expert round table in March 2006. The first major report in the form of a criteria catalogue was published in June 2006 [2].

3. The Target Group for the nestor Catalogue of Criteria

This catalogue primarily addresses cultural heritage organizations - archives, libraries, and museums - and is designed as guidance for the planning and setup of long-term digital repositories. Furthermore this catalogue is intended as an orientation guide for commercial and non-commercial service providers, software developers, and third party vendors.

Although the nestor catalogue is focused on application in Germany, and it is crucial to analyze generally accepted criteria with regard to the situation in Germany, it must be discussed internationally and should adhere to international standards. In evaluating repositories, various components must be considered such as specific judicial constraints, the setup of public institutions (financially and with respect to human resources), national organizational decisions, and the status of development in Germany as a whole.

Potential interest groups for trustworthiness are:

  • repository users who want to access trustworthy information - today and in the future,
  • data producers and content providers for whom certification provides a means of quality assurance when choosing potential service providers,
  • resource allocators, funding agencies and othe institutions that need to make funding and granting decisions, and
  • long-term digital repositories that want to gain trustworthiness and demonstrate this to the public either to fulfill legal requirements or to survive in the market.

4. Coaching - Self-Audit - Certification

Currently, no method has been developed that for the formal certification of long-term digital repositories according to this catalogue. For many of the abstract criteria expressed in the catalogue, it is not yet possible to define accepted standards on which auditing processes could be based. Therefore, nestor has for the moment focused on presenting the paper as a set of guidelines for setting up a trusted digital repository. We are convinced that this will be helpful for many institutions and will stimulate the development of trusted digital repositories. The catalogue can be used as an instrument for self-evaluation on all steps of development, from the concept and specification to implementation.

We regard that as the first step. As a second step, we intend to participate in a national/international standardization process via the German Standardization Organization (DIN) and the International Standardization Organization (ISO) and to establish a formal certification process, in which the catalogue will function as auditing tool.

Certification supports repositories that need to provide objective evidence, and it encourages competition even in the public sector. Competition is meant in those fields, where no formal or legal requirements exist to deliver digital materials to a particular long-term repository. A user will than decide independently where his digital materials will be archived. He will take the decision based upon the services, the quality and prize offered. In such an szenario, certification provides a quality label to the repository and therefore supports the quality management and assurance of public administration. Whenever data have to be archived, certification can be very important.

5. Concepts Central to the Catalogue of Criteria and the Evaluation of Trusted Repositories

5.1 Trustworthiness

Many digital objects are valuable assets, which can be endangered by decay or loss of integrity and authenticity. Trustworthiness (German: Vertrauenswürdigkeit; see discussion in the DOMEA-concept [8] of a system means that it operates according to its objectives and specifications (it does exactly what it claims to do). From an information technology (IT) security perspective, integrity, authenticity, confidentiality and availability are important building blocks of trustworthy digital preservation repositories. Integrity refers to the completeness and exclusion of unintended modifications to repository objects. Unintended modifications could arise, due to malicious or erroneous human behavior, or from technical imperfection, damage, or loss of technical infrastructure. Authenticity here means that the object actually contains what it claims to contain. This is provided by documentation of all changes to the object. Availability is a guarantee (1) of access to the repository by potential users and (2) that the objects within the repository are interpretable. Availability of objects is a central objective, which must be fulfilled in relation to the designated community and its requirements. Confidentiality means that information objects can only be accessed by permitted users.

There is a wide range of preservation repositories that exist or are under development: from national and state libraries and archives with deposit laws; to media centres having to preserve e-learning applications; to archives for smaller institutions; to world data centres in charge of "raw" data. For more examples see [2].

5.2 Steps towards a Trusted Digital Repository

A long-term digital repository is a complex interrelated system. Implementation of the individual criteria must always be seen in the light of the objectives of the overall system. The implementation of the long-term digital repository as well as the implementation of the individual criteria is executed as a multi-stage process with the following steps:

  1. Conception
  2. Planning and Specification
  3. Realisation and Implementation
  4. Evaluation

Because preservation is a process, these steps cannot be taken as a fixed model. Instead, the steps should be repeated throughout the development and management of digital repositories, when necessary.

The development itself is controlled and monitored by quality management. Quality management [10] defines the quality goals of the long-term digital repository. This includes a list of aims and responsibilities that allow for the definition and monitoring of an appropriate process structure. The quality management component defines all processes and their interdependencies, and verifies that responsibilities are assigned. This also applies to processes external to the organization, e.g. processes in connection with the submission of digital materials. Quality management provides procedures for documentation. The long-term digital repository defines rules for completeness, correctness, actuality, understandability, and availability of the documentation, and implements those rules and controls for adherence. The quality management component enables the long-term digital repository to adequately respond to substantial changes.

6. Basic Principles for the Derivation of Criteria

6.1 Abstraction

The catalogue's overall aim is to introduce stable criteria for a wide variety of long-term digital repositories and to maintain the criteria over a long period. For this reason, the catalogue criteria have been formulated at a very abstract level. They are enriched by detailed explanations and concrete examples. The latter conform to the current state-of-the-art in terms of technology and organisation. In some cases, they only make sense within the context of a very special preservation task.

6.2 Accordance to OAIS Terminology

The Reference Model for an Open Archival Information System (OAIS) [3] serves - where possible - as the basis for terminology and structure of the catalogue . The OAIS is used to describe core processes, from ingest via archival storage to access. The OAIS also helps to describe the life cycle of digital objects within the repository. The following information packages have been considered: Submission Information Package (SIP) for ingest, Archival Information Package (AIP) for archival storage, and Dissemination Information Package (DIP) for access. The term digital object is regarded as defined in the OAIS information model.

7. Basic Principles for the Application of the Criteria

7.1 Documentation

The goals, concepts, specifications and implementation of a long-term digital repository should be documented adequately. The documentation demonstrates the development status internally and externally. Early evaluation based on documentation may also prevent mistakes and inappropriate implementations. Adequate documentation can help to prove the completeness of the design and architecture of the long-term digital repository at all steps. In addition, quality and security standards require adequate documentation [1].

7.2 Transparency

Transparency is achieved by publishing appropriate parts of the documentation, which allows users and partners to gauge the degree of trustworthiness for themselves. Producers and suppliers are given the opportunity to assess to whom they wish to entrust their digital objects. Internal transparency ensures that any measures can be traced, and it provides documentation of digital repository quality to operators, backers, management and employees. Parts of the documentation which are not suitable for the general public (e.g. company secrets, security-related information) can be restricted to a specified circle (e.g. certification agency). Transparency establishes trust, because it allows interested parties a direct assessment of the quality of the long-term digital repository.

7.3 Adequacy

According to the principle of adequacy, absolute standards cannot be given. Instead, evaluation is based on the objectives and tasks of the long-term digital repository in question. The criteria have to be seen within the context of the special archiving tasks of the long-term digital repository. Some criteria may therefore prove irrelevant in certain cases. Depending on the objectives and tasks of the long-term digital repository, the required degree of fulfilment for a particular criterion may also differ.

7.4 Measurability

In some cases - especially regarding long-term aspects - there are no objectively assessable (measurable) features. In such cases we must rely on indicators showing the degree of trustworthiness. As the fulfillment of a certain criteria depends always on the designated community, it is not possible to create "hard" criteria for some of them, e.g. how can be measured, what adequate metadata is?

Transparency also makes the indicators accessible for evaluation.

8. A Metric for Certification Criteria

Three examples of different approaches currently in use in Germany are presented. The Certificate for Document and Publication Services issued by the German Initiative for Networked Information (Deutsche Initiative für Netzwerkinformation: DINI) distinguishes between minimum requirements and recommendations [5]. The DOMEA concept [7], used in the archives domain, deals with requirement groups. Each of the basic and specific requirements can be rated in a range from 0 to 4 points. Within each group, a minimum number of points must be achieved. The IT Grundschutzhandbuch (IT Basic Protection Manual) [9] published by the Federal Office for Information Security, uses an implementation status for each measurement.

Through several discussions, the nestor group came to the conclusion that a weighting of the different criteria should be avoided, since this is already implicitly included in the principle of adequacy. One could demand that all criteria of the nestor catalogue be fulfilled up to a certain level. Criteria that allow exceptions have to be marked and justified explicitly, whereupon the equality of alternatives has to be proven.

9. The Catalogue

Based on the initial nestor survey and similar to the approach taken by the Certification of Trusted Repositories Task Force by the National Archives and the Resaerch Libraries Group [5], the nestor working group used abstract criteria in the main catalogue instead of asking very detailed and specific questions (e.g. which metadata is used). The nestor catalogue includes best practice values and provides examples and specific literature references for the listed criteria, despite the need to update such examples regularly. The intention is that this criteria catalogue, and its planned revisions, will help customers to share information and expectations. The criteria composed in this catalogue are seen as a sufficient set to demonstrate the trustworthiness of a digital long-term repository.

9.1 Overview of the Criteria

Within the following table the term "repository" is taken as abbreviation for "long-term digital repository."

A

Organizational Framework

1

The repository has defined its goals.

1.1 selection criteria

1.2 responsibility for the long-term preservation of the information represented by the digital objects

1.3 designated community

2

The repository grants its designated community an adequate usage of the information represented by the digital objects.

2.1 Access for the designated community

2.2 interpretability of the digital objects by the designated community

3

Legal and contractual rules are being observed.

3.1 legal contracts between producers and the repository

3.2 repository operates on a legal basis regarding archiving

3.3 repository operates on a legal basis regarding usage

4

The organisational form is adequate for the digital repository.

4.1 adequate funding

4.2 sufficient numbers of qualified staff

4.3 organisational structure

4.4 repository engages in long-term planning

4.5 continuation of preservation tasks even beyond the existence of the repository

5

Adequate quality management is conducted.

5.1 definition of processes and responsibilities

5.2 documentation of elements and processes

5.3 reaction to substantial changes

B

Object Management

6

The repository ensures integrity of digital objects during all processing stages:

6.1 ingest

6.2 archival storage

6.3 access

7

The repository ensures authenticity of digital objects during all processing stages:

6.1 ingest

6.2 archival storage

6.3 access

8

The repository has a strategic plan for its technical preservation measures.

9

The repository accepts digital objects from its producers based on defined criteria.

9.1 specification of SIPs

9.2 identification of relevant features of the digital objects for the information preservation

9.3 technical control over its digital objects in order to execute preservation methods

10

The archival storage of the digital objects is undertaken to defined specifications.

10.1 definition of AIPs

10.2 transformation of the SIPs into AIPs

10.3 storage and readability of the AIPs

10.4 implementation of preservation strategies for AIPs

11

The repository permits usage of the digital objects based on defined criteria

11.1 definition of DIPs

11.2 transformation of AIPs into DIPs

12

The data management system is capable of providing the necessary digital repository function.

12.1. persistent identification of objects and their relations

12.2. metadata for content and formal description and identification of the digital objects

12.3 metadata for structural description of the digital objects

12.4 metadata for documenting changes made on the digital objects

12.5 metadata for the technical description of the digital objects

12.6 metadata for the usage rights and terms of the digital objects

12.7 The assignment of metadata to the digital objects is guaranteed every time

C

Infrastructure and Security

13

The IT infrastructure is adequate

13.1 The IT infrastructure implements the demands from the object management

13.2 The IT infrastructure implements the security demands of the object management

14

The IT infrastructure implements the object management demands.

9.2 Example Criteria

A criterion consists of 4 parts: the criterion itself, an explanation, possible examples and citations.

1.1 The digital repository has developed criteria for the selection of its digital objects.

The DR should have laid down which digital objects fall within its ambit. This is often determined by the institution's overall task area, or is stipulated by laws. The DR has developed collection guidelines, selection criteria, evaluation criteria or heritage generation criteria. The criteria may be content-based, formal or qualitative in nature.

Examples:

In the case of both state-owned and non-state-owned archives, the formal responsibility is generally derived from the relevant laws or the entity behind the archive (a state-owned archive accepts the documents of the state government, a corporate archive the documents of the company, a university archive, the documents of the university).

German National Library law:

The Library is tasked with:

1. collecting, making an inventory of, analysing and bibliographically recording a) originals of all media works published since 1913 and b) originals of all foreign media works published in German since 1913, and ensuring the long-term preservation of these works, rendering them accessible to the general public, and providing central library and national library services.

Supported by the state libraries, the Baden-Württemberg online archive (BOA - http://www.boa-bw.de/ ) collects net publications ..."which originate in Baden-Württemberg, or the content of which is related to the state, its towns and villages or inhabitants."

The Oxford Text Archive http://ota.ahds.ac.uk/ collects "high-quality scholarly electronic texts and linguistic corpora (and any related resources) of long-term interest and use across the range of humanities disciplines". The website contains a detailed "collections policy".

The document and publication server of the Humboldt University in Berlin collects "electronic academic documents published by employees of the Humboldt University" http://edoc.hu-berlin.de/e_info/leitlinien.php.

References.

[Erpanet: Erpanet "Appraisal of Scientific Data" conference, 2003]

[Interpares Appraisal Task Force: Appraisal of Electronic Records: A Review of the Literature in English, 2006]

[Wiesenmüller, Heidrun et al.: Auswahlkriterien für das Sammeln von Netzpublikationen im Rahmen des elektronischen Pflichtexemplars : Empfehlungen der Arbeitsgemeinschaft der Regionalbibliotheken, 2004]

9.1 The digital repository specifies its Submission Information Packages, SIPs.

The DR should inform the producers or suppliers, or agree with them, which digital objects (SIPs) it will accept. These agreements should allow the transfer or the collection to be automated, and workflows for submission to the DR to be implemented.

These specifications are the basis for quality checking of the SIPs.

Examples:

SIPs may contain content data and also metadata, e.g. to establish their authenticity.

In the case of harvesting based on offline browsers, only text content, but not audio, video and other multimedia content is collected (through the selection or exclusion of specific file formats).

The file formats of the transfer objects can be validated using JHOVE (cf. http://hul.harvard.edu/jhove/) as a quality check.

The DR should recommend file formats for the SIPs, e.g. GeoTif for remote reconnaissance data, or Seed/MiniSeed as the format for geodata, as used in GeoFon (http://www.gfz-potsdam.de/geofon/).

13.1 The requirements specified by the DR regarding the handling of objects should be implemented by the overall system at all stages of processing.

This includes the main processes (in OAIS: "functional entities") of Ingest

Examples:

Web-Ingest-Module, module for bulk ingest in batch operation

Storage module with access to a different, geographically separate storage system.

Usage module

If the DR policy includes registered users being able to feed their photo collections themselves into the DR, assuming these are available as JPEG files, the DR must then provide a suitable upload interface for users.

References.

[Borghoff,Uwe M.u.Mitarb.Univ.d.Bundeswehr München,Fak.f.Informatik,Inst.f.Softwaretechnologie: nestor - materialien 3: Vergleich bestehender Archivierungssysteme, 2005]

10. Conclusions and Further Work

The work of the nestor project is the first step toward guidelines and criteria to establish and build trusted digital repositories. Next, it will be necessary to come to internationally accepted "common principles". This step will be undertaken as a next effort of three initiatives: nestor II (Project will be funded 2003-2006 by the German Ministry of Education and Research), the Digital Curation Centre and the Project Digital Preservation Europe (DPE) funded by the European Commission as well as the new OCLC-RLG/ NARA Trusted Repository Task Force. The projects have been interchanging information about the current status of their work regularly and have com up with 10 basic common principles in January 2005, which were formulated on an abstract level (those will be published in spring 2007). Joint test audits with European digital long-term repositories are planned. It is agreed by all partners that international agreements are inevitable to ensure the interoperability of long-term repositories, their quality and their services offered on an international level. Nevertheless all partner also see the urgent necessity to provide guidelines and tools to their national heritage institutions and making repositories work according to local conditions and national law.

As nestor is always interested to operate on a global scale, interactions, discussion and common developments with major European preservation initiatives like CASPAR or PLANETS are on the agenda.

Acknowledgements

We greatly acknowledge the whole nestor working group on trusted repository certification, see http://www.longtermpreservation.de/ag-repositories, especially the following colleagues: Dr. Andrea Hänger, Karsten Huth, Max Kaiser, Dr. Christian Keitel, Dr. Jens Klump, Dr. Nikola Korb, Peter Rödig, Dr. Stefan Rohde-Enslin, Kathrin Schröder, and Heidrun Wiesenmüller.

We also thank Robin Dale (Research Libraries Group), Seamus Ross and Andrew McHugh (Digital Curation Centre) for the fruitful discussions and help with the English translation.

References

[1] Liegmann, H. and Schwens, U. (2004) "Die digitale Welt - eine ständige Herausforderung" In: Grundlagen der praktischen Information und Dokumentation: Volume 1, edited by Rainer Kuhlen, Thomas Seeger and Dietmar Strauch (München: Saur , 5th edition. , Preprint:http://www.langzeitarchivierung.de/downloads/digitalewelt.pdf

[2] nestor. (2006) "Kriterienkatalog vertrauenswürdige digitale Langzeitarchive Version 1 (Entwurf zur öffentlichen Kommentierung)" edited by nestor - Network of Expertise in Long-Term Storage of Digital Resources and nestor Working Group on Trusted Repositories Certification (Frankfurt am Main: nestor), http://nbn-resolving.de/urn:nbn:de:0008-2006060710,
English translation: nestor (2006) Criteria for Trusted Digital Long-Term Preservation Repositories - Version 1 (Request for Public Comment), edited by nestor - Network of Expertise in Long-Term Storage of Digital Resources and nestor Working Group on Trusted Repositories Certification (Frankfurt am Main: nestor) nestor materials 8, http://nbn-resolving.de/urn:nbn:de:0008-2006060703

[3] OAIS (2002) "Reference Model for an Open Archival Information System (OAIS): CCSDS 650.0-B-1: Blue Book" . Edited by the Consultative Committee for Space Data Systems, http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf

[4] RLG/OCLC (2002) "Trusted Digital Repositories: Attributes and Responsibilities: An RLG-OCLC Report" edited by RLG/OCLC Working Group on Digital Archive Attributes, http://www.rlg.org/en/pdfs/repositories.pdf

[5] RLG-NARA (2006) "Audit Checklist for Certifying Digital Repositories" Edited by the RLG-NARA Task Force on Digital Repository Certification, http://www.rlg.org/en/pdfs/rlgnara-repositorieschecklist.pdf

[6] DINI (2003) DINI-Certificate Document and Publication Repositories edited by the Deutsche Initiative für Netzwerkinformation (DINI) Working Group Electronic Publishing,http://nbn-resolving.de/urn:nbn:de:kobv:11-10046073.

[7] DOMEA-concept (Konzept für Dokumenten-Management und elektronische Archivierung in der üffentlichen Verwaltung): http://www.kbst.bund.de/cln_006/nn_836960/Content/Standards/Domea__Konzept/domea__node.html__nnn=true

[8] Henry Gladney (2002): "Perspectives on Trustworthy Information". In Digital Document Quarterly (DDQ), Volume 1, Number 2, 1Q2002, http://home.pacbell.net/hgladney/ddq_1_2.htm

[9] Grundschutzhandbuch (2004) "Leitfaden IT-Sicherheit IT-Grundschutz kompakt" edited by the: Bundesamt für Sicherheit in der Informationstechnik, http://www.bsi.de/gshb

[10] DIN EN ISO 9000ff, Quality Management, Beuth-Verlag, 2006, CD-ROM

[11] ISO 15489-1 Information and documentation - Records Management, 2001-09-15

[12] Security Evaluation Common Criteria for Information Technologie Security Evaluation, Version 2.1 edited by Bundesamt für Sicherheit in der Informationstechnik, http://www.bsi.bund.de/cc/ccengl/downcc21.htm

[13] [Borghoff,Uwe M.u.Mitarb.Univ.d.Bundeswehr München,Fak.f.Informatik,Inst.f.Softwaretechnologie: nestor - materialien 3: Vergleich bestehender Archivierungssysteme, 2005]

[14] Erpanet: Erpanet "Appraisal of Scientific Data" conference, 2003

[15] Interpares Appraisal Task Force: Appraisal of Electronic Records: A Review of the Literature in English, 2006

[16] Wiesenmüller, Heidrun et al.: Auswahlkriterien für das Sammeln von Netzpublikationen im Rahmen des elektronischen Pflichtexemplars : Empfehlungen der Arbeitsgemeinschaft der Regionalbibliotheken, 2004