Introduction to a Special Issue on Metadata: Selected papers from the Dublin Core 2001 Conference

Traugott Koch and Stuart Weibel*
Special issue editor, Department of Development & IT, DTV, Denmark
and NetLab, Lund University, Sweden
Email: Traugott.Koch@ub2.lu.se
*Executive Director, Dublin Core Metadata Initiative, OCLC Office of Research, Dublin, OH, USA
Email: weibel@oclc.org

The fertile domain of digital library research can be seen as a midwife in the transformation of physical libraries (static objects organized according to constraints of geographic location and physicality) to information streams, created, directed, diverted and constrained by social, economic and technological processes that are in some ways the same and in others, radically different. Metadata can be thought of as a lubricant for information flow, easing the difficulty of discovery and organization of resources. It can also serve as a nozzle -- directing, channeling, and focusing information flow to make it more manageable and effective. The authors and editors of this special issue hope to help illuminate how technological change and global electronic intimacy change practice and opportunity in this rapidly evolving arena.

This issue of the Journal of Digital Information evolved in cooperation with the organizers and the program committee of DC-2001, the International Conference on Dublin Core and Metadata Applications held in Tokyo, Japan. JoDI is grateful for this cooperation, bringing important ideas concerning metadata to the attention of its readership in this focused way. The DC-2001 organizers are also grateful to have this collection of articles from the conference brought before an International readership that furthers the reach and impact of the event.

Common interests were identified and activities coordinated in an early phase, leading to a large degree of shared peer review for the conference presentations and the JoDI issue. Additional review and revision cycles resulted in the selection of eight among more than 50 total submissions to the conference track. Compared with the versions of the papers published in the conference proceedings, the authors of the selected papers had additional opportunity to revise and adapt papers to the specifications of JoDI reviewers and to the interests of the JoDI audience.

Conferences and journals have related, but separate and complementary goals. One of the objectives of the DC-2001 conference was to attract reports on pilot projects and the early experiences of practitioners, and bring these practitioners together. The conference papers included many good papers presenting efforts to construct domain-specific metadata profiles or exploring various practical dimensions of metadata applications. The contributions in this special issue focus instead on metadata models, querying of metadata, an architecture for a specific application area, and a first empirical study of experiences with metadata creation.

The papers by Lagoze/Hunter and by Kunze represent two extremes along a scale of increasing complexity in metadata models for different applications.

J. Kunze, A Metadata Kernel for Electronic Permanence

John Kunze (University of California) proposes a basic metadata kernel, four semantic elements, in the interest of simplicity and low cost. He intents to straighten up and to more tightly define and control approaches contained in the Dublin Core Metadata Element Set and rules. An Electronic Resource Citation (ERC) is developed as the basic record format. For the sake of human readability this is deliberately not encoded in XML. Several useful ideas could very well be used in other contexts, e.g. codes for missing values. ERC can serve as a metadata exchange format as well. Moving away from simplicity, the paper demonstrates a lot of extensions useful for the application area it has been tested in, the preservation of digital documents. With the Archival Resource Key (ARK), Kunze has proposed a persistent identification scheme that allows a service commitment statement to be specified and to connect to object-describing metadata. The paper discusses some of the perceived weaknesses of Dublin Core. Kunze sees the metadata kernel development as a necessary parallel strategy to the complex route of XML/RDF encoding, namespaces and application profiles. Even if it might not emerge as a new syntax alternative for metadata, it may well have a positive influence on other metadata schemes.

C. Lagoze and J. Hunter, The ABC Ontology and Model

The ABC metadata model can certainly be said to aim at handling a very high level of complexity and semantic control in large applications. Carl Lagoze (Cornell University) and Jane Hunter (DSTC, Brisbane, Australia) have extended the approach from the Harmony digital library project in cooperation with the CIMI consortium of museums. The model aims to support analysis of and mapping between metadata ontologies from different domains. Communities can develop their own metadata ontologies by extending the entities and relationships of the ABC model, e.g. time, object modification, agency (people, organizations, instruments), places and concepts. ABC incorporates intellectual creation semantics (work, manifestation, item) from the IFLA FRBR work. Highly expressive but expensive descriptions are necessary to support advanced queries, however. A focus of ABC is the ability to model the creation, evolution and transition of objects over time, the lifecycle aspects, which makes it especially suitable for museum, archive and rights management applications, apart from digital resources in general. The model has been applied to a repository of examples of RDF-encoded object descriptions. It will be exciting to see the first full-scale implementation. An adapted search interface is available as well. A complete reference to the model (classes, properties, their definitions and domains) and example applications are provided in the paper.

T. Baker, M. Dekkers, R. Heery, M. Patel and G. Salokhe, What Terms Does Your Metadata Use? Application Profiles as Machine-Understandable Narratives

Metadata registries and application profiles are the focus of activities in the EU "Forum for Metadata Schema Implementors - SCHEMAS". Thomas Baker (GMD, Germany) and his co-authors from the UK and Luxemburg present a possible approach for a metadata registry. The objective reasons for creating registries and the user requirements are investigated. From human-readable declarations of namespaces with their standard definitions of metadata terms and statements about the use of these terms in particular projects or domains in an application profile, the perspective is extended to the longer-term goal of providing a machine-processable basis for automating crosswalks and conversions. Thus, it is hoped large registries will promote convergence on good-practice solutions and increased interoperability. A prerequisite for that to happen is agreement on a conceptual model about vocabularies and on the functions of a registry and standardised conventions for describing the vocabularies machine-understandably. The SCHEMAS work reported here proposes a model based on statements expressed in RDF's basic grammar making up a narrative about the metadata application. Search in a registry is foreseen as querying RDF statements in a process of identifying shared nodes. Finally, problems of validation, mapping, suitable schema standards (RDF Schema, XML Schema) and the risk of semantic drift are discussed.

C. Anutariya, V. Wuwongse, K. Akama and E. Nantajeewarawat, RDF Declarative Description (RDD): A Language for Metadata

Chutiporn Anutariya (Asian Institute of Technology) and co-authors from Thailand and Japan have developed a metadata modeling language called RDF Declarative Description (RDD). Uniform representation and reasoning with RDF metadata is accomplished by an integration of the RDF data model with its description facilities and interoperability, Declarative Description theory for expressiveness in representing ontological and domain axioms and the Equivalent Transformation (ET) paradigm for a powerful computational and query processing mechanism. RDD is claimed to overcome weaknesses of RDF Schemas, the DAML-family of ontology markup languages and several RDF query languages, and to include the capability to represent all RDF-based languages directly.

C. Dyreson, M. Bohlen and C. Jensen, METAXPath

The XPath language for specification of locations in an XML document, a basis for query languages like XSLT, is extended by Curtis Dyreson (Washington State University, USA) and his Danish colleagues into the METAXPath data model and query language. Metadata and data are separated into different dataspaces creating new levels in a nested XPath document tree. The query language, improved with concepts from an SQL-like query language for semi-structured databases, keeps both separated. It allows sharing of metadata common to a group of nodes without duplication. The application of METAXPath techniques to a more complete query language and its implementation still remains to be done.

D. Wen, T. Sakaguchi, S. Sugimoto and K. Tabata, Multilingual Access to Dublin Core Metadata of ULIS Library

Problems of multilingual access to monolingual metadata in databases are addressed by Danyang Wen and colleagues from the University of Library and Information Science, Tsukuba, Japan. Solutions for display and input of Japanese words supporting overseas users are developed. An application offers access to a collection of metadata about international Library and Information Science related Web sites, described in Japanese with the addition of pronunciation information. The display solution is based on multilingual HTML technology (a MHTML Java applet) developed by the authors. Based on statistics of user behavior, candidates of translated Japanese words are offered as search terms and so are proper Boolean query expressions. The English language is used as a "switching language" for translations between any other language and Japanese. The service is hampered by the shortage of free online dictionaries, though, and by different character code representations.

A. Apps and R. MacIntyre, zetoc: a Dublin Core Based Current Awareness Service

Ann Apps and Ross MacIntyre from the University of Manchester, UK, describe Dublin Core applications to journal article and conference paper tables of contents in a large current awareness service of the British Library. The zetoc solution is based on a Bath Profile compliant version of the Z39.50 retrieval protocol. Its co-working with Dublin Core records in XML syntax is shown in detail. zetoc and its upcoming enhancements illustrate well the advantages and disadvantages of different approaches to representing bibliographic article and conference paper citations in Dublin Core (as discussed in the DCMI Citation Working Group) and in homegrown metadata formats. Experiments with several other standards, like SFX reference linking, RDF Site Summaries and OpenURL are foreseen.

J. Greenberg, M. Pattuelli, B. Parsia and W. Robertson, Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization

One of the first empirical studies of "manual" metadata creation by document authors using the Dublin Core metadata standard is the one carried out by Jane Greenberg and her colleagues at the University of North Carolina, Chapel Hill, with authors from the (US) National Institute of Environmental Health Sciences. Although the empirical basis is rather small this is an important baseline study. There is some encouraging evidence that hypotheses could be proven true, that authors in an organizational setting can create acceptable quality metadata for Web resources using Dublin Core and that a simple Web template with some guidance seems to be a sufficient tool. The metadata-creating authors in this study support the view that metadata is valuable for resource discovery and that authors should be involved in its creation for their own publications. Results from this and similar evaluation studies could contribute to developing metadata metrics and quality measures.