New Applications of Knowledge Organization Systems: introduction to a special issue

Douglas Tudhope and Traugott Koch*, special issue editors
Hypermedia Research Unit, University of Glamorgan, Pontypridd, Wales, UK
Email: dstudhope@glam.ac.uk
*Knowledge Technologies Group, NetLab, Lund University Libraries, Sweden
Email: traugott.koch@lub.lu.se

While Web search engines have made advances in recent years, the problems of keyword searching are well known. Significant differences in results stem from trivial variations in search statements. These problems can be alleviated by controlled vocabularies, which also serve as a resource for expressing an indexing concept or information need. Knowledge Organization Systems/Services (KOS), such as classifications, gazetteers, lexical databases, ontologies, taxonomies and thesauri, model the underlying semantic structure of a domain. Embodied as Web-based services, they can facilitate resource discovery and retrieval. They act as semantic road maps and make possible a common orientation by indexers and future users (whether human or machine).

New networked KOS (NKOS) services and applications are emerging and we are reaching the stage where we might begin to exploit common representations and protocols for distributed use. We are not yet in this situation. However, we have the opportunity to draw on a number of technologies that can be combined to yield new solutions. These include developments of information science approaches to automatic indexing and vocabulary mapping, search systems and interfaces, together with contributions from other disciplines. For example, we can formalise and enrich existing representations for automated KOS applications, exploiting the infrastructure of semantic Web encoding schemes and technologies. Language engineering offers a variety of tools and linguistic resources. New standards in Web services are emerging and user-centred approaches are yielding more sophisticated models of how users search for information.

This issue of the Journal of Digital Information has its origins in two recent workshops on Networked Knowledge Organization Systems/Services (NKOS) held at:

Subsequently, a general call for papers on the issue's theme, New Applications of Knowledge Organization Systems, was disseminated in September 2003. Traugott Koch has been the main editor for the issue, assisted by Douglas Tudhope as co-editor.

In total, 13 submissions were received. After reviewing, five papers have been selected for this issue on new applications of Networked Knowledge Organization Systems/Services, covering both conceptual aspects and technical implementations. These papers represent a selection of the latest work in the expanding NKOS community. We hope that they will serve to stimulate further work in the area. We also hope that future issues of JoDI may be able to combine NKOS developments with contributions from researchers in related communities such as language engineering and linguistics, ontology engineering and the semantic Web.

The papers published here are concerned with various types of KOS, discuss various standards issues and span the information lifecycle. They are introduced below, starting with reengineering existing KOS, moving on to mapping techniques between vocabularies, then KOS service protocols and interfaces, an application of KOS to teaching scientific concepts, and finally a theoretical and historical foundation of a Semantic Web for Culture.

Reengineering Thesauri for New Applications: The AGROVOC example

Soergel et al. are concerned with the process of moving from a traditional KOS (in their case a thesaurus) to an ontology. Following a UMLS approach, they develop a conceptual model that encompasses three levels: concepts, terms and strings. The aim is to retain the terminological richness of a good thesaurus and to augment this with more formal structuring of hierarchies and more detailed relations better suited to conceptual reasoning. The UN Food and Agriculture Organization's AGROVOC thesaurus is the focus of their analysis and formal modelling. They develop an inventory of specific relationship types with well-defined semantics for the agricultural domain and explore an intelligent 'rules-as-you-go' approach to streamlining the time-consuming reengineering process. A transition procedure is proposed to support the future large-scale development effort to transform AGROVOC into an ontology or 'semantically enriched KOS' for food and agriculture. The appendix proposes a medium-range solution for the XML/RDF encoding of KOS.

Vocabulary Mapping for Terminology Services

Vizine-Goetz et al. discuss results from an OCLC project to create inter-vocabulary associations automatically. The case study mapped the ERIC thesaurus to the Library of Congress Subject Headings by encoding the vocabularies according to MARC (MAchine Readable Cataloging) standards, automatically matching vocabulary terms, and storing mapping data as machine links. Detailed results are provided and effectiveness of different mapping techniques discussed. Human evaluation shows good results for the validity of the automated mappings. The OAI protocol is used to provide access to a vocabulary with mappings, via a browser to human users and through the OAI-PMH Web service mechanisms to machines. This is demonstrated by rich links from the paper (and example records) to the experimental services and to subject authority records. First steps are taken to offer access in multiple protocols and formats as part of future terminology services.

KOS at Your Service: Programmatic Access to Knowledge Organisation Systems

Binding and Tudhope review the literature on protocols for distributed access to thesauri and offer suggestions for further development of thesaurus service protocols for search and browsing. The paper reflects on the experience of building a Web demonstrator for the FACET research project. The Web system provides dynamically generated interface components for finding terms and browsing the thesaurus, building a query and returning ranked results using term expansion. The data requirements of the different interface elements are described in detail and live links provided to different parts of the (remote) FACET demonstrator. They argue that basing distributed protocol services on the atomic elements of thesaurus data structures and relationships is not necessarily the best approach in bandwidth-limited environments. Interfaces that seek to provide rich KOS content require a service-oriented approach, where a basic protocol function can provide combinations of KOS data elements. The paper also proposes a semantic expansion service as an advanced protocol option for both browsing and query services.

Building Semantic Tools for Concept-based Learning Spaces -- Knowledge Bases of Strongly-Structured Models for Scientific Concepts in Advanced DL

Smith et al. discuss their work in developing a concept-based Digital Learning Environment, which has been used for teaching geography courses at the University of California, Santa Barbara. They argue that the effective teaching of scientific concepts is facilitated by digital library systems which provide integrated knowledge organization structures. These can combine various aspects of scientific knowledge at a conceptual level. Their system is based on a strongly-structured model for scientific concepts that integrates a domain KOS with associated metadata and specialized learning models, concerned for example with the interrelationship of concepts. The operational classroom system also makes use of various visualization tools. Three screens at the front of the class project the lecture in progress (from a collection of reusable presentation materials), together with relevant windows on the concept knowledge base and example learning objects. This allows, for example, the lecturer to explain a concept, while simultaneously illustrating relationships with other concepts and examples from real life situations.

Towards a Semantic Web for Culture

Veltman critically reviews the topical notion of the 'semantic Web'. He argues that it is based on a restricted and static sense of meaning. The paper is long for a conventional journal article. However, as an electronic journal, JoDI is able to publish extended historical treatments of important issues, with supporting data (and appendices). This allows an intersection of new information technology possibilities with reflection in the best tradition of humanity scholarship. A review of the historical context of the semantic Web uncovers various issues, such as the dynamic nature of knowledge, which have not been adequately considered. Veltman claims that a truly cultural semantic Web should facilitate the study of how meaning and knowledge organization can vary from culture to culture. He discusses how cultural and historical dimensions of cultural diversity are key to understanding practical applications of knowledge organization systems. The paper proposes new ways of visualizing knowledge using a time/space horizon to make possible a 'history of questions as well as of answers'. This has the potential of facilitating the exploration of different theories and world-views while appreciating the mutable and contextually dependent nature of the meaning of terms, crucial issues for today's interrelated global society.