Networked Knowledge Organization Systems

Knowledge Organization Systems can comprise thesauri and other controlled lists of keywords, ontologies, classification systems, clustering approaches, taxonomies, gazetteers, dictionaries, lexical databases, concept maps/spaces, semantic road maps, etc. These schemas enable knowledge structuring and management, knowledge-based data processing and systematic access to knowledge structures in individual collections and digital libraries. Used as interactive information services on the Internet they have an increased potential to support the description, discovery and retrieval of heterogeneous information resources and to contribute to an overall resource discovery infrastructure.

This issue of the Journal of Digital Information evolved from a workshop on Networked Knowledge Organization Systems (NKOS) held at the Fourth European Conference on Research and Advanced Technology for Digital Libraries (ECDL2000) in Lisbon during September 2000. The focus of the workshop was European NKOS initiatives and projects and options for global cooperation. Workshop organizers were Martin Doerr, Traugott Koch, Dougles Tudhope and Repke de Vries. This group has, with Traugott Koch as the main editor and with the help of Linda Hill, cooperated in the editorial tasks for this special issue.

The issue presents five papers on the general theme on both conceptual aspects and technical implementation of NKOS. These papers are introduced below, preceded in each case with a brief biographical note on the lead author.

Ken Miller and Brian Matthews, Having the right connections: the LIMBER project

Ken Miller heads the UK Data Archive Information Development team at the University of Essex, UK, and has researched information retrieval systems for the last 20 years. He is a qualified librarian and information scientist who for the last 14 years has been responsible for the programming development of the UKDA catalogue, keyword index, thesaurus and information retrieval tools. He is a member of the Data Documentation Initiative (DDI) committee, an international group of data producers and archivists with a focus on social science research and the development of a specification for the content, presentation, transport, and preservation of social science technical documentation. Within the LIMBER project he is site project manager responsible for the provision of the European Language Social Science Thesaurus (ELSST).

Miller and Matthews report on the LIMBER Project (Language Independent Metadata Browsing of European Resources). It is funded by the European Union with the purpose of investigating solutions for the sharing of resources from social science datasets across linguistic and discipline boundaries within "a more integrated European environment." The project is developing a four-language multi-lingual social science thesaurus, which is derived from the UK's Humanities and Social Science Electronic Thesaurus (HASSET). To encode the thesaurus, the LIMBER project is extending the RDF schema developed by the Institute for Learning and Research Technology (ILRT, Bristol) in the DESIRE project. Semi-automatic indexing tools will be designed to work with this format to assist in metadata creation and language translation for information retrieval. System architecture, application program interface calls and metadata interoperability are also discussed.

Douglas Tudhope, Harith Alani, Christopher Jones, Augmenting thesaurus relationships: possibilities for retrieval

Douglas Tudhope is reader in the School of Computing at the University of Glamorgan, UK, and lectures on multimedia. He leads the Hypermedia Research Unit at Glamorgan, which focuses on the cultural heritage domain. He is currently directing a UK Engineering and Physical Sciences Research Council (EPSRC) funded research project in collaboration with the UK Science Museum, investigating thesaurus-based retrieval with a view to widening access to digital collections. The project aims to explore the potential of thesaurus facet structure in retrieval and query formulation. Other research interests include the application of interactionist social science perspectives to prototyping and participatory design. He edits the journal New Review of Hypermedia and Multimedia.

Tudhope, with Harith Alani and Christopher Jones, presents an investigation into the value of thesaurus relationships for improved retrieval, particularly the potential of augmenting the associative (Related Term) relationship with hierarchical subtypes of relationships. The use of these relationships in semantic distance measures is investigated. The study also focuses on the spatial inferences that can be derived from relationships among geographic place names in thesauri and that may not have associated latitude and longitude coordinates representing geospatial location or that may have centroid point locations only. Voroni-based techniques are applied for a boundary approximation method.

Martin Doerr, Semantic Problems of Thesaurus Mapping

Martin Doerr is senior researcher in computer science with the Information Systems Laboratory at the Foundation for Research and Technology- Hellas (FORTH) in Greece. He has participated in responsible roles in a series of national and international projects on knowledge-based systems, cultural documentation and thesaurus management, including AQUARELLE, Term-IT and the Greek heterogeneous database project POLEMON. He leads a development team for advanced thesaurus management systems and has participated in the development of a standard domain ontology by the International Council of Museums, the CIDOC CRM. He is also chair of the CIDOC CRM Special Interest Group. His research interests are knowledge representation and conceptual modeling, in particular, access to heterogeneous data sources and multilingual thesaurus management.

Doerr investigates the problems of semantic mapping between thesauri and approaches to retaining original meanings and relationships through optimal mapping techniques. Better mappings, and therefore better retrieval performance, will be possible if thesaurus creators pay attention to construction principles that enhance the use of the terminology sets in networked access to distributed collections.

Jane Hunter, MetaNet - A Metadata Term Thesaurus to Enable Sematic Interoperability between Metadata Domains

Jane Hunter is a senior research scientist at the Distributed Systems Technology Centre at the University of Queensland, Australia. She is the project leader of the MAENAD (Multimedia Access for Enterprises across Networks And Domains) project, a principal investigator of the Harmony International Digital Library project, editor of the MPEG-7 Description Definition Language (DDL) and liaison between MPEG and W3C. Her research interests are multimedia metadata modeling and interoperability between metadata standards across domains and media types.

Hunter describes an investigation into an approach to mapping between metadata structures. The procedure involves the creation of a Metadata Term Thesaurus, MetaNet, "to provide the semantic knowledge required to enable machine understanding of equivalence and hierarchical (subtyping) relationships between metadata terms from different domains," the development of a RDF Schema representation for it, and a hybrid mapping approach adding the syntactic capabilities of XSLT. The work is an extension of the work of project Harmony to develop an event-aware metadata model; the thesaurus conversions are from this event model to representative resource-centric models.

Stephen Cranefield, Networked knowledge representation and exchange using UML and RDF

Stephen Cranefield is a senior lecturer in the Department of Information Science at the University of Otago, New Zealand, where he works in the areas of software engineering and distributed information systems development. His research is mainly focused on a New Zealand government-funded project to develop an agent-based architecture and associated infrastructure that will enable the construction of applications combining information from distributed and heterogeneous information sources.

Cranefield proposes the use of the Unified Modeling Language (UML) for "representing ontologies and knowledge about particular instances in the domains modeled by those ontologies" and develops mappings from XMI encodings (XML representation of ontologies expressed as UML class diagrams) to Java classes and to RDF schemas as a way to serialize knowledge representations in the form of object diagrams. Serialization is necessary to make ontological models available online so that knowledge can be shared between Java applications. Alternative ontological modelling languages are compared to the UML approach.

Networked Knowledge Organization Systems: introduction to a special issue