Networked Knowledge Representation and Exchange using UML and RDF

Abstract

This paper proposes the use of the Unified Modeling Language (UML) as a language for modelling ontologies for Web resources and the knowledge contained within them. To provide a mechanism for serialising and processing object diagrams representing knowledge, a pair of XSLT stylesheets have been developed to map from XML Metadata Interchange (XMI) encodings of class diagrams to corresponding RDF schemas and to Java classes representing the concepts in the ontologies. The Java code includes methods for marshalling and unmarshalling object-oriented information between in-memory data structures and RDF serialisations of that information. This provides a convenient mechanism for Java applications to share knowledge on the Web.

1 Introduction

The ever increasing amount of information available via the World Wide Web has led to efforts to enhance the efficiency and selectivity of search engines and other automated document processing tools by adding semantic information to Web pages. Techniques used have included the addition of simple keyword/value pairs from an unconstrained vocabulary (e.g. using HTML META tags), the encoding of information in terms of simple structured models such as the Dublin Core metadata elements, and the use of structured data representing instances of the concepts from ontologies defined using an ontology modelling language such as SHOE, OIL or DAML.

The full potential of any approach to semantic markup can only be realised when (and if) its language or vocabulary and any associated technologies become widely known and supported. When a critical mass of Web-based resources can be harvested or can even actively interoperate in terms of knowledge rather than uninterpreted data, many new and beneficial applications should emerge.

The development of the Extensible Markup Language (XML), the many free tools and application programmer interfaces for XML-based processing, and the inclusion of XML support in many commercial products have all helped achieve a degree of standardisation for the transport of structured data across the Web. Similarly, the Resource Description Framework (RDF) and the development of publicly available RDF tools has led to RDF becoming an emerging standard for expressing simple metadata content, with RDF Schema (W3C 2000a) providing a simple mechanism for defining metadata schemas or ontologies.

In the area of ontology-based markup languages there is still much research to be done, and although increasing participation in the OIL and DAML projects seems likely to gain these languages significant followings, it will be some time before any one approach gains the level of recognition and tool support currently enjoyed by the simpler RDF approach.

This paper presents an alternative approach for modelling ontologies and encoding the knowledge content of Web pages-one that is based on an existing standard with a huge following and a rich offering of tools (both commercial and freely available): the Unified Modeling Language (UML) (Booch et al. 1998) from the field of software engineering. UML provides a collection of modelling constructs and an associated graphical notation for the analysis and design of object-oriented software systems. In particular, UML provides an expressive toolkit for modelling a problem domain as a class diagram.

UML has already been selected as the modelling language for the Metadata Coalition, and its benefits for representing ontologies in agent-based systems have been argued elsewhere (Cranefield and Purvis 1999; Cranefield et al. 2000). However, although there is a standard XML-based format for exchanging models (i.e. ontologies) defined using UML-via the XML Metadata Interchange (XMI) format (OMG 2000b)-no practical technology for exchanging instances of a UML model has yet been developed, let alone standardised. When UML is applied to the description of Web resources, this means that a mechanism exists for exchanging ontologies but not the corresponding object-oriented structures encoding the knowledge in a particular resource.

This paper describes a solution to this problem: a procedure for generating from a UML class diagram a specialised RDF schema and a set of Java classes corresponding to the classes in the model. The Java code includes methods for marshalling and unmarshalling object-oriented information between in-memory data structures and an RDF representation of that information (via its XML encoding). The generation is performed by a pair of Extensible Stylesheet Language Transformations (XSLT) stylesheets. The translation from a class diagram to RDF Schema and the encoding of knowledge using RDF are illustrated using a simple example.

2 UML for knowledge representation and exchange

The Unified Modeling Language is a language and associated graphical notation for object-oriented analysis and design. The object-oriented modelling paradigm has become the mainstream technique in the software industry based on the widely accepted view that object-oriented modelling fits well with people's intuitive models of the world (Booch 1994).

UML is a standard from the Object Management Group (OMG). The OMG is a consortium of around 800 member companies and institutions involved in software engineering. Therefore, UML has a very large and rapidly expanding user community and the language is widely taught in universities. There are also many tools available for creating and editing models in UML using direct manipulation of the models' graphical presentation.

It has previously been noted (Cranefield and Purvis 2000) that UML models have a number of features commonly regarded as characteristic of the declarative knowledge representation paradigm (Genesereth and Nilsson 1987):

Knowledge expressed using UML is directly accessible for human comprehension (via its standard graphical presentation) and for machine processing (via the XMI model interchange format and associated software libraries or the application programmer interface defined by the OMG's Meta Object Facility).
Knowledge in a UML model can be changed easily due to the modular nature of object-oriented modelling. Changes to one feature in the model do not generally affect other features.
UML models can be used for purposes that were not anticipated at the time of model creation. In other words, UML is an abstract modelling language, not tied to any particular application.
New knowledge can be derived from UML models by reasoning about their contents. In particular, UML has an associated constraint language-the Object Constraint Language (OCL)-that can be used to define derived model elements (those that can be computed from other elements) and to assert arbitrary constraints on the possible instances of a model. This aspect of UML has not been well supported by tools in the past, but a number of OCL-aware tools are beginning to appear (e.g. ModelRun and USE).

With this viewpoint, UML can be regarded as a suitable candidate for knowledge representation. There are several types of diagram defined in UML for modelling the static and dynamic behaviour of a system. For application to knowledge representation and exchange the two relevant diagram types are:

Class diagrams: Class diagrams provide a rich notation for defining classes, their attributes and the relationships between them. They can therefore be used to define ontologies in an object-oriented fashion.
Object diagrams: Given a set of ontologies described using class diagrams, knowledge about the domains described in these ontologies can be expressed as instances of the classes in the ontologies. This knowledge can therefore be formalised as a UML object diagram, which consists of objects, their attribute values and the links between them (which are instances of the associations between classes).

The following subsections present examples of these two types of diagram.

2.1 Ontologies as class diagrams: the family ontology

Figure 1 presents an ontology describing family relationships. To keep the example simple, only the relationships between parents and children are included.

Figure 1. An ontology as a UML class diagram

To understand this UML class diagram, it is sufficient to know the following:

Rectangles depict classes.
- The class name appears at the top (in italics if the class is abstract).
- Any attributes appear below in a separate compartment.
Lines between classes represent association relationships.
- Association ends may be labelled with "rolenames".
- Association ends may be annotated with numbers indicating how many objects may have this association with instances of the class at the other end. '*' means "zero or more".
- An open arrowhead indicates that the association can be traversed in one direction only: the class at one end knows about the association, but not the class at the other end.
- An "ordered" constraint means that there is an order defined on the set of objects at that association end that are related to a common object at the other end. In implementation terms this means that the association end is represented by a list data structure within the class at the other end, rather than a set.
A line with a closed arrowhead represents generalisation, with the arrow pointing to the more general class.
The name of any model element may be preceded by a forward slash character, indicating that the element can be computed from other elements in the model. In Figure 1 the association ends parent, son and daughter are annotated in this way.
The dog-eared rectangle in Figure 1 contains constraints on the class Person expressed using the Object Constraint Language (Warmer and Kleppe 1998). OCL can be used in conjunction with UML in order to constrain the possible models of a specification in ways that cannot be achieved using the UML structural elements alone. In this case the constraints specify how parent, son and daughter are related to mother, father and child.

UML contains many other modelling constructs besides those mentioned here. An overview of the full language can be found in Booch et al. (1998).

2.2 Knowledge as an object diagram

Figure 2 presents some knowledge about a particular family in the form of an object diagram.

Figure 2. Family knowledge as a UML object diagram

In object diagrams, rectangles denote objects, specifying their class (after an optional name and a colon) and the object's attribute values. The lines between objects show 'links': instances of associations between classes.

3 Online processing of knowledge in UML

While Figures 1 and 2 provide a convenient way to view the family ontology and some particular information about the Smith family, a standard serialised format is needed to enable the information to be made available online and shared between computer applications. Furthermore, to facilitate the development of applications that access and process knowledge from the Web, it is desirable for application programmer interfaces (APIs) to be developed for common programming languages such as Java. These technologies already exist for models expressed as UML class diagrams, but there is currently no convenient way to serialise and process object diagrams.

The OMG's XMI specification (OMG 2000b) defines a mechanism for serialising a UML model as an XML document. This is associated with the OMG's Meta Object Facility (MOF) (OMG 2000a) which defines a 'meta-meta-model' for defining modelling languages such as UML. In the MOF terminology, a modelling language such as UML is a meta-model, allowing models (e.g. ontologies) to be defined. XMI specifies how a model stored in a MOF-based model repository can be represented as an XML document. This document is encoded in terms of a document type definition (DTD) generated automatically from the MOF meta-meta-model definition of the modelling language (the meta-model) used. In particular, there is a DTD for UML and UML models can be serialised as XML documents in terms of this DTD.

XMI is a suitable format for the serialisation of ontologies expressed as UML class diagrams. Furthermore, there are APIs existing (NSUML) and under development (JCP 1999) for convenient processing of the information contained within XMI documents. However, XMI does not provide a good solution for serialising object diagrams representing knowledge. The XMI format was designed to allow the interchange of arbitrary UML models. As such, it is based on an XML DTD encoding concepts from the UML meta model such as class, attribute, association and association end (for class diagrams) and object, attribute binding, link and link end (for object diagrams). Therefore, an XMI encoding of the object diagram in Figure 2 would contain 24 separate but interrelated elements corresponding to the different components of the diagram. It would be much more convenient to have an encoding that is specialised to the ontology (i.e. class diagram) used so that the object diagram could be serialised in a form containing elements corresponding to individual Man and Woman objects.

For the work described in this paper it was decided to map class diagrams to schemas expressed using RDF. RDF is a language based on resource-property-value triples designed for expressing statements about resources on the Web (or anything that has an associated uniform resource identifier). An RDF model consists of a set of triples, each of which can be regarded as a simple logical proposition about a resource.

RDF Schema (W3C 2000a) is a set of predefined resources and relationships between them that define a simple meta-model including concepts of class, property, subclass and subproperty relationships, a primitive type 'Literal', bag and sequence types, and domain and range constraints on properties. Domain schemas (i.e. ontologies) can then be expressed as sets of RDF triples using the (meta-)classes and properties defined in RDF Schema.

To generate RDF schemas from UML class diagrams and to allow instances of these schemas (i.e. knowledge) to be expressed in RDF and 'marshalled' and 'unmarshalled' between RDF and in-memory data structures, two mappings were defined:

From UML class diagrams to RDF schemas
From UML class diagrams to sets of Java classes. These classes correspond to the classes in the class diagrams and also include code to marshal and unmarshal instances of the ontologies to and from RDF representations of that information (using the XML syntax for RDF).

The following subsections present RDF diagrams corresponding to the family ontology and family information shown in Figures 1 and 2 that were generated according to these mappings. The XML encodings of these RDF models are also presented.

3.1 The family ontology in RDF

Figure 3 shows the RDF schema generated from the UML class diagram in Figure 1. Following the usual RDF conventions, ellipses denote resources and are labelled with their uniform resource identifiers (URIs), rectangles denote literals and arrows denote properties.

Property labels
t = rdf:type
s = rdfs:subClassOf
d = rdfs:domain
r = rdfs:range
et = rdfsx:collectionElementType

Name space abbreviations
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs = http://www.w3.org/2000/01/rdf-schema#
rdfsx = http://nzdis.otago.ac.nz/0_1/rdf-schema-x#
f = any new namespace chosen for this schema

Figure 3. Family ontology RDF schema

An RDF/XML encoding for this schema is presented in Appendix 1.

In mapping from a class diagram to an RDF schema it was not a goal to express all details of the model, only enough to facilitate serialisation of model instances. For example, the generated schema in Figure 3 does not capture the information that a person has exactly two parents or that a person's name is expressed as a string. This would require extensions to RDF Schema and is not required because knowledge of these relationships is built into the marshalling code generated for the Java Person class. If an application needs access to a full description of an ontology it can access the XMI description directly using an API for XMI such as NSUML or the forthcoming JMOF (JCP 1999). Also, OCL constraints are ignored. At present these are considered to be documentation for people implementing systems that must conform to the ontology, rather than (for example) inference rules that may be used by an application for reasoning about knowledge.

However, it was decided to make one extension to RDF Schema: the property collectionElementType that has been declared in the namespace http://nzdis.otago.ac.nz/0_1/rdf-schema-x#. RDF bag and sequence types are used as the range of properties that correspond to association ends with multiplicities greater than one (with the choice between them depending on the presence or absence of the UML "ordered" constraint). As RDF does not provide a mechanism for restricting the type of elements in a bag or sequence, this new property was introduced to make this important type information available to human readers. However, this addition is not required for the correct marshalling and unmarshalling of information in the RDF format due to the ontological knowledge built into the generated Java classes.

3.2 Family knowledge in RDF

Figure 4 shows the family knowledge from Figure 2 expressed using RDF. In this figure, the three central unlabelled resources represent the three people being described. They are represented here as anonymous resources, i.e. ones without URIs. The other unlabelled resources are the anonymous bag representing the parents of Susan Smith and the anonymous sequences representing (respectively) the children and daughters of John Smith and the children and daughters of Mary Smith. Note that the four bags all have the same singleton element: the resource representing Susan Smith.

Property labels
t = rdf:type	s = f:Person.son
1 = rdf:_1	p = f:Person.parent
2 = rdf:_2	c = f:Person.child
n = f:Person.name	m = f:Person.mother
fr = f:Person.father	d = f:Person.daughter

Name space abbreviations
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
f = namespace corresponding to Figure 3

Figure 4. Family knowledge in RDF

An RDF/XML encoding for this information is presented in Appendix 2.

4 Implementation of the mappings

The mappings from UML class diagrams to RDF schemas and sets of Java classes were implemented using two stylesheets in the XSLT language. The inputs to the stylesheets are XMI encodings of class diagrams (using XMI version 1.0 for UML1.3-the export format supported by the CASE tools Argo/UML 0.8 and Rational Rose 2000 with the Unisys XMI add-in). To be precise, the stylesheet is applied not to an encoding of the diagram itself, but to the model-the declarations of classes and the relationships between them-that is encoded in the XMI document.

XSLT is a language for transforming XML documents into other documents. An XSLT stylesheet is comprised of a set of templates that match nodes in the input document (represented internally as a tree) and transform them (possibly via the application of other templates) to produce an output tree. The output tree can then be output as text or as an HTML or XML document.

Starting from an existing stylesheet for displaying class information from an XMI file as a table in HTML (Objects By Design 1999), this was first updated to work with XMI 1.0 files based on the UML 1.3 (rather than 1.1) meta-model. The resulting stylesheet was then converted and extended to produce XMI to Java and XMI to RDFS stylesheets.

4.1 UML to Java

The mapping between UML class diagrams and Java is mostly straightforward: classes and interfaces map to their equivalents in Java and attributes and associations map to Java fields, which may be implemented using a Java list or set depending on the multiplicity and the presence or absence of an "ordered" constraint. Note that in UML at present the ordered constraint can be applied to association ends but not attributes with a multiplicity greater than one. The mapping currently assumes that the UML model uses the OCL primitive types Boolean, Integer, Real and String, and these are mapped to the corresponding Java class types (with Real mapping to Double) instead of the Java primitive types. This allows the possibility of a null value for fields representing attributes or association ends with a multiplicity range that includes zero. Operations declared in the UML classes and interfaces are also declared in the corresponding Java classes and interfaces, and an empty method body is generated.

Java does not allow multiple inheritance and the stylesheet does not do any restructuring to avoid this-it passes multiple inheritance through for the Java compiler to detect.

Although all navigable association ends, or 'roles', are labelled in Figure 1, these may be omitted from class diagrams. In the case where role names are missing, default names are generated based on the OCL conventions for specifying navigation paths in class diagrams. In particular, the name of the class next to an association end is used as a default role name, with the initial letter in lower case. Any ambiguity due to the lack of role names (e.g. in the case of unlabelled reflexive associations) is not detected and will be caught by the Java compiler when declarations of multiple fields with the same name are encountered. The visibility of attributes and association ends specified in the UML model is respected, but for the purposes of modelling languages it is expected that the designer will make these all public.

Finally, the generated class, interface or field names are checked against a list of Java reserved words and an underscore character is prepended if necessary to avoid a Java compilation error.

A parameter to the stylesheet specifies whether class constructors should be generated. If this option is turned on, the constructors will contain parameters corresponding to all attributes and composition links associated with the class (either inherited or declared in the class).

The output of the stylesheet must be postprocessed to produce separate Java source files for each class and interface.

The generated Java classes include marshalling and unmarshalling methods to serialise object-oriented information as RDF documents using its XML encoding. The aim in this code was to obtain high performance by avoiding the need to reflect on the ontology (using Java reflection or by accessing the XMI or RDFS versions of the ontologies). Instead the marshalling methods explicitly marshal each field in the class that corresponds to an attribute as well as fields corresponding to composition relationships. The actual serialisation to and from RDF is performed by a utility class that uses an existing RDF Java API. The RDF streams produced and consumed by these methods make reference to resources in the RDF encoding of the ontology that is produced by the mapping described in the next section.

As object-oriented data structures consist of a network of interlinked objects, when serialising in-memory knowledge structures to RDF format the marshalling code needs a way of determining which related objects should be serialised along with the particular object of interest. The entry point to the marshalling process is a method of a MarshalHelper class which takes a collection of objects as a parameter. These objects must be instances of one of the generated classes. All objects in the collection are serialised, together with any links between them and any other objects related to these by instances of composition relationships (essentially part-whole relationships). The first element in the collection is assumed to be the main object of interest and the marshal method returns the URI to the serialised form of this object. Cranefield (2001) gives more information about the marshalling framework.

4.2 UML to RDFS

The XMI to RDFS stylesheet produces an RDF schema (using the XML encoding) from an XMI document.

One issue that had to be addressed was the problem that RDF properties are first class entities-they are not defined relative to a class. Therefore a given property cannot be defined to have a particular range when applied to objects of one class and another range when applied to objects of a different class. Various solutions to this have been discussed in the www-rdf-interest mailing list (Haustein 2000). The option chosen in this work is to create properties with names of the form Classname.FieldName so that these are unique for each class.

Another issue was that RDF Schema has no notion of an interface. Instead, UML interfaces are modelled as classes, and realisation relationships between classes and interfaces are modelled as subclass relationships.

5 Limitations

There are some limitations with the current implementations of the XMI to Java and XMI to RDFS mappings. The stylesheets currently handle basic class diagram elements as well as association classes, 'ordered' constraints on association ends and one-way associations. However, n-ary associations, qualified associations, features with class scope and enumeration types are not supported. There is also no mapping of UML namespaces to Java packages. Associations ends are represented by single-valued or set- or list-valued fields depending on their multiplicity. The generated Java classes also distinguish between optional and mandatory fields. However, multiplicity ranges that are specified in finer detail cannot be represented precisely. There are other advanced class diagram features that are also not addressed, but these are considered to be less useful for ontology modelling.

Other approaches to ontology modelling are based on formalisms such as description logic (Donini et al. 1996) which have well defined semantics. In contrast, UML is defined in terms of a meta model with the meaning of the elements described in plain English. However, most of the basic elements useful for ontology modelling are clearly, albeit informally, defined. The lack of precise semantics for UML is being addressed by the Precise UML Group.

Traditional approaches to ontology modelling also allow varying forms of inference to be performed on ontologies and on knowledge encoded in terms of ontologies. Although UML does not currently have this facility, OCL constraints can be used to express logical implications (using an object-oriented syntax). It should be possible to identify particular patterns of OCL constraints forming an inference language that supports tractable reasoning.

6 Related work

A number of other projects have investigated the serialisation of instances of ontological models.

Skogan (1999) defined a mechanism for generating an XML document type definition (DTD) from a UML class diagram and implemented this as a script for the UML-based modelling tool Rational Rose. This is being used for the interchange of geographical information. The mapping is only defined on a subset of UML and many useful features of class diagrams, including generalisation relationships, are not supported.

Work has also been done on producing DTDs (Erdmann and Studer 1999) and XML schemas (Klein et al. 2000) from models expressed in ontology modelling languages (Frame Logic and OIL respectively). The latter work reported that the XML Schema notion of type inheritance does not correspond well to inheritance in object-oriented models, which was a factor in the choice of RDF as a serialisation format in the research described here.

Since its initial design, OIL has been redesigned as an extension of RDFS (Broekstra et al. 2000). This means that an ontology in OIL is also an RDF schema and therefore knowledge about resources in a domain modelled by an OIL ontology can easily be expressed using RDF.

The DAML project is defining an ontology modelling language based on OIL and other prior languages. The UML-based Ontology Toolset project (UBOT) at Lockheed Martin is working on tools to map between UML and DAML representations of ontologies.

The InterDataWorking Web site provides a number of 'gateways' that can be used to convert between different data formats. One particular 'gateway stack' can be used to produce an RDF schema from an XMI document, although no information is given about the mapping and how much of UML is supported. The resulting schema is defined using a mixture of properties and (meta)classes from RDF Schema (such as rdfs:subClassOf) and from Sergey Melnik's RDF representation of the UML metamodel. The schema defines properties and classes that can be referenced when encoding object information in RDF, and could itself be used as an alternative to an XMI encoding for publishing and serialising an ontology modelled using UML. However, as XMI is an OMG standard for model interchange, it is being supported by an increasing number of tools and APIs and there seem to be few advantages in using a different format for encoding UML models. If it is required to annotate an ontology with additional information that is not part of the XMI format (one of Melnik's desiderata) this could be achieved using external annotations and XLink.

Xpetal is a tool that converts models in the 'petal' output format of the UML-based modelling tool Rational Rose to an RDF representation. No details are provided about the mapping from UML to RDF and which UML features are supported.

There is interest within the FIPA agent research community in using object-oriented concepts for representing ontologies and information, and also in the use of XML for representing message structures.

The JADE agent platform allows ontologies to be defined using a frame-based language. An ontology is defined at run time by constructing an ontology object and adding frames to it. A Java class can be associated with a frame so that an application can create instances of the frame as Java objects and insert them in the message content. The JADE messaging system then handles the serialisation of the objects, which can be customised by users for particular content languages.

Botelho and Ramos (2000) have proposed extensions to the FIPA agent communication language to allow object-oriented ontology definitions to be communicated between agents in a propositional format and information about objects to be exchanged using three new proposed speech acts.

Bergenti and Poggi (2000) have also discussed the use of ontologies in UML in the context of multi-agent systems.

Haustein (2001) advocates the use of the XML-based Simple Object Access Protocol, SOAP (W3C 2000b) serialisation format for encoding agent messages based on object-oriented ontology and communication language models.

One of the aims of the XML Protocol Working Group is to design a "mechanism for serializing data representing non-syntactic data models such as object graphs and directed labeled graphs, based on the datatypes of XML Schema". When completed, this, like SOAP, will provide an alternative method for serialising object-oriented information on the Web and within agent-based and other types of distributed systems.

Emorphia Ltd has announced it is extending the FIPA-OS agent platform to make use of the Zeus Java library for XML data binding (Reinhold 1999) to provide built-in marshalling and unmarshalling of agent messages between in-memory object structures and XML documents. This requires XML schemas corresponding to the communication and content languages and the ontologies used. To allow the use of a higher-level modelling language such as UML, it would be necessary to define a mapping between that language and XML Schema.

The work reported in this paper was influenced by the concept of XML binding, and could be described as a "UML binding" facility for Java whereby UML descriptions of data (object diagrams) can be transferred between a Java object encoding and a serialisation format by a single method call. This work has also been applied to agent communication by defining agent communication and content languages as well as ontologies in UML (Cranefield et al. 2001). Java classes can then be automatically generated to provide an API for creating and serialising agent messages.

Another project involving the generation of programming language source code from XMI files is XSL4XMI. This project aims to develop a set of XSLT spreadsheets for generating source code in various programming languages from an XMI input document. No code or documents have yet been released.

7 Discussion

7.1 UML and Web architecture

As the Web has developed its own set of architectural requirements and solutions independently of other disciplines such as software engineering, it is a natural question to ask how well the paradigm of object-oriented modelling, and UML in particular, is suited to representing Web resources and concepts.

On the Web a resource is identified by a uniform resource identifier (URI). This combines the separate notions of object identity and object reference used in object-oriented systems. In an object-oriented system there may be multiple distinct references to a single object. In practice this is also possible (and common) on the Web. For example, at the time of writing, http://www.w3c.org/ and http://www.w3.org/ identify the same resource. However, although object identities are not usually available for direct inspection in implementations of object-oriented systems, there is usually a built-in function for comparing two references to determine whether they refer to the same object or to different objects (which may just happen to have the same attribute values). This is not possible on the Web. Therefore, discussions about Web architecture commonly refer to "the" URI for a resource. In the absence of any application-specific logic to implement a notion of resource identity it must be assumed that distinct URIs refer to distinct resources.

This difference between the concepts of identity on the Web and in object-oriented systems does not limit the suitability of UML for expressing ontologies related to the semantic Web. UML assumes that objects have an identity that is implicitly provided by the underlying implementation infrastructure and which does not need to be included as an attribute in the model. UML makes no commitment about the nature of this identity or the way in which links between objects are implemented. It is therefore suitable for modelling information resources on the Web.

An important consideration in designing standards for the Web is that "anyone can say anything about anything" (Berners-Lee 1997). Due to the dynamic, distributed and largely anarchic nature of the Web, the ability to make statements about a resource cannot be restricted (by technical means at least, although there may be legal constraints). One outcome of this is the representation of properties as first class objects in RDF (discussed in Section 4.2). In RDF it is possible to make statements about the properties of a resource without knowing what class(es) it belongs to. A UML object diagram representation of knowledge does not share this design feature. However, this does not disqualify UML from consideration as an ontology modelling language any more than other proposed languages. It is the very nature of an ontology that it is a published vocabulary that particular applications may choose to use when describing resources. For those applications to use the ontology effectively they must know (rather than guess) at least a subset of the concepts defined in the ontology.

Although ontologies are controlled vocabularies, however, this doesn't prevent anyone saying anything about anything on the Web while using ontologies. The same resource may be described by one application using a particular ontology and by another application using a different ontology. For example, the Woman object with name "Susan Smith" described in the object diagram in Figure 2 might be described as an Employee object by another application using a different ontology. This is not a problem if these applications have nothing to do with each other. However, if they need to merge their information there is a need for a mechanism to indicate that the Woman object and the Employee object are descriptions of the same resource. One possible solution is to use one of UML's extension mechanisms, stereotypes, to define a 'virtual' extension of the UML meta-model:

Define a UML class called URI with a string-valued attribute called name.
Declare Resource to be a stereotype of the meta-class Class. This means it is a special type of class with additional semantics.
Define the semantics of Resource as follows: any (model-level) class annotated with Ãƒ'Ã‚Â«resourceÃƒ'Ã‚Â» is constrained to have a zero or one to many association with the class URI.

It will then be valid for object diagrams to include links between URI objects and objects belonging to classes with the resource stereotype. In the example discussed above, the Woman object and the Employee object would each have a link to an object representing that resource's URI.

This idea has been explained elsewhere in the context of modelling FIPA agent messaging concepts using UML (Cranefield et al. 2000).

7.2 Choosing between UML and other ontology modelling languages

A number of different languages for ontology modelling have been proposed to date. Why is there a need for another one? Apart from the benefits of UML for ontology modelling discussed in Section 2, the marshalling framework described in this paper provides a convenient mechanism for creating and serialising knowledge structures using an object-oriented API (Java in this case, although it could be easily adapted to other object-oriented languages). When a networked knowledge organisation system is designed using the object-oriented paradigm, UML would be a good candidate for representing ontologies and knowledge. For systems built using other techniques, more traditional ontology and knowledge representation formalisms might be better suited.

One current weakness of UML is its lack of support for inference. However, not all knowledge organisation applications necessarily require the ability to perform inference. Whether or not inference is required may depend as much on the architecture chosen for a particular knowledge organisation application as it does on the problem being solved.

Consider the problem of distributed information retrieval. One important issue to be addressed in this type of application is the conversion of information from one ontology to another so that the results of queries to resources described by different ontologies can be combined. Various architectures have been proposed for solving this problem. One example is the SIMS system (Arens et al. 1996). This is built on top of the LOOM knowledge representation system, and therefore makes full use of LOOM's inference capabilities. An alternative architecture is used in the NZDIS project (Purvis et al. 2000) in which separate ontology translation agents may specialise in translating between different sets of ontologies and are located dynamically by querying a broker agent. These agents may be implemented using inference techniques, or may in fact contain hard-wired code or fixed rewrite rules.

8 Conclusions

This paper has illustrated the use of UML for representing ontologies and knowledge about particular instances in the domains modelled by those ontologies. As a widely known and supported modelling language, UML has great potential for describing Web resources in a machine accessible way. Although the XMI specification defines a standard way of serialising UML models, this does not provide a convenient way of serialising knowledge in the form of object diagrams. As a solution to this problem, a pair of mappings from XMI encodings of UML class diagrams to Java classes and to RDF schemas have been defined and implemented using XSLT. The Java classes include code to marshal and unmarshal knowledge expressed as in-memory representations of object diagrams to and from RDF documents in the XML encoding.

There is considerable interest in developing bridges between standards from the domains of object-oriented modelling, distributed object computing and Internet computing (Jagannathan and Fuchs 1999). It is hoped that the specification of mappings such as the ones described here will be addressed by a recognised industry standards body. To aid this process, the work described here will be made publicly available at http://nzdis.otago.ac.nz/projects.

Acknowledgements

This work was done while visiting the Network Computing Group at the Institute for Information Technology, National Research Council of Canada in Ottawa. Thanks are due to Larry Korba and the NRC for hosting me and to the University of Otago for granting leave and financial support.

References

Arens, Y., Knoblock, C. A., and Shen, W.-M. (1996) "Query reformulation for dynamic information integration". Journal of Intelligent Information Systems, 6(2/3):99-130

Bergenti, F. and Poggi, A. (2000) "Exploiting UML in the design of multi-agent systems".
In Engineering Societies in the Agents World, edited by Omicini, A., Tolksdorf, R., and Zambonelli, F., Lecture Notes in Computer Science 1972 (Springer), pp. 106-113
(an earlier version is available at http://lia.deis.unibo.it/confs/ESAW00/pdf/ESAW13.pdf)

Berners-Lee, T. (1997) "Metadata architecture". World Wide Web Consortium Discussion Document
http://www.w3.org/DesignIssues/Metadata.html

Booch, G. (1994) Object-Oriented Analysis and Design with Applications, 2nd edition (Addison-Wesley)

Booch, G., Jacobson, I., and Rumbaugh, J. (1998) The Unified Modeling Language User Guide (Addison-Wesley)

Botelho, L. and Ramos, P. (2000) "Extending the FIPA ACL language: From object based descriptions to relational representations". In Proceedings of the 3rd Iberoamerican Workshop on Distributed AI and Multi-Agent Systems (DAIMAS 2000)
http://iscte.pt/~luis/papers/ACL_Extend.zip

Broekstra, J., Klein, M., Decker, S., Fensel, D., and Horrocks, I. (2000) "Adding formal semantics to the Web: building on top of RDF schema". In Proceedings of the Workshop on the Semantic Web: Models, Architectures and Management, Fourth European Conference on Research and Advanced Technology for Digital Libraries (ECDL'2000)
http://www.ics.forth.gr/proj/isst/SemWeb/proceedings/session2-2/paper.pdf

Cranefield, S. (2001) "UML and the Semantic Web". Discussion Paper 2001/04, Department of Information Science, University of Otago, New Zealand http://www.otago.ac.nz/informationscience/publctns/complete/papers/dp2001-04.pdf.gz

Cranefield, S., Nowostawski, M., and Purvis, M. (2001) "Implementing agent communication languages directly from UML specifications". Discussion Paper 2001/03, Department of Information Science, University of Otago, New Zealand
http://www.otago.ac.nz/informationscience/publctns/complete/papers/dp2001-03.pdf.gz

Cranefield, S. and Purvis, M. (1999) "UML as an ontology modelling language". In Proceedings of the Workshop on Intelligent Information Integration, 16th International Joint Conference on Artificial Intelligence (IJCAI-99)
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-23/cranefield-ijcai99-iii.pdf

Cranefield, S. and Purvis, M. (2000) "Extending agent messaging to enable OO information exchange". In Cybernetics and Systems 2000, Proceedings of the 5th European Meeting on Cybernetics and Systems Research (EMCSR 2000), Trappl, R., editor (Vienna. Austrian Society for Cybernetic Studies)
An earlier version is available http://www.otago.ac.nz/informationscience/publctns/complete/papers/dp2000-07.pdf.gz

Cranefield, S., Purvis, M., and Nowostawski, M. (2000) "Is it an ontology or an abstract syntax? Modelling objects, knowledge and agent messages". In Proceedings of the Workshop on Applications of Ontologies and Problem-Solving Methods, 14th European Conference on Artificial Intelligence (ECAI 2000)
http://delicias.dia.fi.upm.es/WORKSHOP/ECAI00/16.pdf

Donini, F., Lenzerini, M., Nardi, D., and Schaerf, A. (1996) "Reasoning in description logics". In Principles of Knowledge Representation and Reasoning, Brewka, G., editor, Studies in Logic, Language and Information (CSLI Publications), pp. 193-238

Erdmann, M. and Studer, R. (1999) "Ontologies as conceptual models for XML documents". In Proceedings of the 12th Workshop on Knowledge Acquisition, Modeling and Management (KAW'99). Knowledge Science Institute, University of Calgary
http://sern.ucalgary.ca/KSI/KAW/KAW99/papers/Erdmann1/erdmann.pdf

Genesereth, M. R. and Nilsson, N. J. (1987) Logical Foundations of Artificial Intelligence (Morgan Kaufmann)

Hannappel, P. (2000) Summary of recent discussions about an application programming interface for rdf
http://nestroy.wi-inf.uni-essen.de/rdf/sum_rdf_api/

Haustein, S. (2000) "rdf for object serialization". Start of thread on www-rdf-interest mailing list.
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Feb/0157.html

Haustein, S. (2001) "Semantic Web Languages: RDF vs. SOAP Serialization". In Proceedings of the Second International Workshop on the Semantic Web, Hong Kong, May, to be published
http://semanticweb2001.aifb.uni-karlsruhe.de/

Jagannathan, V. and Fuchs, M. (1999) Workshop report on integrating XML and distributed object technologies". In 8th IEEE International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WET-ICE'99) (IEEE Computer Society Press)
http://www.jeffsutherland.org/xml/IEEE_XML_Report_Draft.htm

JCP (1999) JSR #000040: Metadata API specification. Java Community Process JSR
http://java.sun.com/aboutJava/communityprocess/jsr/jsr_040_mof.html

Klein, M., Fensel, D., van Harmelen, F., and Horrocks, I. (2000) "The relation between ontologies and schema-languages: translating OIL-specifications in XML-Schema". In Proceedings of the Workshop on Applications of Ontologies and Problem-Solving Methods, 14th European Conference on Artificial Intelligence (ECAI 2000)
http://delicias.dia.fi.upm.es/WORKSHOP/ECAI00/7.pdf

Objects By Design (1999) Transforming XMI to HTML
http://www.objectsbydesign.com/projects/xmi_to_html.html

OMG (2000a) "Meta Object Facility specification version 1.3". Object Management Group
http://www.omg.org/technology/documents/formal/meta.htm

OMG (2000b) "XML metadata interchange specification, version 1.0". Object Management Group
http://www.omg.org/technology/documents/formal/xml_metadata_interchange.htm

Purvis, M., Cranefield, S., Bush, G., Carter, D., McKinlay, B., Nowostawski, M., and Ward, R. (2000) "The NZDIS project: an agent-based distributed information systems architecture". In Proceedings of the Hawaii International Conference on System Sciences (HICSS-33). Sprague, Jr., R., editor (IEEE Computer Society Press) (CDROM).
http://nzdis.otago.ac.nz/download/papers/nzdis-project_1-00.pdf

Reinhold, M. (1999) "XML data binding specification". Java Specification Request JSR-000031, Sun Microsystems
http://java.sun.com/aboutJava/communityprocess/jsr/jsr_031_xmld.html

Skogan, D. (1999) "UML as a schema language for XML based data interchange". In Proceedings of the 2nd International Conference on The Unified Modeling Language (UML'99)
http://www.ifi.uio.no/~davids/papers/Uml2Xml.pdf

W3C (2000a) "Resource Description Framework Schema specification 1.0". World Wide Web Consortium
http://www.w3.org/TR/2000/CR-rdf-schema-20000327

W3C (2000b) "Simple Object Access Protocol (SOAP) 1.1". World Wide Web Consortium Note 08 May
http://www.w3.org/TR/SOAP/

Warmer, J. B. and Kleppe, A. G. (1998) The Object Constraint Language: Precise Modeling With UML (Addison-Wesley)

Author details

Stephen Cranefield is a senior lecturer in the Department of Information Science at the University of Otago, New Zealand. His research interests include distributed information systems and agent-based systems. He has a BSc(Hons) in Mathematics from the University of Otago, and a PhD from the Department of Artificial Intelligence at the University of Edinburgh.

Appendix 1. The RDF/XML encoding for the family ontology

The following XML document encodes the RDF triples corresponding to the family ontology (as shown in Figure 3). This was generated by the XSLT script but has been slightly edited for ease of reading. In particular the definition of the entity f was added to provide an abbreviated way of referring to the namespace for this generated schema.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rdf:RDF [
  <!ENTITY f 'http://nzdis.otago.ac.nz/0_1/family#'>
]>
<rdf:RDF xml:lang="en" 
         xmlns:f="&f;"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:rdfsx="http://nzdis.otago.ac.nz/0_1/rdf-schema-extensions#">

<rdfs:Class rdf:ID="&f;Person"/>

<rdfs:Class rdf:ID="&f;Man">
  <rdfs:subClassOf rdf:resource="&f;Person"/>
</rdfs:Class>

<rdfs:Class rdf:ID="&f;Woman">
  <rdfs:subClassOf rdf:resource="&f;Person"/>
</rdfs:Class>

<rdf:Property ID="&f;Person.name">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="rdfs:Literal"/>
</rdf:Property>

<rdf:Property ID="&f;Person.parent">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="rdf:Bag"/>
  <rdfsx:containerElementType rdf:resource="&f;Person"/>
</rdf:Property>

<rdf:Property ID="&f;Person.child">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="rdf:Seq"/>
  <rdfsx:containerElementType rdf:resource="&f;Person"/>
</rdf:Property>

<rdf:Property ID="&f;Person.father">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="&f;Man"/>
</rdf:Property>

<rdf:Property ID="&f;Person.son">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="rdf:Seq"/>
  <rdfsx:containerElementType rdf:resource="&f;Man"/>
</rdf:Property>

<rdf:Property ID="&f;Person.mother">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="&f;Woman"/>
</rdf:Property>

<rdf:Property ID="&f;Person.daughter">
  <rdfs:domain rdf:resource="&f;Person"/>
  <rdfs:range rdf:resource="rdf:Seq"/>
  <rdfsx:containerElementType rdf:resource="&f;Woman"/>
</rdf:Property>

</rdf:RDF>

Appendix 2. The RDF/XML encoding for the family knowledge

The following XML document encodes the RDF triples corresponding to the family knowledge (as shown in Figure 4). The original version of this document was generated via an RDF Java API created by Sergey Melnik, but for presentation here the order of elements and the syntax have been altered to give an equivalent encoding that is easier to read.

The anonymous resources have been represented by names such as man1 and woman1 within the namespace associated with this document. However, the best practice for serialising anonymous resources in RDF is an open question (Hannappel 2000).

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE rdf:RDF [
  <!ENTITY a 'http://nzdis.otago.ac.nz/0_1/family#'>
  <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
]>
<rdf:RDF xmlns:a="&a;"
         xmlns:rdf="&rdf;">

<a:Man rdf:ID="man1">
  <a:Person.name>John Smith</a:Person.name>
  <a:Person.daughter rdf:resource="#seq1"/>
  <a:Person.child rdf:resource="#seq2"/>
</a:Man>

<a:Woman rdf:ID="woman1">
<a:Person.name>Mary Smith</a:Person.name>
<a:Person.daughter rdf:resource="#seq3"/>
<a:Person.child rdf:resource="#seq4"/>
</a:Woman>
      	
<a:Woman rdf:ID="woman2">
<a:Person.name>Susan Smith</a:Person.name>
<a:Person.parent rdf:resource="#bag"/>
<a:Person.father rdf:resource="#man1"/>
<a:Person.mother rdf:resource="#woman1"/>
</a:Woman>
      	
<rdf:Seq rdf:ID="seq1">
<rdf:li rdf:resource="#woman2"/>
</rdf:Seq>
      	
<rdf:Seq rdf:ID="seq2">
<rdf:li rdf:resource="#woman2"/>
</rdf:Seq>
      	
<rdf:Seq rdf:ID="seq3">
<rdf:li rdf:resource="#woman2"/>
</rdf:Seq>
      	
<rdf:Seq rdf:ID="seq4">
<rdf:li rdf:resource="#woman2"/>
</rdf:Seq>
      	
<rdf:Bag rdf:ID="bag">
<rdf:li rdf:resource="#man1"/>
<rdf:li rdf:resource="#woman1"/>
</rdf:Description>
      	
</rdf:RDF>