Dynamic Adaptation of Content and Structure in Electronic Encyclopaedias

Abstract

Adaptive functionality has been applied successfully in many areas ranging from user interfaces to hypermedia systems. Digital libraries and electronic encyclopaedias, however, have rarely made use of the power of adaptation.

In this paper, an approach to include adaptation into encyclopaedic environments is presented. The proposal covers a set of adaptation techniques. They enable the system to explain technical terms and replace domain specific expressions with "plain" words automatically. Moreover, specific terms can be linked to further articles automatedly. Blacklisting, whitelisting and general link alteration are employed in order to assure quality standards and to provide users with more appropriate hyperlinks. With navigation support based on the automatic insertion of trails and suggestions of potentially interesting articles, the users' navigation in encyclopaedias can be facilitated.

A first version has been implemented in project "Alexander" and has been made available to a limited public. The system is based on a traditional client-server architecture, where the server-side components perform the actual adaptation. Details of this pilot project are provided.

Keywords: Content Adaptation, Structural Adaptation, Blacklisting, Whitelisting, User Profiles, Electronic Encyclopaedias, Digital Libraries, Web-Based Applications.

1 Introduction

Electronic encyclopaedias were introduced in the 1980s, and numerous products with varying coverage and focuses exist today. Most electronic encyclopaedias, however, are rather conventional, inflexible, and static with an emphasis on technical aspects such as data storage, information retrieval, and multimedia content. Moreover, many systems are monolithic in that they offer one body of knowledge that has to be appropriate for all readers.

Users, however, are often interested in different aspects of the same article, or they approach the same piece of information from different perspectives (cf., Vester (1978)). An article on the ginkgo tree, for instance, might be read by a passionate gardener and by a person interested in alternative medicine. The gardener might expect references to articles on how to grow the tree adequately, whereas the person interested in medicine might want to know more about its curativeness.

As this example shows, users have varying goals, diverse backgrounds and different degrees of knowledge. With today's largely inadaptable encyclopaedias readers are frequently not satisfied because too little, too much, or inappropriate information is provided. One possible solution is to own several encyclopaedias in order to have the same knowledge at different levels of detail at hand.

In order to counter these deficiencies, we propose adaptive functionality and personalisation for electronic encyclopaedias. Section 2 gives a brief overview of adaptive systems, describes what kind of information can be adapted and how the adaptation can be implemented. The key part of this paper is section 3, where various adaptation techniques for the use in electronic encyclopaedias are presented. Technical aspects are addressed in section 4, and section 5 details a prototype implementation. A number of potential further application areas in section 6 complement the paper.

2 User Adaptive Systems

Traditionally, information providers used to follow a "one size fits all" approach in order to minimise the effort of creation and maintenance, to decrease time-to-market and reduce complexity and expenses. Information, in this context, comprises actual content, organisational structures, user interfaces, network connections, etc. In such static systems, users of an online news service, for instance, are provided with the same news stories—independent of their interests and background. In web-shops, all potential buyers are presented the same featured article of the day. Visitors of online museums are offered the same explanations and the same guided tour. Users of software products have to utilise the same user interface, no matter what their experience is (Brusilovsky and Maybury 2002).

Adaptive environments break up these constraints of static systems and attempt to offer information in ways better suitable for users (Benyon and Murray 1993; Holland 1962). They are capable of providing adapted user interfaces, adapted link structures in hypermedia systems, content adapted to the users' knowledge, their aims, or the capabilities of the devices they use, etc.

We distinguish between adaptation technologies and adaptation mechanisms. Technologies of adaptation determine the types of information that can be adapted, whereas adaptation mechanisms describe approaches used to implement them. The following sub-sections provide an overview of the technologies and mechanisms relevant to incorporate adaptive functionality into electronic encyclopaedias.

2.1 Adaptation Technologies

Many adaptation technologies have been proposed and implemented. Some of the more prominent examples are the adaptation of user interfaces, adaptation of content (content itself is modified), selection of content (which content is presented), the ranking of search results (which content is more relevant), presentation, structural adaptation (alteration of links in hypermedia structures), and navigational support (insertion of useful links).

In order to give a brief introduction to adaptation technologies, the following sub-sections focus on three particular techniques: adaptation of user interfaces, content adaptation and structural adaptation.

2.1.1 Adaptation of User Interfaces

User interfaces, especially graphical ones, dominate the users' work with computers. Most user interfaces allow users to make settings in order to adjust certain aspects of the interface, provide default answers to frequent queries, or simply make the work with a software application more comfortable. A refined approach is to allow users to modify menus, short-cuts, and other user interface elements.

However, user interfaces can be adapted in more sophisticated ways. Microsoft, for instance, adapts user interfaces dynamically in several products (Microsoft 2005). Menus are re-ordered so that frequently used menu items are among the first, and less frequently used ones among the latter items in a menu. Although the intention is to provide easy and fast access to frequently used functions, this technique is inconsistent with the concept of human visual memory. Users can usually remember the position of menu items, icons and other symbols in a two-dimensional space (Marshall and Shipman 1995; Findlater and McGrenere 2004). When the position of menu items is altered constantly even experienced users can feel lost in the user interface. This example shows that not every type of adaptation is sensible in certain application areas.

Moreover, some products make use of "agents" that attempt to identify the users' aims and actively assist them in completing their tasks. Additionally, agents can inform users of simpler ways to accomplish a task (Xiao et al. 2004; Schlimmer and Hermens 1993).

A similar approach is taken in recommendation and decision support systems, where the system makes suggestions to users. An example is the highly complex control software of power plants that highlights the required tools and recommended responses in case of a critical situation (cf., Langley 1997). User interfaces for physically impaired people are another example. The adaptive software attempts to predict what the user is about to do and adapts the user interface to proactively present the tools required for the task.

2.1.2 Content Adaptation

Various aims and requirements can lead to systems that implement content adaptation. E-learning systems are one of the first areas in which this technology was employed. It was recognised in the early stages of computer-supported learning systems that skills and knowledge of the learners vary greatly. Hence, environments that are able to adjust lessons and other learning material to the users' needs were developed (Mödritscher et al. 2004; Pivec and Baumann 2004).

Mobile applications and ubiquitous computing are a field of growing importance for the use of content adaptation. Common demands for mobile devices are small size and light weight, which results in limited capabilities such as small screen sizes, relatively slow network connections, and restricted input methods. Therefore content is adapted to the capabilities of client devices prior to transmission. This means that video clips, for example, are transcoded to lower resolutions and reduced frame rates, and images are transmitted with less colours (Fu et al. 2001; Lum and Lau 2002).

2.1.3 Structural Adaptation

Structural adaptation has its roots in hypertext. One of the fundamental elements of hypertext systems are links between objects, where the structure of a hypertext collection frequently represents a graph with edges (links) and nodes (objects). Structural adaptation corresponds to the alteration of edges in the graph (Stotts and Furuta 1991).

Three basic operations can be distinguished: modification, removal, and insertion of links. When users in a web-based environment select a link that has been altered they are directed to a different resource. A possible application area are e-learning systems, where students may only access the next lesson if they have successfully completed an exam. If they fail the exam the link is adjusted and points to the beginning of the previous lesson.

From a user's perspective, the removal of links from a hypertext structure can be seen as blacklisting (see below). In corporate computer networks, for instance, this technology can be employed in order to prevent users from accessing web-sites of competitors or resources that are not related to their work.

The insertion of links is often used in navigation support systems. Links to potentially significant information that might otherwise have gone unnoticed or links to resources that have been recommended by other users of the system are presented to the reader (Weber and Specht 1997).

2.2 Adaptation Mechanisms

Adaptation mechanisms determine how the actual adaptation is performed by the system. Most adaptive environments accomplish an adaptation in three steps (see figure 1). First, an attempt is made to gather information about the user; the data is stored in a user profile. The data that comprises the user profile can either be entered explicitly by users or can be collected gradually by the system itself. The adaptive system makes use of the data retained in the user profile in order to create a user model. By applying the user model to the original piece of information—a user interface, textual content, a hypertext structure, etc.—an adaptation effect is accomplished.

Figure 1. The common process employed in most adaptive systems to achieve an adaptation effect. The user data stored in the user profile are used to generate a user model that is applied to the original information in order to accomplish the adaptation.

In this paper, we distinguish between three generalised kinds of adaptation mechanisms: static, dynamic, and flexible dynamic adaptation. The differences between these three mechanisms are the generation of user profiles and user models, and the format in which original information is provided. A comprehensive overview of adaptation mechanisms can be found in Brusilovsky (1996).

2.2.1 Static Adaptation

Systems that make use of static adaptation are sometimes denominated adaptable systems. The characteristic of static adaptation is that all possible adaptations are defined explicitly by authors and, in contrast to adaptive systems, retained statically in the system. Moreover, decisions on the use of adaptation are often not made automatically by the system but by the user. Thus, the user has to specify explicitly what is to be adapted and how the adaptation is carried out.

An example are web-sites available in multiple languages. The author prepares a page in various languages, and therefore multiple instances of one information object exist and are stored separately in the system. When a page is to be displayed, readers may choose from the set of languages. Alternatively, the web browser's language settings can be used to make an automated decision.

Similar to hierarchical file systems, the information objects in a static adaptation environment can be depicted graphically as a tree. The tree has several main branches—in our example, one for each language. By making a language preference, the user enters one particular branch of the tree, and the subsequent navigation takes place within this specific sub-tree.

Another example for static adaptation are adaptable user interfaces. Apple's DVD authoring suite "DVD Studio Pro", for instance, offers three pre-defined user interfaces: beginner, intermediate and advanced (DSP 2005). The user interface for beginners includes a basic set of tools that let even unskilled users easily author DVDs. A very high level of abstraction is employed so that users do not have to be familiar with technical details. With the interface for intermediate users, authors have additional tools and options at hand. They can define a number of advanced parameters but have to have a basic knowledge of the underlying technologies. The interface for advanced users makes the full range of tools available. Options and parameters can be modified on their lowest levels, and every aspect of a project can be optimised. Users, however, have to have an in-depth knowledge of the technologies employed.

In this example, the software designer prepares three different, static user interface variants. The user can choose between these pre-defined options. However, once one of the three user interfaces is chosen the system does not further adapted to the user.

2.2.2 Dynamic Adaptation

A more advanced approach to adjusting information to the users' needs is dynamic adaptation. With this mechanism, authors do not define statically which information is to be altered in a given situation, but the software makes an assessment which portion of information is to be adapted. This decision is derived from a number of parameters. In a learning environment, for instance, a user's learning objective together with the information on the categorisation of a lesson and the user's latest test results can be used to determine that an explanation in the lesson content is inappropriate and needs to be adjusted.

In most cases, dynamic adaptation requires a higher level of structure in the information to be specifically customised. Moreover, the information usually has to be constructed or prepared by authors. Depending on the actual implementation, authors might have to give hints on which information can be adapted and which pieces of information are suitable replacements. In other approaches, particular cases and conditions, in which a certain adaptation is performed, are defined. Thus, authors need both experience and special skills in producing information, which makes development and maintenance more demanding.

Another requirement for dynamic adaptation is a user profile for every user of the system. This information is typically provided by the users on the first use of the system in the form of answers to questions about their experience, background, favourites, and aims. This data is retained in the system and utilised for producing a user model that can be applied in the adaptation process. Most systems allow users to modify the settings in their profile in order to reflect changes in their aims, etc.

An alternative to persistent user profiles is the use of ad-hoc profiles. Every time the user wants to employ the system a question such as "What would you like to do today?" is asked. Depending on the user's answer aspects such as the user interface, the results of a database query, or the suggestions of a recommendation system are adapted.

Dynamic adaptation is a popular mechanism that is employed in areas ranging from learner support systems to online help systems, and general hypertext systems (Pivec and Baumann 2003; Moore et al. 2001). Another example for the use of dynamic adaptation are adaptive user interfaces (see section 2.1.1).

2.2.3 Flexible Dynamic Adaptation

Flexible dynamic adaptation can be seen as an enhanced variant of dynamic adaptation. With this approach, the adaptation mechanism itself, the selections of content to be adapted, and the user profiles can be adapted. Hence, an author, for instance, gives hints on which portions of information can be used in an adaptation. The system, however, has the ability to find data more suitable. Users, on the other hand, fill out user profiles, but the system can adjust profiles in order to express the users' preferences in a better way.

Thus, user profiles are not static but dynamic. As with dynamic adaptation, users may be asked to provide basic information when they use a service for the first time, and the system attempts to adjust the profile. Even more advanced systems generate a user profile eventually making explicit contribution of information by the user unnecessary.

Techniques from the field of artificial intelligence are employed in order to implement the adaptive functionality of such systems. Neural networks and machine learning, for instance, are technical foundations of many implementations (Annunziato et al. 2002; Narendra and Parthasarathy 1989).

An example for adaptive adaptation can be found in recommendation systems. When the user submits a query, various selected recommendations are presented by the system. If the users follow any of the recommended answers, it can be assumed that the recommendation was appropriate. Thus, both the correctness of the adaptation algorithm can be confirmed under the given conditions and the user's profile can be adjusted.

A major problem of the flexible dynamic adaptation approach is that it might be difficult to obtain the data necessary to adapt the adaptation process and user profiles. In traditional hypertext environments, for instance, only little information can be gathered automatically—the last document requested by the reader, the time spent viewing the document, etc. However, this information might not be accurate, simply not sufficient to generate a user profile, or expensive to collect or compute (Langley and Fehling 1998).

Therefore flexible dynamic adaptation is used frequently in conjunction with features of dynamic adaptation, i.e., the system attempts to adjust the adaptation mechanism with the feedback manually provided by users. In a database system, for example, the system might ask the user after each query, if the results were appropriate.

3 Adaptation in Electronic Encyclopaedias

Most electronic encyclopaedias such as Wikipedia, the Encyclopæia Britannica, and the Brockhaus Multimedial Premium, one of the most advanced electronic encyclopaedias available to date, offer little or no ways at all do adapt to their users (Wikipedia 2005; Britannica 2005; BMM 2005). Therefore we employ a number of adaptation techniques, some of which are successfully incorporated and tested in various application areas such as e-learning environments or recommendation systems.

In this project, adaptation of both content and structure of articles in the encyclopaedia is utilised, whereas adaptation of the user interface is not considered, yet. Mainly dynamic adaptation mechanisms are used. The adaptation is performed at run-time, automatically, and based on user profiles. In this paper, our focus is on the types of adaptation that can be performed and their use rather than on actual adaptation mechanisms or user modelling.

The functionality we propose is part of a larger system that offers communities tools to work actively with content from electronic encyclopaedias and other digital resources (see section 5 as well as Kolbitsch and Maurer 2006a). The environment is web-based. Articles in the encyclopaedia are HTML documents that can:

contain hyperlinks to articles in the internal repositories;
include multimedia data such as images and video clips;
include references to pages and other content from external resources; and
be organised in a hierarchy of categories and sub-categories.

Not only pages from internal repositories but basically every page available on the World Wide Web and documents from other external resources can be used as the source of the adaptation procedure.

3.1 Replacement, Explanation, and Linking

One of the most significant types of adaptation in our project is the fully automated explanation and replacement of terms combined with linking to appropriate resources. Both explanation and replacement can prove to be useful for expressions such as technical terms and domain-specific words. The article on the human heart in an encyclopaedia, for instance, might contain the expression "angina pectoris". Although this term is familiar for most adults, various groups of readers including school children might not understand it.

Therefore we propose three approaches to the explanation and replacement of terms that suit the varying needs of most users. Explanations are provided as appositions or in brackets and are available on two levels of detail. Users with a certain experience in the domain are offered a more elaborate description of the term (an abstract), whereas for users with no appropriate background including school children only a short description or a synonym is provided. Moreover, hyperlinks to appropriate resources are inserted automatically. When the same article is viewed by an expert user it does not include explanations but only links to specialised knowledge.

In order to illustrate the approach, the phrase "leading to angina pectoris ..." is adapted. The following examples show the results of the adaptation for a school child, an intermediate user, and a domain expert:

"leading to a heart disease ..." where "heart disease" is a hyperlink to an article on diseases of the heart in a children's encyclopaedia;
"leading to angina pectoris, an ischemic disease of the heart causing pain in the chest due to a lack of oxygen supply, ..." where "angina pectoris" is a hyperlink to an article on angina pectoris in a general encyclopaedia;
"leading to angina pectoris ..." where "angina pectoris" is a hyperlink to a list of selected publications and recommended reading including current research results regarding angina pectoris.

In the examples above, two content adaptations are performed. For the novice user, the more general term "heart disease" is inserted as a replacement. In the second example, "an isechemic disease of the heart ..." is used as an explanation. All examples make structural adaptations by including various hyperlinks.

Two questions are crucial for the feasibility of an implementation and need to be answered. Which terms are to be replaced, explained and linked? Where do explanations, replacements, and links stem from?

The solution for the first problem is based on two observations: (1) terms that occur relatively seldom in all articles of the encyclopaedia are usually technical terms, and (2) in many cases technical terms specific to a particular category are not very well known in other domains.

The first finding means, for example, that it does not make sense to explain the relatively frequent word "heart" in an article of a general encyclopaedia, while it is reasonable to describe the term "ischemic". The second approach is more complex, as the following example demonstrates. An article on Puhutukawa trees, for instance, can be found in the category Science and Nature – Biology – Botany – Botany of Australasia. Moiti Island is the home of a variety of Pohutukawa trees and is contained in the article on Pohutukawa trees. The term "Moiti Island", however, is classified in the category History and Geography – Australasia – New Zealand and is not part of the category Botany of Australasia. Thus, this term is part of a distinctly different category and therefore a potential candidate for an explanation.

Our solution to the second problem, the source of the information required for an adaptation, attempts to involve the user community (see Kolbitsch and Maurer 2006a). It is conceivable that the data structure of articles in environments such as Wikipedia, where content is solely developed by the community, is altered. Two additional fields for a short description and an abstract could be appended to every article and could be filled out gradually by the users of the encyclopaedia.

Alternatively, both the short description and the abstract can be generated in a semi-automatic process. In numerous encyclopaedias including Wikipedia, the first sentence of an article often resembles an abstract. Although this information can be extracted automatically, the data has to be proof-read and confirmed by members of the community in order to ensure the accuracy of the information.

3.2 Translation

Translation of expressions in foreign languages is an important aspect that is often overlooked. Many authors, for example, do not want to use translations because foreign-language terms are sometimes more precise or simply established in a domain. Therefore terms such as English words in computer-related articles or French expressions used in articles on cuisine might not be fully comprehended by readers.

Hence we make use of technologies for the automatic translation of terms from foreign languages. In order to enable translations words from foreign languages have to be detected. In a first step, terms that occur infrequently in the categories or in the entire encyclopaedia are determined. If they cannot be found in a dictionary of the current language they are potential candidates for a translation.

For the subsequent actual translation, a simplified approach is taken; more advanced implementations can be found, for instance, in Hutchins (2001) and Ide and Véronis (1998). Terms are reduced to their principal form and looked up in a number of foreign-language dictionaries. The results of these queries are either provided as inline text (in brackets, for example) or in separate windows that can be accessed through hyperlinks.

Although several approaches have been developed automatic translations are usually not accurate (Yarkowsky 1992). Therefore users should be informed that the translations provided are automatic and might not be precise. In environments where users are actively involed, community feedback can be used to improve the performance of the approach. Similar to rating systems, users can decide whether translations are correct. After a certain number of such ratings the system can dispose of inappropriate translations.

3.3 Blacklisting

Blacklisting is a technique that prevents the access to, or the use of, services, resources, content or information defined in an exclusion list. It is usually employed as means of censorship, parental control, or in order to filter unwanted and unsolicited information (Balkin et al. 1999). Especially in environments such as the Wikipedia, where content is largely developed by the community, blacklisting can become necessary to maintain the quality.

Implementations exist on various levels including:

access to networks: physical or logical access to certain networks is not possible;
access to services: only services such as HTTP can be used in many Internet cafés, for instance, whereas file sharing is blocked;
access to resources: URLs to certain service providers on the WWW, for example, are filtered in corporate networks; and
access to information: e.g., certain words and phrases are removed from textual content.

In this project, blacklisting is employed on both the resource and information levels. This means that URLs and phrases can be blacklisted.

Our approach to implementing resource filtering is straightforward. The system retains a list of URLs that are not permitted in the system. Entries in the list can consist of exact addresses or URLs with wildcards. If an article in the encyclopaedia contains a blacklisted URL the link is removed from the article. This strategy makes it possible to prevent users from accessing links to resources that were provided by the community but do not conform to required standards.

For blacklisting on the information level, we propose an advanced, context-sensitive filtering mechanism. Conventional blacklisting removes the word "sex", for instance, from an article, if it is a member of the blacklist. With context-sensitive blacklisting, it is possible to define if a word is mandatorily or conditionally blacklisted. While mandatory blacklisting leads to the same results as conventional blacklisting, the decision if a conditionally blacklisted word is removed from an article depends on the context. The article on "sex", for example, might be part of two categories: biology and psychology. If the word "sex" is used in another article in one of these categories, it is not filtered because it can be assumed that, within the context of the category, the word is necessary, appropriate, and not offensive. Whenever the word is used in articles that are not in these categories it is filtered.

How "broad" the context for conditional blacklisting is depends on the actual implementation. It can be rather narrow by limiting the context to the exact sub-categories of articles, or rather broad by choosing top-level categories. Although it is possible to let authors assign the context for words on the blacklist manually, the cost for doing so is most likely too high.

3.4 Whitelisting

In consideration of the sheer number of documents and resources available on the World Wide Web and the amount of information produced every day, it is almost impossible to maintain lists of all unwanted resources (Zakon 2005). Hence, blacklisting is not always an ideal approach and whitelisting can be favourable.

With whitelisting, a list of all resources that may be accessed is retained. This technique can be used to enforce rather restrictive control of external resources in order to be capable of maintaining quality standards. However, it also enables parental control and can be particularly valuable when encyclopaedias are used in learning environments (Lennon and Maurer 2003). In this case, linking to external material can become problematic, and whitelisting can be employed efficiently by allowing access only to a small number of accredited external content providers.

Although from an ethical perspective this notion is worrisome, it should be mentioned that both blacklisting and whitelisting can be used to prevent users from accessing content and services provided by critics and competitors.

3.5 Link Alteration

The automatic alteration of links during run-time can be used to provide better explanations and offer more appropriate further articles. In this context, links do not only comprise hyperlinks in text documents but also links to inline images, sound files and similar media documents. As such, link alteration can basically be seen as generalisation of blacklisting and whitelisting. While these two techniques are used to disable links that meet certain criteria, general link alteration is employed to modify links based on a given set of rules.

An article on heart diseases, for instance, contains a hyperlink to the article on echocardiography. Depending on the user's skills that are retained in the user profile, the link can point to the corresponding article in various sources such as a general encyclopaedia or a children's dictionary. Based on the user's preferences, hyperlinks can also point to different areas of interest. For an electrical engineer, for example, the term echocardiography could be presented as a hyperlink pointing to a detailed description of the technical design of the apparatuses available, whereas for a medical doctor the same link could point to an article describing cases in which echocardiography is an appropriate diagnostic measure. In these examples primarily hyperlinks to other articles are modified.

For an implementation, the system has to determine both the experience of users in various categories (e.g., beginner in history) and the level of complexity of resources in the system. Information on the users' skills might be part of the users' profiles. The content source (e.g., children's encyclopaedia), on the other hand, can be an indication for the complexity of resources.

When links point to resources whose levels are more advanced the system attempts to find appropriate resources matching the users' skills. If such replacements can be found links are modified, otherwise they remain unaltered. Alternatively, links can be removed (cf., blacklisting) in order to avoid confronting users with unsuitable information.

This strategy is, of course, not only applicable to hyperlinks that point to further articles or external web pages. It can also be employed for links to other resources such as inline images, multimedia animations or video clips. However, this approach has to be evaluated extensively in order to avoid unwanted results.

3.6 Navigation Support

Adaptive systems can enable the implementation of enhanced navigational aids. We utilise trails, an established method for presenting popular or pre-defined paths through hyperstructures (Bush 1945; DeRoure et al. 2001).

This mechanism usually consists of a tracking module and a navigation component. The tracking module collects information on the users' navigational behaviour in a hypertext system. It retains weights for every actual connection C_AB between any two nodes A and B, and whenever the user navigates from A to B the weight of C_AB is incremented. As an alternative to the dynamic generation of trails, they can be defined manually by authors as ordered series of edges in a directed graph.

Figure 2. A trail, where the user has already viewed the articles on heart diseases and angina pectoris. The suggestions for further articles are "atheroschlerosis" and "ischemia", where the connection from angina pectoris to ischemia is more popular.

When a user views an article the navigation component offers a trail containing (or starting at) the current node. The trail consists of the articles most recently requested by the user and a number of suggested further articles (see figure 2). The selection of articles suggested in a trail is based on the weight of connections starting at the current node. However, more advanced selection mechanisms might also take user profiles into consideration and produce trails that contain further articles that were requested by users with similar experience and preferences.

3.7 Information Encountering

While trails are a graphic depiction of recommended articles and can serve as direct navigational aids, the use of text-based suggestions is a method that can facilitate the accidental encountering of information (Elderez 1997). With this approach, potentially interesting articles are presented to the user in personalised messages such as "Did you know that ..." or "You might also be interested in ...". A user interested in electrical engineering reading an article on heart diseases, for example, might be confronted with the suggestion "You might also be interested in how a cardiac pacemaker works." The links to these articles are generated automatically by the system on the basis of the user's aims and statistical information on the navigational behaviour of all users (cf., section 3.6). Thus, readers are actively supported in accidentally encountering information that might not have been found otherwise.

In addition to this, articles are complemented with a "recently in the news" section. When the user reads an article on, for instance, biochemistry a news service such as Google News is queried in order to find out if biochemistry was in the news lately. If so, references to the most relevant news stories from various news services are appended to the article in the encyclopaedia on-the-fly. This approach makes it possible to provide users with a range of external resources and up-to-date information.

4 Technical Aspects

Before a prototye implementation is presented in section 5 technical aspects including requirements for articles and user accounts are discussed in the following sub-sections.

4.1 Structure and Categorisation of Articles

Articles in the encyclopaedia are stored as HTML pages or in a different, potentially proprietary format that can easily be converted to HTML documents. A particular structure is not required, simple formattings such as "heading" or "paragraph" suffice. Links to other articles or to external resources as well as references to images and similar media objects are included as anchors or can be stored in a link database.

Every article is part of at least one category. Articles can, however, be part of multiple categories and sub-categories. Articles can originate from various sources including general and specialised encyclopaedias, children's dictionaries, or scientific journals. This means that an article on the human heart may exist in several instances, each from a different source.

In order to be able to perform context and user sensitive link alteration (see section 3.5) a level of specialisation has to be assigned to articles. Specialisation is specified in three nuances reflecting the user's skills—beginner, intermediate, and expert. Thus, an article on the human heart could, for example, be categorised as "medicine, intermediate".

Additionally, articles have to have a small set of descriptive metadata including a short description and an abstract attached. These fields are required to enable the replacement and explanation of technical terms (see section 3.1).

The set of data including categorisation, level of specialisation, origin of the content and descriptions is stored in a feature vector. It may be extended with additional metadata when necessary. Authors could, for example, define manually which adaptations are to be applied or that certain articles are not to be adapted at all.

4.2 Profile Matrices for Users

As in many other adaptive systems, the user's knowledge is represented in an overlay model (Brusilovsky 1996). The overlay model uses the same structure as the subject domain, i.e., the user model is based on the same categories and features as the articles retained in the system.

In the proposed system, users can define both their skills and experience, and their aims and interests. These data are retained in two separate user profiles.

A user profile is a structured set of data stored in a matrix. The first dimension of the matrix consists of categories and sub-categories. The second dimension represents the three levels of specialisation (beginner, intermediate, and expert). With this approach, users can define simple attributes such as "beginner in medicine" (profile: skills, category: Medicine, level of specialisation: beginner) or "interest in geography of Australia" (profile: aims, category: History and Geography — Australasia — Australia, level of specialisation: expert).

4.3 Source of Required Data

In many innovative systems, especially those requiring metadata, the source of the required additional information remains unclear. This section highlights this aspect for the most significant types of data in the system.

The metadata for user profiles is provided by the users on the first use of the system. Users give weighted answers to a set of questions, which enables the system to automatically generate profiles. Eventually the system can update the profile by collecting data on the users' behaviour in the system.

In most encyclopaedias, a categorisation of articles is available (Wikipedia 2005; BMM 2005). In addition to this, large encyclopaedic environments such as Xipolis contain articles not only from a single source but from a number encyclopaedias and domain specific dictionaries (Xipolis 2005). In this case, an article on one topic exists in several instances, and therefore information on both the category and the level of specialisation of articles are known.

As detailed above (see section 3.1), generation of abstracts and short descriptions can be based on an automatic extraction from the articles in the encyclopaedia. Human intervention will, however, be necessary in order to ensure the accuracy and quality of the information.

4.4 Performance Issues

Especially for online encyclopaedias, performance becomes an issue. Adaptation must not consume too much time because otherwise the responsiveness deteriorates. Therefore two approaches to increasing the performance are suggested: offline adaptation and delayed adaptation.

With offline adaptation, the data that is required for performing the actual adaptation is extracted and collected when the system load is low (e.g., during the night). The data is stored in an internal cache in a format immediately suitable for the adaptation process. When a document is requested, data is fetched from the cache (rather than from the original documents) and employed for the adaptation.

Recent technologies such as AJAX make delayed adaptation possible (e.g., van Veen 2006). With this approach, initially a largely unadapted document is sent to the user. However, at the same time a separate process for the generation of the information required for adapting and complementing this document is forked. As soon as the information becomes available it is "sent" to the user and inserted in the document on-the-fly. The results generated in this process are stored in the internal cache described above.

5 Prototype Implementation

In an ongoing research project initiated in 2005 at the Institute for Information Systems and Computer Media (IICM) at Graz University of Technology, several aspects of collaboration and adaptation in electronic encyclopaedias are tested with a closed user group of more than 700 individuals. "Alexander" is based on the notion of combining a vast body of encyclopaedic knowledge with contemporary news articles and building an electronic community for developing and maintaining this knowledge base (Presse 2006b).

As a source for encyclopaedic entries, Alexander builds on the Brockhaus Multimedial, a prime German encyclopaedia with more than 185,000 textual entries and about 23,000 images (BMM 2005). These articles are complemented with news stories from the online version of the "Die Presse", a major Austrian newspaper (Presse 2006a). Current news are periodically added to Alexander's database and kept for reference. In addition to this, the members of the community can write new articles and add them to the knowledge base.

A pilot test of the system's core functionality started in September 2006 and is scheduled for at least three months. The test phase is designed to collect data on the acceptance of Alexander's key features and to gain experience in encyclopaedias with both collaborative and adaptive features. New features are gradually added to Alexander.

5.1 User Community

Users of the community are encouraged to create new content and discuss existing information. All content created by users can be set "community editable", resulting in wiki-like articles.

Unlike other collaborative electronic encyclopaedias such as Wikipedia, Alexander employs a hierarchy of expert users for verifying the quality of content generated by the community (cf., Kolbitsch and Maurer 2006a). While "plain" users can generate new content and ask questions (see below), "experts" are also allowed to post answers to questions and decide on the appropriateness of user authored information. In addition to this, "core experts" are also allowed to modify and delete content. Moreover, it is the core experts' task to identify new experts in various domains.

Content authored by expert users is specifically labelled in order to denote a potentially higher quality. However, all users in the environment can rate the content generated by other users and therefore high quality content can also be identified by a large number of positive ratings.

5.2 Information Retrieval

Retrieving information from both the encyclopaedic and news content is another aspect that differentiates Alexander from other encyclopaedias. On the one hand, users can utilise conventional queries to find articles in all parts of the knowledge base. On the other hand, users can submit questions to the system. These questions can either be general or refer to a particular article.

When a question is posted, the knowledge base and a database with previously answered questions are searched for semantically equivalent queries. If such questions can be found, they are presented to the user along with the corresponding answers. If similar questions have not been asked before the natural language search facility of the Brockhaus encyclopaedia is employed for finding appropriate articles. Questions that cannot be answered automatically are relayed to the community, and experts, for instance, have the ability to post new answers (new content) to the knowledge base.

5.3 Adaptive Functionality

The prototype implementation of Alexander builds on the Hyperwave Information Server (Hyperwave 2006) and utilises Hyperwave's link management. This does not only ensure link integrity and prevent broken links, but also facilitates structural adaptation.

Pages in Alexander are adapted to the role of the user and the current contents of the knowledge base. While plain users might only have access to a restricted view of articles, a more detailed listing is generated for expert users. Moreover, the main page of Alexander is adapted to present recently asked questions, modified or newly created articles, and information from the news.

The current prototype version of Alexander also realises information encountering (cf., section 3.7). When articles from the encyclopaedia, news stories, discussions, or user authored articles are displayed they are complemented with similar or related articles from the knowledge base. User can also have further articles and links to internal and external resources displayed that are provided by the community. For performance reasons, delayed adaptation is employed and this information is request asynchronously (see section 4.4).

So far only a subset of the proposed adaptive functionality could be implemented. However, more adaptive features are planned for further releases.

6 Application Areas

Although electronic encyclopaedias are the prime application area for the proposed functionality, there are several other fields that can make use of this concept. In fact, most systems dealing with encyclopaedic knowledge, and digital libraries in general, can benefit from the adaptive features presented.

A database storing user's manuals of technical devices, for example, usually offers the same information to a general audience, no matter what the readers' backgrounds and aims are. Senior citizens, however, might have to be addressed differently than electrical engineers. In such a scenario adaptive features including explanation and linking might lead to a better comprehension of the content.

Domain specific databases that focus on a general audience might be confronted with similar problems. In medical databases, for instance, many different types of users want to look up diseases—not only physicians. Since articles in these archives are often intended for medical doctors they contain numerous technical terms, Latin words, and references to rather specific further articles. In this case, explanation, translation, the automatic insertion of links and redirection of existing links to "plain" resources can facilitate understanding the complex material.

In learning environments such as web-based training or learner support systems, adaptation can be employed to provide more detailed descriptions where needed. This is mainly achieved through explanation and linking. Moreover, with blacklisting and whitelisting it becomes possible to have unsuitable content and links to unaccredited resources removed from learning material (Lennon and Maurer 2003).

7 Conclusion

In this paper, we presented an approach to adaptation in electronic encyclopaedias that makes the implementation of techniques such as automatic explanation of terms, link alteration, and navigational support possible. With this concept, both the content and the structure of articles stored in an encyclopaedic "knowledge base" can be accommodated to the aptitudes of the individual users. Moreover, it is an attempt to provide advanced functionality in electronic encyclopaedias.

A prototype implementation including several features of the proposed concept for adaptation in electronic encyclopaedias is available—project "Alexander". It will show if the adaptive features actually work in encyclopaedias and whether they are accepted by users. Since quantitative feedback is not a available yet, results from the pilot installation of Alexander will be published in an upcoming paper.