Searching and Browsing in a Digital Library of Historical Maps and Newspapers

Searching and Browsing in a Digital Library of Historical Maps and Newspapers

Steve Jones, Matt Jones, Malcolm Barr and Te Taka Keegan
Department of Computer Science
University of Waikato
Private Bag 3105, Hamilton, New Zealand
Tel: +64 7 838 4021
Email: { stevej, mattj, tetaka }@cs.waikato.ac.nz

Abstract

Digital libraries can empower end users through on-line provision of previously inaccessible materials, synergistic integration of related information collections, and tailoring of access mechanisms for target user groups. In this paper we describe the HistoryMap system that supports access to digitised collections of historical maps and newspapers, integrating searching and browsing between the two. We report on our solution to providing place name searching across maps that vary in accuracy, scale and orientation, and how newspaper text is dynamically reconfigured to include hyperlinks to maps containing given locations. Both the user interface and software architecture of the system are described, as are a usability study of the system and discussion sessions with target end users. Although some surface level usability problems were revealed by the study, target users of the system are enthusiastic about its potential.

Keywords

digital libraries, historic archives, cross-collection access

1. Introduction

A defining attribute of digital library systems is that information collections are catalogued, organised and presented in ways that not only match the characteristics of the information, but also those of the intended users. They differ substantially in these respects from other information retrieval systems such as Internet search engines. However, although the creation of focussed collections of documents (which may be text, images, audio, video and other media) has numerous advantages, it can also have the drawback that potential synergies resulting in the integration of material from multiple collections are often overlooked. The New Zealand Digital Library (NZDL) project (www.nzdl.org) has built numerous standalone collections using its open source Greenstone software (Witten et al. 2000). There are collections focussing on topics such as disaster relief management, computer science research, oral history, music videos, indigenous peoples, aircraft photographs and first aid. Although many of these collections combine materials from multiple sources, the end result is a set of monolithic collections. In many circumstances this is advantageous. For example, many of the NZDL's humanitarian collections-such as the Humanity Development Library-are distributed on self-contained CDs to developing countries, and it is more important to maximise the resources available in a single collection than provide multiple collections.

Greenstone and other digital library technologies support distributed networked access to collections ( Bainbridge et al. 2002), and it is entirely possible that inter-collection relationships will exist across different libraries and information providers. Therefore, the most obvious solution of building relationships between collections when they are created will be, in many cases, impractical. This will be because providers do not have control over all existing related collections, and are also unable to predict which related collections will be created in the future. We believe that a better solution is to establish relationships dynamically, at the time of user access. As a result, the linkages can be up to date, and the establishment of relationships may be specified or changed by the user.

Although potentially useful, simultaneous or parallel searching and browsing of multiple collections presents a number of information retrieval and usability issues. Our focus is on the user's experience of such activities. In particular, we are interested in rendering relationships between holdings in collections explicit when the collections differ in the media of the documents within them. The challenge here is to maintain the organisational, presentational and access characteristics that are tailored to each collection, but to also extend them to reveal related information in other collections.

One goal of the NZDL Project is to empower end-users by providing the freely-distributed Greenstone software, and targeted information collections (Witten et al. 2001). For example, the Africa Collection for Transition contains 300 publications of emergency and disaster management in Sub-Saharan Africa; the Virtual Disaster Library, developed in conjunction with the World Health Organisation, contains 25,000 pages of information on disaster management and reduction; and the First Aid in Pictures collection illustrates basic first aid techniques by images only.

In the work that we present here we have worked with two targeted collections. The first, Niupepa ( Apperley et al. 2001), is a collection of historical newspapers published in New Zealand in the Maori and English languages. Maori are the indigenous people of New Zealand, and key aspects of Maori culture are ancestry and tribal association with land. The newspapers report on a critical period in the colonialisation of New Zealand by European settlers, including the land wars that took place over a 30-year period. However, although a major historical and linguistic resource, the newspapers contain very few maps to provide a context for the very large number of geographic references.

Therefore, we have developed a collection of historical maps of New Zealand-ranging from Captain Cook's map of 1770 to a present day topographical map-for use in conjunction with the Niupepa collection. This paper describes an interactive system that allows users to interact with each collection independently, but to also chose that searching and browsing activities in one collection drive corresponding functions in the other, to automatically reveal related items of interest. This automation goes beyond simple automated searches, and dynamically integrates resources from one collection into the other.

In the next section we describe the Niupepa collection, and the significance of land to the Maori people. We then describe the HistoryMap system that integrates use of the Niupepa and map collections, illustrating how it is used, and explaining the underlying implementation. We then discuss how the system relates to other work in the area, and then present the results of an observational study of the system in use.

2. The Niupepa collection

The Niupepa collection (www.nzdl.org/niupepa) is both an historical resource that reports on events in the formative years of colonial New Zealand, and a linguistic resource that uniquely records the language of Maori-the indigenous people of New Zealand-and its evolution in a written form. Most publications are entirely in the Maori language, some entirely in English, and some a mixture of the two. It is a collection of forty-two newspaper titles that were published in New Zealand between 1842 and 1933, and contains 1750 issues. There are 18,000 individual newsprint pages in total.

The collection was originally gathered from dispersed locations in New Zealand and recorded on microfiche, which was then held in the National Library. Although a step forward, this method of storage had two key drawbacks: it was not easily accessible and it could not be searched. The NZDL group began a project to digitise the newspapers, provide a full-text searchable index for them and make them freely available over the WWW.

All 18,000 individual pages were scanned from 35mm photographic negatives of the newspapers. As we might expect with documents of such an age, the image quality has a high degree of variation both in respect of staining and marking on the paper, and the typesetting (one example is extremely high information density). In order to provide a full-text index to the collection, the text of each page was extracted using Optical Character Recognition (OCR) software with an integrated Maori dictionary. The extracted text of each page was checked manually against the original document by fluent Maori speakers, who correct any OCR errors. An index is built, using the text from each page. Hence, a search term entered by a user is compared against the index, which then provides a list of pages containing the search term. These are then displayed to the user in a standard textual result list. A number of attributes are stored for each page: the extracted text, an image of the original page, commentaries, bibliographic information, and where available, English language abstracts.

A user can carry out a full-text search of the collection, and by default is provided with the extracted text of any document that they choose to view. They can additionally choose to view the original page image (an example is shown in Figure 1). The newspapers can also be browsed by series, issue or date. The user interface to the collection can be presented in either the Maori or English languages.

Thumbnail for Figure 1: Niupepa pages

Figure 1. the Niupepa collection. (a) extracted text of a search result document; (b) image of result document in published form.

3. The significance of land to Maori

To Maori people land is everything. They have a deep spiritual relationship with land stemming from the traditional concept of the basic origin of mankind. As direct descendants of the skyfather Ranginui and the earthmother Papatuanuku, they hold the land in the highest of regards. The land is acknowledged in ceremonies from birth, to death; it is not merely a part of life, but life itself. It is a source of spiritual, emotional, cultural and economic sustenance. It provides a basis for identity and a place of belonging.

The relationship between Maori and land is reflected and reinforced in the social organisation. Tribes are located in geographically distinct regions usually based around the landing and early movement of the original voyaging canoes. Although some boundaries have undergone small shifts the general location has remained stable over a remarkably long period of time. Consequently the connection to the land is not only a connection to a specific place and area but it is also a connection to a tribe, to a sub-tribe and to a family. It is a feeling of being home, of being tangata whenua (of that land). And in many cases it is something that has been passed down for up to 30 generations.

The close relationship with the land is illustrated in the many Maori place names that have survived to serve as a record of the passage of generations. The names invoke tribal accounts of mythology and legend, historic scenes of war and of peace, remembrances of achievement and of failure. The names also serve to distinguish places of particular tribal importance, for example a historic pa site (fortified village) or an ancestral urupa (burial location). Consequently, to reside in a tribal area or return to historic place, or to even speak a Maori place name, is an act of paying homage to tribal mana and prestige.

4. The historic map collection

The map collection is sourced from the Map Library of the University of Waikato. The Map Library holds relatively recent topographical maps of New Zealand, but also holds a large number of original and reproduction maps dating back to 1770. A number of the maps had previously been digitised, and we built our initial collection from these, focussing on the north island of New Zealand, and the local Waikato region in particular.

Each map image was processed to automatically produce a map thumbnail and a set of images at different levels of scaling, to support zooming in/out of maps by the user. One approach to identifying locations present on a map would be to attempt OCR. However, as can be seen later in Figure 3, the maps are hand-drawn and annotated, annotations are rotated from the horizontal by varying degrees or follow curved paths, and there are many underlying marks that would prevent accurate text recognition.

Therefore, locations on a map are identified by reference to a separate database that holds the latitude and longitude of tens of thousands of locations in New Zealand. To associate a map with a set of locations present within it, three values are established by hand for each map. The first two estimate the latitude/longitude coordinates of the top left and bottom right extremes of the area covered by the map. The third value estimates the degree of rotation of the map from a true north-south orientation. In most cases this is a relatively small value, but some maps are highly rotated-for example, one map places south at the top and north at the bottom (which incidentally looks completely correct drom a Maori perspective).

5. The HistoryMap system

There are two components to the HistoryMap user interface. The first facilitates searching and browsing of a set of digitized maps, and the second emulates an HTML browser providing searching and browsing access to the Niupepa collection via the standard NZDL-provided interface.

Although during conventional operation the two components are integrated to provide cross-collection multimedia searching, a user may choose to use either independently. In this case the user may prefer to access Niupepa documents using a standard web browser. If a user wishes to solely interact with the map collection they can use the map component only.

5.1 Accessing maps

The map component supports users in either searching or browsing activities. If the user elects to browse rather to undertake a direct search they are presented with a high-level overview map, on which they can click to indicate a particular location of interest. The display is then updated to show a more detailed map of the selected geographic area. The user is additionally supplied with links to other maps of that area, arranged in chronological year order. Each link has an associated map thumbnail and the year in which the map was created. When a link is selected the corresponding map is displayed, giving an alternative historical view of the area under consideration.

When large maps have been segmented into smaller regions, users are presented with navigational arrows that allow them to browse adjacent regions to the north, south, east or west. Such navigation could lead to confusion about the user's context within the wider area supported by the high-level map. Therefore each map segment has an associated high-level overview in which the current location is marked.

Other maps that have not been segmented have been rendered at different levels of detail. To begin, the user is shown the full map within the available window space. The zoom-in tool provides access a higher resolution image (Figure 2), and can be repeatedly used until the most detailed rendering of the map is displayed (at which point the tool becomes inactive). The zoom-out tool has the opposite effect.

Thumbnail for Figure 2: map viewing

Figure 2: (a) overview of selected result map; (b) zoomed in version of result map. Query location is marked by a red circle on both maps.

Users can also carry out direct searches for places of interest by entering place name text into the query box and selecting the search operation. HistoryMap responds with a result list in the form of chronologically ordered map thumbnails (Figure 3). Each thumbnail represents a map covering the geographic area within which the search location is found. The result list may be segmented by location for two reasons. First, there may be multiple locations with the same place name, and so the result list is segmented by each instance of the location, and each segment is labelled with a description of its broad regional location. Second, the query may contain multiple locations, and the result list is segmented by each distinct location (as in Figure 3).

Thumbnail for Figure 3: Map search results

Figure 3: chronologically ordered set of maps resulting from a search for "hamilton rotorua".

The place name database that we use contains many more locations than are labelled on the maps in the collection. Although it may be feasible to gather contemporary maps that contain all names in the database, this is certainly not the case for historical maps. The older that these maps are, the more sparse the place name information they contain, and the more likely that historical names have been superseded since the map was created.

Consequently, place name searches within the map collection are driven by location (defined by latitude and longitude) rather than the occurrence of names within maps. As a result, users can identify the location of contemporary places on historical maps on which they are not explicitly marked. Places searched for by users are marked symbolically on any map by a small icon.

5.2 Accessing historical newspapers

HistoryMap emulates the standard HTML web interface to the Niupepa collection. Users can search and browse the newspaper collection by the methods described earlier, including searching, series browsing and date browsing.

5.3 Integrating maps and newspapers

The primary aim of HistoryMap is to facilitate integrated access to historical map and newspaper resources, and this is achieved in a number of ways. Whenever a user enters an explicit location search in the map component, the same query is issued to the NZDL Niupepa collection over the network (or on the same machine if it is installed locally). Therefore the newspaper component display is updated with the results of a full-text search for the location specified by the user, giving side-by-side access to related maps and newspapers.

This integrated search also works in the opposite direction. When search terms are entered in the newspaper component, the map collection is searched in parallel, resulting in the same side-by-side access. Further integration is achieved by manipulating the content of the newspaper documents. When the user selects a newspaper item for viewing, the document text is processed in order to identify occurrences of place names that appear in the place name database. A small map-link icon is automatically inserted after each place name, and when selected activates a search for the corresponding place name in the map component (the newspaper display remains unchanged). Rollover text gives a query preview for each map-link-the matching places and their general locale are listed (Figure 4). Map-links for place names that were within the query for which the current document was returned are emphasised for ease of identification.

Thumbnail for Figure 4: inserted links

Figure 4: Links are automatically inserted for placenames, with roll-over text (left); result page after link selection.

When investigating events in a particular geographic region it is difficult to determine exactly which places will have been reported in the newspapers as associated with those events. The map component therefore provides an area search facility. The user can drag a selection box of arbitrary size over a map and is then provided with a dialogue which lists all places located within the region that also occur in the place name database (Figure 5). Except for very detailed maps this list contains many more places than are shown on the map. The user may then select items from the list, which are then issued as queries to the newspaper collection.

Figure 5: area query

Figure 5: area-based query and results.

6. HistoryMap system architecture

The HistoryMap system is written as a Java 1.4.1 application, using the Swing user interface toolkit. It essentially comprises two components-one handles map searching and the other newspaper searching-with appropriate communication between the two (Figure 6).

Thumbnail for Figure 6: HistoryMap architecture

Figure 6: HistoryMap system architecture.

6.1 Data sources

The contemporary New Zealand place name database that we use is freely available from Land Information New Zealand (http://www.linz.govt.nz/). It contains information on 57,000 places of a broad range of types, including inhabited areas and areas of geological, geographic or historical interest. The coverage of locations in New Zealand is wide-ranging and detailed but not exhaustive. In addition, this contemporary resource does not track place name changes over the last 200 years, nor does it indicate alternative location names, which is a particular issue in respect of names in the Maori and English languages. Therefore we have exploited a small (66 items) additional list of Maori-English place name equivalents. Each location in the database has the following attributes:

  • location id
  • name of the location
  • easting
  • northing
  • type of location
  • district
  • latitude
  • longitude

A detailed contemporary map of New Zealand has also been sourced in digital form from LINZ. This map has detail of 1:1000000, and in addition to using thumbnails of the entire map, we have segmented it into numerous smaller map squares for presentation in the system. Historical maps have been sourced from the Map Library of the University of Waikato Central Library. Maps used in the system range from Captain Cook's map of New Zealand from 1770, through the 19th century to 1953. As we might expect, these maps vary greatly in their accuracy and detail, and this is one of the very attributes that render them interesting to those that may access them on-line.

Each map has the following attributes:

  • image file name
  • image file name
  • latitude and longitude of the top left corner
  • latitude and longitude of the bottom right corner
  • map orientation (to allow deviation from the standard of north at the top of the map)
  • detail level
  • year of creation

6.2 Data storage

The most basic format of both place name and map data is ASCII text files. The place file contains the attributes of one location on each line, and the map file has the attributes of one map on each line. Whenever the data is amended (locations or maps are added, removed or edited), the two text files are pre-processed, to store them on disk as Java objects representing the run-time data structures that support searching and other types of access.

Two run-time data structures are used. The first is a hash table of locations, where location names serve as the keys. The data associated with each key is an array of strings, where each array item represents one location with the key as the place name. Each string contains the multiple location attributes listed above. 13% of the location database entries occur more than once, and the hash table therefore contains just over 40,000 location names.

The second data structure supports searching in two-dimensions-the latitude and longitude of a location. To do so we have implemented a k-d tree where k=2, and therefore each node has zero, one or two children. The search key (or discriminator) for this structure is dependent upon the level of the tree under consideration. In this case the discriminator will be either longitude or latitude, which we label 0 and 1 respectively. In the general case, the discriminator is identified by level n mod k. For our case the discriminator is either 0 or 1, being 0 at the root node, 1 and level 1, 0 at level 2 and so on. Data comparisons are exactly as with a binary tree. Left- or right-branching is determined by comparison between the current tree node's discriminator and the corresponding value of the node of interest. Left branches are followed when the node of interest's value is less than the tree node value, and right branches otherwise. Our 2-d tree of locations was constructed pseudo-randomly by adding items in alphabetical order. Range searching, or finding locations within an area that is described by the latitude and longitude of its top-left and bottom-right corners, requires about R+logN comparisons, where R is the number of points to find and N is the number of points in the tree. A search for an individual point takes 19 comparisons, on average. Figure 7 shows the first three levels of the tree, with the discriminators emphasised at each level.

Thumbnail for Figure 7: Location data structure

Figure 7: Extract of the location data structure.

6.3 Dynamic link insertion

Links to related maps are inserted into newspaper documents dynamically, after the document text has been retrieved from the NZDL, and prior to its display in HistoryMap. The document text, which is in HTML format, is parsed to identify any text segments that match locations in the place name database. Where matches are found, the HTML content is modified to insert links (and their icons) to relevant holdings in the map collection. The user's search terms are further highlighted in the text.

This on-the-fly modification of the web page markup prior to rendering could potentially impact upon the user's experience in terms of system responsiveness. However, the current size of the database and documents that we are dealing with result in no perceptible delay, even when the system is running on a 500Mhz laptop computer.

An alternative approach would be to embed these links in the documents via an off-line batch process. However, the dynamic approach has a number of strong benefits over this alternative. The source documents do not need to be amended for use in conjunction with the map system, which might render them unsuitable for presentation in other contexts. Also, there is no need to review the accuracy of the links, and possibly amend a large number of documents, whenever the place name database is revised. In fact, completely different alternative databases may take the place of the current one.

Dynamic insertion provides the user with much greater control and flexibility with respect to such annotations. A simple example is that HistoryMap allows users to turn link insertion on and off. Further options that could be easily provided may insert links:

  • for query term items only
  • for locations of a particular set of user-selected types (such as mountains or railway stations)
  • that indicate the number of matching maps
  • that indicate the years of matching maps
  • derived from a particular place name database
  • derived by integrating multiple databases
  • and so on

7. Related work

There has been a number of other geospatial digital-library system investigations; indeed, a recent ACM/IEEE Digital Library conference had a session on map collections, and the May 2004 issue of D-LIB Magazine (www.dlib.org/dlib/may04/05contents.html) focusses on georefercning and geospatial data in the digital library context. Much of the emphasis has been on the technical architectures needed for accessing diverse ranges of resources such as maps, satellite images and models. Take, for instance, the extensive ADEPT framework (Jan and Frew 2002) developed by the Alexandrian Digital Library project. Challenging and vital issues including meta-data definition and development of distributed query and storage schemes are being addressed. With HistoryMap, we have had to face some of these aspects with the added difficulties presented by old maps and documents. Location names change over time, detailed coordinate information is not available in many cases and the mapping accuracies of older maps can be variable.

Other projects have looked at the sorts of interactive facilities a digital library might support. G-Portal, for example, is an architecture for integrating location information with other resources (Lim et al. 2002). One scenario described by its developers has students collecting fauna and flora taxonomic data and linking this to geographic information. Once created, users are able to access the information by both using geospatial queries (e.g. clicking on a map) or by selecting items from the textual classification data. The two sources are synchronised, so if a user selects information using one method, the display relating to the other source is updated. This scheme is similar to ours, then; however, because our focus has been on user issues, we have investigated a richer range of relationships between collections and the forms of interaction these might support. A more recent use of the G-Portal framework has illustrated the role of the sorts of user-centred perspective that we see as important (Theng et al. 2005).

Outside of the digital-library research community, there have been further map user interface investigations. Geographical Information Systems (GIS) are tools aimed at specifically managing complex geo-spatial information. In contrast, the sort of mapping content of interest to us is rather less dynamic and sophisticated. However, it has been noted that GIS need to be made more usable, and experimental interfaces are being evaluated (e.g. Rauschert et al. 2002). Human-computer interaction researchers have also reported techniques for accessing maps including panning and zooming (e.g.Hornbaek et al. 2002).

8. Usability study

We carried out an observational study to gather impressions of how people responded to the HistoryMap system. Our aim was to identify any difficulties in operating the system features. Unlike the focus group observations, discussed in the next section, this exploration considered surface-level, interface issues rather than HistoryMap's value as a resource for target users.

8.1 Subjects

Fourteen participants were involved in the study, all of whom had computing and Web experience. Four of the participants were librarians specialising in New Zealand documents or maps, the others were computer science students. Three had not used a digital library before, and one had used a digital library but not the New Zealand Digital Library. Of the ten who had used the New Zealand Digital Library, four had used the Niupepa collection. All were fluent English speakers.

8.2 Materials

We carried out the study in the single-user suite of a usability laboratory; with video, audio and screen capture capabilities. The HistoryMap software was installed on a 1.6 Ghz Windows-based desktop personal computer, running under Java 1.4.1. The machine's capabilities meant an instantaneous response to user actions, except for networked queries to the NZDL Niupepa collection, where retrieval time was no worse than in a web search engine like Google.

8.3 Comparing coupled and uncoupled collections

Seven participants used HistoryMap with the collections coupled. That is, interactions with one collection window, caused changed in the information presented in the other. The other seven users accessed the two collections via two independent windows: interactions with one collection did not cause any changes in the information presented in the other. That is, there were no automatic retrievals from either collection.

8.4 Procedure

Each subject was given a participant workbook on arrival at the usability laboratory, and directed to read an overview of the study session and a statement of their rights as a participant. Once consent to continue had been given by a subject, they read a brief written summary of the nature and purpose of a digital library, and had opportunity to ask any questions of the investigator. Subjects then completed a background questionnaire, indicating their familiarity with search engines, digital libraries, the Niupepa collection and online access to geographical maps.

The HistoryMap system was demonstrated by the investigator (one of the authors). Each access method for the Niupepa collection (searching, series browsing and date browsing) was described and example tasks demonstrated. The map interface was then briefly described; no detailed feature-based training was given. After these explanations, the subject was given time to explore the system themselves. After this undirected familiarisation, the HistoryMap system was reset to its initial state and two directed tasks were presented to the subject, one after the other.

For each task, subjects were asked to carry out a number of information-seeking steps. They were encouraged to "think-aloud" - to make comments and ask questions - as they proceeded. The first task required them to find information about a particular location in a given year, by first accessing materials in the Niupepa collection and then accessing a related map in the map collection. In the second task, the degree of cross-referring between collections was greater. Subjects were required to find information about locations that were geographically close to a town (Kerepehi), the site of an historical meeting in 1861. This required a search of the Niupepa collection to find a report of the meeting, identifying its specific location. Then, subjects had to find maps that showed that location. Finally, using a present-day map, they were instructed to identify other places near the meeting site and to search for Niupepa articles about one of these locations.

After the two tasks, subjects completed a post-task questionnaire that captured their subjective views on the usability of the system, and its good and bad aspects. The investigator then interviewed the subjects regarding specific events that had been observed during the session.

8.5 Data captured

During the sessions, the investigator sat with the participant, making notes of key events and comments. Each session was also videotaped, with images of the subject's screen and the subject themselves mixed into a single image. The events of interest included:

  • incorrect use of the system
  • software bug
  • observation of a particular user action
  • important user actions
  • recoverable user errors
  • non-recoverable user errors
  • events to be followed up in the post-task interview

8.6 Results

8.6.1 General map browsing and searching

All users found it easy to learn how to use the system, quickly identifying the meaning and purpose of the search location red circle, the timeline style of map search results, map navigation arrows and the overall browsing scheme. However, some features of the interface did confuse users. Eleven (of the fourteen) clicked on maps in an attempt to zoom-in to a more detailed view; the system does not support this function, requiring users to use the toolbar zoom-in and out buttons (see Figure 2). Seven users clicked on the map thumbnail images, available in search and browse lists, in an attempt to view the full-size document. Again, this action is not accommodated; the system requires users to select the textual link alongside the thumbnail.

Some of the more sophisticated features of the system were not used or fully understood by many of the participants. The area select tool (see Figure 5) was only discovered (accidentally) by six users. If a user clicks on a map image, the tool presents a list of locations for a default-sized region around the cursor location; our six users activated the pop-up box when they clicked on a map, attempting to enlarge the image. The area select tool also allows users to specify a map region and view a list of locations within it; none of the users attempted to do this. Additionally, there were some negative comments on the poor legibility of several of the maps which was compounded by the limited zoom capabilities.

8.6.2 Using the map and Niupepa collections together

Seven of the participants used the uncoupled system, with the two browser windows - one for each collection - acting independently. None of these users experienced difficulty in working with these two separate sources to complete the tasks. Just one person indicated a desire to have an automatic way of searching both collections.

With the coupled system, four of the seven users strongly agreed that it was easy to use, while the other three disagreed. The biggest problem for all seven users was the feedback given about the interactions between the two collections. Subjects wanted it made more obvious when information was automatically updated. So, for example, when some users clicked on the map-icons, embedded in Niupepa documents, they waited for something to happen, not realising that the map window had been updated with the new search. Two users also spent time searching for a way to switch collections within the browser windows, assuming that there was a direct way from the browser components to view the alternate collection.

8.7 Discussion

Encouragingly, all users were able to complete the tasks with little difficulty. No users seemed to be daunted by the need to cross-refer between two information sources. This highlights the increasing digital information-seeking competencies and experience of people like our participants. They are used to having several documents available - perhaps editing one while referring to a couple of others sourced from different locations, both locally and via the Internet. It is perhaps unsurprising, then, that they were not challenged by our two source tasks. As we have tried to do with HistoryMap, digital library developers should take advantage of these growing skills, moving away from the predominant single information-source view of access.

All of the interaction difficulties encountered by the participants are easy to address. At the surface-level, there were no major design flaws. Users are very much conditioned by their Web search and browse experiences. Web site developers have long been advised to ensure their new sites conform to interaction designs seen in existing content familiar to their users (Nielsen 2002). Digital library designers also need to heed such wisdom. In our study, participants wanted to interact with the map images in ways common on the web: clicking on a thumbnail to see the full document and expecting a clickable zoom-in function to magnify maps. Many of our participants particularly praised aspects of the system that were like those found in Web-searches.

Conversely, the more sophisticated interactions our system supports, such as the area select tool with its pop-up location lists are not common on the Web, and this perhaps explains why they were not discovered or understood as easily. We could improve the surface-level usability by making the existence of the tools more obvious by, for instance, some on screen instruction or tool icon. However, even without such cues, we would expect real, longer-term, frequent users to easily understand the operation and utility of these features (see the discussion in the next section).

For users to fully appreciate the power of the coupled system, there is a need to give clearer indications of the dynamic relationship between the two browser windows. Users were sometimes unaware of changes as they could not fully view the contents of one of the browser windows. This was because they were able to minimize browser windows, or obstruct the view of one by the other. One option would be to fix the two browser displays so they were always visible. However, users are likely to react negatively to this reduction in flexibility, wishing sometimes, for example, to use the entire display area for a focussed examination of one of the collections.

Alternatively, then, the system could signal updates in ways which users will notice. For instance, if the user searches one collection and a parallel search occurs in the other occluded or non-visible browser, the window or its toolbar icon could flash or be highlighted in some other way. In the case of a user clicking on an embedded map-link in the Niuppea collection, the map collection window could be made visible and active so the user can give their full attention to it.

Other interaction studies of adaptive or intelligent user interfaces, have demonstrated how the usefulness and usability of automated supports can be greatly enhanced when the user understands why and how information is being provided (e.g. Horvitz 1999). Our study identified simple presentation design changes which should substantially improve a user's awareness of the system's operation.

9. Focus group observations

The observational study described above was supplemented by software demonstrations and discussions with focus groups. These groups consisted of Maori academics and students at two tertiary education institutions. In each focus group meeting, the nature and content of the Niupepa and map collections was explained, and the capabilities of the HistoryMap system shown via a demonstration and commentary.

Attendees were invited to ask questions or make suggestions regarding the system. The most striking observation was the almost unanimous desire of each person to immediately try out a query of interest to themself. Commonly this would be either a geographic query (such as a birthplace) or a query about a particular person (such as a great grandparent). Some queries were related to historical events. The system demonstrator would be responding to numerous and continuous "now try this...", "search for this..." requests. Also noticeable was the conceptual ease with which group members made transitions between the two collections, instructing the demonstrator to switch between the two components of the interface. In some cases a sequence of search goals, guided by the information retrieved at each stage, was specified, involving transitions between the two collections.

From discussions, desirable enhancements to the system were identified. The first addresses the limitations of the result list format. Currently the system segments the map result list by locations specified in the location query. When a map is viewed, only a single location (matching the segment its thumbnail was selected from) is marked, even though multiple query locations may appear on the map. Consequently a potential enhancement is to rank a result list according to the number of query locations contained within a map, and to then mark all query locations on the selected map.

The second enhancement would support searching by location type. A search of the map collection would allow results to be restricted to a particular kind of location, such as a pa, lake, town and so on. The database that we use has pre-assigned categories for each location (there are 75 categories in total), and so the system can be extended to provide this functionality. A further, related enhancement would support the identification of different category locations on a map. In this case, when looking at a result map the user could select a location category, and the map would then be annotated to highlight corresponding locations. Multiple categories could be displayed on the same map.

10. Conclusions and future work

In this paper we have described the HistoryMap system, which integrates browsing and searching across two information collections-one containing historical maps, and the other historical newspapers. The system is designed to meet the particular information seeking needs and strategies of our target user population. It addresses the problem of the difficulty of extracting location information from historical maps by providing a simple mechanism by which maps can be added to the collection, and then retrieved according to the locations within them, independent of the level of detail at which they are presented. Cross-collection browsing avoids costly pre-processing of newspaper content by inserting cross-collection hyperlinks at access time. As a result, link presentation can be tailored to reflect the user's search focus. An observational study of HistoryMap has served to highlight a number of usability issues with the user interface, and focus groups have confirmed the enthusiasm with which sample target users react to the system.

Thumbnail for Figure 8: Revised implementation

Figure 8: Greenstone 3 implementation of the map collection. Multiple location types can be dynamically labelled on each map (left). Search results are ordered by occurrence of query locations on each map (right).

We are currently developing the system within the new Greenstone 3 open source software infrastructure. This new implementation builds upon the HistoryMap system described here, rectifying the usability issues revealed by the study. In particular it extends the system to include the desirable developments expressed by the focus groups. These are illustrated in Figure 8. To the left of the figure is a sample map-viewing page, showing multiple location types (pa sites, cemeteries and mines). These are marked either through a search restricted to location type, or by the user choosing to dynamically add marked location types whilst viewing the map. To the right is a sample query result page, in which the result map thumbnails are ordered according to how many of the multiple query locations are shown within the map. The next stage of this work will be to carry out further usability evaluation and end-user discussions to determine the usability and utility of this new implementation.

Note

Unfortunately we are currently unable to make the HistoryMap system publicly accessible due to copyright restrictions on some of the map resources.

References

Apperley, M., Cunningham, S.J., Keegan, T. and Witten, I.H. (2001) "Niupepa: a Historical Newspaper Collection". Communications of the ACM, 44(5), pp. 86-87.

Bainbridge, D., Buchanan, G., McPherson, J., Jones, S., Mahoui, A. and Witten, I.H. (2001) "Greenstone: a Platform for Distributed Digital Library Applications". In Fifth European Conference on Research and Advanced Technology for Digital Libraries, (September 4-9, Darmstadt, Germany), pp. 137-148.

Hornbaek, K., Bederson, B.B. and Plaisant, C. (2002) "Navigation patterns and usability of zoomable user interfaces with and without an overview". ACM Transactions on Computer-Human Interaction (TOCHI), 9(4), pp. 362-389.

Horvitz, E. (1999) "Principles of mixed-initiative user interfaces". InProceedings of CHI'99: Human Factors in Computing Systems, (Pittsburgh, PA, USA), ACM Press, pp. 159-66.

Jan, G. and Frew, J. (2002) "The ADEPT Digital Library Architecture". In Proceedings of Digital Libraries'02: The Second ACM/IEEE-CS Joint Conference on Digital Libraries, (Portland, Oregon, USA), ACM Press, pp. 342-350.

Lim, E.-P., Goh, D.H.-L., Liu, Z., Ng, W.-K., Khoo, S.-G. and Higgins, S.E. (2002) "G-Portal: a map-based digital library for distributed geospatial and georeferenced resources". In Proceedings of Digital Libraries'02: The Second ACM/IEEE-CS Joint Conference on Digital Libraries, (Portland, Oregon, USA), ACM Press, pp. 351-358.

Nielsen, J., Ed. (2002)Coordinating User Interfaces for Consistency, Morgan Kaufmann.

Rauschert, I., Agrawal, P., Sharma, R., Fuhrmann, S., Brewer, I. and MacEachren, A. (2002) "Designing a human-centered, multimodal GIS interface to support emergency management". In Proceedings of the Tenth ACM International Symposium on Advances in Geographic Information Systems, (McLean, Virginia, USA), ACM Press, pp. 119-124.

Theng, Y-L, Goh, D. H-L, Lim, E-P, Liu, Z, Yin, M., Pang. M. L-S. and Wong, P. B-B. (2005). "Applying Scenario-based Design and Claims Based Analysis to the Design of a Digital Library of Geography Examination Resources". Information Processing and Management, 41, pp. 23-40.

Witten, I.H., Boddie, S., Bainbridge, D. and McNab, R.J. (2000) "Greenstone: a Comprehensive Open-Source Digital Library Software System". InProceedings of Digital Libraries'00: The Fifth ACM Conference on Digital Libraries, (San Antonio, TX USA), ACM Press, pp. 113-121.

Witten, I.H., Loots, M., Trujillo, M.F. and Bainbridge, D. (2001) "The Promise of Digital Libraries in Developing Countries". Communications of the ACM, 44(5), pp. 82-85.