Building Semantic Tools for Concept-based Learning Spaces: Smith et al.: JoDI

Building Semantic Tools for Concept-based Learning Spaces: Knowledge Bases of Strongly-Structured Models for Scientific Concepts in Advanced Digital Libraries

Terence R. Smith, Marcia L. Zeng* and the ADEPT Project Team**
Department of Computer Science, University of California, Santa Barbara,
Santa Barbara, CA 93106, USA
Email: smithtr@cs.ucsb.edu
*School of Library and Information Science, Kent State University,
Kent, Ohio 44242, USA
Email: mzeng@kent.edu; Web: http://www.slis.kent.edu/~mzeng/
** The Alexandria Digital Earth Prototype (ADEPT) Project Team, University of California, Santa Barbara
Web: http://www.alexandria.ucsb.edu/

Abstract

Applying conventional principles of knowledge organization, representation, and other semantic tools, we have constructed a model for scientific concepts and employed knowledge bases and visualization tools to represent knowledge concerning scientific concepts. Strongly-structured models, such as the integration of a taxonomy (or thesaurus) with metadata (or attribute-value pairs) and domain-specific markup languages, as well as specialized models for learning scientific concepts, focus on such attributes as objective representations, operational semantics, use, and interrelationships of concepts. All of these play important roles in constructing representations of knowledge in most domains of science. Instructional activities for undergraduate teaching and learning are greatly facilitated with the use of such integrated semantic tools.

1 Beyond "Information Container"-based Learning Spaces

A strength of digital library (DL) technology lies in its ability to incorporate services supporting content-based organization of, access to, and use of its collections in learning applications. Many applications of DL technology, such as the National Science Foundation's program for constructing a national DL to support education in science, engineering, mathematics, and technology (NSF 2003), depend for success on this strength.

A significant portion of DL development activities that support learning is focused on providing access to educational materials in terms of traditional 'information containers', such as electronic versions of books, journal articles, theses and dissertations, images, and videos. Metadata schemas supporting description and retrieval of these resources have been adopted by various DL communities in order to provide electronic access to such learning materials in a timely manner. The idea of access to DL collections by content has been raised repeatedly, although no consensus has developed concerning appropriate level of granularity for such access. While traditional containers remain the standard level of granularity in organizing knowledge, there is little evidence that this level of granularity is the most effective mode for accessing library collections for learning purposes. The question therefore arises as to whether there exist well-defined and generally acceptable levels of granularity for organizing, accessing, and using scientific knowledge that are both implementationally and economically feasible.

The richness and diversity of the learning materials that may be accessed from both local and networked sources, including DLs, becomes one of the characteristics of electronic learning environments. Such heterogeneity, however, makes it difficult for instructors to create and maintain an environment in which knowledge is organized and integrated in terms of controllable and consistent sets of scientific concepts. This is, we believe, one of the weaknesses of using Web portals to provide access to un-integrated collections of learning materials. Another question therefore arises as to how to organize and represent knowledge in digital learning environments.

The approach taken in this paper to answering these questions is based on the premise that 'scientific concepts' and 'relationships between concepts' provide a powerful, and perhaps the only, level of granularity with which to support effective access and use for learning. As we show below, digital learning environments are capable of supporting the construction of knowledge bases of Strongly-Structured Models (SSMs) of concepts and of services that support the integration of diverse information sources based on such SSMs. We have therefore designed and implemented specific semantic tools for use in environments where heterogeneous learning materials from many distributed sources may be explicitly created and/or integrated, accessed, and organized according to the SSMs.

1.1 Concepts, Science, and Learning

Understanding the knowledge in some domain of science requires students to understand how sets of concepts and their interrelationships are developed and applied in representing the phenomena of the domain. A large literature has developed in various fields, including psychology, education, and cognitive science, concerning the development, use, and evaluation of the conceptual bases for various domains of scientific knowledge. Within the context of human knowledge in general, Binwal and Lalhmachhuana (2001) define "knowledge representation as a systematic way of codifying human knowledge" and affirm that any knowledge representation system requires an ontological commitment as to what set of concepts is to be used in representing some aspect of the world. They further argue that 'a central part of knowledge representation consists of elaborating: (1) a set of abstract objects; (2) concepts and other entities; and (3) the relations that may hold between them." In a related point of view, Gärdenfors (2000) argues that the central question for any theory of knowledge representation is how concepts should be modeled.

In relation to learning, there is a growing consensus that science education should be a meaningful activity in which students learn to think like scientists rather than solely remember information. This suggests strongly the importance of students developing a deep understanding of scientific concepts and their importance in the scientific approach, including their representation, creation, use, and evaluation. The National Science Education Standard (NRC 1996), which makes frequent reference to concepts, calls for students to engage in scientific reasoning in which they 'become familiar with modes of scientific inquiry, rules of evidence, ways of formulating questions, and ways of proposing explanations'. The success of such activities depends on the depth of understanding of the concepts underlying the practice of science.

We may interpret in terms of concepts three generally accepted principles concerning students' acquisition of deep understanding of some domain of scientific knowledge (Mayer 1991, Mayer 2001, Bruer 1993). These principles, derived from cognitive science research into learning and education, are:

  • Domain-specific learning: Students learn best when cognitive skills are taught in the context of a specific domain of knowledge rather than in contexts that are more general. An interpretation of this principle in terms of concepts suggests the importance of explicitly supporting students' acquisition of a deep understanding of the concepts, concept interrelationships, and reasoning skills related to developing and using scientific representations of phenomena in any domain of scientific knowledge.
  • Case-based learning: Students learn best when cognitive skills are learned in the process of solving authentic problems rather than when pieces of information are presented as isolated facts to be learned. A concept-based interpretation suggests the importance of constructing a learning environment in which specific problems are presented in ways to motivate the use of sets of concepts and interrelationships germane to the problems.
  • Scaffolded learning: Students learn best when the task difficulty is adjusted to meet their capabilities. The interpretation suggests providing students with subsets of concepts and concept representations appropriate to their level of knowledge.

These principles and interpretations indicate the value of constructing learning environments based on, and making explicit use of, a knowledge organization and representation system containing structured models of concepts and their interrelationships. Pedagogic advantages arising from such an approach include concept-based access to scientific knowledge and the efficient re-use and re-purposing of knowledge bases of structured concept representations.

1.2 A Concept-based Digital Learning Environment: ADEPT DLE

This section provides an overview of a concept-based Digital Learning Environment (DLE) that is based on our abstract SSM of concepts. This DLE has been developed and implemented by the Alexandria Digital Earth Prototype (ADEPT) project team and is currently in use for teaching geography courses in University of California, Santa Barbara (UCSB). The following section describes in great detail the underlying abstract SSM of concepts.

DLEs offer two advantages over purely print-based learning environments. The first is the richness and diversity of the learning materials that may be accessed from both local and networked sources, including DLs. The second is the emerging availability of knowledge representation and Web technologies for integrating such materials at the level of scientific concepts. In particular, such technologies allow us to extend conventional knowledge organization systems and structures, such as thesauri, gazetteers, and ontologies, into more informative representations of concepts and their interrelationships. We may create digital knowledge bases of information about concepts for use in organizing learning materials for presentation. The ADEPT DLE takes advantage of such technologies in support of its basic assumption concerning the conceptual organization of science learning materials.

The ADEPT DLE has been established for, and tested in, teaching a relatively large class at the undergraduate level in the domain of Physical Geography in 2002 and 2003 at UCSB. At the front-end, in current applications of the DLE in classroom settings, items from each of the three collections (knowledge bases of concepts, DL collection of lecture presentations, and DL collection of learning objects) are separately projected on to three different screens (termed from left to right the "Knowledge Window", the "Lecture Window", and the "Collection Window".) An edited photograph of the three screens taken during a class presentation is shown in Figure 1.

Figure 1: A Classroom Presentation

Figure 1. Classroom presentation: (left to right)
the Knowledge Window, the Lecture Window, and the Collection Window

Presentations shown in the Lecture Window form the backbone of the whole class. The structure of each lecture may be organized as shown in Table 1. The contents of the Knowledge Window and Collection Window in Figure 1 are controlled through icons and links in the Lecture Window, although direct search over the knowledge bases and the learning object collections may be directed from these windows if desired. For example, by clicking on the icons listed under a concept (see right side of Figure 2), corresponding objects such as images, maps, figures, videos, and simulation models that exemplify the phenomena and the associated concepts will be projected on the Collection Window. 

Table 1: Organizing Subheads for a Lecture Model
  • Identification of Scientific Phenomena 
    • TOPIC, SUBTOPIC, SUB-SUBTOPIC
    • OBSERVATIONAL PROCEDURE
    • EXAMPLE
  • Representation of Scientific Phenomena:
    • FACT
    • CONCEPT
    • THEORY
  • Understanding of Scientific Phenomena
    • QUESTION/ANSWER
    • PROBLEM/SOLUTION
    • HYPOTHESIS/EVALUATION
    • STATEMENT/DERIVATION
    • PREDICATION/TEST
    • COMMENT
figure 2

Figure 2. Lecture Window
- illustration of a lecture from the physical geography class

During a presentation, the conceptualization may be shown as a dynamic graph in the Knowledge Window (Figure 1). As the instructor teaches the concepts, relationships between the concept being studied and a set of related concepts are automatically centered on the screen (Figure 3). The conceptualization operation allows users (instructors or students) to associate a given concept occurring in a presentation (such as Stream Velocity) with other concepts occurring in the presentation (such as Depth, Slope, Roughness), providing a network structure among the concepts. Graphic displays of "concept spaces" visually indicate the relationships and important subsets of concepts, particularly subsets that constitute ontological commitments for representing given phenomena. These provide students with large-scale (and even global) views of the structure of concept spaces. Figure 3 shows the output of the Java applet that represents the Conceptualizations of the concept Mass Movement from the physical geography class. This representation shows four levels of conceptualization and the structure of concepts that may be used in "explaining" the concept Mass Movement. For example, one may note that Figure 3 includes the hierarchical structuring MassMovement -- Mechanics -- ForceBalance -- Stability.

figure 3

Figure 3. Visual presentation of the concept space
-- the concept of mass movement from the physical geography course

The three screens represent materials from supporting ADEPT collections that include:

  1. One or more knowledge bases (KBs) containing collections of SSMs of relevant scientific concepts that are necessary and/or sufficient for representing knowledge in some domain of science.
  2. Collections of learning objects from the DL that may be used for exemplifying and illustrating concepts in terms of their representation, meaning, use, and interrelationships.
  3. Collections of re-usable presentation materials (such as lectures or laboratory sessions) that both integrate items from the KBs and learning object collections and may be organized as trajectories through the KBs of concepts.

A current implementation of the DLE is being employed operationally in undergraduate classroom settings. We view this implementation as one possibility in a large class of tailorable DLEs in which logically distinct DL collections and associated services are integrated in terms of KBs of SSMs of scientific concepts. The focus of this implementation is on the preparation and presentation of materials supporting lecture-type modes in the domain of elementary physical geography/geology. However, other modes of presentation, including laboratory sessions, have also been developed. The DLE is designed to be generally applicable to all domains of science (ADEPT 2003).

2 SSMs of Concepts

2.1 Conventional and New Models of Concepts

In developing ADEPT's DLE, we have studied a range of knowledge organization structures that are currently employed by library and DL technology. These structures represent spaces of scientific concepts and their interrelationships to support the organization of, access to, and use of scientific knowledge, including (see, for example, NKOS 2000, Binwal and Lalhmachhuan 2001, Hill et al. 2002):

  1. Term lists, such as authority files, glossaries, and dictionaries;
  2. Metadata-like models, such as directories and gazetteers;
  3. Classification and categorization schemes, such as subject headings, categorization schemes, classification schemes, and taxonomies;
  4. Relationship models, such as thesauri, semantic networks, concept maps, frames, and ontologies;
  5. Metadata content standards, and particularly those parts dealing with knowledge representation;
  6. Domain-specific content markup languages represented in the form of Document Type Definitions or XML schemas;
  7. General knowledge representation languages, such as first order predicate calculus, description logics, conceptual graphs, general frame systems, and object-oriented modeling languages.

Semantic tools listed in group (1) to (3) and some in group (4) are considered to be conventional knowledge organization systems/structures/services (KOS). Conventional representations of concepts have value in learning environments. They support, for example, access to traditional knowledge containers, such as texts and journals, in which term-based representations of concepts occur. They are also of value in supporting high-level graphical views (or "concept maps") of the interrelationships among concepts. On the other hand, a weakness of many of these models in DL environments arises from the relatively weak structures that they typically employ. For example, these representations almost invariably take the form of linguistic terms. In many text-based learning materials, concepts and their interrelationships are primarily denoted by linguistic terms or alphanumeric symbols. This is reflected in the widespread use in textbooks of glossaries of terms and symbols.

Many of the important and defining characteristics of scientific concepts, such as their representation, semantics, properties, relationships to other concepts, and use cannot be represented in such simple linguistic terms. Such knowledge is typically distributed in unstructured ways throughout the learning materials. Nevertheless, most conventional KOS, especially when used individually, cannot easily support access to, or integration of, knowledge concerning many of the attributes of concepts that make them useful in scientific modeling activities. Their importance is largely restricted to accessing knowledge in terms of the traditional 'information containers' whose contents may, for example, be accessed by such term-based concept representations as subject headings. Such models of concepts are of limited value in providing the 'deep' organization of, and access to, scientific knowledge that is important for learning.

From the first to the fourth group listed above, there is a clear trend towards more strongly structured models. As we note in the Discussion (section 4), a further integration of KOS principles and the elements of these groups may be viewed as creating domain-specific knowledge-based semantic tools.

2.2 Theory Underlying SSMs for Concepts

Gärdenfors (2000)

, in constructing a convincing theory of cognitive (or non-objective) representations of concepts, explicitly lays the ground for a theory of objective (including scientific) representations of concepts and their interrelationships. He describes three interrelated classes of theories concerning the cognitive representation of concepts, namely: (1) low-level connectionist/associationist/procedural approaches; (2) high-level symbolic approaches; and (3) bridging-level conceptual approaches.

Gärdenfors gives priority to the third class of conceptual representation, for which he develops a representational theory based on geometrical structures in well-defined conceptual spaces. This contrasts with representations based on either associations or symbols. His focus on the conceptual level reflects the well-known weaknesses of representations based on associations or symbols. The semantics of concept representations based on the connectionist approach are encoded implicitly in terms of procedures; those of concept representations based on the symbolic approach are bound up with essentially syntactic operations on the symbols. He implies, however, that all three classes may be combined to form an integrated and powerful knowledge representation system.

In the context of the conceptual level, he defines a natural concept as being represented by "a set of regions in a number of domains together with an assignment of salience weights to the domains and information about how the regions in different domains are correlated." A domain is a multidimensional space, the quality dimensions of which may take the form of (sometimes measurable) single dimensions or integral dimensions comprised of several interrelated dimensions (such as the three dimensions of color representation), or combinations of these two types. Hence Gärdenfors is able to define: (1) primitive concepts as relating to a single dimension; (2) properties as concepts defined by integral dimensions; (3) simple concepts as defined by a single domain; (4) complex concepts as involving several domains; and (5) objects as points in a conceptual (representation) space.

Gärdenfors' main contribution to a theory of objective representations of concepts and interrelationships are apparent on noting that: (1) the three interrelated classes of theories of cognitive representations of concepts have natural and direct counterparts in theories of scientific (or objective) representation of concepts; (2) his representational theory of concepts at the conceptual level translates almost verbatim to the language of scientific representations; and (3) the intermediate class of SSMs of concepts at the conceptual level are highly expressive, semantically-rich, and very useful in applications.

Binwal and Lalhmachhuana (2001) review objective knowledge representation structures and support this idea from a different perspective. They note that structured object systems, such as semantic nets and frames, provide bases for especially useful knowledge representation structures characterized by attribute-value pairs containing both procedural and declarative representations. They lend themselves well to model-fitting approaches. SSMs of concepts introduced in the next section may be interpreted as special cases of Gärdenfors' theory.

2.3 SSMs of Concepts in Various Domains of Science

Various scientific disciplines have developed models of scientific concepts that represent their key attributes in explicit form. These models focus on such attributes as the objective representations, operational semantics, use, and interrelationships of concepts, all of which play important roles in constructing representations of phenomena that further the understanding of scientific domains of knowledge. We characterize such representations as "strongly-structured" models of concepts. A number of scientific domains have developed such models. Due to space limitation, we discuss only two examples.

2.3.1 SSMs of Concepts in Materials Science

The MatML Working Group of the National Institute of Standards and Technology (NIST) has constructed a model for representing concepts relating to materials science (MatML 2001). In particular, they have created a schema for Materials Property Data (MatML 2002). At the highest level, the MatML_Doc element (root element) contains one or more Material elements, each of which describes a material and its properties.

The information contained by the Material element is compartmentalized into five major elements:

  1. BulkDetails element contains a description of the bulk material;
  2. ComponentDetails element contains a description of each component of the bulk material (useful for complex materials systems such as composites or welds);
  3. Metadata element contains descriptions of data found in the document;
  4. Graphs element encodes two dimensional graphics;
  5. Glossary.

The ComponentDetails element, for example, is composed at the next level down of the following elements:

  • Name
  • Class
  • Subclass
  • Specification
  • Source
  • Form
  • ProcessingDetails
    • Name, ParameterValue, Result, Notes
  • Geometry
    • Shape, Dimensions, Orientation, Notes
  • Characterization
    • Formula, ChemicalComposition, PhaseComposition, DimensionalDetails, Notes
  • PropertyData
    • Data, Qualifier, Uncertainty, ParameterValue
  • AssociationDetails
    • Associate, Relationship, Notes
ComponentDetails

contains a description of a component within the bulk material and may occur zero or more times within the Material element. ComponentDetails may be used to describe complex materials systems such as welds (e.g. the base metal, the heat affected zone, and the weld metal) or composites (e.g. the whiskers, fibers, and matrix of a fiber-reinforced composite material).

2.3.2 SSMs of Concepts in Chemistry

Recording, storing, and retrieving information on chemical substances have been critical in the progress of chemistry (Weisgerber 1997). The discipline of chemistry has developed an authoritative knowledge representation structure for representing concepts relating to chemical substances. Chemical compounds in terms of their syntheses, properties, and applications, lie at the core of chemistry. The Chemical Abstract Service (CAS), a division of the American Chemical Society, produces the largest and most comprehensive databases of chemical information, including Chemical Abstracts (CA) and the CAS Registry. CA contains millions of document records from the chemical journal and patent literature. The CAS Registry, which contains records of 22 million organic and inorganic substances and 35 million sequences, represents a wide variety of chemical substance concepts, including the world's largest collection of concepts relating to such substance subclasses as: Organic compounds, Inorganic compounds, Metals, Alloys, Minerals, Coordination compounds, Organimetallics, Elements, Isotopes, Nuclear particles, Proteins and Nucleic Acids, Polymers, and Nonstructural materials (CAS 2003).

These activities have led to the development of a structured model of chemical substances. CAS extended nomenclature principles of the International Union of Pure and Applied Chemistry (IUPAC) to develop unique names of any substance. When a chemical substance is first encountered in the literature processed by CAS, its molecular structure diagram, systematic chemical name, molecular formula, and other identifying information are added to the CAS Chemical Registry and assigned a unique CAS Registry Number. Since 1967 many non-CAS publications have adopted CAS Registry Number to identify chemical substances.

CAS' SSMs for chemical substance concepts include:

  • Information about generic, proprietary, and trade names of a substance;
  • Three distinct, and well-defined, representations of any chemical substance concept including:
    • Illustrative structural diagrams: identifying the molecular skeleton of a ring system or stereoparent for convenience in interpreting associated entries in the Chemical Substance Index. Such diagrams indicate:
      • The chemical elements present;
      • The connection among atoms in a molecule of the substance;
      • The types of connecting bonds;
      • The arrangements of the atoms in space.
    • Molecular formulas: representing invariant properties of a chemical substance and derivable from its molecular structure.
    • Index of Ring Systems: listing names of cyclic skeletons contained in organic chemical compounds in the order determined by their ring analysis.
  • Relationships between chemical substance concepts.

The General Subject Index of CA includes general concepts relating to chemical substances, such as classes of substances, physical representations, chemical concepts, and phenomena, biochemical and biological subjects (other than specific biochemicals), and concepts relating to animals and plants (especially their common and scientific names) (CAS 1999).

3 Structure and Component of the ADEPT SSMs of Concepts

3.1 Elements of ADEPT SSMs of Scientific Concepts

Informed and motivated, in part, by the work of various scientific groups in constructing detailed, objective models of the concepts underlying their domains, as exemplified above, as well as by the work of Gärdenfors (2000), ADEPT has developed SSMs of concepts for scientific domains in terms of a frame-based knowledge representation system with slots and attribute-value fillers. Such SSMs extend significantly the typically thesauri-like definitions of concepts that have traditionally been used in library environments (Smith et al. 2002a, b). We also note the influence of evolving theories and applications of "ontologies" that involve not only the term representations of concepts and their interrelations, but also of the values of their properties.

figure 4: SSM elements

Figure 4. Elements of ADEPT SSMs of scientific concepts

The current ADEPT SSMs are intended to integrate all of the information about a concept that is required for undergraduate levels of education. Figure 4 describes the main elements of the SSMs, which we now describe in further detail. Figure 5 shows items in the concept KB, as partially displayed through a browser.

  • The domain context of a concept refers to the domain of knowledge to which the concept is relevant. For example, the concept Water, having distinct connotations for physical geography, chemistry, agriculture, and meteorology, will have different SSMs for different contexts. Furthermore, many of the terms that are used to represent concepts may be quite ambiguous if the context is not prescribed. For example, Transport in the context of physical geography typically denotes the transport of earth materials by some fluid medium.
  • Terms are the simple (linguistic) representations that do little more than denote a concept, although they may be employed in reasoning about hierarchical and partitive relationships.
  • Descriptions are natural language representations of the concept. Many of those employed in the current KB are drawn from glossaries of books (with citations) and similar materials.
  • Class of a concept relates to some chosen classification, such as our classification into abstract, methodological, and concrete concepts, and associated subclasses (discussed in the next section.)
  • Historical origins(s) take the form of natural language information about the history and evolution of a concept.
  • Example(s) provide one or at most a few representations of prototypical examples of the concept. For example, photographic and image representations of Landforms are used extensively in the physical geography KB.
  • Defining operations provide scientifically objective descriptions of activities that define the semantics of the concept.
  • Hierarchical relations are specific, thesaurus-like relations between the terms denoting concepts, such as ISA and PARTITIVE relationships. While such terms provide bases for inference (for example: a ISA b AND b ISA c IMPLIES a ISA c), these terms and relationships are essentially without any strong scientific semantics.
  • Conceptualizations: a "conceptualization", which may be viewed as a "weak" representation of a concept, is a relationship between the term representation of a concept and a set of other concepts. This element is intended to provide answers to the question: "What other concepts in the KB do you require in order to explain/understand a given concept?" For example, an answer to this question for the concept of Average Stream Velocity might be Roughness, Water Surface Slope, and Flow Depth, since these are the concepts that enter into the most basic scientific representations of the concept. We note that conceptualizations typically expand in a hierarchical manner, as in the case: Polygon --> Linesegment --> Point --> RealNumber, and may be graphically represented as a "concept map".
  • Scientific classifications, which should not be confused with the classifications of library science, may be viewed as a moderately strong scientific representation of a concept. Many scientific classifications represent a concept as some (generally complex-shaped) region in a space the dimensions of which may be given numerical scales. A good example is the classification of Igneous Rocks as regions in some space characterized by chemical and mineral composition. Scientific classifications may be viewed as providing a simple representation of a concept in terms of other concepts.
  • Scientific representations are those representations of a concept that are given in terms of scientific "languages" and from which useful information may be derived with various deductive and inductive procedures. Important classes of such representations include Data, Graphical, Mathematical and Computational representations. For example, a frequently used and powerful scientific representation of the concept of Average Stream Velocity is the Chezy Equation (Chezy Equation, where C is a constant, R is the Hydraulic Radius, and D is the Flow Depth).
  • Properties are concepts that "characterize" some concepts and the scientific representations of which may be manipulated to obtain a characteristic value of a concept. For example, properties of a Drainage Basin that may be computed from its representation in terms of a Digital Elevation Model (DEM) include its Area and its (Strahler) Stream Order.
  • Causal relations relate to term and scientific representations of causal relationships between one concept and others.
  • Co-relations relate to co-relational, and often statistical, rather than causal relationships between one concept and others.
  • Applications describes the applications of a concept for a specific scientific concept.
figure5 items in the concept KB

Figure 5. Items in the concept KB, as partially displayed through a browser

It is suggested that generalizations and refinements of similar models could be used for developing SSMs of scientific concepts that are acceptable to scientific communities. For example, to indicate the degree to which our general model of concepts is consistent with the SSMs in other domains, we mapped the CAS model for chemical substance concepts into our general model (Smith et al. 2002a).

3.2 A Use-based Classification of Concepts

For the value spaces of "class of concept", we have developed a use-based classification of scientific concepts, informally based on a National Research Council publication on scientific education (NRC 1996). This classification provides one basis for characterizing concepts in terms of objective scientific operations specifying operational semantics. For example, an abstract mathematical concept, such as an arithmetic equation, may be defined in part by the objective syntactic operations that may be carried out on a symbolic representation of an equation; or a measurable concept, such as the average velocity of a river, may be defined in terms of the objective operations that may be used to determine the average velocity of a river. We classify concepts that are interpretable within the span of this basis as operationally interpretable concepts. It follows there are many concepts that cannot be so classified including such uninterpretable concepts as the "way" of Lao Tzu.

The basic form of our classification of scientific concepts is shown in Figure 6. The classification recognizes that the concepts employed by scientists cover a broad range of contexts, as illustrated, for example, by such concepts as polygon, experiment, dataset, multiple linear regression, hypothesis, momentum, hydraulic geometry, and heat diffusion equation

figure 6

Figure 6. Use-based classification of scientific concepts

We believe it is important for students to understand the main contexts of scientific activity in which a given concept finds application. Hence, one application of the classification lies in providing students with a "model" of scientific activities.

At the highest level, the classification represents operationally interpretable concepts as belonging to one of three classes:

  • Abstract concepts: have operational semantics defined in terms of syntactic (or computational) manipulations of symbolic representation. Three possible subclasses include, but are not limited to:
    • syntactic (linguistic) concepts;
    • logical concepts;
    • mathematical concepts.
  • Methodological concepts: have semantics defined in terms of various classes of scientifically well-defined operations that may be carried out in relation to them. Possible subclasses of this class include, but are not limited to, concepts relating to procedures for:
    • identification/characterization;
    • representation;
    • understanding;
    • application;
    • communication.
  • Concrete concepts: have semantics defined in terms of scientifically well-defined operations that provide the concept with an interpretation. The concept of river discharge, for example, has a characterizing set of operations that defines the concept in terms of various sets of measurement procedures that may be carried out in real-world contexts. Such procedures determine the amount of water passing across some cross-section of a river during a given interval of time. These concepts are, by and large, the class of concepts used in model and theory construction and include the important subclasses of:
    • measurable concepts;
    • recognizable concepts;
    • interpreted abstract concepts.

Figure 6 also indicates the existence of non-trivial relationships between the three top-level classes. For example, the classical heat diffusion equation is a concrete concept in the sense that it is an (implicit) representation of the heat diffusion process using variables that are measurable. It may, however, also be viewed as a certain mathematical concept, namely a specific partial differential equation whose terms have been given an interpretation in terms of measurable concepts, such as temperature and spatial location.

We note in particular that this classification of concepts permits the representation of relatively high-level concepts, such as topics, that generalize over various (sub)classes of concepts.

3.3 Design and Implementation of the Concept KB

We have developed designs for a KB of concepts with thin client access through a Web browser that supports the creation, access, and display of concepts.

Representations of the SSMs may be accessed in two forms: visual and textual. The visual forms are intended to show the interrelationships among concepts and to provide "global" views of the structure of the concept spaces. In particular, we have focused on providing visualizations based on the conceptualization element of the SSMs. Textual forms allow instructors and students to browse through the contents of the elements of some SSMs of concepts. Visual representations of the concept spaces have been implemented in two clients, an OpenGL graph visualization tool and a lightweight Java applet (Ancona 2002). The Java Graph Applet has the advantage of a live database connection through a PHP script on the server. The visualizations currently center on a specific user-selected concept and show a view of the KB from that concept. The Java Graph client allows for the fine-tuning of a graph with the ability to change the number of relationship levels visible, and even to hide individual nodes to focus on the specific topic at hand. Conversely, many parameters can be passed to the applet via a PHP script, allowing for links to dynamically generated graphs centered on any concept in the KB. It also has the ability to save a graph to a database so it can easily be retrieved via a URL as part of a lecture.

Basic input and editing features for SSMs have been implemented. The current system allows listing/searching of concepts though a Concepts Control page, where users can search for a Concept to begin editing it, view it, or create a new Concept. Since a Concept consists of several separate parts, both the concept entry/edit page and concept viewing page use a JavaScript collapsing menu system, which allows users to view any combination of the concept parts.

The current KB of approximately 1200 concepts from the domain of physical geography was created, using these input tools, by a small (five-person) group of "experts''. Each expert entered a given concept, using reference materials to support the process. Our experience is that a typical concept takes 0.5-1.5 hours to create, although efficiency increases rapidly with experience. Weekly reviews of created concept models were held to discuss concept models and to provide uniformity over the KB. The design, implementation, and evaluation of the ADEPT DLE are reported in a number of conference papers (Janee and Frew 2002, Smith et al. 2003).

4 Discussion

4.1 Semantic Tools based on SSMs and Conventional KOS

The ADEPT SSMs of scientific concepts integrates various principles of conventional knowledge organization and representation structures and other semantic tools, as illustrated in Figure 7.

figure7 SSMs and KOS relation

Figure 7. Elements in the ADEPT SSMs and the related structures used in conventional KOS and other semantic tools

The elements of the ADEPT model may be found in various conventional KOS and other semantic tools, including thesaurus, classification, semantic network, concept map, faceted analysis, and taxonomy. Many of the principles and elements used by various KOS provide the ADEPT model with sound foundations.

It is clear that any single knowledge organization and representation structure is unlikely to satisfy the needs of concept-based learning spaces. For example, by controlling the vocabulary and associating terms that have hierarchical and associative relationships, a good thesaurus could satisfy the needs for facilitating retrieval of documents and achieving consistency in the indexing of documents for information storage and retrieval systems (ANSI/NISO 1993). By connecting concepts in a useful structure, a good classification could be descriptive, explanatory, heuristic, fruitful, and perhaps also elegant, parsimonious, and robust (Kwasnik 1992).

Most phenomena are understood to have several, perhaps overlapping, but separate sets of attributes and relationships, depending on the context and goal of the representation. In any event, no one hierarchical classification is able to capture all aspects of a particular domain (Kwasnik 1999). Faceted analysis resolves such issues by introducing more than one way to view the world. However, while the flexibility and pragmatic appeal have made it a popular approach, there are some limitations in terms of knowledge representation and creation, especially in the difficulty of establishing appropriate facets and establishing relationships among facets (Kwasnik 1999). In a thesaurus, relationships beyond hierarchies are represented as associative relationships, which reflect other attributes to some extent, but in a loosely connected way. It is possible that such semantic tools are restricted by their own syntax and structure.

Semantic networks have received renewed attention in networked environments. A semantic network, as defined by Quillian (1968), is a graph structure in which nodes (or vertices) represent concepts, while the arcs between these nodes represent relations among concepts. A notable example is the Unified Medical Language System (UMLS) Semantic Network which is created to provide a consistent categorization of all concepts represented in the UMLS Metathesaurus and to provide a set of useful relationships between these concepts. With 135 semantic types and 54 relationships, concepts and their relationships are represented in concept maps, as illustrated in section 3 of the UMLS Knowledge Sources (NLM 2003). The semantic network approach to representing multiple aspects of any entities through its annotated links between nodes has overcome many of the limitations discussed in the preceding paragraph. Nevertheless, its maximal benefits can only be obtained when it is integrated with other semantic tools. We also note that the generation of concept maps representing semantic networks requires much stronger underlying concept models.

Consequently, the ADEPT semantic tools cannot be categorized in terms of conventional semantic tools, although they integrate various elements from such tools. With the components in the ADEPT KBs that are built with SSMs, as well as the enabling technologies, students are able to 'zoom-in' to a concept space (see Figure 5), observing and studying the knowledge structure of a particular concept, while not losing the context of the concept, including its characteristics and relationships. The conceptualization allows students to 'zoom-out' to see large-portions of the concept space and navigate to other concepts they may wish to explore (see Figure 3). Such conceptualizations typically expand into a network and have values in providing students with incrementally-constructed but global views of the conceptual structure of the learning materials. A concept map is naturally interrelated with all the other elements in the SSMs of concepts. The knowledge bases with SSMs enable dynamic and scalable concept-centered maps and presentations to be generated. This mechanism also solves problems of concept map generation because it would be impractical to rely on the preparation of many maps in advance, even though most concept maps reported in the literature and available on the Web are "hand-built". Overall, the knowledge organization and representation models developed by ADEPT lay a foundation for DLEs, while current computational technologies make the implementation of the models possible.

Semantic tools are moving towards the integration of stronger structures. Ontologies, for example, are specific concept models representing complex relationships between objects, including the rules and axioms missing from semantic networks. In some applications such as Protégé, an ontological work would be the combination of a taxonomy, metadata schemas (or attribute-value pairs), and instances. Ontologies typically have fairly intricate structure both in terms of including objects with dozens of properties and many inter-relationships, and also in terms of having fairly deep hierarchies. The controlled vocabularies of the upper level ontologies have significant size as well (McGuinness 2002).

Domain-specific markup languages such as MatML (Materials Markup Language), MathML (Mathematical Markup Language), and CML (Chemical Markup Language) have evolved dramatically during the last few years. Almost all of them exhibit SSMs for concepts. The issue of not only discovery of, but also discovery within, resources directly relates to the issue of resource decomposition. Some objects described by a metadata record may be atomic, but many other objects are true information containers, information rich and with both a complex semantic structure and a complex internal "document" organization. These objects are structurally decomposable and semantically decomposable. The integration of resource level metadata descriptions with specialized semantic or structural markup documents has now become one of the major research problems emerging in DL development (Shreve and Zeng 2003).

We note that fields that have been involved in creating such new semantic tools have now developed beyond traditional library and information science. Such fields include knowledge engineering, knowledge representation, qualitative modeling, language engineering, database design, information retrieval and extraction, informatics in various domains, and knowledge management and organization.

4.2 SSMs and Concept-based (Digital) Learning Spaces

As discussed in previous sections, understanding the knowledge in some domain of science requires students to understand how sets of concepts and their interrelationships are developed and applied in representing the phenomena of the domain. Most approaches to teaching and learning, however, deal with concepts in relatively implicit ways. For example, students are rarely presented with SSMs of concepts and their interrelationships. As a result, they too often emerge from a course of study with: (1) limited notions of the nature of concepts; (2) large, but poorly-structured, memorized sets of terms denoting concepts; (3) incompletely structured associations of relationships between concepts; and (4) partial knowledge of how to use concepts in creating representations of phenomena or how to create new concepts.

We believe it is both feasible and valuable to organize, access, and use scientific knowledge explicitly in terms of the sets of concepts that underlie some domain of scientific knowledge rather than in terms of the containing information objects.

With current DL technology, it is possible to develop:

  • SSMs for representing concepts and their interrelationships;
  • domain-specific knowledge bases of such representations;
  • associated DL collections of 'illustrative materials' concerning different aspects and attributes of the concepts;
  • services supporting the creation, modification, viewing, and use of concepts for various purposes in learning contexts.

In particular, KBs of concepts and associated collections may be employed, together with associated services, in creating tailored courses of instruction that take the form of a 'trajectory' through the space of concepts underlying some domain of knowledge. Not only do such KBs of concepts facilitate the creation of different ways of organizing knowledge about some domain in terms of its basis of concepts and relationships, but support a greater focus on critical aspects of concepts.

We believe that such an explicitly concept-based learning leads to a deeper understanding on the part of students of the: (1) nature, structure, and classes of the concepts that, together with the interrelationships between the concepts, provide a basis for scientific development in specific domains of knowledge; (2) scientific roles of various classes of concepts across the spectrum of scientific activities; and (3) global structure of some domain of scientific knowledge in terms of the underlying framework of concepts. Advantages that may accrue to instructors include the efficient reuse and re-purposing of the KBs of concepts and the associated collections and services in creating instructional support materials.

5 Conclusions

Our experiments in developing and employing a DLE in undergraduate classes has reinforced the obvious idea that one needs to be able to integrate heterogeneous DL materials at a conceptual level for much instructional work. Just having Web portals and search engines as the navigation and access tools to un-integrated digital collections of learning materials is insufficient. In our case, the three sets of collections, (a set of knowledge bases for over 1200 scientific concepts, a learning object collection of approximately 2000 objects, and a collection of reusable presentation materials), are integrated and made usable with the ADEPT DLE model and architecture.

We have also found that the construction and display of elementary knowledge organization systems is also insufficient for DL-supported advanced instructional purposes. We believe that they should be integrated with other scientific knowledge in an interactive manner, particularly when such knowledge is represented as the content of traditional information containers. Un-integrated and elementary knowledge structures cannot easily provide access to knowledge concerning many of the attributes of concepts that make them useful in scientific modeling activities.

Applying conventional principles of knowledge organization systems and other semantic tools, we have constructed domain-specific SSMs and employed knowledge base and visualization tools to represent the knowledge concerning scientific concepts. Instructional activities are greatly facilitated with the use of such integrated knowledge organization structures. SSMs, such as the integration or combination of taxonomy (or thesaurus), metadata (or attribute-value pairs) and domain-specific markup languages, as well as the specialized models for learning scientific concepts, focus on such attributes as objective representations, operational semantics, use, and interrelationships of concepts. All of these play important roles in constructing representations of knowledge in most domains of science. We believe that the development and use of DLs in the concept-based learning environments that are based on the use of KBs of SSMs of concepts will lead to significant benefits to students and instructors.

Acknowledgments

The work described here was partially supported by the NSF-DARPA-NASA Digital Libraries Initiative and the NSF NSDL initiative under NSF IR94-11330, NSF IIS-9817432, DUE-0121578, and UCAR S02-36644.

References

ADEPT (2003) Virtual Learning Environment http://www.alexandria.ucsb.edu/research/learning/index.htm

Ancona, D. (2002) "Visual explorations for the Alexandria Digital Earth Prototype". Second International Workshop on Visual Interfaces to Digital Libraries, JCDL, Portland, Oregon, 2002

ANSI/NISO (1993, R2003) Z39.19-2003 Guidelines for the Construction, Format, and Management of Monolingual Thesauri (Bethesda, MD: National Information Standards Organization)

Binwal, J. C. and Lalhmachhuana (2001) "Knowledge representation: concept, techniques, and the analytico-synthetic paradigm". Knowledge Organization, 28(1):5-16

Bruer, J. T. (1993) Schools for Thought: A Science of Leaning in the Classroom (Cambridge, MA: MIT Press)

CAS (1999) 1999 CA Index Guide (Columbus, OH: Chemical Abstracts Services)

CAS (2003) The CAS Registry. http://www.cas.org/EO/regsys.html

Gärdenfors, P. (2000) Conceptual Spaces: The Geometry of Thought (Cambridge, MA: MIT Press)

Hill, L., Buchel O., Janee, G., and Zeng, M. L. (2002) "Integration of knowledge organization systems into digital library architectures: Position paper". Thirteenth ASIS&T SIG/CR Workshop on Reconceptualizing Classification Research, Philadelphia, PA,November http://alexandria.sdc.ucsb.edu/~lhill/paper_drafts/KOSpaper7-2-final.doc

Janee, G. and Frew, J. (2002) "The ADEPT Digital Library Architecture". In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ' 02), Portland, OR, July

Kwasnik, B. H. (1992) "The role of classification structures in reflecting and building theory". In Advances in Classification Research, Vol.3, Proceedings of the 3rd ASIS SIG/CR Classification Research Workshop, edited by R. Fidel, et al. (Medford, NJ: Learned Information), pp. 63-81

Kwasnik, B. H. (1999) "The role of classification in knowledge representation and discovery". Library Trends, 48(1):22-47

MatML (2001) MatML Overview: XML for Materials Property Data, National Institute of Standards and Technology http://www.matml.org/

MatML (2002) MatML Schema, Version 3.0, prepared by E.F. Begley on behalf of the MatML Working Group http://www.matml.org/schema.htm

Mayer, R. E. (1991) The Promise of Educational Psychology: Learning in the Content Areas (Upper Saddle River, NJ: Merrill Prentice Hall)

Mayer, R. E. (2001) Teaching for Meaningful Learning (Upper Saddle River, NJ: Merrill Prentice Hall)

McGuinness, D. L. (2002) "Ontologies come of age". In Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, edited by D. Fensel, et al. (Cambridge, MA: MIT Press), pp. 171-192
http://www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm

NKOS (Networked Knowledge Organization Systems/Services) (2000) Taxonomy of Knowledge Organization Sources/Systems, draft June 7 (revised July 31). Based on Hodge, Gail (2000) "Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files". CLIR Pub91, April http://nkos.slis.kent.edu/KOS_taxonomy.htm

NLM (2003) "UMLS® Semantic Network". In UMLS® Knowledge Sources, 14th edition - November release, section 3 (National Library of Medicine) http://www.nlm.nih.gov/research/umls/META3.HTML

NRC (National Research Council) (1996) National Science Education Standard (Washington, DC: National Academy Press)

NSF (2003) "National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL) - Program Solicitation". NSF 03-530 http://www.nsf.gov/pubs/2003/nsf03530/nsf03530.htm

Quillian, M. R. (1968) "Semantic memory". In Semantic Information Processing, edited by M. Minsky (Cambridge, MA: MIT Press), pp. 216-270

Shreve, G. M. and Zeng, M. L. (2003) "Integrating resource metadata and domain markup in an NSDL collection". In DC-2003: Proceedings of the International DCMI Metadata Conference and Workshop, Seattle, WA, September 28-October 2, pp. 223-229 http://www.siderean.com/dc2003/604_paper62.pdf

Smith, T. R., Zeng, M. L. and ADEPT Knowledge Team (2002a) Structured Models of Scientific Concepts as a Basis for Organizing, Accessing, and Using Learning Materials, Technical Report 2002-04, UCSB Department of CS

Smith, T. R., Zeng, M. L. and ADEPT Knowledge Team (2002b) "Structured models of scientific concepts for organizing learning materials". In Challenges in knowledge representation and Organization for the 21st century. Integration of knowledge across boundaries: proceedings of the seventh international ISKO conference, Granada, Spain, July, edited by Lopez-Huertas, et al., pp. 232-239

Smith, T. R., et al. (2003) "The ADEPT Concept-based Digital Learning Environment" In Proceedings of the 7th European Conference on Digital Libraries (ECDL 2003), Trondheim, Norway, August (Springer-Verlag)

Weisgerber, D. W. (1997) "Chemical Abstracts Service Chemical Registry System: history, scope, and impacts". Journal of the American Society for Information Science, 148(4):349-360