A Model-driven Method for the Design and Deployment of Web-based Document Management Systems

Federica Paganelli and Maria Chiara Pettenati
Departmet of Electronics and Telecommunications, University of Florence,
v. S. Marta 3, 50139, Firenze, Italy
Email: federica.paganelli@unifi.it, mariachiara.pettenati@unifi.it

Abstract

Most existing Document Management Systems (DMSs) are designed according to an approach which is technology-driven rather than based on standard methodologies. Related shortcomings are vendor dependence, expensive maintenance and poor interoperability. Information model-driven methodologies could help DMS designers to solve these issues. As a matter of fact, information models can provide a technology-independent abstract representation of information systems' functionalities. Based on standard formalisms, they are useful to designers to describe the managed domain and to developers to understand and develop the modeled entities according to a standard methodological approach. However, while information models are commonly used by software designers for the design of information systems, such as databases and digital libraries, their use in DMS design is still in its infancy. This paper provides a contribution in this research area proposing a method for Web-based DMS design based on an information model, named Document Management and Sharing information Model (DMSM). We have also developed a set of tools, the DMSM Framework, that provides designers with DMS design and deployment facilities. Based on this instrumental support, the proposed method facilitates the design and fast prototyping of DMSs, dealing with requirements of open standard compliance, cost effectiveness and uniform access to heterogeneous data sources.

1. Introduction

Document Management (DM) is a critical issue for every kind of organization, where a lot of effort is spent in properly creating, distributing and managing documents. While just some organizational information is stored in relational databases, a relevant percentage is available in unstructured digital formats (the451, 2002). Documents available in unstructured formats (also called throughout this paper unstructured documents) are commonly-used text and multimedia documents. In a typical company, reports, contracts and agreements are available as word-processor documents, marketing presentations as slideshows, technical seminars as a/v files and streaming media, and product description as images and CAD files. The characteristics of unstructured documents pose several challenges for their effective management. This situation is due to several factors (Paganelli, 2004; Fisher & Sheth, 2004):

Any system which is demanded to effectively access and manage unstructured documents should deal with these critical aspects.

1.1 High-level requirements for Document Management Systems design

A Document Management System (DMS) is "the ensemble of applications which enable the automatic execution of storage, organization, transmission, retrieval, manipulation, update and eventual disposition of documents to fulfill an organizational purpose" (Sprague, 1995, p. 32).
In order to deal with the above-mentioned issues related to unstructured document management, DMS design should conveniently match the following non-functional requirements:

1.2 Our contribution

Moving from these considerations, this paper proposes a method for the design and deployment of Document Management Systems in organizations, which has its foundation on an XML-based information model, the Document Management and Sharing Model (DMSM), fully described in some previous works (Paganelli, 2004; Paganelli et al., 2005). The DMSM aims to represent, in the form of digital metadata, a set of documents' formal document characteristics and properties which are relevant to document management and render business and organizational information explicit, in a way which promotes information reuse, user-driven extensibility and interoperability with heterogeneous systems. The serialization of the DMSM in XML language, named Document Management and Sharing Markup Language (DMSML) (Paganelli, 2004), provides a declarative language supporting the design, deployment and operation of a DMS.

The proposed method aims at defining general guidelines and a standard methodological approach for DMS design. Requirements and design specifications are organized and defined using DMSM modeling entities. DMS deployment is then based on the DMSML, an XML-based declarative language.
In order to provide instrumental support to the proposed method and to facilitate the definition of metadata-based technical specifications from socio-organizational requirements, we have developed a DMSML Framework, described in this paper in its prototypal version. The DMSML Framework is an integrated set of tools, which provide intuitive and user-friendly interfaces for the creation of DMSML specifications and the deployment of a Web-based DMS, customized according to those specifications. Thanks to DMSML Framework features, the proposed method supports the conception and deployment of a document management solution, matching with the requirements of a design method based on information models (i.e. the DMSM) and open standard compliance.

The paper is organized as follows: Section 2 discusses the main benefits of information models and evaluates current approaches for DMS design for commercial as well as for open source solutions. Section 3 describes the main characteristics of the DMSM, grounding the proposed method. Section 4 details the DMSM-driven method for DMS design. Section 5 describes the architecture of the DMSML Framework and Section 6 shows the facilities provided by the DMSML Framework for DMS design, development and deployment. Section 7 discusses the results and provides insights into future work and Section 8 concludes the paper.

2. Background

2.1 Benefits of Information models for Document Management System Design

Information models are abstract and technology-independent representations of managed objects, as defined in literature (Pras & Schoenwaelder, 2003). Information models (IMs) are used in the early stages of the software development cycle for analysis purposes and business requirement elicitation. An information model can be specified in an informal way (e.g. using natural language) or by means of standard formalisms. In the latter case the features of an information system can be represented in a way which enables both human and machine understanding. The advantages of information models based on standard formalisms in the design of complex information systems are universally recognized:

Thanks to these advantages, information models are well recognized and commonly used in the design and development of information systems in several application domains. Some of them are strictly related to Document Management, such as enterprise modeling, database, hypermedia system and digital library design, just to mention some.

Enterprise Modeling methods include: business and business process modeling methods, such as the Fundamental Business Processing Modeling Language (FBPML) (Chen-Burger et al., 2002) and the Web Information Exchange Diagram (WIED) (Tongrungrojana & Lowe, 2004), organizational modeling (van der Aalst et al. , 2003), and capability and enterprise ontologies (Ushold et al., 1998). Database design methodologies are traditionally based on information models. The relational model (Elmasri & Navathe, 2003) was the first formal database model. More recently, models were defined for object-oriented (Elmasri & Navathe, 2003) and semi-structured databases (Graves, 2001). Relevant contributions in the field of hypermedia information system design are: Dexter model (Halasz & Schwartz, 1994), WEBML (Ceri et al., 2000), and Ariadne (Montero et al., 2004). The 5S Formal Framework (Gonçalves et al., 2004) represents one the most relevant attempts to provide a comprehensive formalization for Digital Libraries design, providing the formal foundation for the definition of a Digital Library (DL) declarative language and a DL generator tool.

Information models, together with metadata and markup languages, are widely recognized as mechanisms enabling high-quality DMS design (Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Based on these seminal contributions, other works (Päivärinta, 2001; Karjalainen et al., 2000) provide high-level guidelines and principles for DMS requirement elicitation, but they also highlight the need of a methodology for translating socio-organizational requirements into metadata-based technical specifications. Despite that, the study of model-based methods for design and development of Document Management Systems is still in its infancy.

Although the above-mentioned information models can provide useful hints and guidelines, ad-hoc conceptual and methodological frameworks should be developed for the organizational document management field. For instance, digital library concepts cannot be easily adapted to the organizational context. As a matter of fact, the author-publisher-reader model, which is typical of digital library information models, cannot be conveniently used to model the information lifecycle inside an organization because it cannot properly express business process requirements and roles and responsibilities defined in an organizational environment (Murphy, 1998). Hypermedia information system design methods focuses on navigation, presentation, structure and behavior issues which differ from DMS design requirements. As a matter of fact, "in hypermedia applications, information is split into a number of self-contained and unstructured nodes that are connected to related nodes by means of links" (Montero et al., 2004). On the contrary, when dealing with unstructured documents, information is provided by a chunk of content which does not explicitely contain direct links to other information items. Enterprise modeling provides useful instruments to model the organizational context in terms of actors, organizational roles and processes, but documents are usually considered as information resources supporting specific process steps, rather than as "first-class" entities. As a consequence, enterprise models do not aim at supporting traditional DMS features (e.g. document classification, search and retrieval, etc.). Database models deal with a different kind of content (i.e. mostly structured information), but can provide useful guidelines for model-driven design. As a matter of fact, our approach is based on conceptual and logical model-driven design, which derives from widely-accepted model-driven database design methodologies.

2.2 Current approaches for document management system design

At present, several existing Document Management Systems are available in the market, both as proprietary and open source solutions. According to Moore and Markham (2002), some of the most important solutions in terms of offered features and market diffusion in the domain of Document Management are: Documentum, FileNet, IBM Lotus Notes, Interwoven, Microsoft SharePoint, and Stellent. Among the open source products, OpenCMS, Apache Lenya, MARIAN, and Xinco deserve to be mentioned (1).
These systems provide a wide range of functionalities supporting the employees in the use of organizational information. An evaluation of DMS products according to some functional and technical requirements has been provided by Hendley (2005). For the purpose of this paper, we will evaluate some of these products according to their compliance with the following requirements for DMS: open information model, standard compliance, model-driven design methodology.

The analysis synthesized in Table 1 refers to two commercial products, FatWire Content Server and Documentum, and two open source products: MARIAN and Xinco.
The analysis of these products highlights that only one product, MARIAN, is based on an information model, the 5S (Streams, Structures, Spaces, Scenarios, Societies) Formal Model (Gonçalves et al., 2004), and a design methodology is in progress, based on the 5S model. The other products do not provide neither an open and publicly available information model nor a model-driven methodological approach for DMS design and deployment (the publicly available methodology of FatWire seems not to be based on an information model).

Compliance with technical standards is a requirement commonly understood and addressed by means of wide adoption of industrial standards, such as XML and related standards (Sall, 2002), LDAP (Lightweight Directory Access Protocol) (Yeong et al., 1993), SOAP (Simple Object Access Protocol) (Mitra, 2003), Internet protocols, such as HTTP (Hypertext Transfer Protocol) and FTP (File Transfer Protocol) and Java-related specifications. On the other hand, compliance with business standards is partially accomplished. As a matter of fact, while descriptive metadata standards - e.g. Dublin Core (Dublin Core Metadata Initiative, 2003) - are often used in open source solutions, metadata standards for lifecycle and access policy descriptions are scarcely used.

Even if the analysis of commercial products is limited by the lack of documentation about some requirements (especially about the use of an open information model), the overall remark of this analysis is that these products do not completely address the above-mentioned high-level requirements for DMSs. Most commercial systems have monolithic and closed architectures, provide platform-specific solutions and adopt proprietary encoding formats and algorithms (Stickler, 2001). Moreover both commercial and open solutions rarely adopt standard modeling methodologies (Stickler, 2001; Paganelli et al., 2005). This leads to several disadvantages: poor interoperability among heterogeneous systems, limited portability across platforms, and expensive system deployment, maintenance and extension activities, which are thus often not affordable for small-medium enterprises. Generally, open source solutions better deal with requirements of open standard compliance, but do not completely fulfill the requirements of open information model and model-driven design methodology.

Based on these evaluation results, this paper aims at providing a contribution towards the definition of an information model and model-driven design methodology for DMSs, described in the following Sections.

Table 1: DMSs Evaluation results (n.a.: information not available)

Evaluation aspects
DMSs open information model
standard compliance
model-driven design methodology
technical standards business standards metadata standards
FatWire Content Server n.a. yes
LDAP, XML, SOAP and Internet protocols, Java specifications
n.a. no a methodology is available, but it is not based on an information model
Documentum n.a. yes
LDAP, XML, SOAP and Internet protocols
n.a. no n.a

MARIAN

yes
open data model
yes
Internet protocols and XML

No standards for lifecycle and access policy
yes
Dublin Core compliant

The study of a standard method is in progress, based on the 5S Formal model.
Xinco n.a. yes
Internet protocols, SOAP and XML
No standards for lifecycle and access policy no no

3. Document Management and Sharing Model

DMSM is an information model for Document Management Systems, representing digital documents' most relevant properties in the form of metadata. The aim of DMSM is to provide modeling constructs which facilitate the design of DMS, matching with the above-mentioned requirements of information model-driven design, standard compliance, uniform access to heterogeneous data sources and cost effectiveness.

Figure 1 shows the most important steps of the process leading to the DMSM specification: the definition of high-level requirements for DMS design, the analysis of relevant properties for document management and the analysis of metadata specification principles. This section describes the features of DMSM which are relevant for the description of the DMS design method. DMSM detailed description is out of the scope of this paper. Further details can be found in previous works (Paganelli, 2004; Paganelli et al., 2004).

figure 1

Figure 1. Schema of the process leading to the Document Management and Sharing information Model specification

In order to define DMSM core properties we analysed organizational digital documents as objects which:

In order to represent these aspects, DMSM consists of three sub-models: a Descriptive Information Model, a Collaboration Model and a Process Model, which respectively allow the representation of descriptive, collaboration- and process-related characteristics of unstructured documents:

The DMSM model uses some existing metadata standards, in order to promote interoperability, to create a framework of Document Management metadata, and to take advantage of existing standard contributions. DMSM uses a part of the Dublin Core metadata set (Dublin Core Metadata Initiative, 2003) in the Descriptive Information Model, the eXtensible Access Control Markup Language (XACML) (OASIS, 2003) in the Collaboration Model and the Petri Net Markup Language (PNML) (Weber & Kindler, 2002) in the Process Model.

3.1 DMSM metadata specification

The DMSML metadata specification includes two-abstraction modeling levels:

In Figure 2 we provide an extract of the DMSM, showing a part of the conceptual representation of the DMSM Information Descriptive Model (Figure 2a) and its logical representation in XML Schema Language (Figure 2b). Figure 2c shows an instance of the DMSM for a project proposal document. The DMSM instance is an XML document which contains DMSM metadata labels and values, describing a specific document, and is valid against the syntactical rules encoded in DMSML. An example of syntactical rule is that an element "document" should contain an "identifier", a "title", at least one "creator", etc..

figure 2

Figure 2. Example of the DMSM Information Descriptive Model: a. conceptual model; b. Logical model (XML Schema); c. instance document (XML)

The 2-layered modeling approach facilitates the following steps of DMS design:

Consequently, DMSML can support the design and configuration of a DMS according to the specific requirements of an organization, providing specific methods and mechanisms to exploit the business knowledge owned by end users, and leveraging on the compliance with standard formalisms and existing metadata specifications. For the sake of clarity, Figure 3 provides a graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, and Process Model. The complete specification can be found in a previous work (Paganelli, 2004).

figure 3

Figure 3. Graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, and Process Model

4. Method for DMS design and development

This section describes the method for DMS design and development based on the DMSM information model. This DMSM-driven method covers the whole cycle of activities of DMS development. The iterative process includes the following stages, as shown in Figure 4: Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design, Development and Deployment, and Testing and Evaluation. Some steps include semi-structured interviews, based on reference questionnaires. In order to propose a generally-applicable approach, in this paper we describe the main objectives of the interviews and the suggested profile of the interviewees. As a matter of fact, questions should be tailored to the specific characteristics and critical factors of the target organization and questions and their order might consequently need to be modified on the fly. An example of a reference questionnaire is shown in Table 2, other examples can be found in a previous work (Paganelli, 2004).

figure 4

Figure 4. DMSM Method

4.1 Preliminary Meeting

The first step envisages a meeting with some organization representatives. The aim is to delineate the profile of the organization and the organization's strategy for information management, in order to highlight existing inefficiencies, problems and critical factors. Two kinds of questionnaires are used for this activity.

The first questionnaire (Questionnaire A - Organization Profile) is focused on basic information about the organization's profile, such as generic information describing the organization's business goals, services and/or products offered to the market, typology of customers, partners and competitors, size (e.g. number of employees) and geographical distribution of company's sites. This questionnaire has to be submitted to at least one person which has a deep knowledge of the company (e.g. an executive or top manager).

The second questionnaire (Questionnaire B - Practices and Applications for Unstructured Document Management in the Organization) aims to delineate the organizational strategy for information management, focusing especially on unstructured documents. The aim is to collect information about information systems in use and existing policies for document management, to understand how these policies are formalized and shared in the target organization (e.g. formalized as written procedures, tacitly shared and based on practice, etc.) and to highlight the critical factors and unresolved issues (e.g. obstacles of a DMS purchase in an organization which does not have yet a DMS). In this case, the interviewees should know which information systems are in use and how end users use them to share and manage documents for organizational purposes (e.g. a representative of the IT staff, and people which supply input and/or use output of the system).

4.2 Critical Factors Analysis

The critical factors discovered during the first stage should then be analyzed in order to find the causes of possible inefficiencies in DM strategies and/or the factors that should be improved (e.g. bad practices, deficiencies of IT tools, lack of formalized procedures). Based on these considerations, the following step aims to plan a solving intervention. In the context of this work, the intervention is conceived as the definition of an effective solution for unstructured document management. The DMSML model can help in the formalization of a DM strategy which effectively supports the organization's processes.

4.3 DMSM-based Solution Specification

Based on the DMSM model, this stage aims to design a solution for unstructured document management, dealing with the requirements of the target organization. The first step consists in the classification of documents in use in the organization, in collaboration with some organization employees. According to the DMSM model, for each document class (e.g. technical report, project documentation and technical offers), the questions should collect information about descriptive information and collaboration and process- related properties, relevant for document management.

An example of a generally-applicable questionnaire form is provided in Table 2. The collected information should then be used in order to define the DMS specifications, organized in a Descriptive Information Model, Collaboration Model and Process Model and encoded in the DMSML syntax. Based on the collected information, the need to extend/modify the DMSML labels should then be evaluated. For instance, we can imagine that a technical offer or the technical specifications for a project should be labeled with the name of the project they refer to. In that case, the model should be extended by adding a "project" label, to further characterize and easily retrieve documents which are related on a project affiliation basis. The use of XML Schema as the encoding language facilitates the extension of the information model and the use of external metadata schemas, by means of standard mechanisms, such as xs:any, xs:import, and xs:include (Sall, 2002).

Table 2. Questionnaire C - Document Properties

QUESTIONNAIRE C - Document Class Properties

a. Description

a.1 Please briefly describe the document (name, purpose, related project/organizational process, etc.)

a.2 How can this document be classified (meeting minutes, mail, report, etc.?)

a.3 How is it identified (sequential number, code, date)?

b. Collaboration

b.1 What is the access policy for this document?

b.2 How is the access policy specified and interpreted by the system?

c. Process

c.1 Is there a predefined procedure for the management of this document (e.g. guidelines, protocols, etc.)?

c.2 Is a template available?

c.3 Describe the steps of its lifecycle

d. Management

d.1 How do you usually search for this document? (e.g. by Title, author, keywords, project name, etc.

d.2 Does the document refer to other document typologies?

d.3 If it does, How? (e.g. annotations, bibliographic references, URLs, etc.)

d.4 How is versioning managed?

e. IT support

e.1 Which features are provided by the DMS for the management of this document?

  • Notification
  • Access control
  • Versioning
  • Others

e.2 Which should be provided?

f. Personal Experiences

f.1 According to your experience, what are the current problems in the management of this document type?

f.2 Would you suggest a new procedure, new features or a new solution for DM?

4.4 DMS Design, Development and Deployment

This step is focused on the design, development and deployment of the DMS. The DMSML specifications provide the formal foundation for DMS design and development. Thanks to the XML syntax, the DMSML-based specifications can be interpreted by a CASE tool for the automatic generation of DMS code. These specifications (e.g. access policies) can also be automatically enforced by the DMS during its operation.

In order to facilitate the DMSML-based design and the automatization of development and deployment stages we developed a set of tools and applications, named DMSML Framework. Further information about the DMSML Framework is provided in Sections 5 and 6.

It is worth observing that this method aims to be general and technology-independent, and it could benefit from different CASE and fast prototyping tools, other than those provided by the DMSML Framework.

4.5 Testing and evaluation

A selected group of organization employees (a group of users) should then test the DMS, during their working activities. This step aims to evaluate the capability of a DMSML-based solution of Document Management to address the critical factors discovered and analysed in the first two steps of the method, as well as the level of usability of the DMSML Framework Prototype. This investigation in the organization is supported by two kinds of questionnaires:

5. DMSML Framework

The DMSML Framework is an integrated set of software tools which provide the user with automated support for DMS design, deployment and maintenance, according to the specifications encoded in the DMSML declarative language.

The DMSML Framework consists of three parts, as shown in Figure 5:

Figure 5

Figure 5. DMSML Framework Prototype: Functional Architecture

5.1 DMS Configurator

The DMS Configurator is a Java application. Its architecture consists of an Interface, which uses the JavaSwing Graphic Toolkit and other Graphic Utilities (e.g. images, etc.) and the DMS Configurator Core, built on top of the Java Virtual Machine (Figure 6.a). The DMS Configurator Core is composed of five main components:

Figure 6

Figure 6. DMSML Framework Prototype three-tiered Architecture: a. DMS Configurator, b. DMS Generator, c. DMS Web Application Architecture

5.2 DMS Generator

The DMS Generator, as well as the DMS Web Application, are web applications designed according to J2EE (Java 2 Enterprise Edition) specifications. Both the DMS Generator and the DMS Web Application are characterized by a multi-tier architecture, consisting of a Client, an Application Logic (composed of an Interaction and a Business Logic side), and a Data tier (Figure 6b).

The Client is a standard web browser. The Interaction side is realized by means of JSPs. The Business Logic contains a template of a DMS Web Application (i.e. a set of DM libraries) and a set of APIs, called DMSG (DMS Generator) APIs. The DMSG APIs are a set of Java classes which customize the template according to specific configuration parameters, encoded in the DMSML language. Based on the features of the DMSML model, the DMS Generator allows a completely declarative approach for the design and deployment of a Document Management System for a target organization.

5.3 DMS Web Application

Analogously to the DMS Generator, the DMS Web Application has a multi-tier architecture, based on J2EE specifications, as shown in Figure 6c.

The client side is a standard Web browser. The Interaction part is realized by means of JSPs and it provides the user with core Document Management features. The Business Logic is composed of a set of DMS APIs, implemented by Java classes, which provide basic functions for the management of workspaces, folders and documents. The DMS APIs consists of several components:

6. Designing and deploying a DMS using the DMSML Framework prototype

The DMSML Framework Prototype offers support to the DMS designer during the steps of DMSML-based Solution Specification and DMS Design, Development and Deployment.

6.1 DMSML-based Specification

The DMS Configurator provides the DMS designer with a sequential set of graphical windows, which progressively guide the user in the DMS configuration, throughout the definition of the workspace, the organizational schema and the folder structure. The DMS Configurator permits to specify the workspace entity, characterizing the information items in terms of Descriptive Information Model, Collaboration Model and Process Model.

First, the interface enables the user to specify the workspace organization in folders and sub-folders. For instance, in case of project documentation management, the designer can distinguish the following folders, each related to a project execution phase: Analysis, Specification, Development, Accounting. The graphical window, depicted in (Figure 7.a), helps the user in specifying the organization folder, according to the DMSML Information Descriptive Model. Figure 7.b is an excerpt of a DMSML instance document representing the folders' organization (e.g. folder "ProjectA" and subfolders "Analysis", "Specification", "Development", "Accounting"), automatically encoded by the DMS Configurator in the DMSML syntax. The user can specify some properties for each folder: for instance "title", "creator", "affiliation", and "document types" that can be assigned to that folder. The system provides some default document types (e.g. technical report, brochure, etc.), but it also enables the user to insert ad-hoc labels. Analogously to the previous example, Figure 8 shows the graphical window for folder properties' specification (Figure 8.a) and the resulting DMSML document instance (Figure 8.b)

The system provides graphical support for the definition of lifecycle models. Figure 9.a shows the graphical representation of the lifecycle template for documents which should be evaluated by a group of reviewers and consequently accepted or rejected. The document lifecycle is a process specified in terms of a sequence of tasks. The execution of a task is usually triggered by a transition condition, which can be automatic, time-dependent (e.g. a deadline) or caused by a user action or by an external event, and it is associated to an evolution of the document state (e.g. from "draft" to "in_review", to "accepted", or "refused"). In Figure 9.a circles represent the states of documents (or "places" in the Petri Net language) and rectangles represent the transitions from one state to another. The lifecycle of the document is build upon the concatenation of these states and transitions. Figure 9.b shows an excerpt of the DMSML representation of this lifecycle template.

These lifecycle models serve as a collection of templates which can then be assigned to documents in order to accordingly enforce their evolution during their "life". At design time, the user can assign a lifecycle template to the document types previously defined. In order to accommodate a certain level of flexibility, this pre-assignment can be modified by document creators by means of a proper interface offered by the DMS.

Finally, the designer can specify the access control policies which regulate the access to the information items on the basis of roles and responsibilities defined in the organization, as illustrated in Figure 10.a. The DMS Configurator automatically generate the DMSML instance document (Figure 10.b) and check the validity of the specification according to the DMSML rules.


<workspace xmlns="http://det.unifi.it/dmsml">
<folder> <itemDescription> <dc:title>ProjectA</dc:title>
... </itemDescription>
<folder> <itemDescription> <dc:title>Analysis</dc:title>
</itemDescription> </folder> <folder> <itemDescription> <dc:title>Specification</dc:title>
</itemDescription> </folder> .....(other folders) </folder> </workspace>
7.a DMS Configurator interface for folders' organization specification
7.b DMSML instance document excerpt for folders' organization specification (DMSML Descriptive Information Model)

Figure 7. DMS specification: organization in folders and subfolders


<folder>
<itemDescription>
<dc:title>Specification</dc:title>
<dc:creator>F. Paganelli</dc:creator>*
<dc:description>documentation about ProjectA specification step</dc:description>
<dc:date>2005-09-24</dc:date>*
<affiliation>http://det.unifi.it</affiliation>*
<contactInfo>
<contactName>F. Paganelli</contactName>*
<address>via S. Marta 3, Firenze</address>*
<telephoneNumber>+39 055 4796382 </telephoneNumber>*
<faxNumber>+39 055 488883</faxNumber>*
<e-mail>federica.paganelli@unifi.it</e-mail>*
<url>http://radar.det.unifi.it/people /Paganelli/index.html </url>*
</contactInfo> <documentTypes> <documentType>technical reports</documentType> <documentType>specifications</documentType> <documentType>brochures</documentType> <documentTypes>
</itemDescription> </folder> * these metadata are automatically filled by the system
8.a DMS Configurator interface for folders' properties specification (full-size version) 8.b DMSML instance document excerpt for folders' properties specification (DMSML Descriptive Information Model)

Figure 8. DMS specification: folders' characteristics definition


<lifecycle><name>lifecycleTemplate</name>
<description>lifecycle of document subjected to review</description>
<tasks>
<task><name>submission</name>
<description>document submission</description>
<transition type = "userAction">
<name>submit</name>
</transition>
<inputState>
<inputStateName>draft</inputStateName>
</inputState>
<outputState>
<outputStateName>in_review</outputStateName>
</outputState>
<automaticActions>
<notification>
<description>reviewers are notified that a new document has been submitted</description>
<receivers>
<receiverID>reviewer1</receiver>
<receiverID>reviewer2</receiver>
</receivers>
<message>"A new documen has been submitted for review"</message>
</notification>
</automaticActions>
</task> (other tasks)
</tasks>
</lifecycle>
9.a DMS Configurator interface for lifecycle templates specification (full-size version) 9.b DMSML instance document excerpt for lifecycle templates specification (DMSML Process Model)

Figure 9. DMS specification: lifecycle templates

<xacml:Policy PolicyId="document_revisionPolicy">
<xacml:Description>Access Policy for the action: "accept document" </Description>
<xacml:Target>
<xacml:Subjects> <xacml:AnySubject/>
</xacml:Subjects>
<xacml:Resources> <xacml:AnyResource/>
</xacml:Resources>
<xacml:Actions>
<xacml:Action>
<xacml:ActionMatch MatchId="urn:oasis:names:tc: xacml:1.0:function:string-equal">
<xacml:AttributeValue DataType= "http://www.w3.org/2001/XMLSchema#string"> accept</xacml:AttributeValue>
<xacml:ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:1.0: action:action-id" DataType= "http://www.w3.org/2001/XMLSchema#string"/>
</xacml:ActionMatch>
</xacml:Action>
</xacml:Actions>
</xacml:Target>
<xacml:Rule RuleId="document_revision_Rule" Effect="Permit"> (other rules) </xacml:Policy>

10.a DMS Configurator interface for access policies' specification(full-size version)

10.b DMSML instance document excerpt for access policies specification(DMSML Collaboration Model)

Figure 10. DMS specification: access policies


6.2 Design, Development and Deployment of the Document Management System

The DMSML specification is processed by the DMS Generator in order to properly customize the DMS template according to the organization's specific requirements (Figure 11). The DMS Generator web interface enables the user to upload the DMSML specification, called Business Configuration Document, together with the technical parameters (e.g. connection to databases, ip addresses, etc.) encoded in a XML document, named Technical Configuration Document. Figure 12 shows an excerpt of a Technical Configuration Document specifiying the parameters for a connection to a SQL database.

The DMS Web Application offers an intuitive interface with basic Document Management functionalities. The browsing and metadata-based search interfaces are shown in Figure 13 and Figure 14, respectively.

Figure 11. DMS Generator graphical interface
(full-size version)

<system xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:noNamespaceSchemaLocation="technology.xsd">
<sql> <sqldriver>com.microsoft.jdbc.sqlserver.SQLServerDriver</sqldriver>
<connection>jdbc:microsoft:sqlserver://localhost:3106</connection>
<user>admin</user>
<password>dmsml</password>
</sql>
<display-name>DMSWebApp</display-name>
<context-root>dmswebapp</context-root> (other parameters)
</system>

Figure 12. Technical Configuration document excerpt

Figure 13. DMS Web Application: browsing interface
(full-size version)

Figure 14. DMS Web Application: search interface
(full-size version)

7. Discussion

The DMS Web Application aims to cover the above-mentioned requirements for DMSs: Design methodology based on information models, Standard compliance, Uniform access to heterogeneous formats, and Cost effectiveness.

To this extent, we have proposed a DMS design method, which makes extensive use of the Document Management and Sharing information Model, throughout the steps of preliminary analysis, critical factor analysis, design, development and deployment, and testing and evaluation of a DMS in an organization.

The DMSM is a metadata specification which encompasses descriptive, as well as collaborative and process-dependent properties of organizational documents. The DMSM provides a formal, lower-level (structural) description of an information model for DMSs and supports the conception of a completely declarative approach for DMS design and automatic deployment.

The XML serialization of the model (DMSML) is a declarative language which allows the mapping of organizational requirements into machine-understandable technical DMS specifications. As a matter of fact, a DMSML instance contains XML tags enabling the description of the workspace configuration and folder organization, the creation or reuse of a document resource classification schema, the specification of the lifecycle and the access policies assigned to documents either separately or on a document type basis.

This work has helped to resolve the need of standard methodological approaches for DMS design by proposing a generally-applicable and technologically-independent method based on the DMSM information model. While generally the specifications in most available products are embedded in proprietary workflow engines or collaborative applications, DMSML is a declarative language, based on an open and standard-compliant data model.

Moreover, the DMSML Framework Prototype provides automatization support to the design method, reducing the need of technical expertise for DMS configuration (the DMS designer is not concerned with the DMSML syntax) and deployment (he/she should upload two XML documents and the system automatically deploys a customized DMS).

Secondly, standard compliance has been achieved in two ways: the DMSML language integrates three existing metadata standards (Dublin Core, XACML and PNML), and the DMSML Framework is based on standard Web development specifications (i.e. J2EE), and standard languages and technologies, such as XML and XSLT (Sall, 2002).

The other requirements (e.g. Uniform access to heterogeneous data sources and Cost effectiveness) have been partially addressed.
As a matter of fact, the use of web standards and protocols allows access to information stored in heterogeneous locations, but does not effectively support information retrieval, indexing and processing across heterogeneous repositories. The client side is implemented by standard Web browsers, thus providing users with a well-known and uniform paradigm of access, search and retrieval to documents available in heterogeneous formats and stored in heterogeneous locations.
Cost effectiveness is promoted by several factors: the DMS Web Application, as well as the whole DMSML Framework, are based on open source technologies. Furthermore, the instrumental support provided by the DMS Configurator and the DMS Generator enables to speed up the process of design, development and deployment of the DMS solution and hide some technical complexities (such as XML syntax). Because of these cost savings, the DMS Web Application is a candidate for a Document Management solution which is also suitable for addressing SMEs requirements, but this hypothesis needs to be carefully validated in target organizations.

These issues are going to be addressed in on-going and future activities. Firstly, we are experimenting the proposed methodology and the use of the DMSML framework for the management of scientific documentation (papers, theses, project documentation, etc.) in our Department. We have also planned an evaluation activity in a small enterprise. The selected SME is an Italian consulting firm which provides IT services and products to a wide range of customer enterprises. Consulting firms are highly data-intensive companies, since they depend heavily on the expertise of their people and the documented information produced during their business activities.

Better management of heterogeneous and distributed content repositories could be achieved by adopting metadata harvesting protocols, which gather metadata about content for resource discovery across heterogeneous repositories. One of the most important harvesting protocols is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) (Lagoze & Van de Sompel, 2002), which is widely adopted for digital libraries and cultural heritage information systems.

A higher degree of effective and uniform access to heterogeneous and distributed data sources would require a system to better deal with interoperability requirements. The long-term objective would be that of "enabling a machine to read documents of varying degrees of structures from heterogeneous data sources and understand the meaning of each document in order to find associations among those documents" (Fisher & Sheth, 2004). Achieving that kind of interoperability is obviously not a trivial task. Ontology-driven metadata extraction and annotation mechanisms can be used to support advanced classification techniques and to provide metadata with contextual relevance within a given domain. These techniques could provide a normalized "semantic" view of heterogeneous data, providing a certain degree of machine understanding and processing, across syntactical and structural differences of information sources.

For what concerns cost and feasibility evaluation, the DMSML Framework has been developed in an academic framework as a prototypal version. Consequently, usability tests and user-centered re-design of the existing prototype interface should be performed, together with a market analysis and a business plan, in order to promote the research transfer into industrial application. At present, we are evaluating the possibility of creating an open source project, based on the DMSML Framework, in order to benefit of cooperative software development advantages.

8. Conclusions

This paper described a DMS design method based on the DMSM information model. DMSM is a metadata specification which encompasses descriptive, collaborative and process-related properties of organizational documents. The method encompasses the stages of Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design, Development and Deployment, and Testing and Evaluation. The DMSML (i.e. the XML serialization of DMSM) enables a declarative design approach and the DMSM Framework Prototype (i.e. a set of tools for DMSML editing and DMS generation) facilitates automatic development and deployment of a DMS for a target organization.

This model-driven method satisfies two basic requirements for DMS design: design methodology based on information models and standard compliance. We described also future research activities aimed at evaluating the method in a SME and addressing requirements of uniform access to heterogeneous formats and cost effectiveness.

References

Booch, G., Jacobson, I., & Rumbaugh, J. (1998). Unified Modeling Language User's Guide (Boston: Addison-Wesley)

Ceri, C., Fraternali, P., & Bangio, A. (2000) "Web modeling language (WebML): a modeling language for designing web sites". Computer Networks, Vol. 33 (1-6), 137-157

Chen-Burger, Y. H., Tate, A., & Robertson, D. (2002) "Enterprise Modelling: A Declarative Approach for FBPML". In Proceedings European Conference of Artificial Intelligence, Knowledge Management and Organisational Memories Workshop

Dublin Core Metadata Initiative (2003) Dublin Core Metadata Element Set, version 1.1: Reference description http://www.dublincore.org

Elmasri, R., & Navathe, S.B. (2003) Fundamentals of Database Systems (Addison Wesley)

Fisher, M., & Sheth, A. (2004) "Semantic Enterprise Content Management". Practical Handbook of Internet Computing, edited by Munindar P. Singh (Baton Rouge: Chapman Hall & CRC Press)

Ginsburg, M. (2001) "Openness: The Key To Effective Intranet Document Management". In Proceedings of International Symposium on Information Systems and Engineering ISE'2001, Las Vegas, USA

Gonçalves, M. A., Fox, E. A., Watson, L. T., & Kipp, N. A. (2004) "Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries". ACM Transactions on Information Systems, Vol. 22 No. 2, 270-312

Graves, M. (2001) Designing XML Databases (Prentice Hall)

Halasz, F., & Schwartz, M. (1994) "The Dexter Hypertext Reference Model". Communications of the ACM, Vol. 37(2)

Hendley, T. (2005) Managing Information and Documents. The definitive guide Cimtech Ltd http://www.doconsite.co.uk/

Karjalainen, A., Päivärinta, T., Tyrväiinen, P., & Rajala, J. (2000) "Genre-based metadata for enterprise document management". In Proceedings of the 33 the Hawai's Conference on System Sciences HICSS (Los Alamitos CA: IEEE Computer Society), pp. 3013-3023

Lagoze, C., & Van de Sompel, H. (2002) The Open Archives Initiative Protocol for Metadata Harvesting. Open Archives Initiative http://www.openarchives.org/OAI/openarchivesprotocol.html

Mitra, N. (2003). SOAP Version 1.2 Part 0: Primer. W3C Recommendation. Retrieved October 2003 from http://www.w3.org/TR/soap12-part0/

Montero, S., Díaz, P., Dodero, J. M., & Ignacio Aedo, I. (2004) "AriadneTool: A Design Toolkit for Hypermedia Applications". Journal of Digital Information, Vol. 5 No. 2 Article No. 280 http://jodi.ecs.soton.ac.uk/Articles/v05/i02/Montero/

Moore, C., & Markham, R. (2002) "Enterprise Content Management: A Comprehensive Approach for Managing Unstructured Content". Giga Information Group, Inc. http://www.msiinet.com/html/pdfs/essecm3.pdf

Murphy, L.D. (1998) "Digital document metadata in organizations: Roles, analytical approaches, and future research directions". In Proceedings of the 31st Hawaii International Conference on System Sciences: Digital Documents (Los Alamitos CA: IEEE Computer Society), pp. 267-276

OASIS (2003) Extensible Access Control Markup Language (XACML), V. 1.0 http://www.oasis-open.org

Paganelli, F. (2004) A Metadata Model for Unstructured Document Management in Organizations. Phd dissertation, Department of Electronics and Telecommunications, University of Florence, Italy

Paganelli, F., Abou Khaled, O, Pettenati, M.C.P., & Giuli, D. (2004) "A Metadata Model for the Design and Deployment of Document Management Systems. In Proceedings of ICWE 2004 (Springer Verlag), pp. 589-590

Paganelli, F., Pettenati, M.C.P., & Giuli, D. (2005) "A Metadata-based Approach for Unstructured Document Management in Organizations. To be published in Information Resource Management Journal (IDEAGroup)

Pras, A., & Schoenwaelder, J. (2003) RFC 3444 - On the Difference between Information Models and Data Models. The Internet Engineering Task Force http://www.faqs.org/rfcs/rfc3444.html

Päivärinta, T. (2001) A Genre-Based Approach to Developing Electronic Document Management in the Organization. PhD thesis dissertation, University of Jyvaskyla, Finland

Sall, K. B. (2002) XML Family of Specifications: A Practical Guide (Boston: Addison-Wesley Professional)

Salminen, A., Lyytikäinen, V., Tiitinen, P. (2000)"Putting documents into their work context in document analysis. Information Processing and Management, 36(4), 623-641

Sprague, R. H. Jr. (1995) "Electronic document management: Challenges and opportunities for information systems managers". MIS Quarterly, 19(1), 29-50 http://www.cba.hawaii.edu/sprague/MISQ/MISQfina.htm.

Stickler, P. (2001) "Metia-a generalized metadata driven framework for the management and distribution of electronic media". In Proceedings of Dublin Core Conference 2001, pp. 235-241

Tongrungrojana, R.,& Lowe, D.(2004) "WIED: A Web Modelling Language for Modelling Architectural-Level Information Flows". Journal of Digital Information, Vol. 5 No. 2 Article No. 283 http://jodi.tamu.edu/Articles/v05/i02/Tongrungrojana/

the451(2002). "Unstructured Data Management: the elephant in the corner" last updated 2003 http://www.the451.com

Ushold,M., King, M., Moralee, S., & Zorgios, Y. (1998) "Enterprise ontology". The Knowledge Engineering Review: Special Issue on Putting Ontologies to Use, Vol. 13(1), 31-89

van der Aalst, W.M.P. (1998) "The Application of Petri Nets to Workflow Management". J. of Circuits, Systems, and Computers. Vol. 8(1), 21-66

van der Aalst, W.M.P., Kumar, A., & Verbeek, H.M.W. (2003) "Organizational Modeling in UML and XML in the context of Workflow Systems". In Proceedings of the 18th Annual ACM Symposium on Applied Computing (SAC 2003) edited by H. Haddad and G. Papadopoulos

Weber, M., & Kindler, E. (2002) "The petri net markup language". Advances in Petri Nets, LNCS series (Springer Verlag)

Yeong, W., Howes, T., & Kille, S. (1993). X.500 lightweight directory access protocol. IETF RFC 1487. Retrieved October 15, 2002, from http://www.ietf.org/rfc/rfc1487.txt

Links

Documentum http://www.documentum.com

FileNet http://www.filenet.com

IBM Lotus Notes http://www.ibm.com

Interwoven http://www.interwoven.com

Microsoft SharePoint http://www.microsoft.com/sharepoint/default.mspx

Stellent http:///www.stellent.com

OpenCMS http://www.opencms.org

Apache Lenya http://lenya.apache.org

DSpace http://www.dspace.org

MARIAN http://www.dlib.vt.edu/products/marian.html

DSpace http://www.xinco.org/

Sun's XACML Implementation http://sourceforge.net/projects/sunxacml

Notes

1. More precisely, MARIAN is a digital library system (DLS). It is taken into accout in the context of this work because it is a good example of a system based on an information model (the 5S formal model), and because there are features in common among DLSs and DMSs.