1. Introduction
Document Management (DM) is a critical issue for every kind of organization, where a lot of effort is spent in properly creating, distributing and managing documents. While just some organizational information is stored in relational databases, a relevant percentage is available in unstructured digital formats (the451, 2002). Documents available in unstructured formats (also called throughout this paper unstructured documents) are commonly-used text and multimedia documents. In a typical company, reports, contracts and agreements are available as word-processor documents, marketing presentations as slideshows, technical seminars as a/v files and streaming media, and product description as images and CAD files. The characteristics of unstructured documents pose several challenges for their effective management. This situation is due to several factors (Paganelli, 2004; Fisher & Sheth, 2004):
- Characteristics of unstructured formats. Unstructured formats do not provide an explicit, formal and separate representation of content structure (i.e. "logical structure") and presentation instructions (i.e. "physical structure"). In unstructured formats, logical and physical structures are generally blended and cannot be processed separately, and applications do not have explicit references to specific content elements. Consequently, software applications have limited capabilities regarding content processing and rendering functionalities (e.g. searching and indexing). Currently effective indexing, retrieval and processing would require the system to be able to access document content with a degree of granularity that cannot be provided by unstructured document formats.
- Wide adoption of proprietary formats. Most commonly-used formats are proprietary and binary. Information about logical and physical structure, when available, cannot be easily interpreted and processed by heterogeneous applications. As a consequence, the reuse of information across heterogeneous communities, using heterogeneous applications for document creation and sharing, can be extremely cumbersome.
- Heterogeneity of formats. Data are stored in heterogeneous formats which differ structurally and syntactically. This situation leads to an inherent inefficiency in digital data management. For instance, while a human perceives reading an e-mail, a word-processor document or a web page in the same manner, a machine processes different material in regard to structure, syntax and internal representation (Fisher & Sheth, 2004).
- Distribution of data sources. Organizational information is distributed among various physical locations on a network (e.g. mail servers, http servers, PCs, etc.). The access to distributed data sources requires different protocols. Moreover, users may need specific interfaces to search for and access heterogeneous and distributed data sources (e.g. mail client, web browser, etc.). The non-uniformity of the access can disorient the user and compromise its quality of work.
Any system which is demanded to effectively access and manage unstructured documents should deal with these critical aspects.
1.1 High-level requirements for Document Management Systems design
A Document Management System (DMS) is "the ensemble of applications which enable the automatic execution of storage, organization, transmission, retrieval, manipulation, update and eventual disposition of documents to fulfill an organizational purpose" (Sprague, 1995, p. 32).
In order to deal with the above-mentioned issues related to unstructured document management, DMS design should conveniently match the following non-functional requirements:
- Design method based on information models. The adoption of a method based on information models and standard formalisms should be considered as a basic requirement for the design and development of a high-quality document management solution (Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Benefits of information models for information system design are discussed in Section 2.1.
- Standard compliance. Compliance with international and widely adopted standards promotes interoperability among heterogeneous information systems, and facilitates heterogeneous data sources management.
- Uniform access to heterogeneous data sources. The interface of a DMS should provide users with a uniform access paradigm for search and browsing distributed intelligence available as documents in heterogeneous formats and stored in heterogeneous locations.
- Cost effectiveness. Implementation of a DMS in an organization has a relatively high cost, in terms of user license, required storage space and maintenance costs. On the other hand, efficient solutions for document management can introduce cost savings (e.g. reducing storage costs and time spent retrieving critical documents).
1.2 Our contribution
Moving from these considerations, this paper proposes a method for the design and deployment of Document Management Systems in organizations, which has its foundation on an XML-based information model, the Document Management and Sharing Model (DMSM), fully described in some previous works (Paganelli, 2004; Paganelli et al., 2005). The DMSM aims to represent, in the form of digital metadata, a set of documents' formal document characteristics and properties which are relevant to document management and render business and organizational information explicit, in a way which promotes information reuse, user-driven extensibility and interoperability with heterogeneous systems. The serialization of the DMSM in XML language, named Document Management and Sharing Markup Language (DMSML) (Paganelli, 2004), provides a declarative language supporting the design, deployment and operation of a DMS.
The proposed method aims at defining general guidelines and a standard methodological approach for DMS design. Requirements and design specifications are organized and defined using DMSM modeling entities. DMS deployment is then based on the DMSML, an XML-based declarative language.
In order to provide instrumental support to the proposed method and to facilitate the definition of metadata-based technical specifications from socio-organizational requirements, we have developed a DMSML Framework, described in this paper in its prototypal version. The DMSML Framework is an integrated set of tools, which provide intuitive and user-friendly interfaces for the creation of DMSML specifications and the deployment of a Web-based DMS, customized according to those specifications. Thanks to DMSML Framework features, the proposed method supports the conception and deployment of a document management solution, matching with the requirements of a design method based on information models (i.e. the DMSM) and open standard compliance.
The paper is organized as follows: Section 2 discusses the main benefits of information models and evaluates current approaches for DMS design for commercial as well as for open source solutions. Section 3 describes the main characteristics of the DMSM, grounding the proposed method. Section 4 details the DMSM-driven method for DMS design. Section 5 describes the architecture of the DMSML Framework and Section 6 shows the facilities provided by the DMSML Framework for DMS design, development and deployment. Section 7 discusses the results and provides insights into future work and Section 8 concludes the paper.
2. Background
2.1 Benefits of Information models for Document Management System Design
Information models are abstract and technology-independent representations of managed objects, as defined in literature (Pras & Schoenwaelder, 2003). Information models (IMs) are used in the early stages of the software development cycle for analysis purposes and business requirement elicitation. An information model can be specified in an informal way (e.g. using natural language) or by means of standard formalisms. In the latter case the features of an information system can be represented in a way which enables both human and machine understanding. The advantages of information models based on standard formalisms in the design of complex information systems are universally recognized:
- IMs make requirements explicit and formalize them in a precise way. More specifically, they are useful to designers to describe the managed domain and its entities, to operators to understand the modeled entities, and to implementers as a guide to the functionality that must be implemented by means of specific technologies (Pras & Schoenwaelder, 2003). Moreover, information models can be used as a formal basis for the development of access and query paradigms of application interfaces made available to end users for information browsing and retrieval.
- They provide an abstract representation of features of an information system, by representing them in a technology-independent way. Moreover this abstraction from implementation details can promote interoperability and reuse of system design.
- Based on modeling formalisms, proper tools can be developed enabling the automatic or semi-automatic code generation.
Thanks to these advantages, information models are well recognized and commonly used in the design and development of information systems in several application domains. Some of them are strictly related to Document Management, such as enterprise modeling, database, hypermedia system and digital library design, just to mention some.
Enterprise Modeling methods include: business and business process modeling methods, such as the Fundamental Business Processing Modeling Language (FBPML) (Chen-Burger et al., 2002) and the Web Information Exchange Diagram (WIED) (Tongrungrojana & Lowe, 2004), organizational modeling (van der Aalst et al. , 2003), and capability and enterprise ontologies (Ushold et al., 1998). Database design methodologies are traditionally based on information models. The relational model (Elmasri & Navathe, 2003) was the first formal database model. More recently, models were defined for object-oriented (Elmasri & Navathe, 2003) and semi-structured databases (Graves, 2001). Relevant contributions in the field of hypermedia information system design are: Dexter model (Halasz & Schwartz, 1994), WEBML (Ceri et al., 2000), and Ariadne (Montero et al., 2004). The 5S Formal Framework (Gonçalves et al., 2004) represents one the most relevant attempts to provide a comprehensive formalization for Digital Libraries design, providing the formal foundation for the definition of a Digital Library (DL) declarative language and a DL generator tool.
Information models, together with metadata and markup languages, are widely recognized as mechanisms enabling high-quality DMS design (Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Based on these seminal contributions, other works (Päivärinta, 2001; Karjalainen et al., 2000) provide high-level guidelines and principles for DMS requirement elicitation, but they also highlight the need of a methodology for translating socio-organizational requirements into metadata-based technical specifications. Despite that, the study of model-based methods for design and development of Document Management Systems is still in its infancy.
Although the above-mentioned information models can provide useful hints and guidelines, ad-hoc conceptual and methodological frameworks should be developed for the organizational document management field. For instance, digital library concepts cannot be easily adapted to the organizational context. As a matter of fact, the author-publisher-reader model, which is typical of digital library information models, cannot be conveniently used to model the information lifecycle inside an organization because it cannot properly express business process requirements and roles and responsibilities defined in an organizational environment (Murphy, 1998). Hypermedia information system design methods focuses on navigation, presentation, structure and behavior issues which differ from DMS design requirements. As a matter of fact, "in hypermedia applications, information is split into a number of self-contained and unstructured nodes that are connected to related nodes by means of links" (Montero et al., 2004). On the contrary, when dealing with unstructured documents, information is provided by a chunk of content which does not explicitely contain direct links to other information items. Enterprise modeling provides useful instruments to model the organizational context in terms of actors, organizational roles and processes, but documents are usually considered as information resources supporting specific process steps, rather than as "first-class" entities. As a consequence, enterprise models do not aim at supporting traditional DMS features (e.g. document classification, search and retrieval, etc.). Database models deal with a different kind of content (i.e. mostly structured information), but can provide useful guidelines for model-driven design. As a matter of fact, our approach is based on conceptual and logical model-driven design, which derives from widely-accepted model-driven database design methodologies.
2.2 Current approaches for document management system design
At present, several existing Document Management Systems are available in the market, both as proprietary and open source solutions. According to Moore and Markham (2002), some of the most important solutions in terms of offered features and market diffusion in the domain of Document Management are: Documentum, FileNet, IBM Lotus Notes, Interwoven, Microsoft SharePoint, and Stellent. Among the open source products, OpenCMS, Apache Lenya, MARIAN, and Xinco deserve to be mentioned (1).
These systems provide a wide range of functionalities supporting the employees in the use of organizational information. An evaluation of DMS products according to some functional and technical requirements has been provided by Hendley (2005). For the purpose of this paper, we will evaluate some of these products according to their compliance with the following requirements for DMS: open information model, standard compliance, model-driven design methodology.
The analysis synthesized in Table 1 refers to two commercial products, FatWire Content Server and Documentum, and two open source products: MARIAN and Xinco.
The analysis of these products highlights that only one product, MARIAN, is based on an information model, the 5S (Streams, Structures, Spaces, Scenarios, Societies) Formal Model (Gonçalves et al., 2004), and a design methodology is in progress, based on the 5S model. The other products do not provide neither an open and publicly available information model nor a model-driven methodological approach for DMS design and deployment (the publicly available methodology of FatWire seems not to be based on an information model).
Compliance with technical standards is a requirement commonly understood and addressed by means of wide adoption of industrial standards, such as XML and related standards (Sall, 2002), LDAP (Lightweight Directory Access Protocol) (Yeong et al., 1993), SOAP (Simple Object Access Protocol) (Mitra, 2003), Internet protocols, such as HTTP (Hypertext Transfer Protocol) and FTP (File Transfer Protocol) and Java-related specifications. On the other hand, compliance with business standards is partially accomplished. As a matter of fact, while descriptive metadata standards - e.g. Dublin Core (Dublin Core Metadata Initiative, 2003) - are often used in open source solutions, metadata standards for lifecycle and access policy descriptions are scarcely used.
Even if the analysis of commercial products is limited by the lack of documentation about some requirements (especially about the use of an open information model), the overall remark of this analysis is that these products do not completely address the above-mentioned high-level requirements for DMSs. Most commercial systems have monolithic and closed architectures, provide platform-specific solutions and adopt proprietary encoding formats and algorithms (Stickler, 2001). Moreover both commercial and open solutions rarely adopt standard modeling methodologies (Stickler, 2001; Paganelli et al., 2005). This leads to several disadvantages: poor interoperability among heterogeneous systems, limited portability across platforms, and expensive system deployment, maintenance and extension activities, which are thus often not affordable for small-medium enterprises. Generally, open source solutions better deal with requirements of open standard compliance, but do not completely fulfill the requirements of open information model and model-driven design methodology.
Based on these evaluation results, this paper aims at providing a contribution towards the definition of an information model and model-driven design methodology for DMSs, described in the following Sections.
Table 1: DMSs Evaluation results (n.a.: information not available)
Evaluation aspects
|
||||||
DMSs | open information model |
standard compliance
|
model-driven design methodology | |||
technical standards | business standards | metadata standards | ||||
FatWire Content Server | n.a. | yes LDAP, XML, SOAP and Internet protocols, Java specifications |
n.a. | no | a methodology is available, but it is not based on an information model | |
Documentum | n.a. | yes LDAP, XML, SOAP and Internet protocols |
n.a. | no | n.a | |
MARIAN |
yes open data model |
yes Internet protocols and XML |
No standards for lifecycle and access policy |
yes Dublin Core compliant |
The study of a standard method is in progress, based on the 5S Formal model. |
|
Xinco | n.a. | yes Internet protocols, SOAP and XML |
No standards for lifecycle and access policy | no | no |
3. Document Management and Sharing Model
DMSM is an information model for Document Management Systems, representing digital documents' most relevant properties in the form of metadata. The aim of DMSM is to provide modeling constructs which facilitate the design of DMS, matching with the above-mentioned requirements of information model-driven design, standard compliance, uniform access to heterogeneous data sources and cost effectiveness.
Figure 1 shows the most important steps of the process leading to the DMSM specification: the definition of high-level requirements for DMS design, the analysis of relevant properties for document management and the analysis of metadata specification principles. This section describes the features of DMSM which are relevant for the description of the DMS design method. DMSM detailed description is out of the scope of this paper. Further details can be found in previous works (Paganelli, 2004; Paganelli et al., 2004).
Figure 1. Schema of the process leading to the Document Management and Sharing information Model specification
In order to define DMSM core properties we analysed organizational digital documents as objects which:
- need to be identified and searched for;
- are shared among colleagues for the same or related purposes;
- are characterized by different states (e.g. draft, submitted for review, final, etc.) during their lifecycle.
In order to represent these aspects, DMSM consists of three sub-models: a Descriptive Information Model, a Collaboration Model and a Process Model, which respectively allow the representation of descriptive, collaboration- and process-related characteristics of unstructured documents:
- The Descriptive Information Model represents the set of properties which describe and identify the document (e.g. Title, Creator, Date, Description, Document Type, Subject, Contact, Affiliation). These properties are generally used for search and indexing purposes.
- The Collaboration Model formalizes how the human resources are structured (organizational schema) and how access to information resources is regulated on the basis of organizational roles or responsibilities of individuals (access policy). This model allows the description of access policies to information resources in a customizable and standard way, both on a role- or individual basis. Organizational models then map roles and organizational functions and units to individuals or groups. The DMSML organization model specifies the organizational units, individuals and related organizational roles. In order to satisfy changing requirements (e.g. the setup of a short-term project) it may also be extended with groups or external entities which are not institutional members of the organizational model, but may be defined ad-hoc for specific purposes and have a short life.
- The Process Model includes the modeling primitives describing document lifecycles and has its theoretical foundation on the Petri Net process model (van der Aalst, 1998). A document lifecycle usually consists of the following stages: creation, review, publication, access, archive and deletion. A specific lifecycle may not implement all these stages, or may implement others, depending on document types. The document lifecycle is a process specified in terms of a sequence of tasks, performed by some actors.
The DMSM model uses some existing metadata standards, in order to promote interoperability, to create a framework of Document Management metadata, and to take advantage of existing standard contributions. DMSM uses a part of the Dublin Core metadata set (Dublin Core Metadata Initiative, 2003) in the Descriptive Information Model, the eXtensible Access Control Markup Language (XACML) (OASIS, 2003) in the Collaboration Model and the Petri Net Markup Language (PNML) (Weber & Kindler, 2002) in the Process Model.
3.1 DMSM metadata specification
The DMSML metadata specification includes two-abstraction modeling levels:
- conceptual modeling, based on the UML graphical notation (Booch et al., 1998). It provides an abstract and technology-independent representation of concepts and relations among concepts. Conceptual models enable people with low technical expertise to understand meaning of data and promote common understanding among technical and non-technical staff (i.e. end users).
- logical modeling. This level translates domain-related concepts and relationships in data constructs which are expressed in a rigorous and standard logical data modeling paradigm. Our modeling approach is based on the XML Schema modeling paradigm (Sall, 2002). The XML serialization of the DMSM is called Document Management and Sharing Markup Language (DMSML).
In Figure 2 we provide an extract of the DMSM, showing a part of the conceptual representation of the DMSM Information Descriptive Model (Figure 2a) and its logical representation in XML Schema Language (Figure 2b). Figure 2c shows an instance of the DMSM for a project proposal document. The DMSM instance is an XML document which contains DMSM metadata labels and values, describing a specific document, and is valid against the syntactical rules encoded in DMSML. An example of syntactical rule is that an element "document" should contain an "identifier", a "title", at least one "creator", etc..
Figure 2. Example of the DMSM Information Descriptive Model: a. conceptual model; b. Logical model (XML Schema); c. instance document (XML)
The 2-layered modeling approach facilitates the following steps of DMS design:
- discussion and common understanding among software designers and end users, in order to define socio-organizational requirements, thanks to the conceptual abstraction;
- translation of socio-organizational requirements into metadata-based technical system specifications, by means of DMSML machine-understandable syntax. As a matter of fact DMSML is a declarative language for DMS design which encodes document descriptive properties, access policies and lifecycles in XML syntax.
Consequently, DMSML can support the design and configuration of a DMS according to the specific requirements of an organization, providing specific methods and mechanisms to exploit the business knowledge owned by end users, and leveraging on the compliance with standard formalisms and existing metadata specifications. For the sake of clarity, Figure 3 provides a graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, and Process Model. The complete specification can be found in a previous work (Paganelli, 2004).
Figure 3. Graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, and Process Model
4. Method for DMS design and development
This section describes the method for DMS design and development based on the DMSM information model. This DMSM-driven method covers the whole cycle of activities of DMS development. The iterative process includes the following stages, as shown in Figure 4: Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design, Development and Deployment, and Testing and Evaluation. Some steps include semi-structured interviews, based on reference questionnaires. In order to propose a generally-applicable approach, in this paper we describe the main objectives of the interviews and the suggested profile of the interviewees. As a matter of fact, questions should be tailored to the specific characteristics and critical factors of the target organization and questions and their order might consequently need to be modified on the fly. An example of a reference questionnaire is shown in Table 2, other examples can be found in a previous work (Paganelli, 2004).
Figure 4. DMSM Method
4.1 Preliminary Meeting
The first step envisages a meeting with some organization representatives. The aim is to delineate the profile of the organization and the organization's strategy for information management, in order to highlight existing inefficiencies, problems and critical factors. Two kinds of questionnaires are used for this activity.
The first questionnaire (Questionnaire A - Organization Profile) is focused on basic information about the organization's profile, such as generic information describing the organization's business goals, services and/or products offered to the market, typology of customers, partners and competitors, size (e.g. number of employees) and geographical distribution of company's sites. This questionnaire has to be submitted to at least one person which has a deep knowledge of the company (e.g. an executive or top manager).
The second questionnaire (Questionnaire B - Practices and Applications for Unstructured Document Management in the Organization) aims to delineate the organizational strategy for information management, focusing especially on unstructured documents. The aim is to collect information about information systems in use and existing policies for document management, to understand how these policies are formalized and shared in the target organization (e.g. formalized as written procedures, tacitly shared and based on practice, etc.) and to highlight the critical factors and unresolved issues (e.g. obstacles of a DMS purchase in an organization which does not have yet a DMS). In this case, the interviewees should know which information systems are in use and how end users use them to share and manage documents for organizational purposes (e.g. a representative of the IT staff, and people which supply input and/or use output of the system).
4.2 Critical Factors Analysis
The critical factors discovered during the first stage should then be analyzed in order to find the causes of possible inefficiencies in DM strategies and/or the factors that should be improved (e.g. bad practices, deficiencies of IT tools, lack of formalized procedures). Based on these considerations, the following step aims to plan a solving intervention. In the context of this work, the intervention is conceived as the definition of an effective solution for unstructured document management. The DMSML model can help in the formalization of a DM strategy which effectively supports the organization's processes.
4.3 DMSM-based Solution Specification
Based on the DMSM model, this stage aims to design a solution for unstructured document management, dealing with the requirements of the target organization. The first step consists in the classification of documents in use in the organization, in collaboration with some organization employees. According to the DMSM model, for each document class (e.g. technical report, project documentation and technical offers), the questions should collect information about descriptive information and collaboration and process- related properties, relevant for document management.
An example of a generally-applicable questionnaire form is provided in Table 2. The collected information should then be used in order to define the DMS specifications, organized in a Descriptive Information Model, Collaboration Model and Process Model and encoded in the DMSML syntax. Based on the collected information, the need to extend/modify the DMSML labels should then be evaluated. For instance, we can imagine that a technical offer or the technical specifications for a project should be labeled with the name of the project they refer to. In that case, the model should be extended by adding a "project" label, to further characterize and easily retrieve documents which are related on a project affiliation basis. The use of XML Schema as the encoding language facilitates the extension of the information model and the use of external metadata schemas, by means of standard mechanisms, such as xs:any, xs:import, and xs:include (Sall, 2002).
QUESTIONNAIRE C - Document Class Properties |
|
a. Description |
|
a.1 Please briefly describe the document (name, purpose, related project/organizational process, etc.) |
|
a.2 How can this document be classified (meeting minutes, mail, report, etc.?) |
|
a.3 How is it identified (sequential number, code, date)? |
|
b. Collaboration |
|
b.1 What is the access policy for this document? |
|
b.2 How is the access policy specified and interpreted by the system? |
|
c. Process |
|
c.1 Is there a predefined procedure for the management of this document (e.g. guidelines, protocols, etc.)? |
|
c.2 Is a template available? |
|
c.3 Describe the steps of its lifecycle |
|
d. Management |
|
d.1 How do you usually search for this document? (e.g. by Title, author, keywords, project name, etc. |
|
d.2 Does the document refer to other document typologies? |
|
d.3 If it does, How? (e.g. annotations, bibliographic references, URLs, etc.) |
|
d.4 How is versioning managed? |
|
e. IT support |
|
e.1 Which features are provided by the DMS for the management of this document? |
|
e.2 Which should be provided? |
|
f. Personal Experiences |
|
f.1 According to your experience, what are the current problems in the management of this document type? |
|
f.2 Would you suggest a new procedure, new features or a new solution for DM? |
4.4 DMS Design, Development and Deployment
This step is focused on the design, development and deployment of the DMS. The DMSML specifications provide the formal foundation for DMS design and development. Thanks to the XML syntax, the DMSML-based specifications can be interpreted by a CASE tool for the automatic generation of DMS code. These specifications (e.g. access policies) can also be automatically enforced by the DMS during its operation.
In order to facilitate the DMSML-based design and the automatization of development and deployment stages we developed a set of tools and applications, named DMSML Framework. Further information about the DMSML Framework is provided in Sections 5 and 6.
It is worth observing that this method aims to be general and technology-independent, and it could benefit from different CASE and fast prototyping tools, other than those provided by the DMSML Framework.
4.5 Testing and evaluation
A selected group of organization employees (a group of users) should then test the DMS, during their working activities. This step aims to evaluate the capability of a DMSML-based solution of Document Management to address the critical factors discovered and analysed in the first two steps of the method, as well as the level of usability of the DMSML Framework Prototype. This investigation in the organization is supported by two kinds of questionnaires:
- Questionnaire D - DMSML Impact in the Organization, submitted to the group of users, in order to verify if the new solution for DMS has corrected the critical factors previously identified. The results have to be analyzed and interpreted in order to correct/refine the Document Management strategy for the target organization (as shown in Figure 4).
- Questionnaire E - Usability of the DMSML Framework Prototype, submitted to the group of users and the IT responsible, to evaluate the level of usability of the DMSML Framework functionalities, thus providing a feedback for interface re-design in the step of DMS Design, Development and Deployment and for a possible refinement of the DMSM-based Solution Specification (as shown in Figure 4).
5. DMSML Framework
The DMSML Framework is an integrated set of software tools which provide the user with automated support for DMS design, deployment and maintenance, according to the specifications encoded in the DMSML declarative language.
The DMSML Framework consists of three parts, as shown in Figure 5:
- a DMS Configurator, which offers a user-friendly graphical interface facilitating the specification of a DMS solution. It enables a user (i.e. a super-user such as a DMS designer or a system administrator) to generate a DMSML instance document, containing the specifications tailored to a target organization, by means of graphical formalisms hiding the complexity of the XML syntax. To this extent, the DMS Configurator acts through a wizard which progressively guides the user through the definition of the workspace, the organizational schema, the folder structure and, finally, a set of lifecycle templates and access policies, to be assigned to documents or document types. The DMSML instance so created - called DMSML specifications- containing business and organizational information, such as organizational schemas and access policies, will then be processed by the DMS Generator.
- a DMS Generator, which is a web-based application enabling the user (i.e. a super-user) to deploy a DMS by uploading the DMSML instance through a standard Web browser. It customizes a DMS template (i.e. a set of Document Management libraries), according to the DMSML-based specifications, and it deploys a DMS compliant with those requirements.
- a DMS Web Application, which provides basic Document Management features, accessible through a standard web browser. Its configuration is described by the DMSML specifications, in terms of documents' descriptive properties, access policies and lifecycle management. The DMS Web Application configuration and deployment is supported by the DMS Configurator and the DMS Generator facilities. The functions provided by the DMS Web Application include: facilities for navigation, document upload, version control, document lifecycle management, access control, search functions (both metadata and full-text based), and log file recording. It is worth observing that the document search is based not only on descriptive metadata (such as title, author, etc.), but also on administrative metadata, related to lifecycle steps or access control rules (for instance, search for all documents in the state "draft", or all the documents which can be accessed by a project manager). The extensibility of the DMSML model allows a DMS designer to define ad-hoc search policies for target organizations.
Figure 5. DMSML Framework Prototype: Functional Architecture
5.1 DMS Configurator
The DMS Configurator is a Java application. Its architecture consists of an Interface, which uses the JavaSwing Graphic Toolkit and other Graphic Utilities (e.g. images, etc.) and the DMS Configurator Core, built on top of the Java Virtual Machine (Figure 6.a). The DMS Configurator Core is composed of five main components:
- an XML Schema Parser, which aims to verify the validity of the XML document to the DMSML specifications. The XML Schema Parser also contains an XML parser
- JDOM API, used to create, access and manipulate XML Documents
- a XPath Engine, validating XPath expression
- a Rule Manager, which interprets and enforces the rules associated to the user actions
- a set of Basic Services, such as logging and data storage facilities.
Figure 6. DMSML Framework Prototype three-tiered Architecture: a. DMS Configurator, b. DMS Generator, c. DMS Web Application Architecture
5.2 DMS Generator
The DMS Generator, as well as the DMS Web Application, are web applications designed according to J2EE (Java 2 Enterprise Edition) specifications. Both the DMS Generator and the DMS Web Application are characterized by a multi-tier architecture, consisting of a Client, an Application Logic (composed of an Interaction and a Business Logic side), and a Data tier (Figure 6b).
The Client is a standard web browser. The Interaction side is realized by means of JSPs. The Business Logic contains a template of a DMS Web Application (i.e. a set of DM libraries) and a set of APIs, called DMSG (DMS Generator) APIs. The DMSG APIs are a set of Java classes which customize the template according to specific configuration parameters, encoded in the DMSML language. Based on the features of the DMSML model, the DMS Generator allows a completely declarative approach for the design and deployment of a Document Management System for a target organization.
5.3 DMS Web Application
Analogously to the DMS Generator, the DMS Web Application has a multi-tier architecture, based on J2EE specifications, as shown in Figure 6c.
The client side is a standard Web browser. The Interaction part is realized by means of JSPs and it provides the user with core Document Management features. The Business Logic is composed of a set of DMS APIs, implemented by Java classes, which provide basic functions for the management of workspaces, folders and documents. The DMS APIs consists of several components:
- a Document Manager, providing facilities for navigation, document upload, version control, etc.. This part is mainly based on the Descriptive Information Model specifications (e.g. folders' organization, title, creator, etc.)
- a Lifecycle Manager, which enforces the evolution of the document across the lifecycle steps, as specified according to the Process Model.
- an Access Manager, which should guarantee that users execute authorized actions, according to the organizational access policies (in the Collaboration Model). As the Collaboration Model is based on the XACML standard foraccess policy specification, the Access Manager is based on the Sun's XACML Implementation, which is an access control policy evaluation engine, written entirely in Java.
- a Search Engine, enabling a metadata-based and a full-text document search
- an History component, which records log files
- Basic Services, such as monitoring and connection to database services.
6. Designing and deploying a DMS using the DMSML Framework prototype
The DMSML Framework Prototype offers support to the DMS designer during the steps of DMSML-based Solution Specification and DMS Design, Development and Deployment.
6.1 DMSML-based Specification
The DMS Configurator provides the DMS designer with a sequential set of graphical windows, which progressively guide the user in the DMS configuration, throughout the definition of the workspace, the organizational schema and the folder structure. The DMS Configurator permits to specify the workspace entity, characterizing the information items in terms of Descriptive Information Model, Collaboration Model and Process Model.
First, the interface enables the user to specify the workspace organization in folders and sub-folders. For instance, in case of project documentation management, the designer can distinguish the following folders, each related to a project execution phase: Analysis, Specification, Development, Accounting. The graphical window, depicted in (Figure 7.a), helps the user in specifying the organization folder, according to the DMSML Information Descriptive Model. Figure 7.b is an excerpt of a DMSML instance document representing the folders' organization (e.g. folder "ProjectA" and subfolders "Analysis", "Specification", "Development", "Accounting"), automatically encoded by the DMS Configurator in the DMSML syntax. The user can specify some properties for each folder: for instance "title", "creator", "affiliation", and "document types" that can be assigned to that folder. The system provides some default document types (e.g. technical report, brochure, etc.), but it also enables the user to insert ad-hoc labels. Analogously to the previous example, Figure 8 shows the graphical window for folder properties' specification (Figure 8.a) and the resulting DMSML document instance (Figure 8.b)
The system provides graphical support for the definition of lifecycle models. Figure 9.a shows the graphical representation of the lifecycle template for documents which should be evaluated by a group of reviewers and consequently accepted or rejected. The document lifecycle is a process specified in terms of a sequence of tasks. The execution of a task is usually triggered by a transition condition, which can be automatic, time-dependent (e.g. a deadline) or caused by a user action or by an external event, and it is associated to an evolution of the document state (e.g. from "draft" to "in_review", to "accepted", or "refused"). In Figure 9.a circles represent the states of documents (or "places" in the Petri Net language) and rectangles represent the transitions from one state to another. The lifecycle of the document is build upon the concatenation of these states and transitions. Figure 9.b shows an excerpt of the DMSML representation of this lifecycle template.
These lifecycle models serve as a collection of templates which can then be assigned to documents in order to accordingly enforce their evolution during their "life". At design time, the user can assign a lifecycle template to the document types previously defined. In order to accommodate a certain level of flexibility, this pre-assignment can be modified by document creators by means of a proper interface offered by the DMS.
Finally, the designer can specify the access control policies which regulate the access to the information items on the basis of roles and responsibilities defined in the organization, as illustrated in Figure 10.a. The DMS Configurator automatically generate the DMSML instance document (Figure 10.b) and check the validity of the specification according to the DMSML rules.
<workspace xmlns="http://det.unifi.it/dmsml"> |
|
7.a DMS Configurator interface for folders' organization specification
|
7.b DMSML instance document excerpt for folders' organization specification (DMSML Descriptive Information Model) |
Figure 7. DMS specification: organization in folders and subfolders
<folder> |
|
8.a DMS Configurator interface for folders' properties specification (full-size version) | 8.b DMSML instance document excerpt for folders' properties specification (DMSML Descriptive Information Model) |
Figure 8. DMS specification: folders' characteristics definition
<lifecycle><name>lifecycleTemplate</name> |
|
9.a DMS Configurator interface for lifecycle templates specification (full-size version) | 9.b DMSML instance document excerpt for lifecycle templates specification (DMSML Process Model) |
Figure 9. DMS specification: lifecycle templates
<xacml:Policy PolicyId="document_revisionPolicy"> |
|
10.a DMS Configurator interface for access policies' specification(full-size version) |
10.b DMSML instance document excerpt for access policies specification(DMSML Collaboration Model) |
Figure 10. DMS specification: access policies
6.2 Design, Development and Deployment of the Document Management System
The DMSML specification is processed by the DMS Generator in order to properly customize the DMS template according to the organization's specific requirements (Figure 11). The DMS Generator web interface enables the user to upload the DMSML specification, called Business Configuration Document, together with the technical parameters (e.g. connection to databases, ip addresses, etc.) encoded in a XML document, named Technical Configuration Document. Figure 12 shows an excerpt of a Technical Configuration Document specifiying the parameters for a connection to a SQL database.
The DMS Web Application offers an intuitive interface with basic Document Management functionalities. The browsing and metadata-based search interfaces are shown in Figure 13 and Figure 14, respectively.
Figure 11. DMS Generator graphical interface
(full-size version)
<system xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="technology.xsd"> |
Figure 12. Technical Configuration document excerpt
Figure 13. DMS Web Application: browsing interface
(full-size version)
Figure 14. DMS Web Application: search interface
(full-size version)
7. Discussion
The DMS Web Application aims to cover the above-mentioned requirements for DMSs: Design methodology based on information models, Standard compliance, Uniform access to heterogeneous formats, and Cost effectiveness.
To this extent, we have proposed a DMS design method, which makes extensive use of the Document Management and Sharing information Model, throughout the steps of preliminary analysis, critical factor analysis, design, development and deployment, and testing and evaluation of a DMS in an organization.
The DMSM is a metadata specification which encompasses descriptive, as well as collaborative and process-dependent properties of organizational documents. The DMSM provides a formal, lower-level (structural) description of an information model for DMSs and supports the conception of a completely declarative approach for DMS design and automatic deployment.
The XML serialization of the model (DMSML) is a declarative language which allows the mapping of organizational requirements into machine-understandable technical DMS specifications. As a matter of fact, a DMSML instance contains XML tags enabling the description of the workspace configuration and folder organization, the creation or reuse of a document resource classification schema, the specification of the lifecycle and the access policies assigned to documents either separately or on a document type basis.
This work has helped to resolve the need of standard methodological approaches for DMS design by proposing a generally-applicable and technologically-independent method based on the DMSM information model. While generally the specifications in most available products are embedded in proprietary workflow engines or collaborative applications, DMSML is a declarative language, based on an open and standard-compliant data model.
Moreover, the DMSML Framework Prototype provides automatization support to the design method, reducing the need of technical expertise for DMS configuration (the DMS designer is not concerned with the DMSML syntax) and deployment (he/she should upload two XML documents and the system automatically deploys a customized DMS).
Secondly, standard compliance has been achieved in two ways: the DMSML language integrates three existing metadata standards (Dublin Core, XACML and PNML), and the DMSML Framework is based on standard Web development specifications (i.e. J2EE), and standard languages and technologies, such as XML and XSLT (Sall, 2002).
The other requirements (e.g. Uniform access to heterogeneous data sources and Cost effectiveness) have been partially addressed.
As a matter of fact, the use of web standards and protocols allows access to information stored in heterogeneous locations, but does not effectively support information retrieval, indexing and processing across heterogeneous repositories. The client side is implemented by standard Web browsers, thus providing users with a well-known and uniform paradigm of access, search and retrieval to documents available in heterogeneous formats and stored in heterogeneous locations.
Cost effectiveness is promoted by several factors: the DMS Web Application, as well as the whole DMSML Framework, are based on open source technologies. Furthermore, the instrumental support provided by the DMS Configurator and the DMS Generator enables to speed up the process of design, development and deployment of the DMS solution and hide some technical complexities (such as XML syntax). Because of these cost savings, the DMS Web Application is a candidate for a Document Management solution which is also suitable for addressing SMEs requirements, but this hypothesis needs to be carefully validated in target organizations.
These issues are going to be addressed in on-going and future activities. Firstly, we are experimenting the proposed methodology and the use of the DMSML framework for the management of scientific documentation (papers, theses, project documentation, etc.) in our Department. We have also planned an evaluation activity in a small enterprise. The selected SME is an Italian consulting firm which provides IT services and products to a wide range of customer enterprises. Consulting firms are highly data-intensive companies, since they depend heavily on the expertise of their people and the documented information produced during their business activities.
Better management of heterogeneous and distributed content repositories could be achieved by adopting metadata harvesting protocols, which gather metadata about content for resource discovery across heterogeneous repositories. One of the most important harvesting protocols is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) (Lagoze & Van de Sompel, 2002), which is widely adopted for digital libraries and cultural heritage information systems.
A higher degree of effective and uniform access to heterogeneous and distributed data sources would require a system to better deal with interoperability requirements. The long-term objective would be that of "enabling a machine to read documents of varying degrees of structures from heterogeneous data sources and understand the meaning of each document in order to find associations among those documents" (Fisher & Sheth, 2004). Achieving that kind of interoperability is obviously not a trivial task. Ontology-driven metadata extraction and annotation mechanisms can be used to support advanced classification techniques and to provide metadata with contextual relevance within a given domain. These techniques could provide a normalized "semantic" view of heterogeneous data, providing a certain degree of machine understanding and processing, across syntactical and structural differences of information sources.
For what concerns cost and feasibility evaluation, the DMSML Framework has been developed in an academic framework as a prototypal version. Consequently, usability tests and user-centered re-design of the existing prototype interface should be performed, together with a market analysis and a business plan, in order to promote the research transfer into industrial application. At present, we are evaluating the possibility of creating an open source project, based on the DMSML Framework, in order to benefit of cooperative software development advantages.
8. Conclusions
This paper described a DMS design method based on the DMSM information model. DMSM is a metadata specification which encompasses descriptive, collaborative and process-related properties of organizational documents. The method encompasses the stages of Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design, Development and Deployment, and Testing and Evaluation. The DMSML (i.e. the XML serialization of DMSM) enables a declarative design approach and the DMSM Framework Prototype (i.e. a set of tools for DMSML editing and DMS generation) facilitates automatic development and deployment of a DMS for a target organization.
This model-driven method satisfies two basic requirements for DMS design: design methodology based on information models and standard compliance. We described also future research activities aimed at evaluating the method in a SME and addressing requirements of uniform access to heterogeneous formats and cost effectiveness.