The Italian project Sapienza Digital Library (SDL) is a project undertaken by the largest Europe's campus, Università di Roma "La Sapienza" (Sapienza) and Cineca the italian supercomputer center, which is a non profit consortium made up of 47 Italian universities.
This project aims to build an infrastructure supporting preservation, management and dissemination of digital resources, as well as to create a novel weaving factory for the creation of digital resources' that are customizable to different digital environments, and reusable in diverse application contexts. Setting the future scenario of application, it has been evaluated and prefigured the large amount of research and knowledge material, coming from such large and ancient University as well as the various interests of a large and multidisciplinary community of stakeholders, and the last but not least, the potential uses of that material for general and specialized communities of users. The project was indeed, conceived to manage the integration of a large volume of multiformat materials, and to enable their access through different devices, in order to fulfill the needs and the expectations of diverse communities, local, global, and future.
The actual state of experiences in digital libraries, in digital resources management, in digitization and in evolution of dissemination tools have suggested to examine new cost-effective solutions in the weaving factory of submission, archiving, and dissemination of digital resources. In this scenario has been conceived the idea to exploit two different frameworks, Fedora Commons (FC) and MediaMosa (MM), usually adopted in different contexts like preservation and multimedia services. The different framework were used for reducing the costs of preservation, as well as optimizing the dissemination services.
2. Project overview
The project was undertaken by the Università di Roma "La Sapienza", one of the biggest university in the world, and Cineca, the italian supercomputer center, which is a non profit consortium made up of 49 Italian Universities, some prestigious research institutions, and the Ministry of Education, University and Research. The project aims to provide the University of a brand new digital library management system that will allow to support all services provided by the OAIS essential functions: submission, archiving and dissemination. The grounding digital repository of said system is the container of the digital resources, owned or managed by the Sapienza University. The METS standard was elected as the metadata container of all metadata needed for the different description's demands of resources. The following figure represents an overview of the main architecture. The system is founded on three different layers. The lower layer is the digital repository, managed by FC and supported by the integrating systems, as for example MM. The layer named "Other Services" represents the other integrating systems that manage the processing, the caching and the delivery of the images in different resolutions and formats. Up to this time this layer comprehends a JPEG2000 server, for the processing of the images in JPEG2000 format, the server components for the images displaying, like for example Djatoka, and services like an XSLT transfomer or the triplestore Mulgara, which contains the RDF relations among the objects of the repository. These services and systems are integrated by using the RESTful web-services exposing FC Content Models' disseminators, that were designed ad hoc, and the JMS messaging system (ActiveMQ engine).
Figure 1. Architecture overview.
All the repository's services and those defined as disseminators by the FC Content Models are exposed toward the integration layer realized on J2EE technology that instead exposes the services toward the portal layer, which is realized with the Drupal CMS . Among the services integrated by the FC disseminators, the project Sapienza Digital Library has firstly activated the management, transcoding and delivery of multimedia contents that are the focus of this paper.
3. Related work
The DuraCloud initiative develops "value-added services" for facilitating both access and reuse of contents . One of the DuraCloud pilot  concerns the audio video fruition services but it is focused on the cloud storage and delivery phase and not yet on contents' transcoding. In the second half of the 2010 has begun the Variations on Video project, led by Indiana University (IU) in partnership with Northwestern University. This is an analysis and planning project that provides the integration of OpenCast Matterhorn with a FC repository and is intended to define the scope, requirements, and technical architecture for extending IU's open source Variations digital music library system, to support the management and delivery of digital video collections. Other systems have instead chosen to address the transcoding phase without the integration and the difference essentially consists in a straight connection between repository and contents delivered by the streaming servers. Even though this solution is efficient and straightforward, it could make the system less flexible to the technology evolution.
4. Audiovisual content ingestion and modeling: METS ingestion negotiation for FC audiovisual content model
The Sapienza digital resources reside into a dark repository physically located in Rome and organized on a local naming system. The Cineca FC repository is located in Bologna and has been configured to receive packages of resources based on a customized naming system. The transfer of resources between repositories was based on an agreement about a shared METS structure, which was negotiated in order to match the requirements of the FC content model as well as, the Sapienza organizational requirements. The initial negotiation of the METS structure has been driven to the choice of adopting the atomistic philosophy in modeling of content materials, considering the dimension of the University and envisioning the rapid growth of the number of items to submit, and the typology variety of the resources. From the digital library system point of view, the customization of METS structure was harmonized on the base of the feasible structural differences, that impact on the FC model and its resulting services. In the figure is showed how a multipart video, represented in two different formats, can be modeled in order to associate the relevant services.
Figure 2. FC atomistic Content Model for video.
Those services are defined by the attribute USE value which must be present in XML mets:fileGrp element belonging to the mets:fileSec. The negotiated METS must have set this specific attribute, in order to define the type of objects that falls within the specific group that requires appropriate services. For example in case of USE="Source" the services are streaming, embedding and downloading. The Sapienza's digital objects, renamed in conformance with the local organizational business rules, and supplied with the negotiated METS file, are transferred into a remote storage system, from where they are properly ingested into the Cineca's FC repository. When the digital package (objects and METS files) is submitted and archived by the system, it is ready to its management and, specifically to the focus of this article, is ready to the access services.
5. MediaMosa platform as a transcoding and delivery service
MM is an Open Source GNU Public License (GPL) Multimedia Asset Management Platform based on the Drupal framework, it is designed to support content streaming applications by providing a back-end audio-video infrastructure. MM aims at building a web service oriented media management and distribution system, providing an agnostic multimedia content delivery. The MM system offers a platform for third-party development of content streaming applications, and web-services for the multimedia management. MM is based on a Service Oriented Architecture (SOA), and provides a storage platform for any type of content and supplies services like video playback, authentication, authorization (domain, realm, group or a combination of), upload (PUT, POST, FTP), transcoding (converting media files from one format to another), media management, search, OAI-PMH support, logging and statistics. Cineca uses the MM platform since the end of 2008 as backend for e-Learning and multimedia services. MM is focused on managing and delivering audio-video content, and considering that the SDL project will deal with heterogeneous contents, and a large number of objects, it was consequently decided to integrate and exploit the MM's services that were already deployed. Specifically, FC was integrated of two macro-functionalities of MM: the transcoding and the delivery (video playback). The transcoding provides the conversion from one format to another, as well as, the change of the bit rate or the resolution of such video files. MM supports all formats that can be transcoded by FFmpeg, including MPEG, WMV and Flash. MM content model relies on assets and media files: an asset is composed by a set of media files (the original one, and all the transcoded versions). In this way the same asset can be delivered using a number of different protocols and formats. In MM, a media file is always part of one single asset. The transcoding phase is processed following the specification of the transcoding profiles, that specify bit rate, frame size, container, etc. and the upload phase, which follows, is driven by a jobs management system. The MM platform provides access and delivery services for audio-video contents, supporting the playing back media, by means of a playback ticket, and the downloading of media. Furthermore, MM provides a Play Proxy module which allows to stream and download the video content, original and transcoded, and provides third-party applications.
6. Fedora Commons and Mediamosa integration
The integration between FC and MM has been realized both, on supporting audio-video contents' synchronization, through the definition of a video content model in FC, and on mapping of dissemination methods to MM RESTful methods for the content delivery.
Figure 3. FC and MM integration.
The audio-video contents' synchronization is defined by a 1 to 1 association between FC videos and the corresponding MM multimedia assets. The sync mechanism indeed, automatically carries content by means of a FC ingestion and a MM transcoding. Similarly, when a content purging request is done to FC, the corresponding content in MM will be purged as well. Through the definition of a video content model, with the corresponding series of access disseminators, is possible to trigger the audio-video content delivery by means of MM RESTful API.
6.1 Syncing audiovisual content
The FC platform uses a messaging system, which is based on Java Message Service (JMS) . The default messaging provider is ActiveMQ, and the messaging system is used for metadata and contents indexing, by gSearch. The MM platform was extended developing a synchronization module, written in PHP and based on Stomp , which receives API-M messages - in particular for methods like ingest and purge - and uses, within MM, the FTP Upload Batch module.
Given an ingest action in FC, the synchronization's module, represented in the Figure 4, carries out the following operations:
- gets the PID of the selected object;
- verifies, using FC's RESTful API, if the content of the object is effectively a video content, checking the content model associated to the object;
- recovers the object's datastream source;
- creates an ingestion package for MM batch FTP, specifying the selected transcoding profiles, and deposits it into the FTP area.
Given a purge action in FC, the same sync module carries out the following operations:
- gets the PID of the selected object;
- checks if a corresponding MM asset exists;
- deletes the MM assets and all the related media files.
Figure 4. Syncing content.
6.2 Delivering audiovisual content
The FC video content model has four important access disseminators: getThumbnail, which gets the thumbnail URI transcoded by MM, getStreaming, which provides the streaming URI, getDownload for download URI, and getEmbedding, which supplies the embedding code for the content playback in streaming. These four disseminators are mapped toward the MM RESTful play call, that, depending on the access methods, gets different parameters. Given a FC object, with the specified video content model (RDF relationship), it is possible to recall one of the methods, above mentioned, and it is possible to obtain, transparently, the content processed by MM.
The integration of Fedora Commons repository and MediaMosa DAM allows to reuse the existing contextual elements of the established digital library system, like knowledge, framework and services. The chosen solution achieves economies of scale, and provides a value added, that is a valid trade off between the two systems, in spite of the existing functions' overlap. Along with the development of relevant back-office interfaces, the digital library will be provided with on-demand transcoding services, which allow to deliver content on different devices, in more flexible and interoperable manner.