Journal of Digital Information https://jodi-ojs-tdl.tdl.org/jodi <p><em>The Journal of Digital Information published its final issue in 2012 and has ceased publication. This site is maintained by the Texas Digital Library for archiving purposes and contains issues published between 1998 and 2012. </em></p> <p><em>JoDI </em>published peer-reviewed papers on the management, presentation and uses of information in digital environments.</p> Texas Digital Library en-US Journal of Digital Information 1368-7506 DAR: A Modern Institutional Repository with a Scalability Twist https://jodi-ojs-tdl.tdl.org/jodi/article/view/5396 The Digital Assets Repository (DAR) is an Institutional Repository developed at the Bibliotheca Alexandrina to manage the full lifecycle of a digital asset: its creation and ingestion, its metadata management, storage and archival in addition to the necessary mechanisms for publishing and dissemination. DAR was designed with a focus on integrating DAR with different sources of digital objects and metadata in addition to integration with applications built on top of the repository. As a modern repository, the system architecture demonstrates a modular design relying on components that are best of the breed, a flexible content model for digital objects based on current standards and heavily relying on RDF triples to define relations. In this paper we will demonstrate the building blocks of DAR as an example of a modern repository, discussing how the system addresses the challenges that face an institution in consolidating its assets and a focus on solving scalability issues. Youssef Mikhail Noha Adly Magdy Nagi Copyright (c) 2012-03-08 2012-03-08 13 1 Document Viewers for Non-Born-Digital Files in DSpace https://jodi-ojs-tdl.tdl.org/jodi/article/view/5600 As more institutions continue to work with large and diverse type of content for their digital repositories, there is an inherent need to evaluate, prototype, and implement user-friendly websites -regardless of the digital files' size, format, location or the content management system in use. This article aims to provide an overview of the need and current development of Document Viewers for digitized objects in DSpace repositories -includign a local viewer developed for an newspaper collection and four other viewers currently implemented in DSpace repositories. According to the DSpace Registry, 22% of institutions are currently storing "Images" in their repositories and 21% are using DSpace for non-traditional IR content such as: Image Repository, Subject Repository, Museum Cultural, or Learning Resources. The combination of current technologies such as Djatoka Image Server, IIPImage Server, DjVu Libre, and the Internet Archive BookReader, as well as the growing number of digital repositories hosting digitized content, suggests that the DSpace community will probably benefit with an "out-of-the-box" Document Viewer, especially one for large, high-resolution, and multi-page objects. Elias Tzoc Copyright (c) 2012-03-08 2012-03-08 13 1 Repository as a Service (RaaS) https://jodi-ojs-tdl.tdl.org/jodi/article/view/5872 In his oft-quoted seminal paper ‘Institutional Repositories: Essential Infrastructure For Scholarship In The Digital Age’ Clifford Lynch (2003) described the Institutional Repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” This paper seeks instead to define the repository service at a more primitive level, without the specialism of being an ‘Institutional Repository’, and looks at how it can viewed as providing a service within appropriate boundaries, and what that could mean for the future development of repositories, our expectations of what repositories should be, and how they could fit into the set of services required to deliver an Institutional Repository service as describe by Lynch. Stuart Lewis Kim Shepherd Yin Yin Latt Andrea Schweer Adam Field Copyright (c) 2012-03-08 2012-03-08 13 1 Chempound - a Web 2.0-inspired repository for physical science data https://jodi-ojs-tdl.tdl.org/jodi/article/view/5873 Chempound is a new generation repository architecture based on RDF, semantic dictionaries and linked data. It has been developed to hold any type of chemical object expressible in CML and is exemplified by crystallographic experiments and computational chemistry calculations. In both examples, the repository can hold >50k entries which can be searched by SPARQL endpoints and pre-indexing of key fields. The Chempound architecture is general and adaptable to other fields of data-rich science. Sam Adams Peter Murray-Rust Copyright (c) 2012-03-08 2012-03-08 13 1 Building a Community of Curatorial Practice at Penn State: A Case Study https://jodi-ojs-tdl.tdl.org/jodi/article/view/5874 The Penn State University Libraries and Information Technology Services (ITS) collaborated on the development of Curation Architecture Prototype Services (CAPS), a web application for ingest and management of digital objects. CAPS is built atop a prototype service platform providing atomistic curation functions in order to address the current and emerging requirements in the Libraries and ITS for digital curation, defined as “... maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle” (Pennock, 2006)[7]. Additional key goals for CAPS were application of an agile-style methodology to the development process and an assessment of the resulting tool and stakeholders’ experience in the project. This article focuses in particular on the community-building aspects of CAPS, which emerged from a combination of agile-style approaches and our commitment to engage stakeholders actively throughout the process, from the construction of use cases, to decisions on metadata standards, to ingest and management functionalities of the tool. The ensuing community of curatorial practice effectively set the stage for the next iteration of CAPS, which will be devoted to planning and executing the development of a production-ready, enterprise-quality infrastructure to support publishing and curation services at Penn State. Patricia Hswe Michael J. Giarlo Michelle Belden Kevin Clair Daniel Coughlin Linda Klimczyk Copyright (c) 2012-03-08 2012-03-08 13 1 REDDNET and Digital Preservation in the Open Cloud: Research at Texas Tech University Libraries on Long-Term Archival Storage https://jodi-ojs-tdl.tdl.org/jodi/article/view/5875 In the realm of digital data, vendor-supplied cloud systems will still leave the user with responsibility for curation of digital data. Some of the very tasks users thought they were delegating to the cloud vendor may be a requirement for users after all. For example, cloud vendors most often require that users maintain archival copies. Beyond the better known vendor cloud model, we examine curation in two other models: inhouse clouds, and what we call "open" clouds—which are neither inhouse nor vendor. In open clouds, users come aboard as participants or partners—for example, by invitation. In open cloud systems users can develop their own software and data management, control access, and purchase their own hardware while running securely in the cloud environment. To do so will still require working within the rules of the cloud system, but in some open cloud systems those restrictions and limitations can be walked around easily with surprisingly little loss of freedom. It is in this context that REDDnet (Research and Education Data Depot network) is presented as the place where the Texas Tech University (TTU)) Libraries have been conducting research on long-term digital archival storage. The REDDnet network by year's end will be at 1.2 petabytes (PB) with an additional 1.4 PB for a related project (Compact Muon Soleniod Heavy Ion [CMS-HI]); additionally there are over 200 TB of tape storage. These numbers exclude any disk space which TTU will be purchasing during the year. National Science Foundation (NSF) funding covering REDDnet and CMS-HI was in excess of $850,000 with $850,000 earmarked toward REDDnet. In the terminology we used above, REDDnet is an open cloud system that invited TTU Libraries to participate. This means that we run software which fits the REDDnet structure. We are beginning to complete the final design of our system, and starting to move into the first stages of construction. And we have made a decision to move forward and purchase one-half petabyte of disk storage in the initial phase. The concerns, deliberations and testing are presented here along with our initial approach. James Brewer Tracy Popp Joy Perrin Copyright (c) 2012-03-08 2012-03-08 13 1 CLIF: Moving repositories upstream in the content lifecycle https://jodi-ojs-tdl.tdl.org/jodi/article/view/5876 The UK JISC-funded Content Lifecycle Integration Framework (CLIF) project has explored the management of digital content throughout its lifecycle from creation through to preservation or disposal. Whilst many individual systems offer the capability of carrying out lifecycle stages to varying degrees, CLIF recognised that only by facilitating the movement of content between systems could the full lifecycle take advantage of systems specifically geared towards different stages of the digital lifecycle. The project has also placed the digital repository at the heart of this movement and has explored this through carrying out integrations between Fedora and Sakai, and Fedora and SharePoint. This article will describe these integrations in the context of lifecycle management and highlight the issues discovered in enabling the smooth movement of content as required. Simon Waddington Richard Green Chris Awre Copyright (c) 2012-03-08 2012-03-08 13 1 Kindura: Repository services for researchers based on hybrid clouds https://jodi-ojs-tdl.tdl.org/jodi/article/view/5877 The paper describes the investigations and outcomes of the JISC-funded Kindura project, which is piloting the use of hybrid cloud infrastructure to provide repository-focused services to researchers. The hybrid cloud services integrate external commercial cloud services with internal IT infrastructure, which has been adapted to provide cloud-like interfaces. The system provides services to manage and process research outputs, primarily focusing on research data. These services include both repository services, based on use of the Fedora Commons repository, as well as common services such as preservation operations that are provided by cloud compute services. Kindura is piloting the use of the DuraCloud2, open source software developed by DuraSpace, to provide a common interface to interact with cloud storage and compute providers. A storage broker integrates with DuraCloud to optimise the usage of available resources, taking into account such factors as cost, reliability, security and performance. The development is focused on the requirements of target groups of researchers. Simon Waddington Jun Zhang Gareth Knight Mark Hedges Jens Jensen Roger Downing Copyright (c) 2012-03-08 2012-03-08 13 1 Beyond The Low Hanging Fruit: Data Services and Archiving at the University of New Mexico https://jodi-ojs-tdl.tdl.org/jodi/article/view/5878 Open data is becoming increasingly important in research. While individual researchers are slowlybecoming aware of the value, funding agencies are taking the lead by requiring data be made available, and also by requiring data management plans to ensure the data is available in a useable form. Some journals also require that data be made available. However, in most cases, “available upon request” is considered sufficient. We describe a number of historical examples of data use and discovery, then describe two current test cases at the University of New Mexico. The lessons learned suggest that an instituional data services program needs to not only facilitate fulfilling the mandates of granting agencies but to realize the true value of open data. Librarians and institutional archives should actively collaborate with their researchers. We should also work to find ways to make open data enhance a researchers career. In the long run, better quality data and metadata will result if researchers are engaged and willing participants in the dissemination of their data. Robert Olendorf Steve Koch Copyright (c) 2012-03-08 2012-03-08 13 1 Building the Hydra Together: Enhancing Repository Provision through Multi-Institution Collaboration https://jodi-ojs-tdl.tdl.org/jodi/article/view/5879 In 2008 the University of Hull, Stanford University and University of Virginia decided to collaborate with Fedora Commons (now DuraSpace) on the Hydra project. This project has sought to define and develop repository-enabled solutions for the management of multiple digital content management needs that are multi-purpose and multi-functional in such a way as to allow their use across multiple institutions. This article describes the evolution of Hydra as a project, but most importantly as a community that can sustain the outcomes from Hydra and develop them further. The data modelling and technical implementation are touched on in this context, and examples of the Hydra heads in development or production are highlighted. Finally, the benefits of working together, and having worked together, are explored as a key element in establishing a sustainable open source solution. Chris Awre Tom Cramer Copyright (c) 2012-03-08 2012-03-08 13 1 Cloud as Infrastructure at the Texas Digital Library https://jodi-ojs-tdl.tdl.org/jodi/article/view/5881 In this paper, we describe our recent work in using cloud computing to provision digital library services. We consider our original and current motivations, technical details of our implementation, the path we took, and our future work and lessons learned. We also compare our work with other digital library cloud efforts. Peter Nuernberg John Leggett Mark McFarland Copyright (c) 2012-03-08 2012-03-08 13 1 Sheer Curation of Experiments: Data, Process, Provenance https://jodi-ojs-tdl.tdl.org/jodi/article/view/5883 This paper describes an environment for the “sheer curation” of the experimental data of a group of researchers in the fields of biophysics and structural biology. The approach involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment does not capture just the individual datasets generated by an experiment, but the entire workflow that represent the “story” of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the researchers’ processing environment, the provenance is inferred from a variety of domain-specific contextual information, using software that implements the knowledge and expertise of the researchers. We also present an approach to publishing the data files and their provenance according to linked data principles by using OAI-ORE (Open Archives Initiative Object Reuse and Exchange) and OPMV. Mark Hedges Tobias Blanke Stella Fabiane Gareth Knight Eric Liao Copyright (c) 2012-03-08 2012-03-08 13 1 FISHNet: encouraging data sharing and reuse in the freshwater science community https://jodi-ojs-tdl.tdl.org/jodi/article/view/5884 This paper describes the FISHNet project, which developed a repository environment for the curation and sharing of data relating to freshwater science, a discipline whose research community is distributed thinly across a variety of institutions, and usually works in relative isolation as individual researchers or within small groups. As in other “small sciences”, these datasets tend to be small and “hand-crafted”, created to address particular research questions rather than with a view to reuse, so they are rarely curated effectively, and the potential for sharing and reusing them is limited. The paper addresses a variety of issues and concerns raised by freshwater researchers as regards data sharing, describes our approach to developing a repository environment that addresses these concerns, and identifies the potential impact within the research community of the system. Mark Hedges Mike Haft Gareth Knight Copyright (c) 2012-03-08 2012-03-08 13 1 Preserving and delivering audiovisual content integrating Fedora Commons and MediaMosa https://jodi-ojs-tdl.tdl.org/jodi/article/view/5911 The article describes the integrated adoption of Fedora Commons and MediaMosa for managing a digital repository. The integration was experimented along with the development of a cooperative project, Sapienza Digital Library (SDL). The functionalities of the two applications were exploited to built a weaving factory, useful for archiving, preserving and disseminating of multi-format and multi-protocol audio video contents, in different fruition contexts. The integration was unleashed by means of both repository-to-repository interaction, and mapping of video Content Model's disseminators to MediaMosa's Restful services. The outcomes of this integration will lead to a more flexible management of the dissemination services, as well as to economize the overproduction of different dissemination formats. Matteo Bertazzo Angela Di Iorio Copyright (c) 2012-03-08 2012-03-08 13 1 Visualizing Research Data Records for their Better Management https://jodi-ojs-tdl.tdl.org/jodi/article/view/5917 As academia in general, and research funders in particular, place ever greater importance on data as an output of research, so the value of good research data management practices becomes ever more apparent. In response to this, the Innovative Design and Manufacturing Research Centre (IdMRC) at the University of Bath, UK, with funding from the JISC, ran a project to draw up a data management planning regime. In carrying out this task, the ERIM (Engineering Research Information Management) Project devised a visual method of mapping out the data records produced in the course of research, along with the associations between them. This method, called Research Activity Information Development (RAID) Modelling, is based on the Unified Modelling Language (UML) for portability. It is offered to the wider research community as an intuitive way for researchers both to keep track of their own data and to communicate this understanding to others who may wish to validate the findings or re-use the data. Alexander Ball Mansur Darlington Thomas Howard Chris McMahon Steve Culley Copyright (c) 2012-03-08 2012-03-08 13 1