Curation Architecture Prototype Services (CAPS) is a web application for ingest and management of digital objects built atop a prototype service platform providing atomistic curation functions (or curation microservices). The goals of the CAPS project, a collaboration between the Penn State University Libraries and Information Technology Services (ITS), were to develop a back-end architecture for management of digital objects, to engage library curators, to address current and emerging curatorial needs, to apply a stakeholder-driven development methodology, and to assess the costs and benefits of this approach. The project was chartered in November of 2010 and given an ambitious deadline of March 2011 to synchronize with the institution’s budget schedule.
The development team consisted of a project manager, a software developer, a curator, a metadata librarian, an archivist, and a technical architect. In addition to the development team, stakeholders from throughout the libraries were invited to contribute design ideas and use cases in order to ensure that the application was responsive to demonstrated needs among the community of content curators within the Penn State University Libraries. These stakeholders included representation from Special Collections, the Maps Library, Digitization & Preservation, and the Arts & Architecture Library.
Early in the CAPS project, community engagement became a dominant theme. The experience of applying an agile approach, which involved daily meetings of the development team and weekly discussions with stakeholders during the four-month period of the project, emerged as a community-building process. Stakeholder participation did not end upon completion of the prototype, however; when development concluded, the team asked stakeholders to test and debug the prototype, and the development team administered a survey for stakeholders to evaluate their project experience.
The expression “community of practice” describes a “group of people who share an interest in a domain of human endeavor and engage in a process of collective learning that creates bonds between them: a tribe, a garage band, a group of engineers working on similar problems” (Wenger, 2004). The CAPS project produced a similar experience of shared learning in the context of digital curation. This article details how development and stakeholder activities threw into sharp relief a collective understanding of the functionalities and features needed in a basic curation tool and, as a result, built a community of practice to draw upon for further development of curation services at Penn State.
2. Evolving a Community
The development of CAPS was preceded by two projects sponsored jointly by the Libraries and ITS, laying the foundation for building a community of practice around digital curation. These projects were a platform review, in which the four systems used at the Libraries to deliver and manage digitized collection content were assessed, and a microservices tools pilot project, in which curatorial use cases were gathered to test curation microservices-based tools as a proof of concept.
Both projects solicited input from librarians across the spectrum of academic library services, from public to technical to information technology. As such they were instrumental in presenting curation as a shared endeavor, rather than as an activity carried out in isolation by one unit or department or employee, on one platform or another, in the Libraries.
From these projects, it became clear that Penn State librarians were engaging in curatorial work even if they did not refer to it as such. In turn, by eliciting and documenting, their observations about these platforms, the team was amassing a knowledge base about legacy systems from a curator’s perspective, which could prove useful in future decision-making processes tied to application development, and in the identification of best practices for curatorial workflows. Additionally, any prototype development resulting from the platform review and the pilot project would automatically have stakeholders with whom to consult and whose regular feedback would inform meaningful iterations of the prototype; that is, a side effect of the assessment and pilot projects was construction of a local community around curatorial practices, which community was crucial for driving the development of the CAPS project.
The platform review, which the newly hired Digital Library Architect and Digital Collections Curator conducted over several months in early 2010, investigated the following systems: CONTENTdm, used mainly for image collections; DPubS, for open-access journals and monographs; ETD-db, for electronic theses and dissertations; and Olive ActivePaper Archive, for digitized historic newspapers.
Through demonstrations of these platforms and through interviews with colleagues who use them, the architect and curator found many deficiencies. For example, because each of these systems requires different workflows, different training, and different back-end technologies, they are effectively siloized. Perhaps even more importantly, each application emphasizes content delivery over content management, which practice must then be manually coordinated across heterogeneous work environments. Management of digital content is thus handled partially in the aforementioned applications, partially in assorted filesystems, and partially in personal spreadsheets and databases. In addition, the review disclosed gaps in types of content not being curated, the foremost examples of which were: electronic business records, which the university is legally mandated to preserve; and research data, for which the National Science Foundation and other funding agencies require management plans in proposals for grant funding.
The review validated collective suppositions about these platforms and highlighted a couple of advantages to them. It confirmed what had been suspected from the start: without a centralized or unified architecture to curate digital content, the Libraries’ systematic management of such content is haphazard, burdensome, insufficient, and unsustainable -- particularly if the Libraries are to continue scaling their efforts up and out. At the same time, findings from the review suggested that having widely used legacy applications has proved valuable in the following ways. First, they presented an opportunity to analyze the gap between curators' needs and the functionality provided by the applications. Second, they created a base of potential users for next-generation curation systems -- namely “curator types” whose needs are not currently met by legacy delivery applications.
Following the platform review, a group consisting of the architect, the curator, a software developer, and an archivist from the Special Collections Library, explored tools based on curation microservices. They focused their investigation particularly on whether microservices-based tools and specifications could help Penn State curators manage digital content usually destined for the four digital content delivery applications, thereby addressing the Libraries’ need to manage content in a more programmatic and less siloized way.
The team also endeavored to understand more clearly how curators at Penn State do their work; that is, the practice of digital curation needed to be situated within the Penn State University Libraries’ operational context. To this end the team gathered from curators a set of use cases that painted a clear, if necessarily limited, picture of curatorial practices and needs at Penn State. Examples of use cases included automation of naming conventions for electronic records items, automated extraction and generation of non-descriptive metadata (such as preservation and technical metadata), and migration workflows that transfer data on optical media to networked storage and support scheduled fixity verifications.
The scope of the microservices pilot project was soon expanded to include building a web-based curation services platform prototype, called Curation Architecture Prototype Services (CAPS), with requirements driven by the curatorial use cases gathered in the pilot project phase. The prototype would include building an architecture (or platform) for curation microservices and a web-based curatorial tool for ingest and management of digital objects, supporting commonly needed operations such as identification, verification, description, versioning, auditing, and storage. The pilot project team expanded to include the Libraries' metadata librarian and a project manager with experience creating web applications. In addition, the curators the team interviewed in the two earlier projects became stakeholders for CAPS. They would spur the iterative development of the project, which in turn prepared them for testing the completed prototype.
An aspect of CAPS that exemplifies progression of a community of practice is its deployment of microservices, a technical approach that leverages existing code and specifications. The curation microservices model is based on principles of service-oriented architecture, wherein independent curation functions form a platform. “Micro-services, the decomposition of repository function into a highly granular orthogonal set of independent but interoperable components that can be freely composed in strategic combinations towards user ends” (Abrams, et al. 2011). The separation of these functions lends itself to code reuse as well as flexibility at the application layer; an application may invoke curation services that are appropriate to its features and ignore those that are not.
A strength of platform-based approaches is that applications can be tailored to institution- and user-specific needs -- as opposed to applications in which the user interface is tightly coupled to back-end functionality -- whereas behind-the-scenes functionality can be based on open code and specifications developed by the curation community. Leveraging code and specifications developed elsewhere aligned Penn State's efforts with those of peer institutions in the curation community, and supports the notion of relay-supporting archives (Janée et al. 2009), allowing successor archives to make sense of content ingested via CAPS. “This results in a curation environment that is comprehensive in scope, yet flexible with regard to local policies and practices and sustainable despite the inevitability of disruptive change in technology and user expectation” (Abrams et al. 2010).
The CAPS project may be viewed as building yet another repository system from scratch, and to some extent this is a fair criticism though it fails to account for the application's layered approach. (See Figure 1.) While CAPS was not built atop widely deployed repository system software, it was built atop widely deployed components (open-source code and open-community specifications), each of which was abstracted in the code and thus replaceable by other components. The ability to mix, match, and swap small components reflects a key benefit of the platform model. “Curation goals are better served by concentrating on long-lived content sustained by a constantly evolving repertoire of nimble, commodified services” (Abrams et al. 2009).
Figure 1. CAPS Architecture diagram.
All of the software libraries and tools that power CAPS are released under open-source licenses, such as Python, Git, Django, jQuery, and MySQL. Building the platform and application upon widely used open-source projects has the benefit of aligning development efforts with the broader technology community. The projects' large developer bases allowed the development team to find answers when implementation difficulties were encountered; because these tools are used broadly both within and without the library community, the issues encountered had already been raised, documented, and, in most cases, resolved by other developers.
Another benefit of working with open-source components, especially those from the curation community, was rapid identification and resolution of software deficiencies. In one instance, namely developing with the Python-based BagIt library, the team discovered a bug that prevented CAPS from completing file fixity verifications. The team reported the bug to the BagIt library's developer and within 48 hours the code had been patched, tested, and deployed publicly for the benefit of all developers using the code. Although this is but one anecdote, it validated the team's decision to work with open-source technologies and open specifications, and more importantly demonstrated the agility of the community developing such technologies.
Not only did the CAPS project allow the team to utilize more open-source software, but it also lent experience collaborating with peer institutions on such software, such as the BagIt library. In addition, it presented the team with the opportunity to contribute directly to open-source software communities: coding and testing of CAPS occurred entirely out in the open on Github, a social coding platform.
Reinforcing the theme of direct community engagement in building the CAPS software, stakeholders were engaged from the start of development in determining what metadata standards would be supported in the CAPS prototype. Designing a basic data dictionary for the CAPS project involved discussions with stakeholders to determine their curation needs and the development of a metadata framework that was easily extensible but still allowed basic content description and access services to function.
Stakeholders and members of the project team began by suggesting use cases to which metadata would apply; examples included applying retention and records management policies to digital resources, generating and documenting preservation events, and providing descriptive information about the resource.
The Dublin Core Metadata Element Set (DCMES) was used to meet basic descriptive metadata needs, due to its simplicity, to stakeholder familiarity with it as a descriptive metadata schema, and to its use in CONTENTdm, a system widely used by Penn State curators. In addition to Dublin Core, a handful of technical metadata fields, mapped to various elements within the PREMIS and MIX metadata standards, were defined in the CAPS data dictionary. The data dictionary was compiled through close collaboration with stakeholders in the University Libraries, analyzing in particular the curation and preservation data needs of units and employees responsible for these functions.
4. Agile Project Management
A methodology inspired by Agile software development (Highsmith & Cockburn, 2001) was used for project management and development in order to accomplish the objectives of this project within the short time frame. Applying agile principles to the development process, the development team involved stakeholders throughout the development of the prototype, rather than merely at the beginning and end as happens typically in traditional (e.g., "waterfall") project management practices. Development goals and scope evolved organically by way of stakeholder input, week after week.
The agile process workflow utilized for CAPS may be characterized as follows:
- Design and develop software (daily)
- Share progress and discuss blocking issues with development team (daily)
- Share deliverables with stakeholders (weekly)
- Elicit stakeholder feedback (weekly)
- Define deliverables for following week (weekly)
Several critical factors enabled the agile methodology to work successfully for this project. From the time the project was chartered, a strong communication plan was executed by the development team and the stakeholders. Most importantly, key managers and administrators fully supported the project and cleared the way for team members to devote the necessary time to do the work without competing priorities. They also ensured that all of the skills necessary to successful completion of the project were represented on the team.
Additionally, the team was empowered by supervisors and administrators to make provisional decisions in order to get the work done. This allowed work to progress without waiting for higher-level review and approval. Without this level of administrative support and commitment, the project could easily have faltered. Instead, a strong group commitment to the project was fostered -- another key factor in the project's success and in the successful use of agile methodology.
The development team spent a significant amount of time between weekly meetings coding for CAPS, which in turn informed the structured stakeholder involvement that was crucial for guiding the direction of the project. The previous platform review and pilot projects had also given the team an idea of what to show stakeholders to spark discussion with them. Developers began by mocking up, with a professional graphic designer, a series of potential user interface designs displaying the intended functionality in CAPS, and discussions soon took shape around user requirements, desired functionality, and changes to current workflows. Stakeholder feedback from these discussions guided CAPS development for each week of the project.
An agile methodology was effective for CAPS primarily because stakeholders set priorities for development in real time and had a strong sense of what was possible to accomplish week to week. Decisions on what to develop were based on issues that were the most important to stakeholders at each step in the process.
This project management approach could have led to an unmanageable task list, but since all parties were involved in the process, there was a shared understanding of what was reasonable to accomplish within the project's timeline. The team communicated the importance of acknowledging all stakeholder requirements, further building goodwill: each requirement was documented, whether it could be accomplished during the prototyping phase, or not, because future phases of development would then be able to draw on the postponed requirements. The CAPS project thus enabled continuity with stakeholders but also showed the development team’s commitment and responsiveness to curatorial needs.
Constant stakeholder engagement throughout the development phase, during which input was encouraged and documented, ensured that when the prototype was ready for testing, bug reporting, and design feedback, not only would stakeholders already be familiar with the prototype and the range of its functionalities, they also would have a vested interest in seeing the realization of a successful working example of a curatorial tool.
For the short, informal assessment phase following completion of the prototype, the development team set up a wiki for stakeholders to report their experiences with the prototype and its design, according to the following categories -- which pertain to the types of views available in the tool (see Figure 2):
- Dashboard view
- Ingest view
- Management view
Figure 2. CAPS Dashboard View
Stakeholders provided feedback for all but the final three categories (which, in retrospect, likely required having a period of time longer than a week in which to test the prototype's management of digital objects). On the wiki, stakeholders expressed general satisfaction with the dashboard interface and pointed to instances where object and batch ingest functionalities warranted fixes and enhancements, affording the project team a logical point of departure for the next phase of development.
The assessment phase included a short survey, evaluating experience with the project's process and the CAPS tool, for stakeholders to complete; four out of the six stakeholders took the survey. To the question, "The CAPS project team did a good job of listening to my concerns as a stakeholder," all respondents said "Strongly agree." Most respondents were also pleased with the frequency and content of communication (including meeting weekly and having questions answered by the development team clearly, directly, and in a timely fashion); with the prioritization of stakeholder requirements, and with the mock-ups of the project deliverables, which half the stakeholders strongly agreed reflected their priorities for the prototype.
6. A Community of Curatorial Practice as an Outcome
A key outcome of the CAPS project was the genesis of community building around curatorial practices at Penn State. The project goals reflect multiple levels of collaborative participation: in particular, the collaboration of the development team and the collaboration of stakeholders -- a combination of subject specialists, archivists, and technical staff. Through these collaborations, a community of practice around digital curation has taken shape. Curators at Penn State are more informed about curation tools, specifications, and architectures and increasingly have a vested interest in continuing to participate in their development at Penn State. In turn, the development team has a deeper understanding of the curatorial needs of these colleagues. Because stakeholders were closely involved in the development of the prototype, they became instant test users of the resulting platform.
More critically, by building a community of curatorial practice, work is beginning to take place across department- and system-specific boundaries, which bodes well for both curators and developers. Curatorial workflows have the potential to be more uniform, and more closely aligned, when not bound to a siloized systems, and these are brought to light by the crossing of such boundaries. The inherent flexibility and interoperability of a platform-based approach mean that the work of curators need not be confined to a particular application, and the same can be said of software developers, who are working not only on a platform based on the curatorial requirements of their users but also drawing on code that is openly available for distribution, sharing, and re-purposing.
Since the platform-based approach is emerging as an alternative to monolithic full-stack solutions, there has not been much reporting, or documentation, on the experience of curators using tools based upon them. Departing from this custom, the CAPS project has been particularly attentive to capturing what stakeholders think of the iteratively developed prototype, an approach the team intends to continue in future phases and hope to make more widely known by reporting on the team's experiences at conference venues and via other community channels. Documenting the process of stakeholder engagement, such as user testing, use cases, and discussions with developers, and then sharing reports about these activities with the larger digital curation community could serve as one model for how to manage and leverage user engagement in application prototyping and development.
7. Next Steps
The CAPS project proved the viability of developing a prototype curation tool in a very short time frame, with direct involvement of stakeholders, using principles of agile development. Yet, community building, particularly one centered on a new idea within a highly bureaucratic organization such as a large research library, comes with many challenges, such as competing projects and priorities, the need for new and continuing resources (both financial and human), and ways of maintaining the momentum and good will engendered by the CAPS experience. To address this last challenge, the team has established monthly curator roundtable lunches, which not only the development team and stakeholders will be attending but also other Penn State employees with curatorial responsibilities on campus, such as a data archivist at one of Penn State's research institutes.
In addition, many of the principles the team put to use in engaging with stakeholders apply equally to the activity of engaging end users, such as faculty and students. Most of the ideas for additional pilot projects and for additional features are based on use cases that have been documented and gathered from consultations with users outside the project and from discussions with stakeholders, who interact heavily with faculty and students. The CAPS team intends, through future phases of development including the design of a public interface, to understand the needs and requirements of end users by engaging them regularly as well.
Since the conclusion of the prototype development phase, the CAPS team has begun preliminary planning for the next phase of the project. The team will plan and execute the development of a production-ready, enterprise-quality platform to support publishing and curation services at Penn State. Project planning for this phase will involve work in the areas of software development, infrastructure (storage and security), and project management, and will include many of the principles of agile development and quality assurance testing detailed in the CAPS development process. Additional metadata development goals, such as documentation of preservation events and the integration of user annotations into the metadata record for a digital object, will be initiated and expanded upon where they already exist.
Above all, engagement with stakeholders will continue to be a key priority as it has become part of the culture, as will expanding the stakeholder community to include users outside the libraries, such as staff at departments and offices on campus generating electronic records; scholarly end users of digital library collections; and faculty researchers in need of research data curation support.