April 1, 2010

Digital Curation Planning One-on-One Unit Interviews

During the months of January, February, and March 2010,  members of the MSU Digital Curation Planning Team conducted one-on-one meetings with select respondents to our October 2009 baseline data questionnaire.  As we could not interview every survey participant, we decided to focus on units that reported use of content management systems and/or digital repositories. These units would have concentrations of digital content that they are trying to manage and/or preserve. They were likely to be open to help and suggestions that we might offer, and might already have solutions in place that could be of use to other units. We were also interested in talking to units that created digital content documenting MSU history in some way and that might have sent non-digital content to the MSU Archives in the past. Units interviewed included:

The meetings were structured as informal, two-hour conversations rather than formal interviews and were held at the unit offices. Discussions included how the digital content to be preserved relates to the mission of the unit; whether it was of ongoing use or of archival value—that is, whether it documents the activity of the unit or the university; the file formats used; and the storage infrastructure, including any space issues. Regarding the unit’s content management system and/or digital repository, we asked about the system(s) they use, why they chose it, and how they use it. We also asked about their ingest, archival storage/preservation, and access processes and workflows. Finally, we asked about metadata stored with or related to the content and any file naming conventions.

We are currently  in the process of analyzing the results of the interviews and have made some general observations to date. Each of the units developed solutions that fit the nature of their data and the needs of their users. Some use commercial applications, and some use open-source software. The Turfgrass Information Center, for example, has long used Cuadra STAR as its database/content management system, and the Department of Theatre uses the relatively new open-source ResourceSpace repository solution. Some units—such as Broadcasting Services—hold digital content of archival value to the university.

The interviewed units exhibited some very positive trends related to preservation and curation. First, most backed up their data in some fashion. Many of them demonstrated a good use of metadata, and many were using repository software with very good access and discovery interfaces to manage their content. Importantly, many of the units had strong support from their management and stable funding. Most of the units expressed interest in appraisal and curation guidelines, and they do need some help: Although the units back up their data, the backups  tend to be located very close to production servers—often in the same building, if not the same room. Some of the units create only minimal metadata for their digital content, and we found little in the way of digital curation policies.

We are also in the process of performing a comparison of metadata schema among the units interviewed and the Dublin Core set of metadata elements. Six of the ten units interviewed had metadata schema to share. Three units use metadata schema based on Dublin Core, with slight variations to reflect local needs. The Department of Art & Art History uses the Image Resource Information System (IRIS) data standard for cataloging and management of art images, and the metadata is based on the Visual Resources Association Core (VRA Core) and the Cataloging Cultural Objects (CCO) guide to good cataloging practices. Physical Plant’s metadata is specified for use in its commercial engineering content management system for managing facilities assets. Finally, the Turfgrass Information Center uses indexing terms it has specified in the Cuadra STAR system for cataloging bibliographic information of all things turfgrass.

December 8, 2009

Digital Curation Planning Survey Results

The baseline data questionnaire administered recently by the MSU Digital Curation Planning Project team yielded 90 responses: 23 from academic departments, 31 from administrative services units, 9 from research centers, and 27 from technology services units.

Represented academic departments covered a wide range of fields, from agricultural economics, nursing, and veterinary medicine to math and science education, physics and astronomy, telecommunications, business, athletics, and the arts. Likewise, administrative units ranged from the Controllers Office, Inventory/Capital Asset Management, the Office of Planning and Budget, the Office of the President/Board of Trustees, and the MSU Libraries to Broadcasting Services, University Relations, and Virtual University Design and Technology (vuDAT), among others. The research centers included the Cyclotron, the Julian Samora Research Institute, and MATRIX. In contrast, all of the technology services responses came from 6 units: Academic Technology Services (ATS), Administrative Information Services (AIS), Agriculture and Natural Resources (ANR) Technology Services, Enterprise Business Systems Projects (EBSP); Enterprise Information Stewardship (EIS); and Health Information Technology.

The types of digital content making up the largest proportion of a given unit’s content varied considerably. Digital and scanned photos and images, word processing documents, and research data sets topped several of the academic departments’ lists, while administrative units reported large proportions of paper imaging documents, word processing and spreadsheet documents, and databases. Research data, audio/video, word processing documents, and programming code predominated at the research centers. As might be expected, technology services units noted that most of their digital content consisted of code, databases, and web pages.

File formats comprising the largest proportion of a given unit’s digital content were similarly varied. Among the academic departments surveyed, PDFs, SPSS and SAS statistical formats, TIFFs, JPEGs, MySQL, and Camtasia video formats were all noted. Various database formats, TIFFs, text, MS Office formats, as well as audio and video formats, predominated at the administrative units. The research centers reported sizeable concentrations of video formats, php code, MS Word, and SAS, and the technology services units carry large proportions of text and programming code formats.

In terms of storage, nearly all of the units store digital content on hard drives, and most also use some combination of different types of removable media as well as network storage; one unit even reported storing data on cassette tapes. Seventeen units plan to increase online storage capacity in the near future, most from 1-10 TB, with some planning expansion of up to 50 TB.

Several units have implemented or plan to implement content management system (CMS) and/or digital repository software. CMS solutions noted include Sharepoint, Filemaker, Subversion, Alfresco, Trac, Adobe Version Cue, Mura CMS, Drupal, Portfolio Server 9, Madison Digital Image Database (MDID), IRIS, Cascade CMS, Document Viewer, DotNetNuke, Intrafinity, an internal wiki-based system, as well as other in-house-developed software. Physical Plant uses the Facilities Administration Management Information System (FAMIS). In some cases, the CMS doubled as a unit’s digital repository. Other digital repository solutions in use included KORA, ResourceSpace, the Concurrent Versions System (CVS) and Git version control software, carefully organized web and file servers, and some “homegrown” solutions.

Many of the respondents provided additional comments stating great interest and enthusiasm in the digital curation planning project’s goal of establishing naming conventions and other digital curation standards. One administrative unit noted, “This is a timely survey, because our unit is at a point where we HAVE [sic] to choose which data to delete off our servers, as we are accumulating more than we can afford to store.  We need university guidelines and related archival resources.” Another asked for guidelines on how to handle archive-worthy files at the time of creation, rather than storing everything up front and subjecting the unit to an arduous appraisal process later. Interest in guidance on choosing a digital asset management system was also expressed.

September 30, 2009

Baseline Data Questionnaire: An Opportunity to Help Preserve MSU’s Digital Assets

Is your MSU department or unit responsible for large quantities of data that document the activities and scholarship of Michigan State University? Are you concerned about the future of your unit’s unique digital assets?

If you would like to participate in this important initiative that will result in effective preservation and management guidelines for the university’s digital information, please click here to fill out a short questionnaire about your digital environment. It should only take approximately ten minutes to complete. The questionnaire will be available through October 16, 2009.

