Data Curation and the ADR-Round Table (10/16/09)
Attendees:
Holley Long-CUBoulder, Dawn Paschal-CSU, Jack Maness-CU Boulder, Thea Lindquist-CU Boulder,Joe Kraus-DU, Patricia Andersen-CSM, Megan Tomeo-CSM, Jeff Kuntzman-Health Sciences,Chris Brown-DU, Jessica Branco Colati-Alliance, Rose Nelson-Alliance, GeorgeMachovec-Alliance
Conference call: Barb Losoff-CUBoulder, Carol Ou-Colorado College, Matt Gottfried-Colorado College, KatieLage-CU Boulder, Dennis Moser-UWYO, Gabby Weirsma-CU
ADR has been charged with being able to supportdatasets. This is the impetus behind theround table. The conversation centeredaround the following four questions:
- What types of datasets libraries are currentlyresponsible for or interested in curating?
- Current and anticipated faculty/researcher and library relationships
- Access needs, storage needs, and functionalrequirements of dataset tools
- Mandates from campuses and funding agencies
Staff from each library discussed how datasets were beingdealt with in their institution.
Health Sciences-Don't have specific information about datasetsbut anticipate that the library will be involved with curation ofdatasets. NIH OA initiative will be alarge impetus in creation of datasets. HIPPA-might prevent Health Sciences fromstoring or at least making accessible some data. Datasets can be archived and kept "dark"There is encryption on the ADR so this may satisfy HIPPA requirements.
CSU-lots of data being generated in various departments.Need for curation. Data is being stored in various departments. Lotsof GIS data. Atmospheric science has datasets. NSF grant-signalprocessing and tracesdata. Protonomics andgenomics-generating lots of data. Interests in legacy data. Facultygenerating articles with attacheddata that could be datasets.
No mandate to store datasets but submitted a grant to NSF topurchase hardware to support data curation in their repository. Some grant funds will be allocated totraining faculty on digital rights issues in regards to datasets.
DU-subscribes to datasets through ICPR. They also have departmental datasets frompolitical sciences and some hard sciences. There are also lots of GIS datasets. There is some interest in having the library curate these datasets. However there are also some concerns fromfaculty about ceding control of datasets to libraries. Librarians have had some Informal discussionswith faculty about datasets, but no formal policy or discussion hasoccurred.
CU-anticipates the need for curating datasets. They havestarted to interview faculty members about their datasets. Faculty don't yet see the role of the libraryin dataset curation. A lot of thisdepartmental data is proprietary and restricted, so access issues will becentral to the discussion. Some facultyprefers a disciplinary repository for their data. Librarians anticipate discussion withengineers may be even more challenging. Needfor clarification to faculty on how the library can play a role in curation anddata protection/restrict access.
Departments at CU are across the board in how they preserveand access their data. Some departmentsdo sophisticated back ups and curation,others back up data on their personal computers.
There's also an issue with information not being recorded and only in one person'shead. What happens when that person leaves? This also leads to the issue of who ownsthe data. If a researcher leaves, whoowns the data the faculty member or the CU?
Grant initiatives may drive more need for curation andcollaborative work with the library in doing this.
Data sets are growing e.g. weather events datasets.
Humanities-economic and population history. There may be some barriers withtechnology. Early modern englishdiplomats dataset. Searching andencoding mark up of text. Can search forword patterns, usages. When did certainwords evolve?
Challenge-all different types of formats. Not really any uniformity.
CSM-right now concentrated on theses submission. Datasets attached to theses. Trying to find out what datasets enhance thetheses or are necessary to the theses.
UWYO- just getting started with datasets. Are in information partner in a Geographyproject. Looking at preservation of geographical data on a large scale.
Linguistic data-large sets of text linguistic corpus.Focuses on geomap portions first collaborative effort among 3 dept. on campus.
CC-haven't actively curated datasets. But see the ADR as a possibility for datacuration. See some pilot opportunitiesfor GIS datasets. Thinking of archivingGIS dataset as a learning object. Getting people onboard to add their metadata is the challenge. Use XML to convert the GIS datasets tomods.
Metadata-ask stakeholder for advice on metadata. Very helpful to get them to label thework.
Questions/Challenges-DataCuration
How do you curate something that is continuously changing(e.g. weather data)? Do you haveincremental backups or do it on a yearly process? How do you set up protocolsfor these deposits? This is a goodquestion to ask. ADR may serve as a backup, but your research is ongoing and the curation may be in a differentplace.
There is an expectation that the library will be responsiblefor not only archiving the data, but making it accessible with viewers. Library can support some formats but not all.Text is a standard format that can be viewed by several applications, but thereare other formats that don't have a standard viewer. Somerepositories have the statement or disclaimer that they support certain dataformats and not others.
Good cataloging practice is to include software version andtechnical specs in metadata so even if the library doesn't support theparticular viewer, someone accessing the data will know what they need to viewit.
Long term access-doesn't come up with faculty necessarily,but it's a big question. It's difficultto get the standardization across everything.
It's important to curate data so that we understand our pastand can build on to the future. There isalso the challenge of figuring out how to preserve such large sets of data anddetermining when data can be purged. These are policy issues-what to curate, when to purge? Perhaps the question isn't to purge the data,but when to offload it.
Resources onDatasets:
Michael Witt
Institutional Repositories and Research Data Curation in a DistributedEnvironment
Library Trends - Volume 57, Number 2, Fall 2008, pp. 191-201
Volume 57, Number 2, Fall 2008
E-ISSN: 1559-0682 Print ISSN: 0024-2594
DOI: 10.1353/lib.0.0029
UKOLN Digital Preservation Publications
http://www.ukoln.ac.uk/preservation/publications/
Joe Kraus-Resources
http://delicious.com/jokrausdu/data_curation
Stella Conference
http://denver-stella.pbworks.com/
http://denver-stella.pbworks.com/Attendee-List
