A+ | A- | Reset

Search coalliance.org

Home arrow Latest News arrow Data Curation and the ADR-Round Table (10/16/09)
Data Curation and the ADR-Round Table (10/16/09) PDF Print E-mail
Written by Rose Nelson   
Monday, 26 October 2009
Attendees:

Holley Long-CU Boulder, Dawn Paschal-CSU, Jack Maness-CU Boulder, Thea Lindquist-CU Boulder, Joe Kraus-DU, Patricia Andersen-CSM, Megan Tomeo-CSM, Jeff Kuntzman-Health Sciences, Chris Brown-DU, Jessica Branco Colati-Alliance, Rose Nelson-Alliance, George Machovec-Alliance

Conference call: Barb Losoff-CU Boulder, Carol Ou-Colorado College, Matt Gottfried-Colorado College, Katie Lage-CU Boulder, Dennis Moser-UWYO, Gabby Weirsma-CU

 


ADR has been charged with being able to support datasets.  This is the impetus behind the round table.  The conversation centered around the following four questions:

·         What types of datasets libraries are currently responsible for or interested in curating?

·         Current and anticipated faculty/researcher  and library relationships

·         Access needs, storage needs, and functional requirements of dataset tools

·         Mandates from campuses and funding agencies

Staff from each library discussed how datasets were being dealt with in their institution.

Health Sciences-Don't have specific information about datasets but anticipate that the library will be involved with curation of datasets.  NIH OA initiative will be a large impetus in creation of datasets. HIPPA-might prevent Health Sciences from storing or at least making accessible some data.  Datasets can be archived and kept "dark" There is encryption on the ADR so this may satisfy HIPPA requirements.

CSU-lots of data being generated in various departments. Need for curation. Data is being stored in various departments.  Lots of GIS data.  Atmospheric science has datasets.  NSF grant-signal processing and traces data.  Protonomics and genomics-generating lots of data. Interests in legacy data.  Faculty generating articles with attached data that could be datasets. 

No mandate to store datasets but submitted a grant to NSF to purchase hardware to support data curation in their repository.  Some grant funds will be allocated to training faculty on digital rights issues in regards to datasets.  

DU-subscribes to datasets through ICPR.  They also have departmental datasets from political sciences and some hard sciences.  There are also lots of GIS datasets.  There is some interest in having the library curate these datasets.  However there are also some concerns from faculty about ceding control of datasets to libraries.  Librarians have had some Informal discussions with faculty about datasets, but no formal policy or discussion has occurred. 

CU-anticipates the need for curating datasets. They have started to interview faculty members about their datasets.  Faculty don't yet see the role of the library in dataset curation.  A lot of this departmental data is proprietary and restricted, so access issues will be central to the discussion.  Some faculty prefers a disciplinary repository for their data.  Librarians anticipate discussion with engineers may be even more challenging.  Need for clarification to faculty on how the library can play a role in curation and data protection/restrict access.   

Departments at CU are across the board in how they preserve and access their data.  Some departments do sophisticated  back ups and curation, others back up data on their personal computers. 

There's also an issue with information  not being recorded and only in one person's head.  What happens when that person leaves?  This also leads to the issue of who owns the data.  If a researcher leaves, who owns the data the faculty member or the CU? 

Grant initiatives may drive more need for curation and collaborative work with the library in doing this.

Data sets are growing e.g. weather events datasets. 

Humanities-economic and population history.  There may be some barriers with technology.  Early modern english diplomats dataset.  Searching and encoding mark up of text.  Can search for word patterns, usages.  When did certain words evolve? 

Challenge-all different types of formats.  Not really any uniformity. 

CSM-right now concentrated on theses submission.  Datasets attached to theses.  Trying to find out what datasets enhance the theses or are necessary to the theses.    

UWYO- just getting started with datasets.  Are in information partner in a Geography project.  Looking at preservation  of geographical data on a large scale. 

Linguistic data-large sets of text linguistic corpus. Focuses on geomap portions first collaborative effort among 3 dept. on campus.

CC-haven't actively curated datasets.  But see the ADR as a possibility for data curation.  See some pilot opportunities for GIS datasets.   Thinking of archiving GIS dataset as a learning object.  Getting people onboard to add their metadata is the challenge.  Use XML to convert the GIS datasets to mods. 

Metadata-ask stakeholder for advice on metadata.  Very helpful to get them to label the work. 

Questions/Challenges-Data Curation

How do you curate something that is continuously changing (e.g. weather data)?  Do you have incremental backups or do it on a yearly process? How do you set up protocols for these deposits?  This is a good question to ask.  ADR may serve as a back up, but your research is ongoing and the curation may be in a different place. 

There is an expectation that the library will be responsible for not only archiving the data, but making it accessible with viewers.  Library can support some formats but not all. Text is a standard format that can be viewed by several applications, but there are other formats that don't have a standard viewer.   Some repositories have the statement or disclaimer that they support certain data formats and not others. 

 

Good cataloging practice is to include software version and technical specs in metadata so even if the library doesn't support the particular viewer, someone accessing the data will know what they need to view it.

Long term access-doesn't come up with faculty necessarily, but it's a big question.  It's difficult to get the standardization across everything. 

It's important to curate data so that we understand our past and can build on to the future.  There is also the challenge of figuring out how to preserve such large sets of data and determining when data can be purged.  These are policy issues-what to curate, when to purge?  Perhaps the question isn't to purge the data, but when to offload it.

 

Resources on Datasets:

Michael Witt
Institutional Repositories and Research Data Curation in a Distributed Environment
Library Trends - Volume 57, Number 2, Fall 2008, pp. 191-201
Volume 57, Number 2, Fall 2008
E-ISSN: 1559-0682 Print ISSN: 0024-2594
DOI: 10.1353/lib.0.0029

UKOLN Digital Preservation Publications
http://www.ukoln.ac.uk/preservation/publications/

Joe Kraus-Resources
http://delicious.com/jokrausdu/data_curation

Stella Conference
  http://denver-stella.pbworks.com/
  http://denver-stella.pbworks.com/Attendee-List

 

 
 
< Prev   Next >
© 2009 Colorado Alliance of Research Libraries
Joomla! is Free Software released under the GNU/GPL License.