Home Latest News Data Curation and the ADR-Round Table (10/16/09)
|
|
Data Curation and the ADR-Round Table (10/16/09) |
|
|
|
|
Written by Rose Nelson
|
|
Monday, 26 October 2009 |
Attendees:
Holley Long-CU
Boulder, Dawn Paschal-CSU, Jack Maness-CU Boulder, Thea Lindquist-CU Boulder,
Joe Kraus-DU, Patricia Andersen-CSM, Megan Tomeo-CSM, Jeff Kuntzman-Health Sciences,
Chris Brown-DU, Jessica Branco Colati-Alliance, Rose Nelson-Alliance, George
Machovec-Alliance
Conference call: Barb Losoff-CU
Boulder, Carol Ou-Colorado College, Matt Gottfried-Colorado College, Katie
Lage-CU Boulder, Dennis Moser-UWYO, Gabby Weirsma-CU
ADR has been charged with being able to support
datasets. This is the impetus behind the
round table. The conversation centered
around the following four questions:
·
What types of datasets libraries are currently
responsible for or interested in curating?
·
Current and anticipated faculty/researcher
and library relationships
·
Access needs, storage needs, and functional
requirements of dataset tools
·
Mandates from campuses and funding agencies
Staff from each library discussed how datasets were being
dealt with in their institution.
Health Sciences-Don't have specific information about datasets
but anticipate that the library will be involved with curation of
datasets. NIH OA initiative will be a
large impetus in creation of datasets. HIPPA-might prevent Health Sciences from
storing or at least making accessible some data. Datasets can be archived and kept "dark"
There is encryption on the ADR so this may satisfy HIPPA requirements.
CSU-lots of data being generated in various departments.
Need for curation. Data is being stored in various departments. Lots
of GIS data. Atmospheric science has datasets. NSF grant-signal
processing and traces
data. Protonomics and
genomics-generating lots of data. Interests in legacy data. Faculty
generating articles with attached
data that could be datasets.
No mandate to store datasets but submitted a grant to NSF to
purchase hardware to support data curation in their repository. Some grant funds will be allocated to
training faculty on digital rights issues in regards to datasets.
DU-subscribes to datasets through ICPR. They also have departmental datasets from
political sciences and some hard sciences.
There are also lots of GIS datasets.
There is some interest in having the library curate these datasets. However there are also some concerns from
faculty about ceding control of datasets to libraries. Librarians have had some Informal discussions
with faculty about datasets, but no formal policy or discussion has
occurred.
CU-anticipates the need for curating datasets. They have
started to interview faculty members about their datasets. Faculty don't yet see the role of the library
in dataset curation. A lot of this
departmental data is proprietary and restricted, so access issues will be
central to the discussion. Some faculty
prefers a disciplinary repository for their data. Librarians anticipate discussion with
engineers may be even more challenging. Need
for clarification to faculty on how the library can play a role in curation and
data protection/restrict access.
Departments at CU are across the board in how they preserve
and access their data. Some departments
do sophisticated back ups and curation,
others back up data on their personal computers.
There's also an issue with information not being recorded and only in one person's
head. What happens when that person leaves?
This also leads to the issue of who owns
the data. If a researcher leaves, who
owns the data the faculty member or the CU?
Grant initiatives may drive more need for curation and
collaborative work with the library in doing this.
Data sets are growing e.g. weather events datasets.
Humanities-economic and population history. There may be some barriers with
technology. Early modern english
diplomats dataset. Searching and
encoding mark up of text. Can search for
word patterns, usages. When did certain
words evolve?
Challenge-all different types of formats. Not really any uniformity.
CSM-right now concentrated on theses submission. Datasets attached to theses. Trying to find out what datasets enhance the
theses or are necessary to the theses.
UWYO- just getting started with datasets. Are in information partner in a Geography
project. Looking at preservation of geographical data on a large scale.
Linguistic data-large sets of text linguistic corpus.
Focuses on geomap portions first collaborative effort among 3 dept. on campus.
CC-haven't actively curated datasets. But see the ADR as a possibility for data
curation. See some pilot opportunities
for GIS datasets. Thinking of archiving
GIS dataset as a learning object.
Getting people onboard to add their metadata is the challenge. Use XML to convert the GIS datasets to
mods.
Metadata-ask stakeholder for advice on metadata. Very helpful to get them to label the
work.
Questions/Challenges-Data
Curation
How do you curate something that is continuously changing
(e.g. weather data)? Do you have
incremental backups or do it on a yearly process? How do you set up protocols
for these deposits? This is a good
question to ask. ADR may serve as a back
up, but your research is ongoing and the curation may be in a different
place.
There is an expectation that the library will be responsible
for not only archiving the data, but making it accessible with viewers. Library can support some formats but not all.
Text is a standard format that can be viewed by several applications, but there
are other formats that don't have a standard viewer. Some
repositories have the statement or disclaimer that they support certain data
formats and not others.
Good cataloging practice is to include software version and
technical specs in metadata so even if the library doesn't support the
particular viewer, someone accessing the data will know what they need to view
it.
Long term access-doesn't come up with faculty necessarily,
but it's a big question. It's difficult
to get the standardization across everything.
It's important to curate data so that we understand our past
and can build on to the future. There is
also the challenge of figuring out how to preserve such large sets of data and
determining when data can be purged.
These are policy issues-what to curate, when to purge? Perhaps the question isn't to purge the data,
but when to offload it.
Resources on
Datasets:
Michael Witt
Institutional Repositories and Research Data Curation in a Distributed
Environment
Library Trends - Volume 57, Number 2, Fall 2008, pp. 191-201
Volume 57, Number 2, Fall 2008
E-ISSN: 1559-0682 Print ISSN: 0024-2594
DOI: 10.1353/lib.0.0029
UKOLN Digital Preservation Publications
http://www.ukoln.ac.uk/preservation/publications/
Joe Kraus-Resources
http://delicious.com/jokrausdu/data_curation
Stella Conference
http://denver-stella.pbworks.com/
http://denver-stella.pbworks.com/Attendee-List
|
|