Support for Your Research Data Management Needs!

By Brian Bevirt
02/08/2018 - 6:00pm

As the power of supercomputers, satellites, and other research instruments increases, the massive datasets they generate must be stored, managed, and preserved for efficient analysis and usage over many years. Along with many other funding agencies, the National Science Foundation requires that grant proposals include a plan for how research data will be managed and shared. There is an important scientific need for this requirement because datasets are becoming increasingly complex to use as they grow so rapidly in both volume and diversity. In her work as CISL’s Data Curation and Stewardship Coordinator, Sophie Hou is developing a pilot program to help NCAR scientists manage the large and growing volume of research resources – including data and software scripts – that they use in their work.

Sophie Hou
Sophie Hou is CISL’s Data Curation and Stewardship Coordinator, and she develops the Digital Asset Services Hub (DASH) to support, engage, and train researchers who manage digital assets. (Photo by Eliott Foust)

Sophie notes that “this pilot program is not only an opportunity for scientists at NCAR, but also for library and information sciences students. The students receive data management training specific to the requirements of NCAR’s Earth System scientists, and the scientists receive competent assistance with their data management needs. With input from NCAR’s DSET and DASH programs, this pilot program has so far employed three students from the University of Illinois at Urbana-Champaign (UIUC) iSchool to complete five different projects during the last two semesters.”

The support for this pilot program comes from:

  • Time Sophie dedicates to coordinating logistics for the students and training them to handle the specific needs of their assigned NCAR projects, both as part of her job.

  • Time contributed by DSET team members who identify NCAR researchers with data management needs.

  • Time researchers spend working with a student (1-2 hours per week), which is typically less than the time they would have to spend on data management and curation if they were to perform the tasks on their own.

  • Credit hours awarded by the UIUC iSchool toward the students’ degree programs.

Sophie noted that “we are first partnering with the iSchool at UIUC because we have experience with its reputable library and information sciences program, and because it already has a special track for data curation. Additionally, UIUC iSchool students can use distance learning to participate in its programs, so the students already know how to collaborate with remotely located team members. If resources permit, we want to expand and partner with not only the information sciences programs at other universities, but also with other relevant programs at more universities. We also have diversity goals to expand the range and types of students who we hope will enter this field.

“The most immediate value produced by our initial efforts was that the students were able to help create metadata records for many key datasets from several NCAR labs. This student contribution is very significant to our DSET/DASH activities because while the researchers know they need to provide metadata records for their datasets, they often do not have enough time to process these records. Consequently, by helping researchers create their metadata records, the students are helping NCAR meet its data-sharing requirements.”

In reciprocal fashion, NCAR provided the research environment where the students collaborated with scientists to produce and preserve valuable data resources. The people involved in this pilot program are working to ensure that its benefits far outweigh its costs, and they are considering options for gathering the resources to expand it for researchers and the next generation of data professionals.

Describing her ideas about developing this pilot program, Sophie said, “To the best of my knowledge, because the field of ‘data professionals’ is so new, people in the past had mainly been trained on the job, often without having the opportunity to be recognized for doing work that is now categorized as data stewardship: management, curation, and preservation. Going forward, to facilitate the growth and recognition of this work as a profession, it would definitely be helpful to incorporate practical training in degree programs, while continually identifying and including complementary skill areas as the profession matures. This will give aspiring data professionals a balanced foundation in academic, theoretical, and applied data stewardship as well as opportunities to evolve as data needs change over time.”

Due to the fast growth and the high demand in this field, the path to become and to be recognized as a data professional is not well defined. Consequently, it is difficult for people interested in data stewardship to learn how to pursue a career in this field. The guidance and hands-on experience that NCAR can provide will allow next-generation professionals not only to understand the types of work that exist in a research center, but also to contribute their knowledge and develop their skills. This experience will help them evaluate the additional training they might need and identify opportunities they can pursue further – all while providing value for scientists and their research, which includes:

  • Optimized data value resulting from consistent metadata.

  • Expanded and improved data curation.

  • Faster and easier data discovery and access.

  • Enhanced scientific understanding because data and metadata are more usable.

Please contact the team if you have a research project to which a data stewardship student could contribute! And share your thoughts and ideas for expanding this pilot program. You can contact the team at this email address: datahelp@ucar.edu

You are also encouraged to use the Digital Asset Services Hub (DASH) and the Data Stewardship Engineering Team (DSET) for guidance in making your data stewardship responsibilities more efficient and for improving the scientific value of all your data products.