Innovative use of CISL supercomputer provides quality assurance for atmospheric data

By Brian Bevirt
08/21/2012 - 12:00am

When Ashley Bell was pursuing her B.S. in Biology at Ithaca College in New York, she never imagined that a summer research project in Colorado would lead her toward a Ph.D. in statistics. In 2010, she began an internship in CISL to study wind using radiosonde observations and regional climate models, which led to another CISL internship using statistical analysis to evaluate quality control methods for historical radiosonde data. She didn't expect her research experience at NCAR and a paper she co-authored about slug glue (Bradshaw et al., 2011) would be key to her selection for a prestigious graduate research fellowship from the National Science Foundation. By pursuing her diverse research interests, Ashley discovered the wealth of opportunities offered by CISL's SIParCS program, IMAGe's network of interdisciplinary collaborators, the Colorado School of Mines (CSM), and NSF research funding. Ashley's career path highlights the benefits of developing connections through collaboration.

Ashley Bell and Megan Yoder visiting NWTC in 2010
In this 2010 photo, Ashley (left) and fellow SIParCS intern Megan Yoder both worked on wind-related research projects. This photo was taken during their tour of the National Renewable Energy Laboratory's National Wind Technology Center near Boulder, Colorado. —Photo by Amanda Hering, CSM

First SIParCS internship

Ashley established her connection with NCAR when she was a junior at Ithaca College. Studying as a double major in Biology and Mathematics, she took on a small statistics research project about the impact of climate change on maple syrup production in upstate New York using regional climate models. While preparing to present her research results for the Whalen Symposium at Ithaca College, her project advisor Dr. Tom Pfaff suggested she consider a statistics internship in CISL's SIParCS program. Dr. Pfaff knew about SIParCS through his collaboration with Doug Nychka in CISL's Institute for Mathematics Applied to Geosciences (IMAGe). Ashley applied to SIParCS in the winter of 2009 and was selected to perform a study about wind output from climate models and North American Regional Climate Change Assessment Program (NARCCAP) data to reconstruct vertical wind profiles. Her project mentors included Dr. Steve Sain and Dr. Doug Nychka from IMAGe and Dr. Amanda Hering of CSM. Following the internship and supported by SIParCS, Ashley presented her results at the October 2010 workshop on environmetrics sponsored by the American Statistical Association.

New career path

During the fall semester of her senior year at Ithaca College, she decided to change from pursuing a career in biology and instead apply to graduate programs in statistics. Attending Colorado School of Mines appeared ideal given her pre-existing connections with NCAR and CSM, and the potential to continue her SIParCS research. Ashley said, "Mines has a small statistics department, but my connections to NCAR and the resources that are available to me there and at Mines are ideal for successfully completing my thesis." In the spring of 2011, Ashley was accepted to CSM and selected Hering as her adviser.

Ashley presenting graduate research plan to NCAR staff in 2012
Ashley Bell presented her graduate research outline to NCAR staff. In the audience (lower left) is her SIParCS mentor and ongoing collaborator, Joey Comeaux, a software engineer in CISL's Data Support Section. —Photo by Brian Bevirt, CISL

Second SIParCS internship

Also during her senior year, Ashley applied for a second SIParCS project in CISL, this time working with Joey Comeaux, a software engineer in CISL's Data Support Section. This project focused on quality control of radiosonde data. It built on an initial collaboration between Nychka and Comeaux to develop a robust statistical analysis method to evaluate atmospheric radiosonde data. Ashley’s SIParCS work focused on evaluating two different methods of flagging unusual measurements based on a statistic known as a Z score, and it transfers some ideas from the discipline of robust statistics.

Relevance of the research

Radiosonde observations form the backbone of our understanding of the vertical structure of the atmosphere, so archiving, curating, and serving this massive data record is an important program within CISL. A radiosonde is an instrument borne aloft by a balloon to collect data along a generally vertical path up to altitudes of 21–47 kilometers (13–29 miles). These records contain more than 60 years of data about wind speed and direction, temperature, pressure, and relative humidity and are available at several thousand locations. Radiosondes are currently launched twice daily at a network of approximately 800 stations across the globe and are valuable for weather forecasting. Examining the trends in radiosonde measurements over time can help in tracing the effects of climate change at different levels of the atmosphere. Quality issues surround the radiosonde data, however, due to problems such as transmission dropouts and data archiving. "Radiosonde data is tricky because it contains errors, but it may also contain information about extreme events," noted Hering. Methods to evaluate the quality of radiosonde data have been evolving for decades, and Ashley’s work represents some new techniques for discerning unusual observations. "This connection between statisticians and the Data Support Section, with Ashley as the linchpin, is an example of efforts to improve the data services that CISL provides to the scientific community," said Nychka.

Graduate research challenges

Although Ashley’s internships engaged some interesting applications, she was then faced with transforming her summer projects into part of her graduate studies and research. Ashley began working on her master's thesis in the fall of 2011. Her focus is to test the capabilities of various quality control procedures by developing a statistical simulation to mimic the various climatological signals in historical radiosonde data. This research will determine the optimal method to evaluate the quality of radiosonde data by working with at least 1,000 different data scenarios that are run 1,000 times each. "On my laptop, one scenario takes more than a day to complete," said Ashley. "On Lynx, the same run only takes an hour, and I can run many of them simultaneously. Tim Hoar [Associate Scientist in IMAGe's Data Assimilation Research Section] worked with me to write a script that submits these jobs to Lynx remotely. Without the supercomputer, I couldn't complete the one million runs required for this stage of my project in a reasonable amount of time." The simulation results are key to determining which method is best suited for the radiosonde data. Additionally, the final recommended procedure will allow scientists to determine which observations they believe are erroneous and which are extreme events.

Ashley with Dr. Amanda Hering of CSM
Amanda Hering (right) was one of Ashley's SIParCS mentors in 2010 and is now her graduate research advisor at The Colorado School of Mines in Golden, Colorado. A tenure-track assistant professor in applied math and statistics at CSM, Hering's experience with winds and wind forecasting supports her interest in applying spatial and temporal statistics to environmental data. She uses radiosonde data to validate climate models – specifically wind observations to validate the 3D wind data in NARCCAP – to test the model's accuracy in capturing observed wind behavior. Including winds in climate models is new because past models did not have sufficient resolution. —Photo by Brian Bevirt, CISL

NSF Fellowship

Given the strengths of her academic and research success, Ashley's mentors encouraged her to seek an NSF graduate research fellowship, and in spring 2012 she was successful in winning one of the 11 awarded in statistics. This NSF award provides a three-year fellowship including tuition and a stipend. Nychka notes,"Her proposal was strong because she had a good mix of an interesting applied problem combined with statistical research. Adding to that were her links with NCAR, the SIParCS experience with radiosonde data, plus a demonstrated ability to publish successfully."

Benefits of collaboration

Ashley's work represents a unique collaboration of statistical research at IMAGe and CSM in conjunction with the quality assurance efforts of CISL's Data Support Section, and it benefits the community supported by the National Science Foundation. CISL directly contributes to her project with office space and access to NCAR's supercomputing facilities. This research is moving forward quickly because the interdisciplinary research environment nurtured within CISL builds on input from staff at multiple universities, operational meteorologists, the modeling community, and the research data curation discipline. "We've had students come to IMAGe for a specific project, who have then returned as postdocs, taken jobs as university faculty, then sent their students here," Nychka concluded. "This process keeps refreshing our workforce with new talent, it reinvigorates others at the universities, and it is a healthy model for a national center."

Reference: Bradshaw, A., M. Salt, A. Bell, M. Zeitler, N. Litra, and A.M. Smith, 2011: Cross-linking by protein oxidation in the rapidly setting gel-based glues of slugs. The Journal of Experimental Biology, 214, 1699-1706.