New IMAGe Course Refines Science of Extremes

By Marijke Unger
06/30/2016 - 2:00pm

This year, CISL's Institue for Mathematics Applied to Geosciences (IMAGe) started a new “in-reach” initiative intended to offer NCAR scientists a data learning experience, using a hands-on approach to teach staff how to use software and perform data analysis at the same time.

The course series, entitled “Beyond P-Values,” is geared to the experienced scientist or engineer who wants to focus more intensively on key data analysis concepts and skills that are pertinent to her or his specific area of scientific inquiry. Typically this would mean going beyond the basic statistics course and learning more specialized methods.  

A P-value is a number between zero and one that measures the evidence for a hypothesis about a data set.  For example, it can be used to answer questions like whether temperatures are increasing over time, or if a candidate has a majority in a poll. P-values tend to be more useful for simple kinds of data problems rather than the complex scientific data questions we see at NCAR. P-values can only serve to disprove a hypothesis, not inform it, so the approach lacks the subtlety needed for many areas of research. Also, they have the disadvantage of reducing the analysis to a single number, thus missing important features.  

The first in the series of the two courses for 2016 was organized by Philippe Naveau, and took place at NCAR on March 28-30. The subject was “Statistics of Extremes,” and topics included introductions to uni- and multivariate extreme value theory in climate sciences and detection and attribution for extremes.

“Much research has been done based on mean data, and the statistics for that have been really well worked out, so solutions are easier to compute,” explained Doug Nychka, director of IMAGe. “But when we start looking at minima and maxima, which is interesting for climate research, it’s not clear that the same methods work. Philippe has been developing statistical methods specifically designed to analyze those extreme values.”

Sample template image
Philippe Naveau and participants in the first of two P-Values courses held at the NCAR Mesa Lab in April 2016 (photo by Marijke Unger).

The courses are in high demand, and for now, are limited to 12 NCAR staff. The format consists of experts in the field providing overview lectures followed by hands-on training with relevant tools like open source R statistical software and tailored packages for analysis of extremes.

“The added benefit of the courses is that they offer a platform for connecting scientists working in specific disciplines with statisticians who can help them,” said Dorit Hammerling, a statistician in IMAGe spearheading the Beyond P-values series. “They actually forge collaborations within the three-day workshops as they work together.”

Organizers prepared a pre-workshop survey and discovered that participants wanted to learn how to model extreme events such as the 2013 flood in Boulder. They were also interested in learning the basics of statistical programming. The demographic of participants surprised the organizers, who expected to see mostly students or postdocs, but actually attracted  many senior scientists.

The courses were led by five instructors, two from Colorado State University, two from NCAR, and Philippe Naveau, who is a visiting scientist from France’s Centre National de Recherche Scientifique.

The second course in the series, “Introduction to Bayesian Statistics,” will take place July 11-13, also at NCAR. Bayesian Statistics is an emergent area of statistics applicable to many problems and especially relevant in the context of uncertainty quantification. By the end of the course, participants will fit and evaluate models for a dataset of atmospheric CO2 concentrations taken from ice-core measurements. Alex Gitelman from the University of Oregon, Corvallis, is the lead organizer for this session. 

Next year’s sessions will focus on spatial statistics, time series, and spectral analysis, and the goal for the long run is to repeat the most popular courses in two- to three- year intervals.