Workshop in data analytics for teachers

By Brian Bevirt
08/27/2013 - 12:00am

Doug Nychka (CISL) and Timothy Robinson (University of Wyoming – UW) want to change the way people learn statistics and interact with data. Traditionally, statistics is taught using simple methods and small data sets. Despite the merits of this approach for new students, real-world applications of the discipline are messy, and statisticians contend with uncertain values and missing data. But these real-world data sets tell rich stories and sometimes offer intriguing clues about how nature works. Those possibilities are what stimulate the curiosity and imagination of future scientists and engineers. That enthusiasm for discovery is what Nychka and Robinson want to cultivate as they strive to develop and enrich the nation’s future workforce.

Locations of the most severe tornados
An example of a real-world geoscience dataset used in the workshop shows the locations of the most severe tornados (EF4 in blue and EF5 in magenta) recorded by the U.S. National Weather Service from 1950 – 2012.

They began their collaboration in an effort to broaden participation in math and statistics. Robinson cites a 2011 Harvard Study on Educational Policy and Governance titled “Globally Challenged: Are U.S. Students Ready to Compete?,” which found that the U.S. class of 2011 had a 32% proficiency rate in mathematics, ranking the U.S. 32nd among the nations evaluated. The study analyzes this finding and notes, “While the 42 percent math proficiency rate for U.S. white students is much higher than the averages for [U.S.] students from African American and Hispanic backgrounds, U.S. white students are still surpassed by all students in 16 other countries.” Nychka and Robinson want to develop young people’s interest in useful applications of statistics as early in their lives as possible.

Targeting math, science, and statistics professors of freshman and sophomore courses, the workshop particularly encouraged attendance by faculty at community colleges who teach introductory statistics and teachers of AP statistics at high schools in Wyoming and along the Front Range. Faculty at minority-serving institutions and historically black colleges and universities were also invited.

As Director of CISL’s Institute for Mathematics Applied to the Geosciences (IMAGE), Doug Nychka has a mandate to teach and mentor upcoming generations of scientists. Timothy Robinson’s educational expertise arises from his roles as a UW professor of statistics and Interim Director of Wyoming WWAMI Medical Education. The 16–19 June workshop “Data analytics for the Geosciences using R” is the first step in their plan to empower instructors to teach young people practical applications for statistics and modern data analysis in an interactive learning environment. Held at NCAR’s Mesa Lab campus in Boulder, Colorado, this workshop was the first in a planned series of events that will provide instructors with engaging tools for teaching statistics and stimulating their students’ interest in discovery.

Participants use the R software language for statistical computing, graphics, and interactive data analysis
Some of the participants in the new NCAR-UW workshop are shown using the R software language for statistical computing, graphics, and interactive data analysis. Participants felt that the workshop helped them leap forward in their R skills. Several said that they had tried to get started with R but never overcame the initial learning curve to try it out for their classes. “Data analytics for the Geosciences using R” is a new type of workshop that will multiply the efforts of statistics experts who want to make a beneficial impact on future U.S. scientists and engineers.

The inaugural hands-on tutorial workshop first taught participants how to use the R statistical software language. Participants then explored substantial data sets using a variety of statistical methods. Although the data sets were primarily drawn from the environmental and Earth sciences, participants were encouraged to use data from other fields that matched their interests. The interactive setting for the workshop supported the participants in contributing ideas for complementing and enhancing traditional statistics instruction. The workshop format also allowed participants to be collaborators in developing data analysis curricula that are rich in data that interests both them and their students. The intent of this format is to encourage participants to stress the relevance of statistics and data analysis in their student’s daily lives as well as their career plans.

In the post-workshop debriefing, participants indicated that they planned to use what they learned to help their students develop (a) data analysis skills, (b) techniques for handling larger and more complex data sets, and (c) an appreciation for the value of more advanced statistical methods. Participant feedback throughout the course provided inspiration for enhancing this workshop the next time it is offered. This process has started an interest group for exchanging training materials.

R is a publicly available software package developed by the international statistics community. It is a standard for current statistical methodology in industry, government, and health care. Thousands of contributors and an estimated two million users worldwide make up this active community. R has the flexibility to support a range of users from beginning statistics students all the way through scientists and engineers who are pursuing cutting-edge data analysis for research and commercial applications. It runs on numerous types of computing platforms, from laptops to supercomputers.