New CMIP Analysis Platform will help researchers access and use data

By Marijke Unger
02/06/2016 - 1:45am

NCAR’s Computational and Information Systems Laboratory (CISL) is adding resources to support big data needs related to the Coupled Model Intercomparison Project (CMIP). With support from the National Science Foundation, CISL’s new CMIP Analysis Platform will help scientists analyze data produced in CMIP Phase 5 (CMIP5), and will be an important resource for processing even larger data sets once CMIP6 activities begin.

The objective of CMIP is to better understand past, present and future climate changes arising from either natural, unforced variability or in response to changes in radiative forcing in a multi-model context. An important part of CMIP is to make the multi-model output publicly available in a standardized format.

The CMIP Analysis Platform at CISL will make use of the Geyser and Caldera analysis clusters, the GLADE disk system, and will provide up to 500 terabytes of CMIP5 data, which includes all the CMIP5 data published by NCAR and will include other CMIP5 data requested by users.

The goal of the CMIP Analysis Platform is to support analysis that would otherwise be difficult or impossible for university researchers to conduct, either because of a lack of disk space or lack of analysis software or computing capability. The existing 1.8 petabytes of CMIP5 data is housed in many different places, complicating access to, and usability of the information.

These challenges will only increase with the even larger data sets expected from the upcoming CMIP6 activities. CISL is using CMIP5 data to roll out the CMIP Analysis Platform and prepare for a larger-scale environment to be deployed for analysis of CMIP6 data.

“The CMIP Analysis Platform will make the data more usable,” said Dave Hart, Head of CISL's User Services Section. “We are also providing storage on our GLADE system, and we are helping with curating the data. You can think of it as a lending library, where we are not only procuring the particular volume a user is interested in, but we also hold up the very heavy tome while they read and analyze it.”

The CMIP Analysis Platform is available to any researcher who is eligible for a small university allocation, which includes scientists with NSF awards as well as graduate students and postdoctoral researchers for their dissertation or postdoctoral projects. Users with existing projects on the Yellowstone environment can also access the CMIP5 data.

Interested users can request allocations on the CMIP Analysis Platform and request CMIP5 data sets to be added to the file system. CISL staff will locate the data and add it to the GLADE space. Users can then use the Geyser and Caldera systems to perform their analyses. To support the widest possible range of research, data sets will be rotated in and out regularly based on user requests and demand for the various sets of data.

“What makes this setup unique is that it combines large scale storage, large scale analysis clusters, and the data,” said Hart. “We are a one-stop shop that will help make the information really usable.”