Applying the science and technology of data assimilation

Jul 11, 2017

This article is part of a CISL News series describing the many ways CISL improves modeling beyond providing supercomputing systems and facilities. These articles briefly describe CISL modeling projects and how they benefit the research community.

CISL’s Data Assimilation Research Section (DAReS) has been developing a community software facility for ensemble data assimilation since 2002. Their Data Assimilation Research Testbed (DART) includes software interfaces to many types of models, allowing the models to use observed data to produce initial conditions for predictions. Using DART allows model developers to compare predictions with subsequent observations, and helps them identify and repair weaknesses in their models. DART has been widely adopted throughout the scientific research community because it offers great benefits to modelers, and because the DAReS staff’s expertise in data assimilation science has made it so effective. In reciprocal fashion, community researchers provide specific feedback to DAReS scientists to help them determine the best paths for new development in the DART facility.

In addition, the DAReS team provides customer service to the community’s modeling enterprise by working directly with researchers and developers. “We collaborate on nearly all of NCAR’s community models and with almost every lab at NCAR, as well as with many U.S. universities, U.S. national labs and defense labs, and universities and government labs in Europe,” said Jeff Anderson, DAReS section manager. “It is difficult to create software that interoperates well with large, realistic geophysical models, and DART provides proven methods and software tools that allow scientists to focus on their research questions. It has been very successful in allowing graduate students, university faculty, and other scientific groups to apply data assimilation in novel ways without making the substantial investment of engineering new software.”

Research models such as the Community Earth System Model (CESM) represent complex physical systems with interacting components that model the Earth’s atmosphere, oceans, ice, and land. The CESM and other large-scale models integrate these components through time to make inferences or predictions about planetary-scale dynamic systems such as weather and climate. Data assimilation incorporates millions of past measurements of these dynamic systems into the computer models, providing realistic initial conditions for projections about how they will evolve days, years, and centuries into the future.


These plots show measured and modeled zonal mean temperatures between 70ºN and 90ºN during the January 2009 sudden warming of the stratosphere. The bottom plot shows the observed temperatures (in degrees Kelvin, see legend at right), the center plot shows how this state of the atmosphere was simulated by the specified-dynamics version of the WACCM model, and the top plot shows WACCM’s improved result after DART modifications were applied.

Model output improved by data assimilation

The key point in this figure is that WACCM+DART captures both the stratosphere warming and mesosphere cooling that are seen in the observations. Also seen in the specified-dynamics version of WACCM, the elevated stratopause that forms at high altitudes around day 30 descends too fast compared to the observations. The elevated stratopause is maintained at a high altitude in the WACCM+DART simulation. This has implications for the descent of species from the mesosphere into the stratosphere. Accurate representation of the mesosphere dynamics is important for the ionosphere variability during sudden stratosphere warming events. (Figure courtesy of Nick Pedatella, HAO)

Because observations and model predictions both have errors, it is important to quantify this uncertainty when making forecasts. DART’s ensemble data assimilation methods provide modelers with sets of equally likely initial conditions for producing an ensemble of forecasts. Differences between these forecasts give information that can help scientists identify the most important shortcomings in individual models. DART ensemble assimilation tools can also help modelers ‘tune’ parameters in their models so model forecasts better fit observations.

A recent example of how DART benefits model development appears in the figure at right. Nick Pedatella (HAO) applied DART to the Whole Atmosphere Community Climate Model (WACCM) and used DART to quickly improve the model’s ability to produce short-term forecasts. DART helped him analyze the model’s outputs to identify the types of issues that could be involved. The model developers then addressed those issues in the model’s code and significantly improved its predictions. This work is valuable to three NCAR labs: CGD developed the WACCM dynamics, HAO provided code development for WACCM to simulate space weather, and ACOM uses WACCM for high-altitude atmospheric chemistry. The figure compares model short term forecasts generated with DART (top) to another method of making the model close to observations (specified dynamics WACCM, middle) and the available observations (bottom).

Data assimilation – with its comparison of model outputs to observations – is a key technology for modeling because it is the best method we have to evaluate and validate prediction models. We have only recently developed the capability to perform data assimilation on climate models. Historically, it has been too difficult and expensive for modeling groups to do that, but as computers have become more capable and software such as DART has been developed, validation of model predictions is now practical. The ongoing development and improvement of models requires scientific assurance that a model is producing accurate forecasts for the right reasons. Data assimilation helps developers progress from a model that produces something that isn’t right, to understanding why it isn’t right, to code changes that produce consistently accurate results. Finally, data assimilation adds meaning to those results by quantifying the uncertainty in model predictions.