SIParCS 2022 - Ed Liu

Ed Liu, Drexel University

Ed Liu, Drexel University

Improving the Speed and Scalability of the Data Assimilation Research Testbed

Recorded Talk

The Data Assimilation Research Testbed (DART) is developed by the Data Assimilation Research Section (DAReS) and is widely used by researchers to conduct ensemble data assimilation. DART is used to improve predictions of weather or ocean models by combining model forecasts and observations. This work aims to improve the speed and scalability of DART through code profiling and the identification of computational barriers. Detailed code profiling results exposed a significant computational barrier in the close observation caching subroutine. Removing the redundant caching process improved the computational time for runs with larger numbers of states by over 50%. Another existing problem with DART is a computationally expensive job for DART: a high-resolution MIT General Circulation Model (MITgcm) run of the red sea. The simulation could not be run even on the extreme memory nodes (4TB memory) because of the huge netCDF state files which contain useless information as fill values. Offline processing reduced the state file from over 10 GB to less than 1 GB by removing the fill values and reformulating the order of state variables. Key subroutines for communicating between DART and MITgcm source codes were changed to accommodate the new state file. This MITgcm case run finished successfully with computational resources on Cheyenne with changes mentioned above, and the computational time for a smaller version of the run was reduced significantly. Future work involves making originally offline processing as a part of the DART run, which will reduce the preprocessing work for users.

Mentors: Helen Kershaw, Jeffrey Anderson

Slides and poster