Data Compression of Climate Simulation Data


John Dennis

John Dennis
Application Scalability and Performance Group
National Center for Atmospheric Research
Boulder, Colorado


While the rate at which it is possible to generate data from earth system model simulations has recently increased, the cost of either online or offline storage has not kept pace. This discrepancy in rate of advancement for compute costs versus storage costs is a direct impact of fundamental computing trends. While climate science is currently limited by the amount of compute cycles available, we believe that in the very near future, storage volume will become the primary limitation.  It is not a question of ‘if’ but ‘when’  storage volume will be the limiting factor.

We explore the use of data compression as a possible solution to significantly reduce storage volume. We examine several lossy-compression algorithms, including Apax compression from Samplify, a wavelet-based algorithm, and others. We compare both the quality and speed of compression. We evaluate quality by comparing the difference between a compressed data and an ensemble of runs created by a small perturbation to initial conditions. We observe that it is possible to achieve a compression rate of up to 5-to-1 for a large number of climate variables.