SIParCS 2020 - Lucas Hayne

Lucas Hayne

Lucas Hayne, University of Colorado, Boulder

Using Neural Networks for Two-Dimensional Scientific Data Compression

Recorded Talk

In 2019, the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR) provided access to more than 5.1 petabytes of data through the Research Data Archive, which is more data than Netflix’s entire master library of videos. Furthermore, the amount of climate data produced by scientists grows every second: one recent climate simulation required the generation of 260 terabytes of data every 16 seconds. Both the huge volume of existing data and the rapid generation of new data pose immense challenges to our storage system. Accordingly, the need for effective approaches to address these challenges continues to grow. This project explores one of the available approaches, specifically lossy compression through neural networks. Previous research using neural networks for compression achieved state-of-the-art performance on natural images, but researchers have yet to adapt these breakthroughs in compression for scientific data. This research assesses the performance of an existing variational neural network compression algorithm, trained to compress natural images, on datasets of two-dimensional scientific data. This out-of-the-box compression algorithm achieves near state-of-the-art peak signal-to-noise ratios (PSNR) at low bitrates compared to other state-of-the-art scientific data compression algorithms like speck, sz, and zfp. Additionally, by retraining the out-of-the-box neural network through transfer learning on climate data, we demonstrate an extension of the performance of the algorithm to higher bitrates. These preliminary assessments pave the way for future research into neural network compression algorithms on scientific data.

Mentors: Samuel Li, John Clyne 

Slides