CISL Seminar Series - Data Reduction at Exascale: Lossy Compression of Scientific Data

04/18/2018 - 10:00am to 11:00am
ML Main Seminar Room
Peter Lindstrom

With the exascale age fast approaching, new solutions are needed to cope with the fire hose of numerical data generated in simulations, experiments, and observations.  Not only does data motion constitute a significant performance bottleneck and power consumer in parallel computing, but large data volumes also strain storage and bandwidth resources for archiving, disseminating, and analyzing the data.  Data compression presents a partial solution to reducing data volumes, but traditional lossless compression rarely achieves more than a factor of two reduction; one to two orders of magnitude less than the expected needs for exascale computing.  Moreover, floating-point data is already contaminated with error from several sources, including round-off, truncation, iteration, model, and sensor errors.  Rather than spending precious storage to accurately represent what is effectively noise or information not needed for the task at hand, we advocate the use of lossy compression to dramatically boost compression.

In this talk, Peter will discuss fpzip and zfp: two open source high-speed compressors designed for floating-point data.  These compressors are able to bound relative or absolute compression-induced errors to limit data loss to an acceptable level.  He will discuss error distributions associated with these compressors and how they might influence climate data analysis.  In addition to being a tool for reducing I/O, zfp is an alternative, more economical floating-point representation on which numerical computations are possible.  zfp provides C++ compressed array classes that replace conventional, uncompressed floating-point arrays with minimal code changes.  He will present results of using fpzip and zfp to store arrays in compressed form within physics simulations, during data analysis, and in visualization, where 100x data reduction or more is possible.


Peter Lindstrom is a Computer Scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, where he leads several research efforts on data compression.  He is the chief architect of the fpzip and zfp floating-point compressors for scientific data and leads the effort to make zfp production ready for the U.S. Department of Energy's Exascale Computing Project.  Peter received a Ph.D. in Computer Science from Georgia Institute of Technology in 2000 and B.S. degrees in Computer Science, Mathematics, and Physics from Elon University.  His primary research areas include data compression, scientific visualization, computer graphics, and scientific computing.