CISL Seminar: What’s all the fuss about Zarr?

06/16/2021 - 1:00pm to 2:00pm
Virtual

Speaker: Joe Hamman, UCAR-CGD

Abstract
Data stored as multidimensional arrays (i.e. N-dimensional arrays) is ubiquitous across the computational sciences. Zarr is a new community-supported open source data specification and API for storing chunked, compressed, N-dimensional arrays in flexible storage containers. Spurred on by the exponential growth in scientific data and the adoption of new computational architectures like cloud computing, Zarr is finding rapid adoption across the geosciences and beyond.

The Zarr project began in 2016 with an initial implementation designed to serve the scientific Python community. Zarr’s simple, open specification has led to its implementation in several other languages including Java, Julia, JavaScript, and C++. Notably, recent versions of the NetCDF-C library now support reading and writing data in Zarr format, unlocking new opportunities for data storage and processing related to HPC-centric modeling and data analysis workflows.

In this seminar, I will provide an overview of the Zarr project, targeting researchers and scientific programers that are new to Zarr but are familiar with scientific computing. I will highlight applications that use Zarr in both HPC and cloud environments, making use of both the Python and NetCDF-C implementations.

Bio
Joe Hamman is a scientist at the National Center for Atmospheric Research (NCAR) and the technology director at the non-profit CarbonPlan. His work focuses on the development of data and software tools for applications across the geosciences. He is an active contributor to numerous open source software projects (e.g. Xarray, Dask, Zarr) and a founding member of the Pangeo Project. He holds both a MS and a PhD from the University of Washington, a BS from the University of Arizona, and a PE in Washington State.

June 16, 2021
1-2pm MT
Virtual

Email Taysia (taysia@ucar.edu) for invitation.