The end of an archive: NCAR powers off HPSS

By Staff
10/06/2021 - 3:15pm

After more than 10 years in service as a long-term repository for curated data archives and modeling data, NCAR's High-Performance Storage System (HPSS) was officially retired on October 1, 2021.

The tape archive made its debut in March 2011 as a follow-on to the 25-year-old NCAR Mass Storage System (MSS). During the transition from MSS, some 70 million files – approximately 12 petabytes of data – were migrated into HPSS. At its peak in early 2020, the volume of data archived in HPSS had grown to more than 93 petabytes and 300 million files.

HPSS initially consisted of two robotic tape libraries at NCAR’s Mesa Lab in Boulder, Colorado. When the NCAR-Wyoming Supercomputing Center opened its doors in 2012 in Cheyenne, Wyoming, two new robotic libraries were installed there for ongoing production purposes and the libraries in Boulder were converted to hold disaster recovery copies. Eventually, two more libraries were installed in Cheyenne. CISL engineers kept HPSS technologically current during its decade of service by migrating data from old to new tape media twice and by deploying four generations of tape drives. HPSS was busy, too. On average:

  • A tape was mounted every 15 seconds.
  • A file was read every three seconds.
  • A file was written every second.
Image showing the interior of one of the two HPSS tape libraries at NCAR in Boulder, Colorado.
Image showing the interior of one of the two HPSS tape libraries at the Mesa Lab in Boulder, Colorado. NCAR image.

Deciding on disk

Storing long-term, archival data on tape was much more cost-effective than storing it on disk when NCAR deployed HPSS in 2011, particularly for high-performance applications. Since 2011, however, advancements in disk storage technology and evolving user requirements have increasingly made tape less attractive for certain types of large-scale data storage. The deployments of the Globally Accessible Data Environment – best known as GLADE – and the NCAR Campaign Storage system, for example, have demonstrated that multi-petabyte disk storage systems are both better suited to meet the workflow demands of NCAR’s user communities and are relatively easier to deploy than tape-based storage systems. CISL now maintains a small tape-based archive known as Quasar for storing curated data collections for disaster recovery purposes that have an indefinite lifetime.

Consequently, CISL made HPSS read-only on January 20, 2020, commencing its data evacuation phase to allow users to pack up their data before the official HPSS decommission date on October 1, 2021.

History of MSS and HPSS: A tale of two archives

The Mass Storage System that preceded HPSS was deployed in the late 1970s. Developed by the NCAR Scientific Computing Division – which later became CISL – MSS used a range of tape technologies over the years. These included seven- and nine-track half-inch tapes, Ampex Terabit Memory tape drives (the same tape drives and media then used by television networks for video recording), Storage Technology's robotic Automated Cartridge Systems, and eventually Storage Technology SL-8500 robotic digital storage libraries.

MSS required a team of seven systems engineers to maintain its operating software and port it to new robotic library and tape drive technologies. In the 1980s, its design contributed to the development of the IEEE Mass Storage System Reference Model (see Coleman and Miller, 1990 and Shiers, 1994). A team of engineers from Lawrence Livermore National Laboratory and IBM (Teaff, Watson, and Coyne, 1995) later used the IEEE MSS Reference Model to develop the HPSS system.

Over time, HPSS became a popular data storage and archive solution for the U.S. Department of Energy (DOE) laboratories. As NCAR's MSS data holdings grew, and with rapid changes in storage technologies taking place in the first decade of the 21st century, continuing to maintain a one-of-a-kind system that was developed and maintained in-house became untenable in the long run. After a lengthy evaluation of competing technologies and the projected growth of NCAR's data holdings, the DOE/IBM HPSS system was identified as the most viable option.

Finally, in 2010 and early 2011, engineers in CISL’s former Mass Store Systems Group wrote and tested HPSS enhancement software to enable HPSS to read the tape cartridges written by its predecessor system. At 5 p.m. on Sunday, March 27, 2011, the NCAR MSS system was powered down. Over the next 48 hours, the engineers migrated MSS metadata into HPSS and did a final test to verify that HPSS was able to read the 12 petabytes of MSS data.