Migrating files from HPSS

NCAR’s High-Performance Storage System (HPSS) will reach its end of life on October 1, 2021, and users who have data holdings on that system have been advised to begin moving their data to an alternative storage system and deleting it from HPSS. Depending on individual use cases, appropriate storage alternatives could include the NCAR Campaign Storage file system or a remote site.

Early action is necessary because of the tape archive’s limited bandwidth and the time needed to complete large data transfers. Also, HPSS will be put into read-only mode on January 20, 2020, so users need to review their workflows and stop writing data to tape if they have not already done so. Getting a late start on the process increases the likelihood that important data will be lost when HPSS is shut down.

The process of evaluating which data to move or delete should begin immediately in coordination with the principal investigator (PI) of the project with which the files are associated. The information below provides guidance on how to accomplish the data migration.

Related training: Tutorial video and slides


Involve the PI and project team

Discuss with the project PI your intention to migrate your project's files from HPSS. The PI may have specific plans for coordinating the migration, especially if multiple individuals are involved in the project.

If the plan is to migrate data to Campaign Storage, use the gladequota command to determine if the project has a Campaign Storage allocation. If it does not, the PI must submit a request and specify the project members who are to be given write permission in Campaign Storage. The PI can make these requests through the NCAR Resource Allocation System or by contacting CISL.

If the plan is to migrate the files to a remote site, the PI will also have a key coordination role, likely in making the file storage arrangements and others.


Identify your HPSS directories and files

Each file or directory in HPSS is associated with a project. To see a list of projects to which you have access, log in to Cheyenne or Casper and run the id command. The output will look something like this:

uid=8061(nad) gid=1234(ncar) groups=73704(ucbk0099) 7087(cesm0099)

The projects to which you have access are identified by eight-character strings – project codes – inside the parentheses. A project may have an HPSS directory name under which all the associated directories and files reside – /CESM, for example. You may also own project files and directories in a /home/username directory, or in /USERNAME.

You can identify holdings in the latter two examples by executing one of these commands:

hsi ls -lRU /home/username
hsi ls -lRU /USERNAME

The hsi commands above may generate numerous lines of output. Consider redirecting their output to a file so the information is easier to review.

CISL staff are developing tools for identifying more extensive HPSS holdings. If you already have a good understanding of which directories and files you need to review, proceed as described below.

Exercise caution

Neither HPSS files nor Campaign Storage files are backed up. When making any changes to your directories, using either POSIX commands or Globus, be sure you are making them in the intended directory. When copying files, take care to avoid unintentionally overwriting files. Exercise the same caution you normally use when changing files and directories in any other storage resource that is not backed up.

 


Organize your HPSS files and directories

Examine your HPSS holdings for the project, including your HPSS /home/username directory, and also /$USER if such a directory exists. If the project's files are organized in one or more directories and subdirectories the way you want them, proceed to the next step. If they are not, organizing them at this point is a good idea.

You will most likely need to deal with unwanted files and directories. We recommend clearly separating unwanted files and directories from those you want to keep. For example, create a new directory for the files and directories that you do want to keep and use the HPSS mv command to move them to that directory.

Examples:

  • If you have a flat, catch-all directory, you might create a set of organized subdirectories and move the files to them accordingly.
  • If your project has several individual files or directories scattered across many directories having no common parent, move them under a parent HPSS directory.
  • If you have htar files that you want to migrate, treat them as you would treat tar files. You can place many of them in a common directory.

Caveat: When moving files and directories under a common HPSS directory, it is important to keep in mind the size of the directory in light of what you intend to do with it. Directories of several terabytes in size may be difficult to transfer to a remote site. Likewise, huge tar files may be difficult to untar on a remote system. The best way to find out what size works is to try some file-transfer experiments.


Copy directories from HPSS

For each project directory that you intend to copy, we recommend following one of the processes listed below, depending on the migration situation you are addressing. Review the information at each link to make sure you understand any limitations and pitfalls in the process. Also, no matter which transfer utility you use, be sure to keep the output from each transfer as a record of the number of bytes that were transferred.

1. HPSS → Campaign Storage

  • Log in to data-access.ucar.edu and change directories to your Campaign Storage directory.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • This will copy the HPSS directory into the directory where you ran the hsi command.

2. HPSS → remote site where Globus is installed

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • Copy the files to the remote site using either the Globus web interface or command line interface.

3. HPSS → remote site where Globus is not installed

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • Copy the directories from your scratch space to the remote site using scp, sftp, or bbcp. The best choice is bbcp because it transfers files faster than single-streaming utilities.

Users can also run batch jobs on the Casper system, using the hpss partition, to execute transfers from HPSS to the Campaign Storage file system. Sample batch jobs are shown on this page: Example scripts for Casper users.


Review copied directories, remove them from HPSS

Review the directories and files after they have been copied to your target destination. Confirm that their organization and extent are exactly what you want and review the output from the transfer utilities to confirm that they were completed properly.

The final step of migration – removing your files from HPSS – is very important. After you have copied your data to the Campaign Storage system or a remote system, let the PI know you are ready to delete the files from HPSS. After you get the PI's approval, delete the files. If you have questions about how to do this, please contact CISL.


Getting help

Contact CISL for additional guidance, information, or assistance: Getting help.