Migrating files from HPSS

Involve the team | Identify holdings | Organize data | Copy directories
Review and remove from HPSS | Request help

Updated 2/19/2021: The Copy directories section includes new information about potential issues with certain files that were carried over from the Mass Storage System to HPSS on or before February 27, 2011.

Users with data holdings on NCAR’s High-Performance Storage System (HPSS) were advised in October 2019 to begin moving their data to alternative storage systems because HPSS will reach its end of life on October 1, 2021. Depending on individual use cases, appropriate storage alternatives could include the NCAR Campaign Storage file system or a remote site.

CISL recommended early action because of the tape archive’s limited bandwidth and the time needed to complete large data transfers. Getting a late start on the process increases the likelihood that important data will be lost when HPSS is shut down.

The information below provides guidance on how to accomplish the data migration in coordination with the principal investigator (PI) of the project with which the files are associated.

Related training: Tutorial video and slides

Involve the PI and project team

Discuss with the project PI your intention to migrate your project's files from HPSS. The PI may have specific plans for coordinating the migration, especially if multiple individuals are involved in the project.

If the plan is to migrate data to Campaign Storage, use the gladequota command to determine if the project has a Campaign Storage allocation. If it does not, the PI will need to request one.

  • Each NCAR lab has an allocation of Campaign Storage space and the labs manage how those allocations are used.
  • Universities can request allocations as described on the Campaign Storage web page.

If the plan is to migrate the files to a remote site, the PI will also have a key coordination role, likely in making the file storage arrangements and others.

Identify your HPSS directories and files

CISL provides lists of users' HPSS directories and files to help with the migration and deletion process. These lists are updated weekly.

Files and directories are identified on two lists: one organized by the users who own them and another by the projects with which they are associated. Information provided includes file names, sizes, creation date, and the date the files were last accessed. They can be found here:

  • /glade/work/csgteam/hpssreports/current/byusers/userID.data.gz
  • /glade/work/csgteam/hpssreports/current/byprojects/projectID.data.gz

Insert your own user ID or the project code where indicated.

Exercise caution

Neither HPSS files nor Campaign Storage files are backed up. When making any changes to your directories, using either POSIX commands or Globus, be sure you are making them in the intended directory. When copying files, take care to avoid unintentionally overwriting files. Exercise the same caution you normally use when changing files and directories in any other storage resource that is not backed up.


Organize your HPSS files and directories

Examine your HPSS holdings for the project, including your HPSS /home/username directory, and also /$USER if such a directory exists. If the project's files are organized in one or more directories and subdirectories the way you want them, proceed to the next step. If they are not, organizing them at this point is a good idea.

You will most likely need to deal with unwanted files and directories. We recommend clearly separating unwanted files and directories from those you want to keep. For example, create a new directory for the files and directories that you do want to keep and use the HPSS mv command to move them to that directory.


  • If you have a flat, catch-all directory, you might create a set of organized subdirectories and move the files to them accordingly.
  • If your project has several individual files or directories scattered across many directories having no common parent, move them under a parent HPSS directory.
  • If you have htar files that you want to migrate, treat them as you would treat tar files. You can place many of them in a common directory.

Caveat: When moving files and directories under a common HPSS directory, it is important to keep in mind the size of the directory in light of what you intend to do with it. Directories of several terabytes in size may be difficult to transfer to a remote site. Likewise, huge tar files may be difficult to untar on a remote system. The best way to find out what size works is to try some file-transfer experiments.

Copy directories from HPSS

For each project directory that you intend to copy, we recommend following one of the processes listed below, depending on the migration situation you are addressing. Review the information at each link to make sure you understand any limitations and pitfalls in the process. Also, no matter which transfer utility you use:

  • Be aware of the concurrent file transfer limits described below.
  • Be sure to keep the output from each transfer as a record of the number of bytes that were transferred.
  • Users who own certain files that were carried over from the Mass Storage System to the High Performance Storage System (HPSS) on or before February 27, 2011, may run into issues moving those files with the hsi cget -RA command recommended below. In those cases, use the non-optimized hsi cget -R command instead. Such files will be located in an HPSS directory tree that begins with the file owner’s username (or the original file owner’s username) in all capital letters, in a /SMITH tree, for example. (Files in /home/smith are not affected.)

1. HPSS → Campaign Storage

  • Log in to data-access.ucar.edu and change directories to your Campaign Storage directory.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • This will copy the HPSS directory into the directory where you ran the hsi command.

Users can also run batch jobs on the Casper system, using the hpss partition, to execute transfers from HPSS to the Campaign Storage file system. Sample batch jobs are shown on this page: Example scripts for Casper users.

2. HPSS → remote site where Globus is installed

Do not attempt to transfer files directly from HPSS to a remote site using the Globus service. Follow these steps:

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Run the following command to copy the data to your scratch space:
    hsi cget -RA full_HPSS_directory_pathname.
  • Copy the files from your scratch space to the remote site using either the Globus web interface or command line interface.

3. HPSS → remote site where Globus is not installed

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • Copy the directories from your scratch space to the remote site using scp, sftp, or bbcp. The best choice is bbcp because it transfers files faster than single-streaming utilities.

Concurrent file transfer limits

The number of concurrent transfers an individual user can execute is limited in order to help ensure that all users have reasonable access to HPSS. There also is a global limit on concurrent file actions that sometimes results in users' receiving "EIO" error notices – even if they have not exceeded their individual limit – if the system is especially busy.

To reduce the incidence of such errors, follow these recommendations:

  • Do not submit more than one hsi cget -RA command to run concurrently. Each such command requests multiple file actions.
  • Request no more than five (5) individual file actions to be executed concurrently, whether using hsi or htar and regardless of where you start the transfers (Cheyenne, Casper, data-access nodes, for example).
  • Do not submit more than one cget command that requests retrieval of multiple files by using wildcards (such as hsi cget *file) in either a single session or in multiple sessions.
  • Be aware that HPSS may still be busy executing an action after it appears to you to have been completed. By running commands in quick succession, you may inadvertently reach your limit because some actions remain in progress. Also, once HPSS begins to execute an action, it may continue to completion even if you cancel the command.
  • If you encounter EIO errors, submit fewer htar commands simultaneously or in rapid succession. A read opens both an index file and a tar file, so you might reach your limit sooner than you expect.

Review copied directories, remove them from HPSS

Review the directories and files after they have been copied to your target destination. Confirm that their organization and extent are exactly what you want and review the output from the transfer utilities to confirm that the transfers were completed properly.


After copying a directory to Campaign Storage or GLADE using hsi cget -RA as described above, use the command hpss_verify to compare the destination directory with its HPSS source. The comparison checks file existence and size in the destination directory against the source directory and reports discrepancies. It does NOT do byte-for-byte comparison or checksum as commands like md5sum do.

The hpss_verify command is available on Casper, Cheyenne, and the data-access nodes. Execute it with no arguments for usage details.

The command is especially useful if the directory copy was interrupted and run again, as sometimes happens with large directories. In that case, one or more of the files may have been copied only partially and the command will flag such files.

Remove files

The final step of migration – removing your files from HPSS – is very important. After you have copied your data to the Campaign Storage system or a remote system, let the PI know you are ready to delete the files from HPSS. After you get the PI's approval, delete the files by following the appropriate example(s) below.

Example 1. Remove an entire HPSS directory, leaving no empty directory names behind.

hsi rm -R full_HPSS_directory_pathname

Example 2. Remove a set of files. In this example, "filelist.txt" is an ASCII text file with one or more lines like this: rm full_HPSS_filename

hsi < filelist.txt

Getting help

Contact CISL for additional guidance, information, or assistance: Getting help.