Migrating files from HPSS

Involve the team | Identify holdings | Organize data | Copy directories
Review and remove from HPSS | Request help

Users with data holdings on NCAR’s High-Performance Storage System (HPSS) were advised in October 2019 to begin moving their data to alternative storage systems because HPSS will reach its end of life in October 2021. Depending on individual use cases, appropriate storage alternatives could include the NCAR Campaign Storage file system or a remote site.

CISL recommended early action because of the tape archive’s limited bandwidth and the time needed to complete large data transfers. Getting a late start on the process increases the likelihood that important data will be lost when HPSS is shut down.

The information below provides guidance on how to accomplish the data migration in coordination with the principal investigator (PI) of the project with which the files are associated.

Related training: Tutorial video and slides


Involve the PI and project team

Discuss with the project PI your intention to migrate your project's files from HPSS. The PI may have specific plans for coordinating the migration, especially if multiple individuals are involved in the project.

If the plan is to migrate data to Campaign Storage, use the gladequota command to determine if the project has a Campaign Storage allocation. If it does not, the PI will need to request one.

  • Each NCAR lab has an allocation of Campaign Storage space and the labs manage how those allocations are used.
  • Universities can request allocations as described on the Campaign Storage web page.

If the plan is to migrate the files to a remote site, the PI will also have a key coordination role, likely in making the file storage arrangements and others.


Identify your HPSS directories and files

CISL provides lists of users' HPSS directories and files to help with the migration and deletion process. These lists are updated weekly.

Files and directories are identified on two lists: one organized by the users who own them and another by the projects with which they are associated. Information provided includes file names, sizes, creation date, and the date the files were last accessed. They can be found here:

  • /glade/work/csgteam/hpssreports/current/byusers/userID.data.gz
  • /glade/work/csgteam/hpssreports/current/byprojects/projectID.data.gz

Insert your own user ID or the project code where indicated.

CISL staff are developing additional tools for identifying and managing extensive HPSS holdings. If you already have a good understanding of which directories and files you need to review, proceed as described below.

Exercise caution

Neither HPSS files nor Campaign Storage files are backed up. When making any changes to your directories, using either POSIX commands or Globus, be sure you are making them in the intended directory. When copying files, take care to avoid unintentionally overwriting files. Exercise the same caution you normally use when changing files and directories in any other storage resource that is not backed up.

 


Organize your HPSS files and directories

Examine your HPSS holdings for the project, including your HPSS /home/username directory, and also /$USER if such a directory exists. If the project's files are organized in one or more directories and subdirectories the way you want them, proceed to the next step. If they are not, organizing them at this point is a good idea.

You will most likely need to deal with unwanted files and directories. We recommend clearly separating unwanted files and directories from those you want to keep. For example, create a new directory for the files and directories that you do want to keep and use the HPSS mv command to move them to that directory.

Examples:

  • If you have a flat, catch-all directory, you might create a set of organized subdirectories and move the files to them accordingly.
  • If your project has several individual files or directories scattered across many directories having no common parent, move them under a parent HPSS directory.
  • If you have htar files that you want to migrate, treat them as you would treat tar files. You can place many of them in a common directory.

Caveat: When moving files and directories under a common HPSS directory, it is important to keep in mind the size of the directory in light of what you intend to do with it. Directories of several terabytes in size may be difficult to transfer to a remote site. Likewise, huge tar files may be difficult to untar on a remote system. The best way to find out what size works is to try some file-transfer experiments.


Copy directories from HPSS

For each project directory that you intend to copy, we recommend following one of the processes listed below, depending on the migration situation you are addressing. Review the information at each link to make sure you understand any limitations and pitfalls in the process. Also, no matter which transfer utility you use, be sure to keep the output from each transfer as a record of the number of bytes that were transferred.

1. HPSS → Campaign Storage

  • Log in to data-access.ucar.edu and change directories to your Campaign Storage directory.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • This will copy the HPSS directory into the directory where you ran the hsi command.

Users can also run batch jobs on the Casper system, using the hpss partition, to execute transfers from HPSS to the Campaign Storage file system. Sample batch jobs are shown on this page: Example scripts for Casper users.

2. HPSS → remote site where Globus is installed

Do not attempt to transfer files directly from HPSS to a remote site using the Globus service. Follow these steps:

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Run the following command to copy the data to your scratch space:
    hsi cget -RA full_HPSS_directory_pathname.
  • Copy the files from your scratch space to the remote site using either the Globus web interface or command line interface.

3. HPSS → remote site where Globus is not installed

  • Log in to Casper or Cheyenne and change directories to your GLADE scratch space.
  • Execute command hsi cget -RA full_HPSS_directory_pathname.
  • Copy the directories from your scratch space to the remote site using scp, sftp, or bbcp. The best choice is bbcp because it transfers files faster than single-streaming utilities.

Review copied directories, remove them from HPSS

Review the directories and files after they have been copied to your target destination. Confirm that their organization and extent are exactly what you want and review the output from the transfer utilities to confirm that the transfers were completed properly.

hpss_verify

After copying a directory to Campaign Storage or GLADE using hsi cget -RA as described above, use the command hpss_verify to compare the destination directory with its HPSS source. The comparison checks file existence and size in the destination directory against the source directory and reports discrepancies. It does NOT do byte-for-byte comparison or checksum as commands like md5sum do.

The hpss_verify command is available on Casper, Cheyenne, and the data-access nodes. Execute it with no arguments for usage details.

The command is especially useful if the directory copy was interrupted and run again, as sometimes happens with large directories. In that case, one or more of the files may have been copied only partially and the command will flag such files.

Remove files

The final step of migration – removing your files from HPSS – is very important. After you have copied your data to the Campaign Storage system or a remote system, let the PI know you are ready to delete the files from HPSS. After you get the PI's approval, delete the files. If you have questions about how to do this, please contact CISL.


Getting help

Contact CISL for additional guidance, information, or assistance: Getting help.