Managing files with HSI

Important considerations | Invoking HSI | Cautions | Commands and examples

The HSI interface is one of two primary tools for transferring data to and from HPSS within the Yellowstone environment. If you need to use HSI on NCAR systems that are outside of the Yellowstone environment, see Kerberos and HSI.

If you need to archive large numbers of individual files, use HTAR instead of HSI.

See File size guidelines for HPSS for additional information that will help you use this system efficiently.

warning icon

Failure to follow these guidelines can result in suspension of your access to HPSS.

Also see HSI Help.


Important considerations

File deletion is permanent

Files that are deleted from or overwritten in HPSS cannot be recovered. To avoid losing data, carefully review the Cautions section below.

Concurrent transfer limits

HPSS is shared by a large number of users, and there are individual and global limits to the number of file actions that can be executed concurrently. See Use and storage policies for details regarding these limits.

Also see Optimizing HPSS file retrieval. Configuring your requests based on tape location can result in quicker retrievals in some cases.

Bulk file operations

If you need to perform an operation on 100,000 or more of your HPSS files (with commands such as chacct, chcos, chgrp, cp, mv, and rm, among others), contact the CISL Help Desk or Consulting Services Group for assistance so we can avoid slowing the system for all users.

See Submitting HPSS file metadata change requests for details that you will need to provide.

Best practices

Review and follow CISL best practices for managing your files and data transfers. They will help you make the most efficient use of your computing and storage allocations.


Invoking HSI

HSI can be invoked in several modes, each of which is described here.

  • As an interactive command interface, in which you start an HSI session and then execute commands to archive, fetch and manage files
  • Batch mode, in which you execute the commands from your Linux command line
  • By submitting a job to the “hpss” queue

Interactive command interface

If you are working with our HPC, analysis, or visualization systems*, just enter hsi on your command line to start an HSI session.

Once your session starts, your command prompt will look like this, with your own username.

[HSI]/home/username->

You can execute commands from there to archive or retrieve files and so on. See “Commands and examples” below.

To leave the HSI environment and return to your shell prompt, enter quit.

Batch mode

Run HSI commands from your shell prompt without starting an interactive HSI session first.

Precede your commands with “hsi” as in this example:

-bash-4.1$ hsi cput xxx : yyy

As shown in the following video, the cput command will be executed in an HSI session, then control will return to the shell. This is how you would put HSI commands in a script, or in a “system” call from a running program.

Another batch mode option is to create a file that contains the desired HSI commands and then execute one of the following:

-bash-4.1$ hsi in filename
-bash-4.1$ hsi < filename

Submit job to “hpss” queue

Batch (LSF) and cron jobs can use HSI in the same way as the interactive and non-interactive jobs described above. On Yellowstone, use the hpss queue for these jobs.

Here is one example of how to submit a job on Yellowstone to execute an HPSS transfer:

-bash-4.1$ bsub -n 1 -q hpss -W 2:00 -P project_code hsi cget mydata 

Cautions

  • The cpmvput, and get commands can overwrite data at their targets without warning. This is a problem if you mistakenly remove or overwrite data, because it cannot be recovered. To help prevent inadvertently overwriting your HPSS files with these commands, establish directory permissions carefully. See Permissions and data safety for how to do that.
  • Use cput and cget rather than put and get to avoid data loss. Unlike put, which unconditionally clobbers its target, the conditional cput command will not overwrite a file with the same name. Similarly, using cget rather than get will prevent you from inadvertently overwriting a file on your local drive when you retrieve an HPSS file with the same name. Use put and get only when you know that you want to overwrite existing data.
  • The cp command resets a file's project code to your default code, so be especially careful with this if you have multiple projects to which you can charge. If you have files that are associated with non-default groupscp will reset the group ID, as well.
  • The rm command will not ask you to confirm that you want to remove a file unless you include the -i option.
  • HSI does not mirror your GLADE directory structure or create directories by default when you archive files. If you need to preserve that structure in HPSS, carefully follow the instructions below for using cput and the -R option.
  • When fetching files from HPSS (with either get or cget), make sure that you have sufficient room in your GLADE file space. If you exceed your GLADE quota, the transfer will fail.

Commands and examples

HSI commands include some familiar ones, such as cp and ls, that resemble their Linux and UNIX counterparts. 

For example:

  • ls lists the contents of a directory
  • rm permanently removes a file
  • mkdir creates a directory
  • rmdir deletes a directory

Some HSI commands, however, have additional options. For example, the HSI command ls has option -U for identifying the project code associated with a file. We recommend getting familiar with each command's options to be sure that you get the results you want when managing your HPSS holdings. Some of the most frequently used HSI commands are discussed below. A complete list is available here.

Transferring files

Review Bulk file operations.

⇒ cput

Executing this command in an interactive HSI session writes a file from your current working directory in GLADE to your home directory in the HPSS archive.

[HSI]/home/username-> cput filename

To write the file to HPSS with a different name, follow this example.

[HSI]/home/username-> cput filename : newfilename

Absolute path names for local or HPSS files also are acceptable. The local file always comes before the colon with both the cput and cget commands.

To put a set of files into a target directory, create the directory if it doesn't already exist, cd into it, and then run cput.

[HSI]/home/username-> cd /home/username/targetdir; cput file_pattern

UNIX users sometimes try to do the following, where targetdir is an existing directory (or a directory that is to be created with the -P option). HSI does not support this:

[HSI]/home/username-> cput file_pattern : /home/username/targetdir

⇒ cput and the -R option

Use cput with the -R option to archive a local directory and its contents to HPSS. Change to the target directory before you execute the cput command.

For example, to put the local directory mydir into an HPSS target directory /home/username/test, run this from your command line.

-bash-4.1$ hsi “cd /home/username/test; cput -h -R mydir”

The -h option preserves symlinks that you might have in your source directory, but otherwise it is not required.

The result in this example is a directory /home/username/test/mydir that contains all the files and directories from your local mydir directory.

Because cput does not overwrite a file with the same name, only new files are archived. If files change in your local directory after you archive them to HPSS, use put when you need to overwrite the older, archived files.

Also see Confirming HPSS transfers.

⇒ cget

Use cget to retrieve an HPSS file into your current working directory on your local machine:

[HSI]/home/username-> cget filename

To read the HPSS file into your current working directory with a different name, follow this example:

[HSI]/home/username-> cget newfilename : filename

⇒ cget and the -R option

Use cget with the -R option to retrieve a directory and its contents from HPSS into the current working directory on your local machine.

Here is a simple example:

-bash-4.1$ hsi cget -R MyData

⇒ find

The find command can be useful for confirming that a transfer was successful.

Here is a simple example, using find after starting an HSI session:

[HSI]/home/username-> find . -mtime -90 -print

The "." defaults to your home directory, and -mtime -90 specifies files that have been modified or created within the past 90 days. The specified directory is searched recursively.

Also see Confirming HPSS transfers.

Changing ownership

To change ownership of a file or files, submit a request to cislhelp@ucar.edu.

Users do not have the necessary permissions to run the chown command.

Setting permissions

Review Bulk file operations.

⇒ chmod

Use the chmod command to set or change permissions on your files and directories to protect your data or to give others access to them. This is often done recursively by using the –R option.

See these links for detailed examples:

⇒ chgrp

Use the UNIX chgrp command to change the associated group for existing files and directories. This is often done recursively with the –R option.

[HSI]/home/username-> chgrp newgroup myfile
[HSI]/home/username-> chgrp -R newgroup mydir

See this link for more details: Changing user and group for files/directories

⇒ newgrp

Use the newgrp command to change your current effective group within an HSI session.

[HSI]/home/username-> newgrp groupname

To see what your current effective group is, just enter the command with no arguments.

[HSI]/home/username-> newgrp

If you need to change your default primary group for HPSS, request the change in an email to cislhelp@ucar.edu.

See this link for more information: Users and groups (HPSS)

Managing charges

Review Bulk file operations.

⇒ chacct

Use the HSI chacct command if you need to change the project code—or “account ID,” in HPSS—with which existing files and directories are associated. A project code is used for the purpose of charging against your HPSS storage allocation.

Enter the command and desired project code, and identify the relevant files as shown in this example. This is often done recursively with the –R option.

[HSI]/home/username-> chacct UABC0001 myfile
[HSI]/home/username-> chacct -R UABC0001 mydir

See Projects and charges for more information.

⇒ newacct

Use the HSI newacct command to associate an individual HSI session with a project code that is not your default project code.

Enter the command and the project code that you want to use for your session as shown in this example.

[HSI]/home/username-> newacct UABC0001

See Projects and charges for more information.

* UCAR users: To use HSI on NCAR systems that are outside of the Yellowstone environment, see Kerberos and HSI.

 

Related training courses