Managing files with HSI

Important considerations | Invoking HSI | Cautions | Commands and examples

The HSI interface is one of two primary tools for transferring data to and from HPSS within the Yellowstone environment. See Kerberos and HSI for how to use HSI on NCAR systems that are outside of the Yellowstone environment, 

Also see HSI Help.

Use HTAR instead of HSI if you need to archive large numbers of individual files whose aggregate size is less than several gigabytes. If you have 1,000 1-MB files or 250 4-MB files, for example, HTAR will archive them in one large HTAR file. This makes much more efficient use of the system and benefits all users. It also reduces the time it takes to retrieve files later.

warning iconDo not try to use HSI to move hundreds or thousands of individual files to HPSS. Using HSI to transfer large numbers of small files significantly slows the entire system for all users.

Failure to follow these guidelines can result in suspension of your access to HPSS.


Important considerations

File deletion is permanent

Files that are deleted from or overwritten in HPSS cannot be recovered. To avoid inadvertent data loss, carefully review the Cautions section below.

Concurrent transfer limits

HPSS is shared by a large number of users, and there are individual and global limits to the number of file actions that can be executed concurrently. See Use and storage policies for details regarding these limits.

Also see Optimizing HPSS file retrieval. Configuring your requests based on tape location can result in quicker retrievals in some cases.

Bulk file operations

If you need to perform an operation on 100,000 or more of your HPSS files (with commands such as chacct, chcos, chgrp, cp, mv, and rm, among others), contact the CISL Help Desk or Consulting Services Group for assistance so we can avoid slowing the system for all users.

See Submitting HPSS file metadata change requests for details that you will need to provide.

Best practices

Review and follow CISL best practices for managing your files and data transfers. They will help you make the most efficient use of your computing and storage allocations.


Invoking HSI

HSI can be invoked in several modes:

  • As an interactive command interface
  • As a prefix to one or more commands to be executed from your UNIX command line
  • By submitting a job to the "hpss" queue

Interactive command interface

If you are working with our HPC, analysis, or visualization systems*, just enter hsi on your command line to start an HSI session.

To exit the HSI environment, enter quit.

Non-interactive (batch) mode

Your can run HSI commands from your command line without starting an HSI session first. To do this, simply type the command like this example:

hsi cput xxx : yyy

The command will be executed in an HSI session, then control will return to the shell. This is how you would put HSI commands in a script, or in a “system” call from a running program. Another batch mode option is to create a file containing the desired HSI commands and executing one of the following:

hsi in filename
hsi < filename

Submit job to "hpss" queue

Batch (LSF) and cron jobs can use HSI in the same way as the interactive and non-interactive jobs described above. On Yellowstone, use the hpss queue for these jobs.

Here is one example of how to submit a job on Yellowstone to execute an HPSS transfer:

bsub -n 1 -q hpss -W 2:00 -P project_code hsi cget mydata 

Cautions

  • Some commands—including mv, put, and cp—can overwrite data at their targets and result in your losing HPSS archive data. To help prevent inadvertently overwriting your HPSS files with these commands, establish directory permissions carefully. See Permissions and data safety for how to do that.
  • The cp command resets a file's project code to your default code, so be especially careful with this if you have multiple projects to which you can charge. If you have files that are associated with non-default groups, cp will reset the group ID, as well.
  • Use cput and cget rather than put and get to avoid data loss. Unlike put, which unconditionally clobbers its target, the conditional cput command will not overwrite a file with the same name. Similarly, using cget rather than get will prevent you from inadvertently overwriting a file on your local drive when you retrieve an HPSS file with the same name. Use put and get only when you know that you want to overwrite existing data.
  • When fetching files from HPSS (with either get or cget), make sure that you have sufficient room in your GLADE file space. If you exceed your GLADE quota, the transfer will fail.

Commands and examples

Some commonly used commands are unique to HSI while others generally work the same way as their UNIX counterparts.

For example:

  • ls lists the contents of a directory
  • rm permanently removes a file
  • mkdir creates a directory
  • rmdir deletes a directory

Some other commands are implemented differently in HSI than they are in UNIX or Linux. Review the following information to avoid getting some unexpected results.

Transferring files

Review Bulk file operations.

⇒ cput

Executing this command in an interactive HSI session writes a file from your current working directory in GLADE to your home directory in the HPSS archive.

[HSI]/home/username-> cput filename

To write the file to HPSS with a different name, follow this example.

[HSI]/home/username-> cput filename : newfilename

Absolute path names for local or HPSS files also are acceptable. The local file always comes before the colon with both the cput and cget commands.

To put a set of files into a target directory, change to that directory in your interactive session and run cput.

[HSI]/home/username-> cd /home/username/targetdir; cput file_pattern

UNIX users sometimes try to do the following, where targetdir is an existing directory (or a directory that is to be created with the -P option). HSI does not support this:

[HSI]/home/username-> cput file_pattern : /home/username/targetdir

⇒ cput and the -R option

You can specify a source with the recursive option (-R) in HSI, but you cannot specify a target with it. If you try, the command will interpret your source, your target, and the token “:” all as sources. This can produce unexpected and even damaging results.

Correct: The proper way to use this command is to change to the target directory first, then execute your cput command with the -R option. For example, to put the local directory mydir into an HPSS target directory /home/username/test, run this from your command line.

hsi "cd /home/username/test; cput -R mydir"

The result is a directory /home/username/test/mydir that contains all the files and directories from your local mydir directory.

Incorrect: Here’s an example of how this often is done incorrectly by trying to write the source file tree rooted at mydir to /home/username/test/mydir:

hsi cput mydir : /home/username/test/mydir

You will get an error message recognizing that mydir is a directory and telling you that you need to use the -R option. Take care to execute this correctly, as shown above, or you risk overwriting valuable data.

⇒ cget

Use cget to retrieve an HPSS file into your current working directory on your local machine:

[HSI]/home/username-> cget filename

To read the HPSS file into your current working directory with a different name, follow this example:

[HSI]/home/username-> cget newfilename : filename

Also see Confirming HPSS transfers.

Changing ownership

To change ownership of a file or files, submit a request to cislhelp@ucar.edu.

Users do not have the necessary permissions to run the chown command.

Setting permissions

Review Bulk file operations.

⇒ chmod

Use the UNIX chmod command to set or change permissions on your files and directories to protect your data or to give others access to them. This is often done recursively with the –R option.

See these links for detailed examples:

⇒ chgrp

Use the UNIX chgrp command to change the associated group for existing files and directories. This is often done recursively with the –R option.

[HSI]/home/username-> chgrp newgroup myfile
[HSI]/home/username-> chgrp -R newgroup mydir

See this link for more details: Changing user and group for files/directories

⇒ newgrp

Use the newgrp command to change your current effective group within an HSI session.

[HSI]/home/username-> newgrp groupname

To see what your current effective group is, just enter the command with no arguments.

[HSI]/home/username-> newgrp

If you need to change your default primary group for HPSS, request the change in an email to cislhelp@ucar.edu.

See this link for more information: Users and groups (HPSS)

Managing charges

Review Bulk file operations.

⇒ chacct

Use the HSI chacct command if you need to change the project code—or “account ID,” in HPSS—with which existing files and directories are associated. A project code is used for the purpose of charging against your HPSS storage allocation.

Enter the command and desired project code, and identify the relevant files as shown in this example. This is often done recursively with the –R option.

[HSI]/home/username-> chacct UABC0001 myfile
[HSI]/home/username-> chacct -R UABC0001 mydir

See Projects and charges for more information.

⇒ newacct

Use the HSI newacct command to associate an individual HSI session with a project code that is not your default project code.

Enter the command and the project code that you want to use for your session as shown in this example.

[HSI]/home/username-> newacct UABC0001

See Projects and charges for more information.

* UCAR users: To use HSI on NCAR systems that are outside of the Yellowstone environment, see Kerberos and HSI.

Related training courses