Configuring Globus for unattended workflows

Automating endpoint activation | Using gci to initiate file transfers

Many users initiate Globus transfers between NCAR file systems within their batch jobs or in cron jobs so the transfers can run unattended. Some workflows – long field campaigns, for example – continue for many months and require transfers between endpoints on a regular basis. To support such workflows, CISL provides a utility that works with InCommon certificates to enable those reactivations to be done automatically.

How to activate the NCAR GLADE and NCAR Campaign Storage endpoints automatically is described below. The basic steps include:

  1. Obtaining a free InCommon certificate by requesting it from CISL.
  2. Running a CISL utility called gcert to convert the certificate to X.509 format and activate the endpoints.
  3. Using another CISL utility called gci (for Globus Campaign-Storage Interface) to initiate transfers between the endpoints.

Users who do not need to make unattended transfers can extend the default Globus credential lifetime as documented here to reduce how frequently they need to activate endpoints manually. 


Automating endpoint activation

To enable users to reactivate the NCAR GLADE and NCAR Campaign Storage endpoints automatically when their credentials expire, CISL provides free InCommon certificates and the gcert and gci utilities.

To request an InCommon certificate, which can be used for up to three years, contact cislhelp@ucar.edu. You will receive an email when your request is fulfilled. The subject line will be:

Invitation Email - You have requested email certificate validation.

InCommon certificate
Click to enlarge the image.

The email will include a link to an InCommon form that you will submit to get your personal certificate. Leave the optional PIN field blank when you fill out the form.

After you submit the form, download your .p12 certificate file. Then, follow the example commands below to:

  1. Copy (scp) the file to your home directory on GLADE.
  2. Log in to Cheyenne.
  3. Run gcert to convert the certificate into an X.509 certificate that Globus can use. (It also activates these endpoints for you: NCAR GLADE, NCAR Campaign Storage.)
  4. Delete the .p12 file from GLADE.

Example commands:

scp username_ucar_edu.p12 username@cheyenne.ucar.edu:~/
ssh username@cheyenne.ucar.edu
gcert username_ucar_edu.p12
rm username_ucar_edu.p12

When the activation expires, run gcert again. It will use your existing X.509 certificate to reactivate the endpoints. (While the gcert utility does not work in PBS jobs, gci does as noted below.)


Using gci to initiate file transfers

The CISL gci utility simplifies the process of copying files between the GLADE and Campaign Storage file systems. It provides a number of features not available in the standard Globus CLI:

  1. Automatically activates endpoints if an InCommon certificate is available and can be used within PBS batch and interactive jobs on Cheyenne
  2. Simpler syntax for initiating file transfers
  3. Automatically loads the Globus interface into your environment
  4. Gathers and prepends endpoint IDs for you
  5. Terminates transfers if they enter a “permission denied” state instead of retrying
  6. Better handling of relative file paths

The gci utility is used to access this collection of subcommands:

cget   – conditionally copy files from Campaign Storage to GLADE if newer
cput   – conditionally copy files from GLADE to Campaign Storage if newer
get    – copy files from campaign Storage to GLADE
put    – copy files from GLADE to Campaign Storage
mkdir  – create a new directory on Campaign Storage

Transfers using gci are formatted with the source and destination paths separated by a colon (/source/file:/destination/file). Here are some examples:

gci put data1.dat:lab/group/$USER/data1.dat
gci cput /glade/scratch/$USER/data2.dat:lab/group/$USER/
gci get /gpfs/csfs1/lab/group/$USER/data2.dat:data2.dat
gci cget lab/group/$USER/data2.dat:

Note that gci accepts both relative and absolute paths. The second example demonstrates that there is no need to name the destination file; if no name is given, the destination file assumes the name of the source file. (This is not the case in the Globus CLI, where a similarly configured transfer would simply fail.) 

In the final example, no GLADE destination path is specified. In that case, the file is copied to the user’s present working directory.

Recursive transfers are also possible:

gci cput -r datadir/:lab/group/$USER/datadir1
gci get -r /gpfs/csfs1/lab/group/$USER/datadir2:`pwd`

In the second example, the datadir2 directory is transferred from Campaign Storage to the present working directory on GLADE.

Globus’ batch transfer mode, in which the user provides a list of file and/or directory transfer commands via standard input, is also accessible via gci. Here are example gets and puts using batch mode:

gci cput --batch << EOF
settings    lab/group/$USER/model/settings
binary      lab/group/$USER/model/binary
input       lab/group/$USER/model/data/input
output      lab/group/$USER/model/data/output
EOF

gci cget --batch group/lab/$USER/: < getlist

The source and destination paths are optional in batch mode. If they are provided, they are appended to the paths given in the batch input. Batch input can be given at the command line or via a redirected input file. 

This example demonstrates how to mix single file transfers and recursive directory transfers in the same batch file:

file1.txt /glade/scratch/$USER/file1.txt
file2.txt /glade/scratch/$USER/file2.txt
dir1/ /glade/scratch/$USER/dir1/ --recursive