Using the Globus Command Line Interface

29 November, 2018

Brian Vanderwende
CISL Consulting Services Group

Globus is fully supported on CISL systems

  • Fast and robust transfers within NCAR and to outside systems
  • Data sharing spaces are provided on request to support collaboration
  • Globus is the only way to access Campaign Storage*
  • Does not currently support symlinks or preserve POSIX permissions(!)

*Campaign Storage provides 5-year (publication scale) retention on spinning disk

Access the CLI on CISL systems

Available in your environment by default on the data access nodes

ssh -l <user> data-access.ucar.edu

Also accessible as part of the NCAR Package Library for Python on Cheyenne login nodes and Casper

In [2]:
# Load the Python 2.7 package library on Casper
# (the variable simply disables the Python virtual
# environment prompt to improve display in the
# Jupyter Notebook)
export VIRTUAL_ENV_DISABLE_PROMPT=1
module load python
ncar_pylib 20181029
Now using NPL version 20181029
Use deactivate to remove NPL from environment
In [4]:
pip show globus-cli
Name: globus-cli
Version: 1.9.0
Summary: Globus CLI
Home-page: https://github.com/globus/globus-cli
Author: Stephen Rosen
Author-email: sirosen@globus.org
License: UNKNOWN
Location: /glade/u/apps/dav/opt/python/2.7.14/intel/17.0.1/pkg-library/20181029/lib/python2.7/site-packages
Requires: six, jmespath, cryptography, click, globus-sdk, requests, configobj
Required-by: 

First, authenticate with Globus service

globus login

  • This command requires interactivity and should not be use programmatically
  • Globus credentials should be effectively permanent given regular usage
In [5]:
# Show all activated login sessions and time of authentication
globus session show
For information on your primary identity or full identity set see
  globus whoami

Username              | ID                                   | Auth Time           
--------------------- | ------------------------------------ | --------------------
vanderwb@globusid.org | 19b6f770-32a4-400a-ae97-0a6fc94917fe | 2018-11-27 10:24 MST
In [6]:
# Provide a summary of all globus CLI commands
globus list-commands
=== globus ===

    rename          Rename a file or directory on an endpoint
    list-commands   List all CLI Commands
    transfer        Submit a transfer task (asynchronous)
    mkdir           Make a directory on an endpoint
    update          Update the Globus CLI to its latest version
    version         Show the version and exit
    logout          Logout of the Globus CLI
    get-identities  Lookup Globus Auth Identities
    rm              Delete a single path; wait for it to complete
    login           Log into Globus to get credentials for the Globus CLI
    delete          Submit a delete task (asynchronous)
    whoami          Show the currently logged-in primary...
    ls              List endpoint directory contents

=== globus endpoint ===

    search          Search for Globus endpoints
    activate        Activate an endpoint
    show            Display a detailed endpoint definition
    deactivate      Deactivate an endpoint
    create          Create a new endpoint
    update          Update attributes of an endpoint
    my-shared-endpoint-list
                    List all shared endpoints on an endpoint by...
    is-activated    Check if an endpoint is activated
    local-id        Display UUID of locally installed endpoint
    delete          Delete a given endpoint

=== globus endpoint permission ===

    create          Create an access control rule, allowing new...
    delete          Delete an access control rule, removing...
    list            List of permissions on an endpoint
    update          Update an access control rule, changing...
    show            Show a permission on an endpoint

=== globus endpoint server ===

    add             Add a server to an endpoint
    delete          Delete a server belonging to an endpoint...
    list            List all servers belonging to an endpoint
    update          Update attributes of a server on an endpoint
    show            Show a server belonging to an endpoint

=== globus endpoint role ===

    create          Create a role on an endpoint
    delete          Remove a role from an endpoint
    list            List of assigned roles on an endpoint
    show            Show full info for a role on an endpoint

=== globus session ===

    boost           Boost your CLI auth session
    show            Show your current CLI auth session

=== globus bookmark ===

    rename          Change a bookmark's name
    create          Create a bookmark for the current user
    show            Given a bookmark name or ID resolves bookmark...
    list            List bookmarks for the current user
    delete          Delete a bookmark

=== globus task ===

    event-list      List Events for a given task
    show            Show detailed information about a specific...
    list            List tasks for the current user
    update          Update a task
    generate-submission-id
                    Get a submission ID
    pause-info      Show why an in-progress task is currently paused
    cancel          Cancel a task
    wait            Wait for a task to complete

=== globus config ===

    show            Show a value from the Globus config file
    init            Initialize all settings in the Globus Config file
    set             Set a value in the Globus config file
    remove          Remove a value from the Globus config file
    filename        Output the path of the config file

Finding and using data endpoints

To work with endpoints (data destinations), we need to search for and save the endpoint IDs

In [7]:
globus endpoint search "NCAR GLADE" --filter-owner-id ncar@globusid.org
ID                                   | Owner             | Display Name
------------------------------------ | ----------------- | ------------
d33b3614-6d04-11e5-ba46-22000b92c6ec | ncar@globusid.org | NCAR GLADE  

Script-friendly output can be obtained using the "jq" and "format" options

In [8]:
globus endpoint search "NCAR GLADE"      \
    --filter-owner-id ncar@globusid.org  \
    --jq "DATA[0].id" --format UNIX
d33b3614-6d04-11e5-ba46-22000b92c6ec

The jq entry should be quoted to avoid shell interpretation of special characters!

In [3]:
# Here we store the output (ID) into a bash variable
EPGLADE=$(globus endpoint search "NCAR GLADE"   \
            --filter-owner-id ncar@globusid.org \
            --jq "DATA[0].id" --format UNIX)
globus endpoint show $EPGLADE
Display Name:              NCAR GLADE
ID:                        d33b3614-6d04-11e5-ba46-22000b92c6ec
Owner:                     ncar@globusid.org
Activated:                 True
Shareable:                 True
Department:                CISL/OSD
Keywords:                  file share
Endpoint Info Link:        https://www2.cisl.ucar.edu/resources/glade
Contact E-mail:            cislhelp@ucar.edu
Organization:              National Center for Atmospheric Research
Department:                CISL/OSD
Other Contact Info:        1850 Table Mesa Dr.
Boulder, CO 80305
(303) 497-2400
Visibility:                True
Default Directory:         /~/
Force Encryption:          False
Managed Endpoint:          True
Subscription ID:           8fd6296d-74cc-11e4-a56d-12313922b1c7
Legacy Name:               ncar#gridftp
Local User Info Available: True

Activating Globus endpoints

In addition to authentication with the Globus service, you must activate the endpoints you wish to use. For NCAR CISL endpoints, two-factor authentication is used via a proxy service. This step also requires interactivity!

globus endpoint activate --myproxy $EPGLADE

--no-autoactivate tells Globus to create a new credential, rather than using a cached one that may be close to expiring

globus endpoint activate --no-autoactivate --myproxy $EPGLADE

By default, endpoint authentications last 24 hours. For NCAR CISL endpoints, you can request up to 720 hours:

globus endpoint activate --no-autoactivate --myproxy --myproxy-lifetime 720 $EPGLADE

Both GLADE and Campaign Storage use the same two-factor authentication - once you activate one, the other is also activated

In [11]:
# Return whether endpoint is activated (good practice in scripts)
globus endpoint is-activated $EPGLADE

# The return code ($?) is non-zero if the endpoint is not active
echo "Return code = $?"
d33b3614-6d04-11e5-ba46-22000b92c6ec is activated
Return code = 0
In [4]:
# Get the expiration time of the endpoint credential (not in default output)
globus endpoint is-activated --jq "expire_time" --format UNIX $EPGLADE

# Get the number of seconds until expiration
globus endpoint is-activated --jq "expires_in" --format UNIX $EPGLADE
2018-12-15 18:59:12+00:00
1296293

The expire_time and expires_in fields are only available if the endpoint is active!

Navigation and directories

In [13]:
# Retrieve NCAR Campaign Storage endpoint ID and look at contents
EPSTORE=$(globus endpoint search "NCAR Campaign Storage" \
            --filter-owner-id ncar@globusid.org          \
            --jq "DATA[0].id" --format UNIX)
globus ls $EPSTORE | head -n 5
acom/
asp/
cesm/
cgd/
cisl/

We would rather look at the space we own on Campaign Storage...

Environment variable bookmarks

In [14]:
# Display the space I own on Campaign Storage
CSHOME=/gpfs/csfs1/cisl/csg/vanderwb
globus ls -l $EPSTORE:$CSHOME
Permissions | User     | Group | Size | Last Modified             | File Type | Filename  
----------- | -------- | ----- | ---- | ------------------------- | --------- | ----------
2755        | vanderwb | csg   | 4096 | 2018-07-09 16:19:36+00:00 | dir       | bou_flood/

Native Globus bookmarks

I find these bookmarks a bit clunky to use, but they are shared between the CLI and the web interface (and across scripts)

In [15]:
# Create a web-visible bookmark to my Campaign Storage directory
# (path needs to have trailing slash to be accepted by Globus)
globus bookmark create $EPSTORE:/gpfs/csfs1/cisl/csg/vanderwb/ "CSHOME"
Bookmark ID: 30371fb2-f393-11e8-8cc8-0a1d4c5c824a
In [17]:
# We must issue a sub-command to retrieve the bookmark ID, which we give to the ls command
globus ls -l $(globus bookmark show "CSHOME")
Permissions | User     | Group | Size | Last Modified             | File Type | Filename  
----------- | -------- | ----- | ---- | ------------------------- | --------- | ----------
2755        | vanderwb | csg   | 4096 | 2018-07-09 16:19:36+00:00 | dir       | bou_flood/
In [18]:
# List all of the bookmarks I've created in the CLI and on the web
globus bookmark list
Name   | Bookmark ID                          | Endpoint ID                          | Endpoint Name         | Path                          
------ | ------------------------------------ | ------------------------------------ | --------------------- | ------------------------------
CSHOME | 30371fb2-f393-11e8-8cc8-0a1d4c5c824a | 6b5ab960-7bbf-11e8-9450-0a6d4e044368 | NCAR Campaign Storage | /gpfs/csfs1/cisl/csg/vanderwb/
CSTORE | bc68e858-f38b-11e8-8cc8-0a1d4c5c824a | 6b5ab960-7bbf-11e8-9450-0a6d4e044368 | NCAR Campaign Storage | /gpfs/csfs1/cisl/csg/vanderwb/
In [19]:
# Deleting a bookmark is simple...
globus bookmark delete "CSHOME"
Bookmark '30371fb2-f393-11e8-8cc8-0a1d4c5c824a' deleted successfully

Transfers using the CLI

All transfer commands follow the same basic format:

globus transfer SOURCE:PATH DESTINATION:PATH

There are three types of transfers possible:

  1. Single file transfers
  2. Recursive directory transfers
  3. Batch mode transfers (list of files)

Scheduling a single file transfer

In [20]:
# Create bash bookmark to our data source location on GLADE
SRCDIR=/glade/work/vanderwb/tutorials/globus-cli/sample_data

# Make a directory to place files within on Campaign Storage
globus mkdir $EPSTORE:${CSHOME}/tut_files
The directory was created successfully
In [23]:
# Submit transfer request for processing
# (the filename must be provided for both source and destination)
globus transfer $EPGLADE:${SRCDIR}/namelist.input           \
                $EPSTORE:${CSHOME}/tut_files/namelist.input
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 2d9d72c8-f394-11e8-8cc8-0a1d4c5c824a
In [24]:
# List only the latest task - the one we just submitted
globus task list --limit 1
Task ID                              | Status    | Type     | Source Display Name | Dest Display Name     | Label
------------------------------------ | --------- | -------- | ------------------- | --------------------- | -----
2d9d72c8-f394-11e8-8cc8-0a1d4c5c824a | SUCCEEDED | TRANSFER | NCAR GLADE          | NCAR Campaign Storage | NULL 

Recursively transfer directories

In [25]:
# Submit recursive transfer request and store transfer ID for use in other commands
TID=$(globus transfer --recursive               \
        $EPGLADE:${SRCDIR}/wps_input            \
        $EPSTORE:${CSHOME}/tut_files/wps_input  \
        --label "WPS Input Data Transfer"       \
        --jq "task_id" --format UNIX)
echo "Transfer $TID submitted..."

# Pause while transfer occurs (kill transfer if not complete after 1 hour)
# Task wait returns zero if the task was successful
globus task wait $TID --timeout 3600
echo "Transfer $TID finished with return code = $?"
Transfer 6d30cd2c-f394-11e8-8cc8-0a1d4c5c824a submitted...
Transfer 6d30cd2c-f394-11e8-8cc8-0a1d4c5c824a finished with return code = 0
In [26]:
# Show status and summary of transfer
globus task show $TID
Label:                   WPS Input Data Transfer
Task ID:                 6d30cd2c-f394-11e8-8cc8-0a1d4c5c824a
Is Paused:               False
Type:                    TRANSFER
Directories:             1
Files:                   4
Status:                  SUCCEEDED
Request Time:            2018-11-29 05:05:45+00:00
Faults:                  0
Total Subtasks:          6
Subtasks Succeeded:      6
Subtasks Pending:        0
Subtasks Retrying:       0
Subtasks Failed:         0
Subtasks Canceled:       0
Subtasks Expired:        0
Completion Time:         2018-11-29 05:05:51+00:00
Source Endpoint:         NCAR GLADE
Source Endpoint ID:      d33b3614-6d04-11e5-ba46-22000b92c6ec
Destination Endpoint:    NCAR Campaign Storage
Destination Endpoint ID: 6b5ab960-7bbf-11e8-9450-0a6d4e044368
Bytes Transferred:       241541795
Bytes Per Second:        45515115
In [27]:
# Show all events that have occurred in this transfer (useful for debugging)
globus task event-list $TID
Time                      | Code      | Is Error | Details                                                                    
------------------------- | --------- | -------- | ---------------------------------------------------------------------------
2018-11-29 05:05:51+00:00 | SUCCEEDED |        0 | {"files_succeeded":4}                                                      
2018-11-29 05:05:51+00:00 | PROGRESS  |        0 | {"bytes_transferred":241541795,"duration":1.81,"mbps":1066.17}             
2018-11-29 05:05:49+00:00 | STARTED   |        0 | {"concurrency":8,"parallelism":4,"pipelining":20,"type":"GridFTP Transfer"}

Creating batch transfers

In [28]:
# Globus provides globbing support using the filter option to ls
# Here we ask for all netCDF files in $SRCDIR
globus ls $EPGLADE:$SRCDIR --filter ~*.nc
met_em.d01.2001-10-25_00:00:00.nc
met_em.d01.2001-10-25_06:00:00.nc
In [31]:
# The batch file needs to have full source and destination paths
# We use a sed command to format our ls output and store it as a bash variable
SOUT="${SRCDIR}/\1 ${CSHOME}/tut_files/\1"
BF=$(globus ls $EPGLADE:$SRCDIR --filter ~*.nc | sed "s|\(.*\)|${SOUT}|")
echo "$BF"
/glade/work/vanderwb/tutorials/globus-cli/sample_data/met_em.d01.2001-10-25_00:00:00.nc /gpfs/csfs1/cisl/csg/vanderwb/tut_files/met_em.d01.2001-10-25_00:00:00.nc
/glade/work/vanderwb/tutorials/globus-cli/sample_data/met_em.d01.2001-10-25_06:00:00.nc /gpfs/csfs1/cisl/csg/vanderwb/tut_files/met_em.d01.2001-10-25_06:00:00.nc
In [32]:
# Finally, we use the variable contents as stdin to our globus batch transfer
# In this transfer, we ask Globus to only copy files that are new or updated
globus transfer $EPGLADE $EPSTORE   \
    --label "WPS Data Addendum"     \
    --jq "task_id" --format UNIX    \
    --sync-level mtime              \
    --batch <<< "$BF"
d0159972-f399-11e8-8cc8-0a1d4c5c824a

We can check on this transfer in the web GUI as well...
https://www.globus.org

Deleting files on endpoints

The delete command also supports single file, recursive, and batch modes

In [33]:
globus delete --recursive $EPSTORE:${CSHOME}/tut_files
Message: The delete has been accepted and a task has been created and queued for execution
Task ID: 07ee7ddc-f39a-11e8-8cc8-0a1d4c5c824a
In [34]:
globus ls $EPSTORE:$CSHOME
bou_flood/

Aside: Determine campaign storage usage using gladequota

In [35]:
gladequota
Current GLADE space usage: vanderwb

  Space                                 Used       Quota    % Full      # Files
---------------------------------- ----------- ----------- --------- -----------
/glade/scratch/vanderwb                0.09 TB    10.00 TB    0.90 %     1485444
/glade/work/vanderwb                  80.81 GB  1024.00 GB    7.89 %      548125
/glade/p_old/work/vanderwb           116.47 GB   512.00 GB   22.75 %      639725
/glade/u/home/vanderwb                28.78 GB    50.00 GB   57.56 %      115728
---------------------------------- ----------- ----------- --------- -----------
/glade/u/apps                        930.02 GB  1024.00 GB   90.82 %    10634664
/glade/p/cisl/CSG                   1160.27 GB     0.00 GB    0.00 %      593508
/glade/u/sampledata                   51.82 GB  1024.00 GB    5.06 %         104
/glade/p_old/cesm0005                883.32 TB   900.00 TB   98.15 %     2810054
/glade/collections/cmip               59.19 TB  3000.00 TB    1.97 %      258901
/glade/p_old/CMIP                     28.81 TB   200.00 TB   14.40 %      195313
/glade/p_old/CSG                       1.23 TB     1.73 TB   71.10 %      762843
/glade/u/hpssusrs                      0.03 TB     5.00 TB    0.60 %        6421
/glade/p_old/ncldev                    0.96 TB     2.27 TB   42.29 %      211480
---------------------------------- ----------- ----------- --------- -----------
Campaign: vanderwb (user total)        2.37 GB         n/a       n/a          44
Campaign: /cisl/csg                 4210.02 GB     0.00 GB    0.00 %         986
(Campaign usage as of: Wed Nov 28 22:05:03 MST 2018)

/glade/scratch  - 44.7% used (6707 TB used out of 15000 TB total)

Example: tcsh workflow script

In [36]:
pygmentize -f 16m workflow.csh
#!/bin/tcsh

# In this example, we run a hypothetical CFD model to produce daily
# forecasts. Analysis data is stored on Campaign Storage after it is
# produced from the raw output.

# Use input forecast start time, or use yesterday
if ( $#argv == 1 ) then
    set TIMESTR="$1"
else
    set TIMESTR="yesterday"
endif

set FY=`date -d $TIMESTR '+%Y'`
set FM=`date -d $TIMESTR '+%m'`
set FD=`date -d $TIMESTR '+%d'`

# Declare paths to use in script
set FCST=${FY}${FM}${FD}
set RDADIR=/glade/collections/rda/data/ds083.3
set SRCDIR=/glade/work/${USER}/FCSTMOD
set RUNDIR=/glade/scratch/${USER}/FCSTMOD/$FCST
set CSDIR=/gpfs/csfs1/cisl/csg/vanderwb/fcst_archive

# Load Python to get the CLI
module load python
ncar_pylib 20181024

# Create and populate run directory
mkdir -p $RUNDIR
cd $RUNDIR
ln -s ${SRCDIR}/*.exe .

# Link the static grid data
ln -s ${SRCDIR}/data/grid.dat .

# Gather initial data from RDA
ln -s ${RDADIR}/${FY}/${FY}${FM}/*${FY}${FM}${FD}00.f00* init.dat

# Run our model
./model.exe

# Run post-processing to generate analysis
./analysis.exe

# Retrieve endpoint IDs and store them as variables
set EPGLADE=`globus endpoint search 'NCAR GLADE'            \
                --filter-owner-id ncar@globusid.org         \
                --jq 'DATA[0].id' --format UNIX`
set EPSTORE=`globus endpoint search 'NCAR Campaign Storage' \
                --filter-owner-id ncar@globusid.org         \
                --jq 'DATA[0].id' --format UNIX`

# Check if endpoint is activated
# (we dont't care about output, only return code)
globus endpoint is-activated $EPGLADE >& /dev/null

if ( $status > 0 ) then
    echo "Fatal: NCAR endpoints aren't activated." > globus.log
    echo "Aborting transfer..." >> globus.log
    echo "Failed: $FCST to Campaign Storage!" > ~/GLOBUS-ERROR.$FCST
    exit 1
else
    set EXPIRE=`globus endpoint is-activated                \
                    --jq expire_time -F unix $EPGLADE`
    echo "NCAR endpoints active until $EXPIRE" > globus.log
endif

# Check if destination directory exists; if not, create it
globus ls ${EPSTORE}:$CSDIR >& /dev/null

if ( $status != 0 ) then
    globus mkdir ${EPSTORE}:$CSDIR >>& globus.log
endif

set DESTDIR=${CSDIR}/$FCST
globus mkdir ${EPSTORE}:${DESTDIR} >>& globus.log

# Start copy of GLADE data holdings to CS
set BATCHFMT="${RUNDIR}/\1 ${DESTDIR}/\1"
ls -1 fcst*.nc | sed "s|\(.*\)|${BATCHFMT}|" > globus-batch.txt

globus transfer $EPGLADE $EPSTORE                           \
    --label "$FCST - copy forecast to CS"                   \
    --batch < globus-batch.txt >>& globus.log

Example: bash data preserver script

In [3]:
pygmentize -f 16m preserve.sh
#!/bin/bash

# Declare paths to use in script
CASE=BF2013-ens
CASENAME="Sept 2013 WRF Ensemble"
LOGDIR=/glade/work/${USER}/WRF/${CASE}/logs
SRCDIR=/glade/scratch/${USER}/WRF/$CASE
DESTDIR=/gpfs/csfs1/cisl/csg/vanderwb/$CASE
TDATE=$(date +%y%m%d-%H%M%S)

# Mail message to send for endpoint failure
function errormail {
mail -s "ENDPOINT INACTIVE - transfer to CS failed"         \
     -r "${USER}<${USER}@ucar.edu>" << EOM
Transfer to Campaign Storage at $TDATE failed.
Globus endpoints need activation. Run:

globus endpoint activate --myproxy
    --myproxy-lifetime <HOURS> $1
EOM
}

function warnmail {
mail -s "NCAR CISL Globus credential expires soon"          \
     -r "${USER}<${USER}@ucar.edu>" << EOM
NCAR CISL endpoints expire on $EXPIRE
and should be reactivated soon. Run:

globus endpoint activate --no-autoactivate
    --myproxy --myproxy-lifetime <HOURS> $1
EOM
}

# Load Python to get the CLI
module load python
ncar_pylib 20181024

cd $LOGDIR

# Retrieve endpoint IDs and store them as variables
EPGLADE=$(globus endpoint search 'NCAR GLADE'               \
            --filter-owner-id ncar@globusid.org             \
            --jq 'DATA[0].id' --format UNIX)
EPSTORE=$(globus endpoint search 'NCAR Campaign Storage'    \
            --filter-owner-id ncar@globusid.org             \
            --jq 'DATA[0].id' --format UNIX)

# Check if endpoint is activated
# (we dont't care about output, only return code)
globus endpoint is-activated $EPGLADE >& /dev/null

if [[ $? != 0 ]]; then
    echo "Fatal: NCAR endpoints aren't activated." > log.$TDATE
    echo "Aborting transfer..." >> log.$TDATE
    epmail $EPGLADE
    exit 1
else
    EXPIRE=$(globus endpoint is-activated                   \
                --jq "expire_time" -F unix $EPGLADE)
    echo "NCAR endpoints active until $EXPIRE" > log.$TDATE

    # If credential has less than five days until expiry,
    # send a warning email
    TIMELEFT=$(globus endpoint is-activated                 \
                --jq "expires_in" -F unix $EPGLADE)

    if [[ $TIMELEFT -le 432000 ]]; then
        warnmail $EPGLADE
    fi
fi

# Start copy of GLADE data holdings to CS
# Use modification time to determine which files to copy
TID=$(globus transfer --recursive --sync-level mtime        \
        --label "$CASENAME - $TDATE backup"                 \
        ${EPGLADE}:$SRCDIR ${EPSTORE}:$DESTDIR              \
        --jq task_id --format UNIX)

# Wait for task to complete so that we can log what happened
# (make sure we don't wait forever)
globus task wait $TID --timeout 21600

# Output information about transfer
globus task show $TID >> log.$TDATE
globus task show -t $TID > files.$TDATE
globus task event-list $TID > events.$TDATE
globus ls ${EPSTORE}:${DESTDIR} > ls.$TDATE

CISL Help Desk / Consulting

https://www2.cisl.ucar.edu/user-support/getting-help

  • Walk-in: ML 1B Suite 55
  • Email: cislhelp@ucar.edu
  • Phone: 303-497-2400

Specific questions from today and/or feedback:

  • Email: vanderwb@ucar.edu