CISL best practices

Using shared resources | Managing allocations | Writing job scripts | Managing files | Transferring data

Updated 4/15/2019

The practices described below will help you make the most of your computing and storage allocations.


Using shared resources

Be considerate of others in the user community when you work with these shared computing and storage resources. Here are a few key issues to keep in mind.

best practices

Use login nodes only for their intended purposes

You can run short, non-memory-intensive processes on the login nodes. These include tasks such as text editing or running small serial scripts or programs. Memory-intensive processes that slow login node performance for all users are killed automatically and the responsible parties are notified by email. See Appropriate use of login nodes for more information.

Use the Cheyenne and Casper nodes that best meet your needs

The Cheyenne system and Casper nodes are configured for distinct purposes. Cheyenne is best used for running climate and weather models and simulations while the heterogeneous Casper cluster of nodes is for other specialized work. Most Casper nodes are used for analyzing and visualizing data while others feature large-memory, dense GPU configurations that support explorations in machine learning and deep learning.

This documentation explains how to get jobs running on the most appropriate system for your work and on the individual types of nodes that will best meet your needs:

For expert assistance or guidance in using these resources, contact the CISL Consulting Services Group.

Don't monopolize compute resources

Consider what impact you might have on the work of others and schedule jobs accordingly. For example, avoid writing job submission scripts that rapidly fill the scheduler with potentially concurrent compute resource requests. Contact the Consulting Services Group for guidance if your workload requires you to submit numerous jobs in a short time frame. CISL monitors the use of these resources and will kill jobs when necessary to ensure fair access for all users.

Limit your use of shared licenses

Users share a limited number of licenses for running IDL, MATLAB, Mathematica, and some other applications. Be familiar with and follow the established license-use guidelines to ensure fair access for all users. CISL reserves the right to kill jobs/tasks of users who monopolize these licenses.


Managing allocations

Monitor usage charges

Check your usage charges frequently to help ensure that you are using CISL resources as efficiently as possible. Also make sure that others who are authorized to charge against your allocation understand how to use them efficiently. Understand how your choice of queues affects charges against your computing allocation and be aware of other allocation-related policies. See Managing allocations and charges.

If you are authorized to charge your work against multiple projects, check your usage charges and storage holdings for each project on a regular basis. This will help ensure that you are charging jobs correctly and help you avoid overspending your allocations.

Optimize on a single processor

Minimize your use of computing resources and conserve your allocation by optimizing your code on a single processor before running larger jobs in production. Use optimizing libraries if your code lends itself to that.

Always specify project codes

Always specify a project code for charging purposes when using HPSS – even if it is your default project code. Why? Your default may change. By always specifying the project code, you make it easier to change in your jobs and scripts when necessary. It also helps you manage HPSS charges and avoid surprises down the road.

Remove unneeded data

Periodically examine your GLADE, HPSS, and NCAR Campaign Storage holdings and remove unwanted, unneeded files. This reduces charges against your storage allocation and makes these systems more efficient for everyone.

Contact CISL consultants

Before you run a set of jobs that will consume a large portion of your allocation – a long experiment, for example – ask the Consulting Services Group to review your job configuration. One of the consultants may be able to suggest an economical workflow that will help you conserve computing resources. This is especially important if you are unfamiliar with job configuration or with how to manage your allocation efficiently.


Writing job scripts 

Avoid hardcoding in your job scripts

Use relative paths and environment variables instead of hardcoding directory names in your job scripts. Hardcoding in scripts and elsewhere can make debugging your code more difficult and also complicate situations in which others need to copy your directories to build and run your code as themselves.

Here’s one simple example of what not to do in your script:

cd /glade/scratch/joe/code/running_directory

Instead, replace your hardcoded username with $USER:

cd /glade/scratch/$USER/code/running_directory

Better yet, assume that you will launch the job from your working directory so you don’t need to include the path in your script at all.

Use comments in job scripts

When setting a variable in your job scripts or startup files, include the date and a brief description of the variable's purpose. This practice may help prevent propagation of variables that are possibly inappropriate in carrying jobs and environments forward. One example is noting the use of a variable that is not set or appropriate in most other scripts. 

# yyyy-mm-dd Context: Cheyenne MPT peak_memusage job. 
# Variable MPI_SHEPHERD is set in this job in order to
# enable peak_memusage. Do not propagate it to other MPT  
# jobs as it may cause significant slowdown or timeout.

setenv MPI_SHEPHERD true

Prepare for debugging and troubleshooting

Arrange the script, source code, and data used in your job in a few directories to make it easy for others to copy and debug if necessary. Also: Include a README file that details the environment needed to configure, build and run, and that identifies the required modules and environment variables. Ask a colleague or CISL Consulting Services Group consultant to copy and run the code themselves.


Managing files

Set permissions when you create files

Set permissions when you create a file. While you can change file ownership and permissions after the fact, establishing them when you create the file will simplify your life and save you time and effort later.

Configure jobs to avoid massive directories

Ensemble runs, data assimilation runs, and other jobs generate tens or hundreds of thousands of output files, log files, and others over time. Such large numbers of files can be difficult to manage and remove from GLADE file spaces when they are no longer needed. Configuring jobs to place no more than 2,000 to 3,000 files in a single directory will make them easier to manage.

See Removing large numbers of files for how to remove massive accumulations of files.

Use scratch space for temporary files

The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of time. It is also the recommended space for temporary files that would otherwise reside in small /tmp or /var/tmp directories that many users share. See Storing temporary files with TMPDIR for more information.

Use the most appropriate storage system

Review and understand the intended uses of GLADE, the High Performance Storage System (HPSS), and the NCAR Campaign Storage file system. The HPSS tape archive, for example, is for long-term data storage. Rather than routinely copying output to HPSS right after you complete a simulation run, use the large /glade/scratch space and then save your data to your work space, project space, or Campaign Storage after post-processing. Individual NCAR labs and project leads for universities that have Campaign Storage space establish their own workflows and storage policies.

Store large files

Storing large files, such as tar files, is more efficient than storing numerous small files. In the case of GLADE disk storage, this is because the system allocates a minimum amount of space for each file, no matter how small. That amount varies depending on which of several file spaces holds the file. See this GLADE documentation for details.

Also see File size guidelines for HPSS.

Avoid sharing home spaces

If you have an account for using the supercomputers, analysis, and visualization systems that CISL manages, you have your own /glade/u/home directory. Other users have their own home directories, too, so it isn't necessary to share by giving others write permission. Sharing often leads to unnecessary confusion over file ownership as your work progresses.

If you and your colleagues need to write files to a common space, consider using a work space or project space.

Also: Do not share your HPSS home directory.

Organize for efficiency

Organize your files and keep them that way. Arrange them in same-purpose trees, for example. Say you have 20 TB of Mount Pinatubo volcanic aerosols data. Keep the files in a subdirectory such as /glade/u/home/$USER/pinatubo rather than scattered among unrelated files or in multiple directories. Specialized trees are easier to share with other users and to transfer to other users or projects as necessary.

Back up critical files

Back up files that are critical to your project. While HPSS, for example, is a highly reliable system for long-term storage, files stored there are not backed up. In the unlikely event that your files are affected by breakage of a storage tape, CISL may not be able to restore them quickly or entirely. It is possible to store two copies of some files in HPSS, but also consider using another repository.

Similarly, with the exception of users' /glade/u/home spaces, the GLADE and Campaign Storage file systems are not backed up. You are responsible for replicating any data that you feel should be stored at an additional location.

Don't leave orphaned files

Don't leave orphaned files behind. Before your involvement in a project ends, transfer your files or arrange for someone else to take ownership of the files.


Transferring data

Use Globus to transfer files

CISL recommends using Globus to transfer large files or data sets between the GLADE centralized file service and remote destinations such as XSEDE facilities. (Transferring files between GLADE and the NCAR Campaign Storage file system requires the use of Globus.) In addition to web and command line interfaces, Globus offers a feature called Globus Connect Personal that enables users to move files easily to and from laptop or desktop computers and other systems.

Secure Copy Protocol (SCP) works well for transferring a few relatively small files between most systems.

Use HTAR to archive large numbers of files

If you have large numbers of small (> 1 MB) individual files that you need to store on tape in HPSS, CISL recommends using HTAR rather than HSI to archive them. Transferring hundreds or thousands of small files with HSI commands significantly slows the entire system for all users. HTAR is more efficient for archiving and it reduces the time it takes to retrieve files later. Failure to follow this guideline can result in suspension of your access to HPSS.

Also see File size guidelines for HPSS.