CISL best practices

Sharing resources | Managing files | Managing allocations | Transferring data

The practices described below will help you make the most of your computing and storage allocations.

Sharing resources

Limit your use of shared licenses

Users of the resources that CISL manages share a limited number of licenses for running IDL, MATLAB, Mathematica, and some other applications. Be familiar with and follow the license-use guidelines that have been established to ensure fair access for all users. CISL reserves the right to kill jobs/tasks of users who monopolize these licenses.

Don't monopolize compute resources

best practicesBe considerate of other users when planning and scheduling jobs. To keep from monopolizing these resources, for example, avoid writing job submission scripts that rapidly fill the scheduler with potentially concurrent compute resource requests. Contact the Consulting Services Group for guidance if your workload will require you to submit numerous jobs in a short timeframe. CISL monitors the use of these resources and will kill jobs when necessary to ensure fair access.

Managing files

Use HPSS only for long-term storage

Use the HPSS tape archive to store only the data that you need to save long-term. Rather than routinely copying output to HPSS right after you complete a simulation run, for example, use the large GLADE scratch space and save the data to HPSS only after post-processing.

Using the tape archive only for long-term storage helps conserve your storage allocation and allows the HPSS system to run more efficiently for everyone.

Use scratch space for temporary files

The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of time. It is also the recommended space for temporary files that would otherwise reside in small /tmp or /var/tmp directories that many users share. See Storing temporary files with TMPDIR for more information.

Store large files

Storing large files, such as tar files, is more efficient than storing numerous small files. In the case of GLADE disk storage, this is because the system allocates a minimum amount of space for each file, no matter how small. That amount varies depending on which of several file spaces holds the file. See this GLADE documentation for details.

Also see File size guidelines for HPSS.

Configure jobs to avoid massive directories

Ensemble runs, data assimilation runs, and other jobs can generate tens or hundreds of thousands of output files, log files, and others over time. Such large numbers of files can be difficult to manage and remove from GLADE file spaces when they are no longer needed. Configuring jobs to place no more than 2,000 to 3,000 files in a single directory will make them easier to manage. See Removing large numbers of files for how to remove massive accumulations of files.

Avoid sharing storage spaces

If you have an account for using the supercomputers, analysis, and visualization systems that CISL manages, you have your own home directory in the GLADE environment. Other users have their own home directories, too, so it isn't necessary to share by giving others write permission.

Sharing often leads to unnecessary confusion over file ownership as your work progresses. If you and your colleagues need to write files to a common space, consider using a work space or project space, and plan to back the files up in HPSS when appropriate. (Do not share your HPSS home directory.)

Set permissions when you create files

Set permissions when you create a file. While file ownership and permissions can be changed after the fact, establishing them when you create the file will simplify your life and save you time and effort later.

Always specify project codes

Always specify a project code for charging purposes when using HPSS—even if it is your default project code. Why? Your default may change. By always specifying the project code, you make it easier to change in your jobs and scripts when necessary. It also helps you manage HPSS charges and avoid surprises down the road.

Organize for efficiency

Organize your files and keep them that way. Arrange them in same-purpose trees, for example. Say you have 20 TB of Mount Pinatubo volcanic aerosols data. Keep the files in a subdirectory such as /glade/u/home/$USER/pinatubo rather than scattered among unrelated files or in multiple directories. Specialized trees are easier to share with other users and to transfer to other users or projects as necessary.

Back up critical files

Back up files that are critical to your project. HPSS is a highly reliable system, but files stored there are not backed up. In the unlikely event that your files are affected by breakage of a storage tape, CISL may not be able to restore them quickly or entirely. Consider storing two copies of your most critical files in HPSS or keeping one in HPSS and one in another repository. You are responsible for replicating any data that you feel should be stored at an additional location.

Remove unneeded data

Periodically examine your GLADE and HPSS holdings and remove unwanted, unneeded files. This reduces charges against your storage allocation and makes these systems more efficient for everyone.

Don't leave orphaned files

Don't leave orphaned files behind. Before your involvement in a project ends, transfer your files or arrange for someone else to take ownership of the files.

Managing allocations

Monitor usage charges

Check your usage charges frequently to help ensure that you are using CISL resources as efficiently as possible, and make sure that others who are authorized to charge against your allocation understand how to use them efficiently. Understand how your choice of queues affects charges against your computing allocation, and be aware of other allocation-related policies. See Managing allocations and charges.

If you are authorized to charge your work against multiple projects, check your usage charges and storage holdings for each project on a regular basis. This will help ensure that you are charging jobs correctly and help you avoid overspending allocations.

Contact CISL consultants

Before you run a set of jobs that will consume a large portion of your allocation—a long experiment, for example—ask the Consulting Services Group to review your job configuration. One of the consultants may be able to suggest an economical workflow that will help you conserve computing resources. This is especially important if you are unfamiliar with job configuration or with how to manage your allocation efficiently.

Optimize on a single processor

Minimize your use of computing resources and conserve your allocation by optimizing your code on a single processor before running larger jobs in production. Use optimizing libraries if your code lends itself to that.

Transferring data

Use HTAR to archive large numbers of files

CISL recommends using HTAR rather than HSI if you need to archive large numbers of individual files that are smaller than 1 MB. Failure to follow this guideline can result in suspension of your access to HPSS.

Transferring hundreds or thousands of small files with HSI commands significantly slows the entire system for all users. HTAR is more efficient for archiving and reduces the time it takes to retrieve files later.

Also see File size guidelines for HPSS.

Use Globus to transfer files

CISL recommends using Globus to transfer large files or data sets between the GLADE centralized file service and remote destinations such as XSEDE facilities. It is a convenient, easy-to-use interface, and offers a feature called Globus Connect Personal that enables users to move files easily to and from laptop or desktop computers and other systems. Secure Copy Protocol (SCP) works well for transferring a few relatively small files between systems.