Starting jobs on Casper nodes

Interactive jobs | Batch jobs | GPU requests 
NVMe storageScript examples | Compiling code

Updated 7/15/2020: This page was revised to reflect changes in how to request the use of GPUs.

To run jobs on the Casper cluster, users submit them with the open-source Slurm Workload Manager. Procedures for starting both interactive jobs and batch jobs are described below. Also:

  • Compile your code on Casper nodes if you will run it on Casper. (See compiling your code below.)
  • See Calculating charges to learn how core-hours charges are calculated for jobs that run on Casper.

Begin by logging in on Casper (casper.ucar.edu) or Cheyenne (cheyenne.ucar.edu).

Users may have up to 36 jobs running concurrently on the Casper cluster. Jobs submitted in excess of the 36-job limit will be put into “Pending” state by the Slurm workload manager. As running jobs complete, pending jobs will be released for execution in the order in which they were submitted.

Wall-clock

The wall-clock limit on the Casper cluster is 24 hours.

Specify the hours your job needs as in the examples below, which use the hours:minutes:seconds format. It can be shortened to minutes:seconds.


Interactive jobs

Starting a remote command shell with execdav

Run the execdav script/command to start an interactive job. Invoking it without an argument will start an interactive shell on the first available DAV node. The default wall-clock time is 6 hours.

The execdav command accepts all Slurm resource requests as detailed by man salloc. Some common requests include:

  • --account=project_code (defaults to the DAV_PROJECT value that you set in your start file)
  • --time=00:00:00 (defaults to 6 hours)
  • --ntasks=number_of_tasks (defaults to 1 task)
    • When you launch the interactive job, your login shell uses 1 task or "slot," so adjust --ntasks by requesting enough cores to account for that.
  • --mem=nG
    • Use this if you want to specify how much memory to use per node, from 1 to 1100 gigabytes.
      Example: --mem=300G
    • If you do not specify memory per node, the default memory available is 1.87G per core that you request.
  • --constraint=skylake (default)

* * *

Starting a virtual desktop with vncmgr

If your work with complex programs such as MATLAB and VAPOR requires the use of virtual network computing (VNC) server and client software, use vncmgr instead of execdav.

Using vncmgr simplifies configuring and running a VNC session in a Casper batch job. How to do that is documented here.

* * *

Using exechpss

The exechpss command is used to initiate HSI and HTAR file transfers.

See examples here:


Batch jobs

Prepare a batch script by following one of the examples below. Be aware that the system does not import your Cheyenne environment, so make sure your script loads the software modules that you will need to run the job.

Basic Slurm commands

When your script is ready, run sbatch to submit the job.

sbatch script_name

To check on your job's progress, run squeue.

squeue -u $USER

To get a detailed status report, run scontrol show job followed by the job number.

scontrol show job nnn

To kill a job, run scancel with the job number.

scancel nnn

Requesting GPU access

Two types of GPUs are available on the Casper system:

  • NVIDIA Tesla V100 GPUs for intensive GPGPU computing and machine learning/deep learning
  • NVIDIA Quadro GP100 GPUs for visualization and light GPGPU workloads

The V100 GPUs are intended for intensive computation. A feature called GPU isolation limits their use to jobs that reserve them as consumable resources with the Slurm --gres option. These examples show how to reserve V100 GPUs (1 with execdav; 2 in a batch job directive):

execdav --gres=gpu:v100:1
#SBATCH --gres=gpu:v100:2

The GP100 GPUs are not subject to GPU isolation because visualization tasks can typically be shared on a single GPU. To ensure that your job is placed on a node with the GP100 resource available, follow this example to specify it as a constraint:

#SBATCH --constraint=gp100

A "v100" feature constraint also exists, as shown in the table below. If you use it, your job will be placed on a node with V100 GPUs, but will not be able to access them for computational tasks. Use --gres instead.

Node requests can be constrained by various hardware and software features. The following table shows how many nodes have which features and consumable resources. Minimizing feature constraints and resource reservations decreases the length of time your job waits in the queue. When you do use them, make sure that your constraints and reservations don't conflict. (For example, don't constrain your job to nodes with X11 support while reserving V100 GPUs.)

Node count

Constraints/features

Consumable resources

12

skylake (default)

 

8

skylake, gpu, x11, gp100

gpu:gp100:1

2

skylake, gpu, v100, 4xv100

gpu:v100:[1-4]

4

skylake, gpu, v100, 8xv100

gpu:v100:[1-8]


NVMe node-local storage

Casper nodes each have 2 TB of local NVMe solid-state disk (SSD) storage. Some is used to augment memory to reduce the likelihood of jobs failing because of excessive memory use.

NVMe storage can also be used while a job is running. (Recommended only for I/O-intensive jobs.) Data stored in /local_scratch/$SLURM_JOB_ID are deleted when the job ends.

To use this disk space while your job is running, include the following in your batch script after customizing as needed.

### Copy input data to NVMe (can check that it fits first using "df -h")
cp -r /glade/scratch/$USER/input_data /local_scratch/$SLURM_JOB_ID

### Run script to process data (NCL example takes input and output paths as command line arguments)
ncl proc_data.ncl /local_scratch/$SLURM_JOB_ID/input_data /local_scratch/$SLURM_JOB_ID/output_data

### Move output data before the job ends and your output is deleted
mv /local_scratch/$SLURM_JOB_ID/output_data /glade/scratch/$USER/

Script examples

The examples below show how to create a script for running an MPI job. They use the long form of #SBATCH directives, in which each option begins with two dashes – as in #SBATCH --partition=dav, for example. For information on all options, run man sbatch.

See these pages for more script examples:

For tcsh users

Insert your own project code where indicated and customize other settings as needed for your own job.

#!/bin/tcsh
#SBATCH --job-name=mpi_job
#SBATCH --account=project_code
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:10:00
#SBATCH --partition=dav
#SBATCH --output=mpi_job.out.%j

setenv TMPDIR /glade/scratch/$USER/temp
mkdir -p $TMPDIR

### Run program
srun ./executable_name

For bash users

Insert your own project code where indicated and customize other settings as needed for your own job.

#!/bin/bash -l
#SBATCH --job-name=mpi_job
#SBATCH --account=project_code
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:10:00
#SBATCH --partition=dav
#SBATCH --output=mpi_job.out.%j

export TMPDIR=/glade/scratch/$USER/temp
mkdir -p $TMPDIR

### Run program
srun ./executable_name

Compiling your code

CISL recommends using the default Intel, GNU or PGI compilers for parallel programs.

  1. Load the compiler.
  2. Load the openmpi module if you plan to use MPI.
  3. Compile your code as you usually do.

Serial programs can use any compiler.