Running jobs

Submitting jobs | Monitoring jobs | Checking backfill windows

Users schedule their jobs to run on the Yellowstone, Geyser, and Caldera clusters by submitting them through Platform LSF.

Most production computing jobs run in batch queues on the 1.5-petaflops Yellowstone high-performance computing (HPC) system. Shared-node batch jobs and some exclusive-use batch jobs also may run on the Geyser and Caldera clusters. Interactive queues are available on both Geyser and Caldera.

Use of login nodes

Users can run short, non-memory-intensive processes interactively on the system's login nodes. These include tasks such as text editing or running small serial scripts or programs. Do not run programs or models that consume more than a few minutes of CPU time, more than a few GB of memory, or excessive I/O resources. You can compile models and programs. The number of simultaneously executing compilation process threads may not exceed eight (8). Typically this is controlled by an argument following the “-j” option for your GNU make command.

All tasks that are run on login nodes are run “at risk.” If any task consumes excessive resources (determined at the discretion of the CISL consultants or Yellowstone system administrators), a system administrator will kill the process or processes and you will be notified.

Preparing jobs

Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity.

Check for backfill windows; you may be able to adjust your wall-clock estimate and have your job fit an available window.

Note the system's usable memory and configure your job script to maximize performance.

Submitting jobs

To submit a simple MPI batch job, follow the instructions below.

To submit an interactive job, see Running interactive applications.

See Platform LSF examples for additional sample scripts.

Batch jobs

To submit a batch job, use the command bsub with the redirect sign (<) and the name of your batch script file.

bsub < script_name

We recommend passing the options to bsub in a batch script file rather than with numerous individual commands.

Include these options in your script:

  • -J job_name
  • -P project_code
  • -R with "span[ptile=n]" for tasks per node
  • -W [hour:]minute
  • -e error_file_name
  • -o output_file_name
  • -n number of tasks
  • -q queue_name
  • -w dependency_expression (if applicable)
  • -B (if you want to receive an email when the job starts)
  • -N (if you want to receive the job report by email when the job finishes)

Use the same name for your output and error files if you want the data stored in a single file rather than separately.

Batch script for pure MPI job

Here is a batch script example that will use four nodes (16 MPI tasks per node) for six minutes on Yellowstone in the regular queue.

# LSF batch script to run an MPI application
#BSUB -P project_code        # project code
#BSUB -W 00:06               # wall-clock time (hrs:mins)
#BSUB -n 64                  # number of tasks in job         
#BSUB -R "span[ptile=16]"    # run 16 MPI tasks per node
#BSUB -J myjob               # job name
#BSUB -o myjob.%J.out        # output file name in which %J is replaced by the job ID
#BSUB -e myjob.%J.err        # error file name in which %J is replaced by the job ID
#BSUB -q regular             # queue

#run the executable
mpirun.lsf ./myjob.exe


Troubleshooting tips

Error message

mpirun.lsf: LSF_PJL_TYPE is undefined. Exit ...

What to do

Use the bsub command as described here.

A common mistake that leads to this error message is trying to execute ./job.lsf rather than bsub < job.lsf. The error results when the mpirun.lsf command in a job script is executed in the absence of environment variables that are provided when you submit a job correctly.

* * * * * * * * * *

Error message

Your job has been rejected.
You must declare a wall-clock time with the bsub -W directive.
If you have specified this directive and your job is still
rejected, verify that you have not exceeded your GLADE quotas
(use "gladequota") and that you are properly redirecting job
file input (e.g., bsub < jobfile).

To take advantage of backfill, the declared wall-clock time
should be less than the maximum wall-clock limit for the queue
to which you are submitting the job.

What to do

Check each factor noted in the message. Simply forgetting to include the < in bsub < jobfile is a common mistake. Review the documentation above regarding how to submit jobs.

* * * * * * * * * *

Error message

Jobs submitted with 32 tasks per node using batch option -R "span[ptile=32]" are killed with this error message:

ERROR: 0031-758 AFFINITY: [ys0116] Oversubscribe: 32 tasks in total,  each task requires 1 resource, but
there are only 16 available resource. Affinity cannot be applied

What to do

Submit your job with environment variable MP_TASK_AFFINITY set to cpu as shown here:

export MP_TASK_AFFINITY=cpu (for bash/sh/ksh users)
setenv MP_TASK_AFFINITY cpu   (for csh/tcsh users)

Monitoring jobs

To get information about your unfinished jobs, use the command bjobs.

To get information regarding unfinished jobs for a user group, add -u and the group name.

To list all unfinished jobs, use all.

bjobs -u all
354029  siliu  RUN  regular  yslogin1-ib  32*ys0101-ib  *p.cpp.exe  May 30 10:03
354032  siliu  RUN  regular  yslogin1-ib  32*ys0104-ib  *p.cpp.exe  May 30 10:22
354033  siliu  RUN  regular  yslogin1-ib  32*ys0105-ib  *p.cpp.exe  May 30 10:35
354036  siliu  RUN  regular  yslogin1-ib  32*ys0119-ib  *p.cpp.exe  May 30 10:54
354037  siliu  RUN  regular  yslogin1-ib  32*ys0123-ib  *p.cpp.exe  May 30 11:33

You can suppress lines that show each individual node used in large jobs by piping the output through grep as follows.

bjobs -u all | grep -v "^    "

For information about your own unfinished jobs in a queue, use -q and the queue name.

bjobs -q queue_name

For a summary of batch jobs that have already run, use bhist. Sample output:

Summary of time in seconds spent in various states:
354029  siliu  *cpp.exe  2    0     55   0     0     0     57
354032  siliu  *cpp.exe  2    0     55   0     0     0     57
354033  siliu  *cpp.exe  2    0     55   0     0     0     57
354036  siliu  *cpp.exe  2    0     54   0     0     0     56
354037  siliu  *cpp.exe  2    0     54   0     0     0     56

Other useful commands include:

bpeek – Allows you to watch the error and output files of a running batch job. This is particularly useful for monitoring a long-running job; if the job isn't running as you expected, you may want to kill it to preserve computing time and resources.

bpeek jobid

bkill – Removes a queued job from LSF, or stops and removes a running job. Use it with the Job ID, which you can get from the output of bjobs.

bkill jobid

tail – When used with the -f option to monitor a log file, this enables you to view the log as it changes. To use it in the Yellowstone environment, also disable inotify as shown in this example to ensure that your screen output gets updated properly.

tail ---disable-inotify -f /glade/scratch/username/filename.log

Checking backfill windows

Run the bfill command before submitting your job to see if backfill windows are available. With that information, you may be able to adjust your job script wall-clock estimate and have your job start more quickly than it might otherwise.

The bfill command parses and reformats output from the native LSF bslots command.

For Yellowstone, as shown in the sample output below, bfill reports the backfill window's duration and how many nodes are available.

For Geyser and Caldera, where jobs most typically run on shared nodes, bfill indicates:

  • the time available (often "Unlimited" – up to the queue's wall-clock limit),
  • the number of entire nodes available,
  • and other slots that a job could potentially use.

When system use is high, few large backfill windows are likely to be available on Yellowstone. Some large windows might become available as the system is drained prior to jobs launching in the capability queue or prior to system maintenance downtimes.

Use the bfill information as general guidance. The nodes and slots are NOT guaranteed to be available at job submission.

bfill output sample

----- Current backfill windows -----
Yellowstone:  00:16:18 -     1 nodes
Yellowstone: Unlimited -    16 nodes
Geyser: Unlimited -  0 entire nodes, plus 1140 slots
Caldera: Unlimited - 12 entire nodes, plus  122 slots