Running jobs

Submitting jobs | Monitoring jobs | Checking backfill windows

Users schedule their jobs to run on the Yellowstone, Geyser, and Caldera clusters by submitting them through Platform LSF.

Most production computing jobs run in batch queues on the 1.5-petaflops Yellowstone high-performance computing (HPC) system. Shared-node batch jobs and some exclusive-use batch jobs also may run on the Geyser and Caldera clusters. Interactive queues are available on both Geyser and Caldera.

Use of login nodes

Users can run short, non-memory-intensive processes interactively on the system's login nodes. These include tasks such as text editing or running small serial scripts or programs.

You can compile models and programs. The number of simultaneously executing compilation process threads may not exceed eight (8). Typically this is controlled by an argument following the “-j” option for your GNU make command.

All tasks that you run on login nodes are run “at risk.” If any task or multiple concurrent tasks being run by an individual user consumes excessive resources, the task or tasks will be killed and you will be notified.

Do not run programs or models that consume excessive amounts of CPU time, more than a few GB of memory, or excessive I/O resources. Instead, use the Yellowstone batch nodes or the shared nodes on the Geyser and Caldera clusters. Many tasks can be performed easily on Geyser and Caldera by using the execgy and execca scripts.

Preparing jobs

Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity.

Check for backfill windows; you may be able to adjust your wall-clock estimate and have your job fit an available window.

Note the system's usable memory and configure your job script to maximize performance.


Submitting jobs

To submit a simple MPI batch job, follow the instructions below.

To start an interactive job on Yellowstone, follow the example here.

To start an interactive job on Geyser or Caldera, see Running interactive applications.

See Platform LSF examples for additional sample scripts.

Batch jobs

⇒ bsub

To submit a batch job, use the command bsub with the redirect sign (<) and the name of your batch script file.

bsub < script_name

We recommend passing the options to bsub in a batch script file rather than with numerous individual commands.

Include these options in your script:

  • -J job_name
  • -P project_code
  • -R with "span[ptile=n]" for tasks per node
  • -W [hour:]minute
  • -e error_file_name
  • -o output_file_name
  • -n number of tasks
  • -q queue_name
  • -w dependency_expression (if applicable)
  • -B (if you want to receive an email when the job starts)
  • -N (if you want to receive the job report by email when the job finishes)

Use the same name for your output and error files if you want the data stored in a single file rather than separately.

Loading modules in a batch script

Users sometimes need to execute module commands from within a batch job—to load an application such as NCL, for example, or to load or remove other modules.

To ensure that the module commands are available, insert the following in your batch script if you need to include module commands.

In a tcsh script:

source /glade/u/apps/opt/lmod/4.2.1/init/tcsh

In a bash script:

source /glade/u/apps/opt/lmod/4.2.1/init/bash

Once that is included, you can add the module purge command if you need to and then load just the modules that are needed to establish the software environment that your job requires.

Batch script for pure MPI job

Here is a batch script example for a job that will use four nodes (16 MPI tasks per node) for six minutes on Yellowstone. Insert your own project code, job name and executable, and specify a queue.

#!/bin/tcsh
#
# LSF batch script to run an MPI application
#
#BSUB -P project_code        # project code
#BSUB -W 00:06               # wall-clock time (hrs:mins)
#BSUB -n 64                  # number of tasks in job         
#BSUB -R "span[ptile=16]"    # run 16 MPI tasks per node
#BSUB -J job_name            # job name
#BSUB -o job_name.%J.out     # output file name in which %J is replaced by the job ID
#BSUB -e job_name.%J.err     # error file name in which %J is replaced by the job ID
#BSUB -q queue_name          # queue

#run the executable
mpirun.lsf ./myjob.exe

 


Troubleshooting tips

Error message


mpirun.lsf: LSF_PJL_TYPE is undefined. Exit ...

What to do

Use the bsub command as described above.

A common mistake that leads to this error message is trying to execute ./job.lsf rather than bsub < job.lsf. The error results when the mpirun.lsf command in a job script is executed in the absence of environment variables that are provided when you submit a job correctly.

* * * * * * * * * *

Error message


Your job has been rejected.
You must declare a wall-clock time with the bsub -W directive.
If you have specified this directive and your job is still
rejected, verify that you have not exceeded your GLADE quotas
(use "gladequota") and that you are properly redirecting job
file input (e.g., bsub < jobfile).

To take advantage of backfill, the declared wall-clock time
should be less than the maximum wall-clock limit for the queue
to which you are submitting the job.

What to do

Check each factor noted in the message. Simply forgetting to include the < in bsub < jobfile is a common mistake. Review the documentation above regarding how to submit jobs.

* * * * * * * * * *

Error message


Jobs submitted with 32 tasks per node using batch option -R "span[ptile=32]" are killed with this error message:

ERROR: 0031-758 AFFINITY: [ys0116] Oversubscribe: 32 tasks in total,  each task requires 1 resource, but
there are only 16 available resource. Affinity cannot be applied

What to do

Submit your job with environment variable MP_TASK_AFFINITY set to cpu as shown here:

export MP_TASK_AFFINITY=cpu (for bash/sh/ksh users)
setenv MP_TASK_AFFINITY cpu   (for csh/tcsh users)

* * * * * * * * * *

Error message


Queue level per job host limit is exceeded. Job not submitted.

What to do

Your job script is requesting the use of more nodes than are available for the queue, so you need to to select a different queue or revise your batch job script. Review the job size parameters for the queue.


Monitoring jobs

⇒ bjobs

The bjobs command provides information on unfinished jobs. The following examples show some commontly used options and arguments. To learn more, log in to Yellowstone and refer to the man pages.

Run bjobs by itself for the status of your own jobs.

For information about your unfinished jobs in an individual queue, use -q and the queue name.

bjobs -q queue_name

To get information regarding unfinished jobs for a user group, add -u and the group name.

To list all unfinished jobs, use all as shown.

bjobs -u all

Sample output:

JOBID   USER      STAT QUEUE    FROM_HOST    EXEC_HOST     JOB_NAME SUBMIT_TIME
354029  ehaskell  RUN  regular  yslogin1-ib  32*ys0101-ib  job_113  May 30 10:03
354032  ehaskell  RUN  regular  yslogin1-ib  32*ys0104-ib  job_114  May 30 10:22
354038  jmathers  RUN  regular  yslogin5-ib  32*ys0118-ib  *p.cpp.exe  May 30 10:35
354039  jmathers  RUN  regular  yslogin5-ib  32*ys0119-ib  *p.cpp.exe  May 30 10:54

When large jobs are running, the output identifies each individual node that is in use. To suppress those lines, you can pipe the output through grep as follows.

bjobs -u all | grep -v "^    "

The -o option lets you customize your bjobs output by specifying which fields to include, as in these examples.

Your own jobs

bjobs -o "jobid project queue stat submit_time mem delimiter=','" -u $USER

Sample output:

JOBID,USER,PROJ_NAME,QUEUE,STAT,SUBMIT_TIME,MEM
192896,bbaggins,P8675309,geyser,RUN,Nov 13 09:40,79 Mbytes
192899,bbaggins,P8675309,geyser,RUN,Nov 13 09:40,61 Mbytes
192902,bbaggins,P8675309,geyser,RUN,Nov 13 09:41,51 Mbytes

All unfinished jobs

bjobs -o "jobid user project queue stat submit_time mem delimiter=','" -u all

Sample output:

JOBID,USER,PROJ_NAME,QUEUE,STAT,SUBMIT_TIME,MEM
187483,ttritt,UABC0003,regular,RUN,Nov 12 22:56,2.6 Gbytes
187484,ttritt,UABC0003,regular,RUN,Nov 12 22:56,2.7 Gbytes
187486,ttritt,UABC0003,regular,RUN,Nov 12 22:56,2.4 Gbytes
187670,jdenver,P12345678,regular,RUN,Nov 12 23:46,15.8 Gbyte
187900,jccash,P87654321,regular,RUN,Nov 13 00:15,21 Gbytes
187902,jccash,P87654321,regular,RUN,Nov 13 00:15,21 Gbytes
187964,jccash,P87654321,regular,RUN,Nov 13 00:26,21 Gbytes

⇒ bhist

Use bhist with no arguments to get a report on the status of your running, pending, and suspended jobs.

Sample output:

Summary of time in seconds spent in various states:
JOBID   USER      JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
354029  jmathers  job_113   2    0     55   0     0     0     57
354032  jmathers  job_114   2    0     55   0     0     0     57

Use bhist and the Job ID if you want information about an individual job.

bhist 354029

Use the following example to get a detailed report on a specified number of recent event files (in this case, 10) and save the output to a file.

bhist -a -l -n 10 > file.report

⇒ Other frequently used commands

bpeek – Allows you to watch the error and output files of a running batch job. This is particularly useful for monitoring a long-running job; if the job isn't running as you expected, you may want to kill it to preserve computing time and resources.

bpeek jobid

bkill – Removes a queued job from LSF, or stops and removes a running job. Use it with the Job ID, which you can get from the output of bjobs.

bkill jobid

tail – When used with the -f option to monitor a log file, this enables you to view the log as it changes. To use it in the Yellowstone environment, also disable inotify as shown in this example to ensure that your screen output gets updated properly.

tail ---disable-inotify -f /glade/scratch/username/filename.log

Checking backfill windows

⇒ bfill

Run the bfill command before submitting your job to see if backfill windows are available. With that information, you may be able to adjust your job script wall-clock estimate and have your job start more quickly than it might otherwise.

The bfill command parses and reformats output from the native LSF bslots command.

For Yellowstone, as shown in the sample output below, bfill reports the backfill window's duration and how many nodes are available.

For Geyser and Caldera, where jobs most typically run on shared nodes, bfill indicates:

  • the time available (often "Unlimited" – up to the queue's wall-clock limit),
  • the number of entire nodes available,
  • and other slots that a job could potentially use.

When system use is high, few large backfill windows are likely to be available on Yellowstone. Some large windows might become available as the system is drained prior to jobs launching in the capability queue or prior to system maintenance downtimes.

Use the bfill information as general guidance. The nodes and slots are NOT guaranteed to be available at job submission.

bfill output sample

----- Current backfill windows -----
Yellowstone:  00:16:18 -     1 nodes
Yellowstone: Unlimited -    16 nodes
Geyser: Unlimited -  0 entire nodes, plus 1140 slots
Caldera: Unlimited - 12 entire nodes, plus  122 slots