Running jobs

Using login nodes | Allocation use and thresholds | Fair share

Cheyenne HPC production jobs run in batch queues on the system's exclusive-use compute nodes. Login nodes can be used as described below.

For details regarding how to run production jobs, see:

Using login nodes

Users may run short, non-memory-intensive processes interactively on the Cheyenne system's login nodes. These include tasks such as text editing or running small serial scripts or programs.

However, the login nodes may not be used to run processes that consume excessive resources. This is to ensure an appropriate balance between user convenience and login node performance.

This applies to individual processes that consume excessive amounts of CPU time, more than a few GB of memory, or excessive I/O resources. It also applies collectively to multiple concurrent tasks that are run by an individual user.

Processes that use excessive resources on the login nodes are terminated automatically. Affected users are informed by email that their sessions were terminated due to "CPU/memory oversubscription." They are also advised to run such processes on batch nodes or interactively on the Geyser or Caldera clusters.

See Checking memory use for how to use the peak_memusage utility.

Allocation use and thresholds

When a project's usage exceeds its allocation limit, subsequent jobs are redirected automatically to the low-priority "standby" queue rather than rejected. No charge is applied should the job run successfully.

This same courtesy is applied when projects that are subject to 30-day and 90-day usage thresholds exceed those thresholds. These usually are large-scale, non-university projects and NCAR divisional allocations for which usage is constrained to remain below a percentage of the monthly or quarterly portion of the allocation—120% and 105%, respectively, by default—over moving 30-day or 90-day windows.

Once a job is in the standby queue, as with any other job submission, it remains there until it runs or is killed.

Fair share

HPC "facility" fair share

CISL manages scheduling priorities to ensure fair access to the system by all of these stakeholder groups: the university community, the NCAR community, the Climate Simulation Laboratory (CSL), and the Wyoming community.

The fair-share policy takes the community-wide usage balance into account along with several additional factors. These include the submitting user's currently running jobs and recently completed jobs. The scheduling system uses a dynamic-priority formula to weigh these factors, calculate each job's priority, and make scheduling decisions.


Related training courses