Using computing resources

Running programs | Fair share | Allocation use and thresholds

Running programs

Most production computing jobs run in batch queues on the 1.5-petaflops Yellowstone high-performance computing (HPC) system. Shared-node batch jobs and some exclusive-use batch jobs also may run on the Geyser and Caldera clusters. Interactive queues are available on both Geyser and Caldera.

See Queues and charges and Running jobs.

Permitted use of login nodes

Users may run short, non-memory-intensive processes interactively on the Yellowstone system's login nodes. These include tasks such as text editing or running small serial scripts or programs.

To ensure an appropriate balance between user convenience and login node performance, however, the login nodes may not be used to run processes that consume excessive resources.

This applies to individual processes that consume excessive amounts of CPU time, more than a few GB of memory, or excessive I/O resources. It also applies collectively to multiple concurrent tasks that are run by an individual user.

Processes that use excessive resources on the login nodes are terminated automatically. Affected users are informed by email that their sessions were terminated due to "CPU/memory oversubscription." They are also advised to run such processes on batch nodes or interactively on the Geyser or Caldera clusters.


Fair share

HPC "facility" fair share

CISL manages scheduling priorities to ensure fair access to the system by all of these stakeholder groups: the university community, the NCAR community, the Climate Simulation Laboratory (CSL), and the Wyoming community.

The fair-share policy takes the community-wide usage balance into account along with several additional factors. These include the submitting user's currently running jobs and recently completed jobs. LSF uses a dynamic-priority formula to weigh these factors, calculate each job's priority, and make scheduling decisions.


Allocation use and thresholds

When a project's usage exceeds its allocation limit, subsequent jobs are redirected automatically to the low-priority "standby" queue rather than rejected. No charge is applied should the job run successfully.

This same courtesy is applied when projects that are subject to 30-day and 90-day usage thresholds exceed those thresholds. These usually are large-scale, non-university projects and NCAR divisional allocations for which usage is constrained to remain below a percentage of the monthly or quarterly portion of the allocation—120% and 105%, respectively, by default—over moving 30-day or 90-day windows.

Once a job is in the standby queue, as with any other job submission, it remains there until it runs or is killed.