Determining computational resource needs

Documenting your code’s performance and scalability to demonstrate that your stated resource needs are reasonable is an important aspect of preparing an allocation request.

These guidelines are intended to help you gather and present the data you need to support your request. Ultimately, you will complete a table like the one shown here to include in your application. With accompanying narrative that describes each experiment or experimental configuration, it will communicate clearly to the reviewers what computing resources you need and how you will use those resources.

For early guidance on how to estimate core-hours needed on the Cheyenne system, see this Cheyenne documentation.

You can use a chart like this as a starting point. Review the documentation for the relevant allocation opportunity to learn what else is required.

Experiment
(Experimental configuration)

Core-hours per simulation

Number of simulations

Total core-hours

       
       
       
       
       
       
       

Totals

     

 

CESM and WRF

In the case of CESM, see the timing tables for CESM 1.0CESM 1.1, or CESM 1.2 provided by the CESM team at NCAR for information that will help you develop a statement of resource requirements for your project. (See the CESM models page for guidance in selecting the appropriate CESM version.) The computational cost of CESM experiments typically is expressed in core-hours (also known as processor element-hours or "pe-hrs") per simulated year, so the “Cost pe-hrs/yr” column (the cost in core-hours for each simulated year) provides the necessary value needed to calculate the cost of a simulation; you provide the number of years. Your allocation request should justify that the number of years proposed and the model resolution chosen are sufficient and necessary to answer your scientific question(s).

For WRF projects, review the guidance on our Optimizing WRF performance on Yellowstone page. Follow those recommendations as you do some benchmark runs to estimate the number of core-hours you will need for each planned simulation. Cite that page in your core-hour request and describe for the review panel how you estimated your resource requirements.

Other codes and models

Proposed experiments may be different enough from the documented CESM and WRF simulations that you need to run your code several times to determine how many core-hours you’ll need. If you are using other codes or models that do not have well-known or published performance information, you will need to document the performance and scalability of your codes to complete your resource request. (A reference to a web site or paper with performance and scalability details for the code or model is acceptable for the purposes of an allocation request.)

Presumably you’ve run your code on a multi-core system and have at least a general idea of what resources you will need in order to run at the same scale or larger on the Yellowstone, Geyser, or Caldera systems.

To begin fine-tuning your general idea into a specific request for resources, consider these questions:

  • How large is any data set that you need to load?
  • How much memory needs to be available for you to complete a run? (The job_memusage tool on Yellowstone can tell you how much memory your program uses.)

Your answers will help you calculate the minimum number of nodes you can use. (Yellowstone has 16 cores and 25 GB of usable memory per node. See our Resources overview for additional information.)

Memory needed / memory per node = minimum nodes

That can serve as your starting point unless you already know that a larger number will work.

It is unlikely that you will need to use anything approaching Yellowstone’s 4,500+ nodes, but you may need anything from a few dozen to hundreds or several thousand cores.

Document your code’s performance with test runs on the Yellowstone system if possible, or on a reasonably similar platform and software stack. You will want to illustrate in a graph that your code performs well at increasingly higher scale, showing at least four points indicating the results of your runs (as in Figure 1).

scaling graph

If your application’s performance is not already well documented, it may be necessary to establish a baseline by running it on a single node before scaling. On Yellowstone, your parallel code would be executed on a total of 16 cores on a single node, or the equivalent of 32 with hyper-threading support.

From that baseline, you can generate scaling data by making a series of runs on progressively greater numbers of nodes to determine the optimum number to use.

Start by documenting the smallest run that you know will work. For example, say you’ve run your parallel code on a system like Janus using 4,000 cores, or you’ve had an opportunity to do some similar-size test runs on Yellowstone. Do additional larger runs to demonstrate that the code scales as you expect it to scale. To ensure accuracy, it can help to do several runs at each point on different days to detect variations that might result from changes in the machine’s workload.

Based on the hypothetical results shown in Figure 1, a run could be done efficiently using 10,000 cores. The job would complete more quickly, using less wall-clock time, than it would using 4,000 cores.

To convert that into total core-hours needed for one type of simulation, multiply core-hours per simulation (the time needed to complete one simulation times the number of cores used) by the number of times you are proposing to run that type of simulation.

Core-hours per simulation x total simulations = total core-hours

That gives you the information you need to fill out one row of your table. Repeat the process for each type of simulation, then add the figures in the last column to get the overall core-hour total.

Strong vs. weak scaling

As you do your test runs, keep in mind that strong scaling, or the degree to which performance improves as more processors are applied to a fixed problem size, is more useful for these purposes than weak scaling, which reflects the ability to solve larger problem sizes with more processors. Strong scaling will reflect the improvement in speed (resulting in reduced overall run time) as more processors are used.

At some point, using more processors is likely to have little additional benefit, because the potential for increasing speed is limited by the ratio of serial to parallel code in the application. Figure 2 reflects such a scenario, which shows little or no performance benefit from using more than 5,000 cores. If you identify such a point in your test runs, it would make little sense to base your allocation request on higher numbers of cores.

scaling graph

Reporting on performance

When submitting your allocation request, provide documentation on how you generated your scaling data. Include a graph similar to those shown here to illustrate the results.

Describe how flexible your code is regarding the number of processors it can use and why you chose a certain number on which to base your request. It also is helpful to include information on the portability of the code to other platforms and on your team’s knowledge and experience with systems similar to Yellowstone.

Two sample proposals are available at these links:

They are specific to large university allocations but are good examples of documenting performance that you can follow for other types of requests.