WRF scaling and timing

Scaling results | Run time results

The Consulting Services Group (CSG) ran numerous Weather Research and Forecasting (WRF) modeling system jobs on the Cheyenne supercomputer. The jobs included large and small runs, with different domain sizes and time steps.

The scaling analysis provided in the next section may be helpful in answering questions such as:

  • "Is it possible to solve a problem with such-and-such resolution in a timely manner?"
  • "If I use more cores I will have the results more quickly, but with this resolution will my run be in the efficient strong-scaling regime, an intermediate one, or in the very inefficient one dominated by I/O and initialization instead of computing?"

Numbers from the figures below can help you develop some back-of-the-envelope estimates of what will happen if you increase or decrease the core counts of your runs, so that you can find one that is optimal for you both in terms of time-to-solution and in terms of your allocation.

If you're preparing an allocation request, while these plots provide some guidance, you do need to run some tests on Cheyenne or a comparable system to support the validity of your core-hour estimate for your own physics parameterization, and to make sure you account for the overhead from initialization and I/O. (Different I/O settings and frequency will affect your runs differently.)

Moreover, if you are primarily concerned with performance, we recommend using the latest Intel compiler and SGI's MPT library that is available on Cheyenne. This combination gave us the best performance as observed during our experiments when running WRF with various compilers and MPI libraries. The simulations below use WRFV4.0 compiled with Intel 18.0.1 and SGI's MPT 2.18

Also see Optimizing WRF performance for related documentation and Determining computational resource needs for additional information on preparing allocation requests.


Scaling results

Figure 1 shows scaling results from four different simulations: Hurricane Maria at 1km and 3km resolutions and the official CONUS benchmarks from http://www2.mmm.ucar.edu/wrf/WG2/bench/ at 12km and 2.5km resolution. The CONUS benchmark uses the CONUS physics suite introduced from WRFv3.9 instead of the older physics combination used in the official benchmark. Note that both axes are logarithmic, so a small distance between points corresponds to a large difference in values.

Figure 1: scaling results

Figure 1

Notice the range of the x-axis. Total grid points per core varies from ~100 to ~10^6. This means that as grid points per core is reduced, the number of nodes increases, and vice-versa. When expressing the scaling as a function of WRF grid points per core, all the cases scaled similarly on Cheyenne. WRF has linear strong scaling. This means the computational part of WRF is making effective use of the parallel computational resources available to it. Increasing the number of cores a run uses decreases the computation time proportionately – while initialization and I/O time may increase – and about the same number of core-hours are used for computation. On the side of the plot where there are no blue or red points, the limiting factor is available memory. An increasingly large part of the domain must be held in the physical memory of each node, causing a lower core-count limit when the memory needed exceeds available memory.


Run time results

Figure 2 below shows the total run time for WRF jobs with increasing numbers of cores. A core count of 36,864 corresponds to 1,024 nodes on Cheyenne. The results are based on simulations of Hurricane Maria at 1km resolution having a single output file, which used a domain of about 372 million grid points.

The total run time of WRF can be broken down into the initialization time, computation time, and output writing time. As illustrated, initialization and writing output times remain relatively fixed, increasing only slightly as you move to larger core counts. The numbers below may vary depending on the number of restart and output files your simulations have, but we expect the trend to be similar.

Figure 2: WRF runtime results

Figure 2

 

Related training courses