Accelerator Technology


The DG-Kernel performs a gradient operation form the Discontinous Galerkin version of the HOMME atmospheric dynamic core.  This kernel is very similar to the expensive vector calculus operations currently in CAM-SE. Because the benchmark is compact, it was possible to write multiple versions of the DG-kernel to explore the impact of various programming models on performance.  Versions of the DG-kernel were created based on Fortan, CUDA C, CUDA Fortran, OpenACC, and F2C-ACC.  The benchmark was run on several nVidia GPU’s (M2070Q, K20, K20X), the Intel Xeon Phi, and Intel Sandy Bridge (the microprocessor in NCAR’s Yellowstone).  A plot of the execution rate of the DG-kernel for the best programming method for each microprocessor is provided in Figure 1. The Intel Xeon Phi achieves over 6.7x  performance compared to the single socket of the Intel Sandy Bridge, while the CUDA C on the Nvidia K20X achieves a slightly lower rate.  This illustrates the potential of these accelerator micro-architectures.


 DG Kernel

Figure 1: Performance of DG-kernel on Nvidia GPU (M2070Q, K20, K20X), Intel Sandy Bridge (SNB), and Intel Xeon Phi (SE10x).


HOMME and CESM on Accelerators

The performance of HOMME and CESM has been examined on both Nvidia GPU and Intel Xeon Phi architectures .  This work has concentrated on understanding how to most efficiently utilize each architecture.  The ASAP group has successfully executed HOMME on both Nvidia GPU and Intel Xeon Phi architectures, and CESM on Intel Xeon Phi.  The ASAP group has ongoing collaborations with Intel. 

Performance Identification in large Applications

In collaboration with researchers at the Barcelona Supercomputing Center (BSC), members of ASAP has applied Extrae, Paraver, and clustering tools to automatic identify performance problems within HOMME, CESM, and DART.