SIParCS 2015- Ta

Tuan Ta, University of Mississippi

Optimizations of Scientific Computation across Languages and Platforms

(Slides) (Recorded Talk)

Performance, scalability and portability are three important factors in any parallel algorithm.  Parallelizing an algorithm often requires programmers to make a trade-off among these factors.  While OpenMP and CUDA are favorable languages that target only CPUs and NVIDIA GPUs respectively, OpenCL gives us a platform-independent programming model that works on both CPUs and GPUs.  In this project, we investigate how these three programming languages perform in scientific computation across different platforms. We choose “Shallow Water Equations (SWE) using RBF-FD methods” algorithm as our benchmark program. We optimize the algorithm using OpenMP, CUDA and OpenCL on a high-end CPU and GPU. We then compare the performance of our best-effort optimizations across hardware platforms. Experimental results show that a sequential implementation of SWE using RBF-FD algorithm can be accelerated up to 15x on a 2-socket computing node and 32x on a single GPU. In addition, the performance of our OpenCL implementation that can work on both CPU and GPU is comparable to other platform-dependent languages (i.e., OpenMP and CUDA). We conclude that we do not necessarily make a performance vs. portability trade-off in OpenCL, and it is a good alternative to either OpenMP or CUDA.

Mentors: Raghu Kumar and Rich Loft, CISL TDD