From June 27th to 30th, NVIDIA Corporation and NCAR partnered to conduct a three and a half day “hackathon” workshop where students and researchers learned the theory and practice of parallelizing of programs for GPU architectures using the OpenACC directive-based paradigm. The Hackathon consisted of 16 attendees - six NCAR researchers, four NCAR student assistants and SIParCS interns, five NVIDIA employees and one visitor from the Korean Institute of Science and Technology Information. The first two days included half-day morning lectures, in which the attendees learned the vocabulary and syntax of OpenACC directives that allows code to be automatically parallelized for GPUs.
The Hackathon attendees brought their own code to refine during the event, which they worked on after learning basic skills during the lectures. CISL’s Operations and Services Division established special queue access during the hackathon to allow the attendees dedicated access to the Caldera system which has two K20x NVIDIA GPUs per node.
"The hackathon answered a lot of questions we had about compilers and code optimization techniques," said SIParCS intern Pranay Reddy Kommera. "The hands-on experience was really important and interesting."
Many attendees worked on portions of weather and climate codes ranging from simple kernels and dynamical cores to complete applications. For some of the students, it was the their first chance to work on GPUs.
For example, Ankita Manjrekar, a student assistant at NCAR, worked on a new way to use GPUs using task parallelism. Normally people think of GPU cards as highly parallel arrays of simple processors called “CUDA cores” that must work in lock step to solve problems with lots of data –using so-called data parallelism. But what do you do if you have lots of little tasks, none of which have all that much data parallelism? How can you map them to a GPU? The key is to recognize that the GPU’s architecture is more complex: inside a GPU there are typically 8-20 streaming multiprocessors (SMs), each of which has 64 to 192 CUDA cores, and it turns out that the SMs can work independently. Ankita exploited this fact and took a set of small test problems and mapped each one to a single SM inside the GPU using OpenACC. This approach reveals the fact that not only can the GPU’s SMs work independently, but also efficiently –if there is enough work in each problem. NVIDIA has asked Ankita to write up her task parallelism results from the hackathon for their magazine, PGInsider, which has a circulation of 50,000 developers worldwide.
"As new student, it was a really fun immersion learning experience for me to get to learn OpenACC and make use of it to solve complex problems," said Ankita. "I am looking forward to writing up my results for the magazine."