Performance Analysis of MPI over InfiniBand on Yellowstone

08/01/2013 - 3:05am to 3:25am
ML - Main Seminar Room
Zhengyang Liu

Abstract

The Message Passing Interface (MPI) is a message passing system used by parallel applications to communicate. On the Yellowstone supercomputer at NCAR, performance degeneration has been observed for applications utilizing more than 1000 cores. We suspect contentions in the underlying InfiniBand interconnect contributes to the degenerated performance. The ultimate goal of the project is to understand the causes for poor performance of MPI applications and to propose solutions addressing these issues. We have developed micro benchmarks to measure the throughput and latency achievable under ideal settings in a test environment. The same set of benchmarks were run on the production Yellowstone environments. The measurements were compared to identify hotspots in the network. We also utilize the benchmarks to understand effects of various MPI tuning parameters on performance. Network sniffing techniques were used to collect InfiniBand packets over the wire in order to understand interactions between the MPI software stack and the InfiniBand network.

VIDEO PRESENTATION