HOMME Trace Analysis

07/31/2014 - 9:25am
Mesa Lab Main Seminar Room

Fabrice Mizero portrait

Fabrice Mizero, SIParCS Intern
(University of Virginia)

The High-Order Method Modeling Environment (HOMME) is the default dynamical core within CAM, and consumes a significant fraction of the total cycles on Yellowstone.  While HOMME has an scalable communication library that has been shown to scale to large processor counts, its scalability on Yellowstone’s  Infiniband network has shown unexpected performance variability.  Our work during the  summer has consisted of analyzing several traces from HOMME runs to determine the source of this high variability in network latency.   We have developed a statistical based methodology that consists of monitoring congestion and analyzing message arrival times using Extrae.  We have applied this method  to look at two different suspected causes of performance variability: OS Jitter and network congestion.  OS Jitter  introduces delays in an application’s execution due kernel interrupts.  Network congestion increases communication time due to queuing delays within the Infiniband network. We present the results of our analysis as well as suggest a potential solution which utilizes virtual lanes to prevent congestion due to head of line blocking.

Video Presentation