Evaluating the Impact of Infiniband Routing Algorithms on Network Performance

08/01/2013 - 3:25am to 3:45am
ML - Main Seminar Room
Fabrice Mizero


The Infiniband Architecture provides high bandwidth, low latency communication for parallel message passing applications. However, network congestion, ineffective routing, topological issues, and the way in which the subnet manager deals with dysfunctional links can contribute to unexpected poor network performance. In particular the Infiniband subnet manager constantly scans the entire network to identify dysfunctional links. Dysfunctional links will force the recalculation and distribution of new routing tables. In large-scale networks with a large number of dysfunctional links, the recalculation overhead can be significant. Our research throughout this summer has been focused on identifying the main cause of Yellowstone latency issues, correlating them to the overhead associated with OpenSM tasks, and evaluating how changes in routing algorithms could help lower the aforementioned overheads.