SIParCS 2017-Zhenzhen Liu

Zhenzhen Liu

Zhenzhen Liu, Stevens Institute of Technology

Supercomputer Infiniband Fabric Analysis

Recorded Talk

Today's highly parallel supercomputers require an efficient internal network of fiber-optic cables and switches to interconnect thousands of nodes that contain hundreds of thousands of processors. Engineers at the National Center for Atmospheric Research (NCAR) not only observe the performance (data throughput and latency) of this network fabric, but they also study ways to increase its efficiency by analyzing its topology, routing, and design. The results of this work improve their collaborations with users to optimize applications running on the system. InfiniBand is a communications standard for supercomputing interconnects that features very high throughput and very low latency. NCAR uses the information visualization framework named Tulip to perform static analysis of the Infiniband interconnect fabrics for NCAR supercomputers, and Tulip is also used for visualizing the results to make them more understandable. Four types of Tulip plugins are used for static analyses of the Infiniband fabrics for NCAR’s flagship supercomputers. The first is shortest path (Dijkstra’s Algorithm) analysis based on the physical topology of the network. Dijkstra’s Algorithm analyses the shortest data flow paths from the initial source node to any other node. The second is shortest path (Dijkstra’s Algorithm) analysis between a selected source node and a selected target node. The third is logical routing analysis based on the information of switches from the initial source node to any other node. The fourth is the logical routing path between a selected source node and a selected target node. The logical routing analyses must be balanced with route contention, so they cause greater latency than the shortest path analyses. But the routing path can be improved based on shortest path of physical topology.

Mentors: Nathan Rini, Tom Kleespies