Real Time Power and Performance Monitoring of Supercomputer Applications

07/31/2014 - 10:15am
Mesa Lab Main Seminar Room

Shankar Prajapati portrait

Shankar Prajapati, SIParCS Intern
(Claflin University)

Supercomputing centers strive to provide users with a productive and efficient supercomputing environment, not only through observed performance, but also through quantitative analysis of sampled usage patterns, resource measurements, and theoretical modeling. A particularly important question is how best to optimize applications to fully utilize available system resources efficiently? For this, we need accurate real time measurements as possible of actual supercomputer jobs running. Current approaches involve running jobs in specialized profiling software which places burden on users and cannot be done to all nodes at all times which prevents system wide analysis. We set up and configure the real time data collection at system level with the given hardware monitoring limitations on a test cluster. This could be in theory scaled to a Petaflop class supercomputer. The data collection includes power usage from Intel processors, PDUs and node power supplies. The data is collected while running selected jobs. The analysis of sampled data helps us to start understanding how jobs affects the real-time power usage of supercomputers.

Video Presentation