SIParCS 2020 - Steven Rivera

Steven RiveraUniversity of Puerto Rico

Cross Reference Monitoring of Supercomputers and Support Infrastructure

(Slides)

Supercomputers provide the necessary robust functions to many jobs, from businesses to science research. Inside the system, there's a variety of frameworks to monitor, analyze single equipment and many components of Cheyenne. The data that comes from cybernetics systems can be implemented into building better supercomputer cooling and electric infrastructure. With the inclusion of batch system software in a long term that will provide better management and sustainability of robust operations. To successfully operate large scale systems and correlate the information gathered, it is use software programs like Grafana and Metasys to find and adjust severe impacts that can lead to work lost.

Issues were found in specific parts of the system like Racks that presented behavioral anomalies in their process. The research was done by the insertion of data queries in Grafana software tool, searching for sensors and temperature units and differentiating their process around distinct time frames. It is of importance the upgrade and maintenance of the Cheyenne system that permits us to run works in several kinds of environments and by supporting its performance we can maintain future jobs and lower its expenses.

 

Mentors: Michael Kercher, Jonathan Roberts