SIParCS 2018- Rohith Uppala
Using Machine-Learning to Simplify the Identification of Code Optimization
Many of the scientific applications that execute on large scale parallel computing platforms run in a sub-optimal fashion. Frequently, modest changes or optimizations to the internal calculation of the applications can significantly reduce the time-to-solution and improves the code quality. While it is frequently easy to make these code modifications, it is non-trivial to know exactly which modification should be made. This optimization process requires lot of analysis and human efforts. Using simulation or manual techniques to measure the performance and identify the part of code need to be optimized of the whole source code is often too slow. Our idea is to decrease the human efforts and reduce the time for identification of code optimization by utilizing machine learning techniques to identify which code changes should be applied to certain sections of code based on a detailed performance analysis of an application.
In this project, by using machine learning techniques to suggest the line number and file name, we may get performance gain by vectorizing the code. We want to mimic what a human does via a machine through training Random Forest, using hardware counters data. Hardware counters data is generated from building and running the code and using interposition and folding tools. By using Hyperparameter tuning techniques, we are able to tune so that the model can optimally solve the machine learning problem. For handling large datasets efficiently, we used Dimensionality Reduction techniques. Our end machine learning model is able identify the Line number and name of the file where we can get performance gain by vectorizing the code.
Mentors: John Dennis, Youngsung Kim