SIParCS 2015- Haider

Adnan Haider, Illinois Institute of Technology

Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems

(Slides) (Recorded Talk)

Post-processing climate data applications allow scientists to discover trends in data. However, these applications spend on average 90% of their time waiting for I/O to complete, thus slowing the rate of scientific discovery. It is believed that post-processing performance can be improved by using flash-based high performance computing systems. To quantify the performance provided by flash devices, we tested two systems, Gordon and Wrangler, which deploy different flash devices and storage architectures. After comparing the performance on these systems, we learned three main concepts. First, we found that an incorrect matching between storage architecture and I/O workload can hide the benefits of flash by increasing runtime by 4x. Second, hybrid I/O decreases flash storage consumption by half while decreasing runtime by 6x. Third, after tuning Gordon's architecture, we found that a local flash architecture could be a feasible alternative to a pooled architecture if scalability and interconnect bottlenecks are alleviated.

Mentors: John Dennis and Sheri Mickelson, CISL TDD