Leadership Computing Directions at Oak Ridge National Laboratory: Navigating the Transition to Heterogeneous Architectures

 

James J. Hack

James J. Hack
Director, National Center for Computational Sciences
Oak Ridge National Laboratory
Oak Ridge, Tennessee

 
Abstract


The National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) was selected as the Leadership Computing Facility (LCF) by the U.S. Department of Energy in 2004, adopting a high-performance scalable commodity architecture that was positioned to grow with advances in processor and interconnect technologies, along with evolutionary improvements to the programming environment and associated tools.  The final upgrade to an AMD 6-core Istanbul processor placed the Jaguar supercomputer as the number 1 ranked system in the world for the November 2009 release of the TOP500 High-Performance Linpack (HPL) results. More importantly, the 18,688-node Jaguar Cray XT-5 quickly demonstrated sustained computational performance exceeding 1 PF on five large-scale computational applications.

 In 2011 ORNL began a physical upgrade to Jaguar to convert it from a Cray XT5 into a Cray XK6 system to be named Titan.  This upgrade included two distinct phases, the first of which was completed in early 2012 which replaced all of the XT5 node boards with XK6 boards, including the AMD Opteron 6274 16-core processors, 600 terabytes of system memory, Cray’s new Gemini network, and 960 NVIDIA X2090 Tesla processors.  In the final upgrade phase, the 960 Tesla processors were removed to make room for the installation of 18,688 next-generation Kepler processors. The outcome placed Titan in the number one position on the November 2012 TOP500 list demonstrating 17.59 PF on the HPL benchmark, a 10-fold increase over the 2009 Jaguar performance.  

 Although there are real success stories with regard to significantly increased sustained performance on the new Titan architecture, the migration to a heterogeneous architecture has presented new challenges to the application community, most notably the development of a programming strategy that allows applications to run efficiently on the hybrid architecture, while maintaining portability with other more conventional computer architectures. This talk will review the deployment of petascale capabilities at ORNL that has led to the current architectural direction and will discuss the preparations aimed at ensuring a successful transition to heterogeneous architectures for some key simulation problems, including global atmospheric modeling.

Presentation