IBM iDataPlex/FDR-IB - Yellowstone

IBM IDATAPLEX/FDR-IB Supercomputer — IBM

In use: June 4, 2012 - December 30, 2017

Production use

Peak teraflops: 1,509.58

Linpack teraflops: 1,257.62

Processors: 72,576.00

Processors per node: 16

Nodes per frame: 72

Frames: 63

Clock speed: 2.60GHz

Memory (terabytes): 145.15TB

Storage: 16,000.00TB

Electrical power consumption: 1,165.00 kW

Predecessor: IBM p6-p575

The Yellowstone supercomputer was installed at the NCAR-Wyoming Supercomputing Center (NWSC) during the summer of 2012, run through a rigorous acceptance testing period, and began full-scale production on December 20, 2012. CISL Director Anke Kamrath noted that Yellowstone “gave us a big boost in computing capacity over the Bluefire system. And it was the inaugural computer at the NWSC – a historic moment in NCAR’s long history of supercomputing. In addition to the sheer volume of computing it offered, Yellowstone provided NCAR with the capability to simulate Earth system physics in greater detail for longer time frames than ever before.”

Placing Yellowstone into production service culminated a decade of CISL efforts to plan, design, and construct the NWSC while selecting the first world-class supercomputer to be housed there. When it was installed, Yellowstone was the 13th-most powerful computer in the world, and it provided the largest-ever increase in NCAR’s computational capacity: 30 times more than Bluefire, its predecessor, which had served NCAR since the spring of 2008, and 15 million times more powerful than NCAR’s first supercomputer, the Cray 1-A, installed at NCAR in 1977. Over its entire service life, Yellowstone was heavily utilized and reliable: it had a total downtime of 1.96%, and its user communities utilized 92.57% of its capacity throughout those five-plus years.

While Yellowstone was still in its acceptance testing period and during its early months of user production, CISL selected a few well-prepared science proposals that tested the limits of the supercomputer’s capabilities. Those Accelerated Scientific Discovery (ASD) projects included diverse, large-scale codes that allowed researchers to answer scientific questions in a short time by using large, dedicated portions of the new system. Such dedicated use of the system is rarely feasible in a fully allocated production environment. The ASD projects made global climate predictions in the highest detail that was possible at that time; projected future summertime ozone over the United States; simulated clouds and atmospheric eddies on a global scale; analyzed earthquake hazards; and investigated magnetism dynamics inside the sun. Yellowstone also supported numerous special and real-time computing campaigns throughout its lifetime.

One of the world’s first petascale computers

Yellowstone’s peak performance was rated at 1.51 petaflops and the system was NCAR's first petascale computer. A "petascale computer" is a system that can perform complex calculations at a rate exceeding 1 petaflops (1 million billion floating-point operations per second), and this is typically measured by running the LINPACK benchmark across the entire system. This benchmark is used by the Top 500 list of the world's largest supercomputers. The world’s first petascale computer, the IBM "Roadrunner" system at Los Alamos National Laboratory, went into service in the summer of 2008, and Yellowstone went into production just over four years later.

Yellowstone was so large and required so much power and cooling that it the NCAR Mesa Lab Computing Facility in Boulder, Colorado, could not accommodate it. CISL had foreseen this trend in supercomputing technology and the limitations of the Mesa Lab's facility a decade in advance and built the NWSC to house NCAR’s first petascale computer.

Yellowstone was an IBM iDataPlex supercomputer comprising 4,536 IBM dx360 M4 nodes, each containing dual Intel Xeon E5-2650 (Sandy Bridge EP) processors and 32 gigabytes of DDR3-1600 memory. It had a Mellanox FDR InfiniBand interconnect, configured in a single-plane full fat-tree topology. Physically, Yellowstone consisted of 63 "double-wide" interconnected cabinets that occupied nearly 2,000 square feet of the NWSC computer room floor. The cabinets were arranged in four 64-foot rows and a partial fifth row of cabinets.

Yellowstone was augmented by a trio of data-analysis and visualization systems (Caldera, Geyser and Pronghorn) and NCAR's Globally Accessible Data Environment (GLADE), which was built upon an IBM GPFS-based high-performance data storage system.

During its service life, Yellowstone provided 2.49 billion CPU hours to complete 18.64 million jobs for 2,958 different users. Yellowstone contained 72,576 processor cores for user jobs. While NCAR’s computational science had advanced to the level where many applications, such as those used for the ASD, could use the entire system, the system's typical production workload hosted jobs which, on average, used nearly 3,000 cores in parallel. For reference, the Bluefire supercomputer offered a total of only 4,096 processor cores, and with the exception of dedicated time (where the whole system was given to a single job), only few of the largest production jobs ever used as many as 1,000 cores.

Yellowstone also dramatically improved upon Bluefire’s energy efficiency. Yellowstone required approximately twice the electrical power, 1.2 megawatts, to do 30 times the work of Bluefire. This is one of the key reasons that supercomputers become obsolete in 4-5 years: new systems can perform significantly more work in less time while consuming less power. Yellowstone became obsolete for the same reasons.

CISL’s computational scientists and consultants provided enormous support for users transitioning onto Yellowstone. Efficiently utilizing ever-larger numbers of CPUs in parallel was (and is) a complex challenge that required a great deal of advance planning, ongoing refinement, and user training. Computational scientists, software engineers, and programmers throughout CISL collaborated to modify user codes to run on Yellowstone, then they trained users to run their codes with increasing efficiency. This ongoing effort also helped NCAR’s research communities transition from Yellowstone to its successor, Cheyenne.

Yellowstone provided production computing for longer than all but two other supercomputers at NCAR; those exceptions are the first supercomputer, the 1977-vintage CRAY 1-A (serial number 3), which was operated for more than 11 years, and the 1990s Cray Y-MP/8 named Shavano, at 7.1 years. It can be said that supercomputers live fast and retire young because technology advances quickly and each new generation of supercomputer is significantly more powerful, faster, and energy-efficient – thus cheaper to operate – than the one before it. Yellowstone was one of the most important and scientifically-productive supercomputers in NCAR’s history and performed its last calculation on December 30, 2017.