Yellowstone was NCAR’s first petascale computer

By Brian Bevirt
01/23/2018 - 2:15pm

NCAR’s outgoing flagship supercomputer began full-scale production on 20 December 2012 and performed its last calculation on 30 December 2017. CISL Director Anke Kamrath noted that “Yellowstone gave us a big boost in computing capacity over the Bluefire system. And it was the inaugural computer at the NWSC – a historic moment in NCAR’s long history of supercomputing. In addition to the sheer volume of computing it offered, Yellowstone also provided NCAR with the capability to simulate Earth System physics in greater detail for longer time frames than ever before.”

Yellowstone front panels
NCAR built the NCAR-Wyoming Supercomputing Center to handle the power and space needed for modern supercomputers. Yellowstone’s attractive front panels were installed facing the viewing windows from the visitor area. (Photo by Marijke Unger.)

Early history

Placing Yellowstone into production service culminated a decade of CISL efforts to plan, design, and construct the new NCAR-Wyoming Supercomputing Center (NWSC) while selecting the first world-class computer to be housed there. When it was installed at the NWSC, Yellowstone was the 13th-most powerful computer in the world, and it provided the largest-ever increase in NCAR’s computational capacity: 30 times more than Bluefire, its predecessor from 2008, and 15 million times more powerful than NCAR’s first supercomputer, the CRAY 1-A installed at NCAR in 1977. For its entire service life, Yellowstone had a total downtime of 1.96%, and its user communities utilized 92.57% of its capacity throughout those five-plus years.

Availability and usage graph
An effective job management system, strong user demand, and rapid user adaptation to the new system architecture kept Yellowstone usage within 5% of its maximum capacity for five years. The red line shows system availability, and the blue line shows how effectively it was utilized throughout its lifetime. (Chart by Tom Engel.)

While it was still in its acceptance testing period, and during the early months of production, CISL selected a few well-prepared science proposals that tested the limits of Yellowstone’s capabilities. Those Accelerated Scientific Discovery (ASD) projects included diverse, large-scale codes that allowed researchers to answer scientific questions in a very short time by using large, dedicated portions of the new system. Such dedicated use of the system is rarely feasible in a fully allocated production environment, and these ASD projects made global climate predictions in the highest detail that was possible at that time, projected future summertime ozone over the U.S., simulated clouds and atmospheric eddies on a global scale, analyzed earthquake hazards, and investigated magnetism dynamics inside the Sun. Yellowstone supported numerous special computing campaigns throughout its lifetime.

High-resolution climate model output
Snapshot from an ASD simulation showing latent heat flux (gray scale) overlaid on sea surface temperature from year 14 of a high-resolution CESM run (Small, R.J., et al., 2014: A new synoptic-scale resolving global climate simulation using the Community Earth System Model. Journal of Advances in Modeling Earth Systems, 6, 1065-1094, doi:10.1002/2014MS000363). Warmest ocean temperatures are red, followed by yellow, green, and blue, which is coldest. Note the influence of Gulf Stream meanders on a cold-air outbreak in the Northwest Atlantic (red arrow) and a low-temperature wake behind a tropical cyclone in the Indian Ocean (blue arrow). Features like these required world-class supercomputing because neither was well simulated by lower-resolution climate models.

One of the world’s first petascale computers

Yellowstone’s peak performance was rated at 1.54 petaflops. A petascale computer is a system that can perform complex calculations at a rate exceeding 1 petaflops (1 million billion floating-point operations per second). This performance is rated by running standard benchmark tests – such as the LINPACK software library that performs numerical linear algebra – on the entire system. The world’s first petascale computer went into service in 2008, and Yellowstone went into production in 2012. Yellowstone was so large in size and demanded so much power and cooling that it could not be operated at NCAR’s Mesa Lab Computing Facility in Boulder, Colorado. CISL foresaw this need a decade in advance and built the NWSC facility to house NCAR’s first petascale computer.

System interconnect cables
Massively parallel supercomputers require efficient communication between their processors. A network of more than 4,500 orange fiber-optic cables were organized above Yellowstone’s 63 racks to connect its 72,576 parallel processors. The yellow Ethernet cables were part of its administrative network. (Photo by Carlye Calvin.)

CISL’s high-performance computing (HPC) efforts focus on providing robust, reliable, and secure HPC resources in a production research environment, and on supporting that environment for thousands of users with projects spanning NCAR, universities, and the broader research community. Dedicated to Earth System sciences, Yellowstone competently provided both capability and capacity for the computing needs of multiple research communities for five full years. (Computing capability refers to using the maximum computing power to solve a single large problem in the shortest possible time. Computing capacity refers to the total amount of work that can be performed on the entire computing complex, which is typically divided among numerous smaller jobs.)

Petascale computing is important because it allows us to simulate the Earth’s physical processes in ever-greater detail. Simulation is important because it is our only tool for projecting what could happen to our planet’s physical systems in the future. We don’t have identical planets that we can test to find out what happens in various future scenarios, and even if we did, playing out those test scenarios in real time could not give us answers today. So we rely on simulation – some people call it the third pillar of the scientific method – computer models that allow us to estimate global-scale outcomes of changes in the Earth’s systems.

Computer models are built on the laws of physics and data from observations to study nature’s complex, interrelated forces. Climate and weather models are verified and validated by running them with data from past observations to see how accurately they produce current conditions. When the models can achieve that, scientists then run them with current data to estimate future conditions. Short-term weather forecasting, and hurricane prediction in particular, have improved dramatically since weather models came into regular use. Models are not perfect predictors of what will happen, but when one or more models are run with slightly different variables – called ensemble runs – scientists can use statistics to improve their ability to glimpse possible future outcomes. Models allow us to project the consequences of various changes to the interrelated physical systems being simulated, and as supercomputing power increases, so does our ability to project future conditions.

Performance comparisons

During its service life, Yellowstone provided 2.49 billion CPU hours to complete 18.64 million jobs for 2,958 different users. Yellowstone contained 72,576 processor cores for user jobs, and NCAR’s computational science had advanced to the level where the average job was routinely using almost 3,000 cores in parallel. For reference, the previous supercomputer Bluefire offered a total of only 3,744 processor cores for user jobs, and very few of the largest jobs ever used as many as 1,000 processors.

Yellowstone also improved on Bluefire’s power efficiency. Yellowstone required only 4 times the power to do 30 times the work of Bluefire. These are main reasons why supercomputers become obsolete in 4-5 years: new systems can perform significantly more work in less time while consuming less power.

Yellowstone became obsolete for these same reasons. NCAR’s new flagship supercomputer Cheyenne contains more than twice as many processor cores as Yellowstone, has a peak speed about 3.5 times faster, performs about 3 times Yellowstone’s typical user workload, and consumes a total of only 25% more electricity. CISL is already planning a replacement for Cheyenne in about four years. New system hardware and architecture are major factors in advancing supercomputing technology, but other factors also contribute to progress.

Computational science contributions to performance

Computational science advances also contribute to reducing the power consumed by supercomputers. CISL computational scientists continually develop optimizations to speed up the codes for current and future supercomputer architectures to produce “more science per watt of electricity.” By reducing the time to complete a simulation, code optimization also reduces the electrical power required to do that work.

CISL’s computational scientists provided enormous support for users transitioning onto Yellowstone. Efficiently utilizing ever-larger numbers of CPUs in parallel was (and is) a complex challenge that required a great deal of advance planning, ongoing refinement, and user training. Computational scientists, software engineers, and programmers throughout CISL all collaborated to modify user codes to run on Yellowstone, then they trained users to run their codes with increasing efficiency. This effort is always ongoing, and it also helped NCAR’s research communities transition from Yellowstone to NCAR’s new supercomputer, Cheyenne.

Despite Yellowstone’s 30x increase in computing capacity, an even larger increase in scientific demand required CISL to handle a record number of requests for its computing time in 2015. The fall 2015 requests from university researchers were 3.33 times greater than the core-hours available, 25% more than the previous record requests for computing time on Yellowstone. These demands demonstrate the research community’s growing needs for petascale computing resources tailored to the Earth System sciences, and it motivates CISL to continue planning for new supercomputers.

Yellowstone’s value to scientific advancement

Yellowstone’s scientific projections about the future were focused by NCAR’s mission to study the Sun-Earth System. Almost all of its computer time was allocated to the fields of climate dynamics, ocean sciences, weather prediction, meteorology, atmospheric chemistry, other Earth sciences, solar physics, geospace sciences, and fluid dynamics. Researchers performed a broad range of scientific investigations throughout Yellowstone’s lifetime, such as:

  • Understanding and forecasting severe weather such as hurricanes, tornadoes, heat waves, droughts, and floods – including in real time during weather emergencies
  • Resolving the interaction of water droplets and ice crystals within clouds
  • Resolving the three-dimensional behavior of eddies in ocean currents
  • Understanding the formation and evolution of atmospheric rivers
  • Modeling both large-scale and small-scale processes inside the Sun
  • Modeling sunspots
Sunspot simulation
NCAR scientists and colleagues used Yellowstone to model the complex magnetic fields that form a sunspot. Comprehensive 3D computer simulations increased scientific understanding of the forces operating below the visible surface of the Sun. (Visualization courtesy of Matthias Rempel.)

Developing our understanding of these physical processes produces direct benefits for society. From human health and safety to resource management and policy, Yellowstone simulations have helped us adapt and prepare for a changing world. The outcomes of this research enhance human safety, reduce property damage, and increase economic efficiency each year. Weather forecasting improvements provide additional useful and timely information for decision making that help people interpret and use weather forecasts in both everyday and high–impact weather situations. A few types of this research included:

  • Clear-air turbulence that affects aircraft
  • Changes to agriculture and growing seasons
  • Adaptation planning for changes in water availability and other natural resources
  • Simulating, predicting, and preventing wildland fires

Yellowstone ran at near-maximum performance almost every day for five years, providing information that advances our understanding of the Earth’s atmosphere, hydrosphere, biosphere, lithosphere, and human vulnerabilities to the changing balance between these systems. Yellowstone also advanced our understanding of physical processes inside the Sun as well as the dynamical processes in the space between the Sun and the Earth. Most of this work nudges our knowledge forward incrementally, and the special computing campaigns like those of the ASD program move our knowledge forward by large steps.

NCAR assigned a digital object identifier to Yellowstone to help track publications that credit the computer's contribution to scientific research. A study tallied these references and reported that the authors of 357 papers published in 2015 referenced Yellowstone as a factor in their research. This number does not include research that may have used Yellowstone but did not explicitly acknowledge it. This data point indicates that Yellowstone likely supported hundreds of research papers each year, perhaps totaling thousands in its lifetime.

The era of Big Data

The Yellowstone supercomputing environment was organized around a central file system that served data to the supercomputer, data analysis and visualization computers, the data archive, science gateways, data portals, and networks. This structural improvement in data management and data workflows reduced the amount of memory required to process jobs and increased user efficiency by minimizing data transfers and maximizing the speed at which research could be conducted.

A recent research project provides an example of petascale computing in the era of Big Data. A major data reanalysis project used Yellowstone to produce a groundbreaking new data set that gives scientists an unprecedented look at the future of weather. (A “reanalysis” uses a single, consistent analysis scheme to represent the entire state of a natural system with a high level of accuracy and detail.) This 193-terabyte data set was generated by running the NCAR-based Weather Research and Forecasting (WRF) model at an extremely high resolution across the entire contiguous United States for 26 simulated years: half in the current climate and half in the future climate expected if the atmosphere continues to change at its current rate.

Led by Roy Rasmussen and including more than 15 scientists from across NCAR and the broader research community, the research team created the data set using WRF to simulate weather across the U.S. between 2000 and 2013. They initiated the model using a separate reanalysis data set constructed from observations. The simulations showed excellent similarity to radar images of the weather observed during that time period. Named CONUS 1, the project took more than a year to run on Yellowstone, and it produced data that allows scientists to explore in detail how today’s weather would look in a warmer, wetter atmosphere.

With confidence that WRF had accurately simulated today’s weather, the scientists ran the model for a second 13-year period using the same reanalysis data but with an increased temperature and a corresponding increase in atmospheric water vapor (because a warmer atmosphere holds more moisture). The second data set shows how weather events from the recent past – including named hurricanes and other distinctive weather events – would look in our expected future climate.

This data set has already proven to be a rich resource for people interested in how individual types of weather will respond to climate change. It showed that rainfall from intense thunderstorms will get more intense, more frequent, and cover larger areas. It also indicated how snowpack in the West will change in the high-elevation Rockies and in the coastal ranges. Scientists have also used the CONUS 1 data set to examine changes to rainfall-on-snow events, the speed of snowmelt, and more.

More work will be done to refine these projections, and scientists are already preparing a CONUS 2 simulation to be run on the Cheyenne supercomputer. That next data set will cover two 20-year periods and will probably take another year complete. CONUS 2 will help scientists understand how a warming climate will affect future storm tracks and local weather across the country.

Data-centric computing for understanding our changing Earth System

Yellowstone was one of the most important supercomputers in NCAR’s history, and the technical details of its value to NCAR were summarized in “The Yellowstone Workload Study” completed at the midpoint of its service life. At more than five years, Yellowstone provided production computing for longer than any other supercomputer at NCAR except for the first “super”computer, the 1977-vintage CRAY 1-A (serial number 003) which was operated for more than 11 years, and the mid-1980s Cray X-MP/8 named Shavano, at 7.1 years. The Yellowstone user community pressed it to within 5% of its maximum capacity (92.57% of the system was actively running user jobs during the 97.50% of the time it was available throughout its service life). Supercomputers are run hard and retired young because technology advances quickly and each new generation is significantly more powerful, faster, and cheaper to operate than the one before it.