Bluefire’s contribution to supercomputing at NCAR

By Brian Bevirt
04/08/2015 - 12:00am
Bluefire node opened to show coolant plumbing
Photo by Brian Bevirt (NCAR CISL), © 2008 UCAR
A closed loop of copper tubing carried liquid coolant from a heat exchanger in the bottom of each cabinet to copper heat sinks attached to the processors. The heat exchanger regulated the coolant temperature, keeping it low enough to cool the chips yet warm enough to avoid condensation inside the system. A separate chilled-water loop connected the heat exchangers in each cabinet with a large reservoir that was cooled by the computer facility’s chilled-water system. Floor vents directly in front of the system supplied chilled air to the intake fans in the front doors of each processor cabinet. (The doors are not yet installed in this photo.)

When Bluefire arrived at NCAR on 24 April 2008, it replaced three smaller supercomputers and almost quadrupled their combined capacity while being three times more energy efficient. Now that Bluefire has been removed from the Mesa Lab Computing Facility, it’s time to review its place in the history of supercomputing at NCAR.

Bluefire, an IBM Power 575 Hydro-Cluster system, had the distinction of being NCAR’s first supercomputer that was more than one million times more powerful than NCAR’s first supercomputer, the CRAY-1A installed at NCAR in 1977. When it began operations on 30 May 2008, Bluefire’s peak speed of approximately 77 teraflops (77 trillion floating-point operations per second) made it the 25th-fastest supercomputer in the world at that time. Originally expected to provide high performance computing for NCAR scientists and other NSF-funded researchers until 2011, Bluefire was so useful that NCAR kept operating it until 31 January 2013.

Bluefire coolant reservoir
Photo courtesy of Carlye Calvin (UCAR), © 2008 UCAR
In this installation photo, two 1,500-gallon chilled-water tanks (background) are located near Bluefire (standing out of frame to the left of the open floor panels). To mitigate possible condensation, a worker insulates the incoming chilled-water pipes from the reservoir and the outgoing warm-water pipes from Bluefire. This chilled-water system terminated at heat exchangers in the bottom of each cabinet. A separate liquid cooling loop regulated the internal temperature inside each of Bluefire’s 11 cabinets.

NCAR’s CRAY-1A had one very powerful processor (for the 1970s) that ran so hot it required liquid coolant to control its internal temperature. As parallel-processing HPC designs evolved, air-cooled supercomputers became common at NCAR in the 1990s and 2000s. By distributing the computational workload among hundreds-to-thousands of processors, parallel supercomputer processors could be kept within their operating temperature range using forced air only. When Bluefire arrived, CISL had to re-embrace the technology of liquid cooling for processors because the density and speed of its thousands of processors produced too much heat for air cooling alone to remove from its cabinets. In addition to forced-air cooling, Bluefire used water cooling both at the surface of the processors (see photo above showing the inside of one node) and in the rear door of each cabinet to prevent the exhaust air stream from overheating the computer room.

Bluefire’s primary water-cooling system used a liquid-to-liquid heat exchanger in the bottom of each cabinet to regulate the temperature at the surface of each of its 4,064 processors using a 3,000-gallon chilled-water reservoir nearby (see photo at right). One closed loop carried chilled water between the reservoir and the heat exchangers, and another closed loop distributed coolant from the heat exchangers to the processors. The second water-cooling system passed the hot air exhausted out the back of the computer across a large coolant coil built into the rear door of each cabinet to cool nearly 60 percent of the heated air before it entered the computer room. Bluefire’s unique, water-based cooling system was 33 percent more energy efficient than air-cooled systems, helping it to be three times more energy efficient per rack than its air-cooled predecessor.

Bluefire’s purpose was to advance research into severe weather and the future of Earth’s climate. Scientists at NCAR and universities across the country used it to accelerate research into climate change, including future patterns of precipitation and drought around the world, changes to agriculture and growing seasons, and the complex influence of global warming on hurricanes, among many other important scientific questions. Researchers also used it to improve weather forecasting models so society can better anticipate where and when dangerous storms may strike.

As soon as Bluefire was operational but before it went into full production, NCAR and NSF provided very large allocations of its computing resources to a small number of geoscience researchers who were prepared to use Bluefire’s full capabilities. These Accelerated Scientific Discovery projects allowed scientists to perform breakthrough simulations that each required hundreds of Bluefire’s processors for dedicated runs lasting up to four months. This dedicated use of the system is not possible in a full production environment, and these projects served to test Bluefire’s capability limits: its ability to solve the most complex problems in the shortest possible time. Nine of these projects used more than 3.7 million processor hours on Bluefire over four months. Every year thereafter, Bluefire supported many investigations into challenging questions, including projects related to snowfall and snowpack predictions, water resource adaptation planning, the formation and evolution of atmospheric rivers, the Earth’s water cycle, real-time convective storm forecasting, real-time hurricane forecasting, ocean-atmosphere interactions, mesoscale eddies in the global ocean, clear-air turbulence affecting aircraft, and small-scale processes in the Sun’s corona.

Bluefire simulation of structure inside sunspot
Image courtesy of Matthias Rempel (NCAR HAO), © 2009 UCAR
This visualization from a highly detailed simulation shows bipolar magnetic structure diverging from two sunspots. The upper panel shows the vertical magnetic field in the solar photosphere while the lower panel shows the corresponding subsurface magnetic field strength on a vertical plane through the center of the sunspot pair. Enabled by NCAR’s Advanced Scientific Discovery program on Bluefire, this computation used 1.8 billion grid points and required about 250,000 CPU hours. The simulation ran for about two hours, which is sufficient to understand the magneto-convective origin of sunspot fine structure. Simulations of this size only became feasible with supercomputers as powerful as Bluefire. Detailed modeling of the origin, dynamical evolution, and decay of sunspots is essential for understanding the solar magnetic cycle and its impact on the Earth via solar flares and coronal mass ejections. Radiative energy losses associated with sunspots give rise to variations in the solar heating of the Earth and lead to variations of the Earth’s climate that are yet to be understood.

Bluefire’s computing power helped scientists create the first-ever comprehensive computer model of sunspots (see visualization at right). This computer simulation helped scientists better understand the detailed structure of sunspots and the physical processes underlying their formation, evolution, and decay. That simulation also laid the foundation for a deeper understanding of the physical processes involved in the connections between the Sun’s varying activity and Earth’s atmosphere. Perhaps Bluefire’s most noteworthy achievement was supporting research into the Earth’s future climate when it ran an extensive series of highly detailed simulations for the Fifth Assessment Report of the United Nations’ Nobel-prize-winning International Panel on Climate Change (IPCC-AR5).

Through its 1,677 days of production service from October 2008 through January 2013, Bluefire delivered more than 162 million core-hours and ran more than 7.3 million individual jobs. Its overall availability to users was above 97 percent, a remarkably high level of average uptime through more than four years in a computer room built half a century ago.

Bluefire cabinet being moved into storage
Photo by Brian Bevirt (NCAR CISL), © 2015 UCAR
One of the cabinets for Bluefire’s fiber-optic internal network is being moved into a storage room at NCAR’s Mesa Lab in March 2015.

Bluefire was replaced by Yellowstone, which is so large that it had to be housed in a new facility built in Wyoming by NCAR, the State and University of Wyoming, and the National Science Foundation. Yellowstone, an IBM iDataPlex cluster, represents the largest increase in computational capacity and capability in NCAR history: it is 30 times more powerful than Bluefire. Yellowstone has 72,288 processors and a peak speed of 1,500 teraflops, while Bluefire had 4,064 processors and a peak speed of 77 teraflops. In June 2014,Yellowstone was ranked as the 29th most powerful computer in the world, and it now provides about 500 million core-hours of computing per year to support approximately 5 million production jobs. Yellowstone also requires water cooling – far more than Bluefire – and so will all its successor systems for the foreseeable future. The lessons learned from operating Bluefire have paid valuable dividends in the way the new computing center in Wyoming was designed and is operated.

Bluefire was recently disassembled and removed from the Mesa Lab Computing Facility (see photo at right), but it has provided many significant contributions to the history of computing at NCAR. However, it may be reassembled and pressed into a new term of service by another laboratory that can benefit from its still-impressive capabilities.