Cheyenne Acceptance Testing Complete

By Eliott Foust
01/12/2017 - 2:00pm

The National Center for Atmospheric Research’s (NCAR) new supercomputer, Cheyenne, is set to begin production operations in January 2017. Cheyenne makes its debut as the 20th fastest supercomputer in the world, and it will allow researchers to tackle challenging problems by running sophisticated models at higher resolutions.

With Cheyenne being three times more powerful than its predecessor, Yellowstone, the team overseeing the installation and testing process engaged in extensive planning and remained diligent towards their work to ensure the successful installation of Cheyenne. The physical installation was completed in only two weeks in September, significantly faster than Yellowstone’s installation four years before. The NWSC’s design, which facilitates receiving and accommodating new systems with convenient entry points and raised floors for wiring, once again demonstrated its value. Cheyenne was pre-assembled by Silicon Graphics International (SGI) in Chippewa Falls, Wisconsin, which also hastened the installation. While SGI was assembling and delivering the computing hardware, Data Direct Networks (DDN) delivered the 200 petabyte high-performance disk storage system that was procured along with Cheyenne.

After the quick installation, CISL, SGI, and DDN staff members focused on integration and  acceptance testing. During this phase, Cheyenne was subjected to resilience tests that confirmed the preservation of all system configuration files and defaults, and comprehensively verified the interoperation of all components. Further, CISL performed benchmark tests on the full system to verify that it will operate as expected. Dave Hart, User Services Section Manager, said, “All the testing and benching we’ve done until now suggests that SGI’s benchmark projections are on the mark. So users should see substantial core-for-core performance improvements on their codes.” The system was accepted on January 2, 2017.

During acceptance testing, Cheyenne underwent real-world stress tests in the form of two unplanned power outages that forced shutdowns of the entire system. Irfan Elahi, High-End Services Section Manager and Project Manager of Cheyenne, says two power outages in ten days is unexpected, especially when considering Yellowstone experienced only ten unplanned outages in a four year period. However, these events were deemed a fortunate occurrence because CISL staff members gained new insight into the system’s resilience while learning more about Cheyenne's architecture.

The first group of Cheyenne users will conduct large-scale computational experiments as a part of the Accelerated Scientific Discovery (ASD) phase. These computationally demanding projects require a significantly larger allocation of resources than the average research project and must be conducted in a two-month period. These users have already prepared their code to run on Cheyenne, so this phase will double as a final test that allows CISL staff to evaluate how Cheyenne performs with real users and real applications. Aaron Andersen, Deputy Director of the Operations and Services Division, says ASD users “are ready to hit the  ground running with a big chunk of the system.”

Despite the rapid pace of the installation, CISL staff has kept realistic expectations. The system has undergone some stress tests and the staff has been challenged during the installation and testing process. With a fully installed system that has been proven to meet, if not exceed, performance expectations, Hart expects that “a lot of science will get done on Cheyenne.”