Casper

Casper hardware | Transition from Geyser and Caldera

Updated 1/15/2019

The Casper cluster is a heterogeneous system of specialized data analysis and visualization resources and large-memory, multi-GPU nodes. Casper is the successor to the Geyser and Caldera clusters, which were decommissioned at the end of 2018.

NCAR's Casper system, procured from PCPC Direct, Ltd., consists of a total of 26 Supermicro nodes featuring Intel Skylake processors.

  • 22 Supermicro SuperWorkstation nodes are used for data analysis and visualization jobs. Each node has 36 cores and up to 384 GB of memory. Nine of the nodes also feature an NVIDIA GPU.
  • 4 additional nodes feature large-memory, dense GPU configurations to support explorations in machine learning (ML) and deep learning (DL) in atmospheric and related sciences.

See the hardware table below for more detailed specifications.

Job scheduler: Users run jobs on the Casper cluster by logging in to Cheyenne and submitting them with the Slurm Workload Manager.

Operating system: CentOS 7


Hardware

Data Analysis & Visualization nodes

22 Supermicro 7049GP-TRT SuperWorkstation nodes
Up to 384 GB DDR4-2666 memory per node
2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node
2 TB local NVMe Solid State Disk
Mellanox VPI EDR InfiniBand dual-port interconnect
(one port configured for FDR and one as 100 GbE)
Intel 10 Gb dual-port Ethernet
NVIDIA QuadroGP100 GPU on each of 9 nodes

Machine Learning/Deep Learning nodes

2 Supermicro 1029GQ-TVRT SuperServer nodes
768 GB DDR4-2666 memory per node
2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node
2 TB local NVMe Solid State Disk
Mellanox VPI EDR InfiniBand dual-port interconnect
(one port configured for FDR and one as 100 GbE)
Intel 10 Gb dual-port Ethernet
NVIDIA Tesla V100 SXM2 GPUs with NVLink

2 Supermicro 4029GP-TVRT SuperServer nodes
1152 GB DDR4-2666 memory per node
2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node
2 TB local NVMe Solid State Disk
Mellanox VPI EDR InfiniBand dual-port interconnect
(one port configured for FDR and one as 100 GbE)
Intel 10 Gb dual-port Ethernet
NVIDIA Tesla V100 SXM2 GPUs with NVLink

Transition from using Geyser and Caldera

Geyser and Caldera users can prepare to run jobs on Casper nodes by taking these steps:

  • Review the documentation:

Starting jobs on Casper nodes

Starting TurboVNC on Casper nodes

Compiling GPU code on Casper

Compiling multi-GPU MPI/CUDA code on Casper

  • Create or revise job scripts for use on the desired nodes. Casper job scripts are similar to those for Geyser and Caldera.
  • Recompile their codes on the new system. See Compiling code.
  • Register for upcoming training events when they are publicized in the CISL Daily Bulletin.

Casper and Cheyenne mount the central GLADE file systems. This means you can analyze your data files in place, without sending large amounts of data across a network or creating copies in multiple locations.