Documentation

Casper cluster

Created by Unknown User (bjsmith), last modified on 2023-02-27

The Casper cluster is a system of specialized data analysis and visualization resources; large-memory, multi-GPU nodes; and high-throughput computing nodes.

Casper is composed of 100 nodes featuring Intel Skylake or Cascade Lake processors.

22 Supermicro SuperWorkstation nodes are used for data analysis and visualization jobs. Each node has 36 cores and up to 384 GB memory.
- 9 of these nodes also feature an NVIDIA Quadro GP100 GPU.
- 3 nodes feature a single NVIDIA Ampere A100 GPU.
10 nodes feature large-memory, dense GPU configurations to support explorations in machine learning (ML) and deep learning (DL) and general-purpose GPU (GPGPU) computing in atmospheric and related sciences.
- 4 of these nodes feature 4 NVIDIA Tesla V100 GPUs
- 6 of these nodes feature 8 NVIDIA Tesla V100 GPUs
64 high-throughput computing (HTC) nodes for small computing tasks using 1 or 2 CPUs.
- 62 HTC nodes have 384 GB of available memory
- 2 HTC nodes have 1.5 TB of available memory
4 nodes are reserved for Research Data Archive workflows.

See the hardware summary table below for detailed specifications.

Operating system: CentOS 7.8

Logging in on an NCAR system

To log in, start your terminal or Secure Shell client and run an ssh command as shown here:

ssh -X username@system_name.ucar.edu 
OR 
ssh -X username@system_name.hpc.ucar.edu

Some users (particularly on Macs) need to use -Y instead of -X when calling SSH to enable X11 forwarding.

You can use this shorter command if your username for the system is the same as your username on your local computer:

ssh -X system_name.ucar.edu 
OR 
ssh -X system_system_name.hpc.ucar.edu

After running the ssh command, you will be asked to authenticate to finish logging in.

Hardware

Data Analysis & Visualization nodes	22 Supermicro 7049GP-TRT SuperWorkstation nodes Up to 384 GB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR100 InfiniBand link 1 NVIDIA Quadro GP100 GPU 16GB PCIe on each of 9 nodes 1 NVIDIA Ampere A100 GPU 40 GB PCIe on each of 3 nodes
Machine Learning/Deep Learning & General Purpose GPU (GPGPU) nodes	4 Supermicro SuperServer nodes with 4 V100 GPUs 768 GB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node 2 18-core 2.6-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 2 Mellanox ConnectX-6 HDR200 InfiniBand adapters. HDR100 link on each CPU socket 4 NVIDIA Tesla V100 32GB SXM2 GPUs with NVLink 6 Supermicro SuperServer nodes with 8 V100 GPUs 1152 GB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 2 Mellanox ConnectX-6 HDR200 InfiniBand adapters, HDR100 link on each CPU socket 8 NVIDIA Tesla V100 32GB SXM2 GPUs with NVLink
High-Throughput Computing nodes	62 small-memory workstation nodes 384 GB DDR4-2666 memory per node 2 18-core 2.6-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 1.6 TB local NVMe Solid State Disk 1 Mellanox ConnectX-5 100Gb Ethernet VPI adapter (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR200 InfiniBand VPI adapter. HDR100 link on each CPU socket 2 large-memory workstation nodes 1.5 TB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 1.6 TB local NVMe Solid State Disk 1 Mellanox ConnectX-5 100Gb Ethernet VPI adapter (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR200 InfiniBand VPI adapter, HDR100 link on each CPU socket
Research Data Archive nodes (reserved for RDA use)	4 Supermicro Workstation nodes 94 GB DDR4-2666 memory per node 2 16-core 2.3-GHz Intel Xeon Gold 5218 (Cascade Lake) processors per node 1.92 TB local Solid State Disk 1 Mellanox ConnectX-6 VPI 100Gb Ethernet connection (GLADE, Campaign Storage, internal connectivity)