Compiling multi-GPU MPI-CUDA code on Casper

Follow the example below to build and run a multi-GPU, MPI/CUDA application on the Casper cluster. The example uses the Intel compiler, which is loaded by default.


Log in to Cheyenne, then copy the sample files from here to your own GLADE file space:

/glade/u/home/csgteam/Examples/mpi_cuda_hello

Run execdav to start an interactive job on a GPU-accelerated Casper node. Request 5 cores for this example. (When you launch the interactive job, your login shell uses 1 core or "slot" from this original request. As you will see below, you will use the 4 that remain when you launch the executable.)

execdav -C gp100 -n 5

Load the CUDA module when your job starts.

module load cuda

Use the NVIDIA compiler (nvcc) to compile portions of your code that contain CUDA calls. (As an alternative to doing each of the following compiling and linking steps separately, you can run make to automate those steps. The necessary makefile is included with the sample files.)

nvcc -c gpu_driver.cu
nvcc -c hello.cu

Compile any portions of the code containing MPI calls.

mpicc -c main.c

Link the object files.

mpicxx -o hello gpu_driver.o hello.o main.o

OR

mpicc -o hello gpu_driver.o hello.o main.o -lstdc++

Launch the executable with srun.

srun -n 4 ./hello

Sample output:

[task 2] Contents of data before kernel call: HdjikhjcZ
there are 1 gpus on host casper26
[task 2] is using gpu 0 on host casper26
[task 2] Contents of data after kernel call: Hello World!
Using 4 MPI Tasks
[task 0] Contents of data before kernel call: HdjikhjcZ
there are 1 gpus on host casper26
[task 0] is using gpu 0 on host casper26
[task 0] Contents of data after kernel call: Hello World!
[task 3] Contents of data before kernel call: HdjikhjcZ
there are 1 gpus on host casper26
[task 3] is using gpu 0 on host casper26
[task 3] Contents of data after kernel call: Hello World!
[task 1] Contents of data before kernel call: HdjikhjcZ
there are 1 gpus on host casper26
[task 1] is using gpu 0 on host casper26
[task 1] Contents of data after kernel call: Hello World!