Compiling multi-GPU MPI/CUDA code on Caldera

To build and run a multi-GPU, MPI/CUDA application on the Caldera cluster, follow the example below. It uses the Intel compiler, which is loaded by default.

Log in to Cheyenne, then copy the sample files from here to your own GLADE file space:

/glade/p/CSG/Examples/mpi_cuda_hello

Run execgpu to start an interactive job on a Caldera node. Request 4 cores for this example.

execgpu -n 4

Load the CUDA module when your job starts.

module load cuda

Use the NVIDIA compiler (nvcc) to compile portions of your code that contain CUDA calls. (As an alternative to doing each of the following compiling and linking steps separately, you can run make to automate those steps. The necessary makefile is included with the sample files.)

nvcc -c gpu_driver.cu
nvcc -c hello.cu

Compile any portions of the code containing MPI calls with mpiicpc.

mpicc -c main.c

Link the object files with mpiicpc.

mpicc -o hello gpu_driver.o hello.o main.o

Launch the executable.

mpiexec -np 4 ./hello

Sample output:

[task 1] Contents of data before kernel call: HdjikhjcZ
Using 4 MPI Tasks
[task 0] Contents of data before kernel call: HdjikhjcZ
[task 2] Contents of data before kernel call: HdjikhjcZ
[task 3] Contents of data before kernel call: HdjikhjcZ
[task 3] using GPU 1 on host caldera16
[task 1] using GPU 1 on host caldera16
[task 0] using GPU 0 on host caldera16
[task 2] using GPU 0 on host caldera16
[task 1] Contents of data after kernel call:  Hello World!
[task 3] Contents of data after kernel call:  Hello World!
[task 2] Contents of data after kernel call:  Hello World!
[task 0] Contents of data after kernel call:  Hello World!