Compiling multi-GPU MPI/CUDA code on Caldera

To build and run a multi-GPU, MPI/CUDA application on the Caldera cluster, follow the example below. It uses the Intel compiler, which is loaded by default when you log in to Yellowstone.

Copy the sample files from here to your own GLADE file space:

/glade/p/CSG/Examples/mpi_cuda_hello

Load the CUDA module.

module load cuda

Cheyenne users

Documentation in development.

Yellowstone users

Start an interactive job (or run a batch job) using Caldera's gpgpu queue as shown.

bsub -Is -q gpgpu -R "span[ptile=2]" -W 1:00 -n 2 -P project_code $SHELL

Using the NVIDIA compiler (nvcc), compile portions of your code that contain CUDA calls. (As an alternative to doing each of the following compiling and linking steps separately, you can run make to automate those steps. The necessary makefile is included with the sample files.)

nvcc -c gpu_driver.cu
nvcc -c hello.cu

Compile any portions of the code containing MPI calls with mpiicpc.

mpiicpc -c main.c

Link the object files with mpiicpc.

mpiicpc -o hello gpu_driver.o hello.o main.o

Launch the executable with mpirun.lsf.

mpirun.lsf ./hello

Sample output:

Execute poe command line: poe  ./hello
Using 2 MPI Tasks
[task 0] Contents of data before kernel call: HdjikhjcZ
[task 1] Contents of data before kernel call: HdjikhjcZ
[task 0] using GPU 0 on host caldera07
[task 1] using GPU 1 on host caldera07
[task 0] Contents of data after kernel call:  Hello World!
[task 1] Contents of data after kernel call:  Hello World!