Math Kernel Library (MKL)

Prerequisites | Example 1 | Example 2 | Example 3

The Intel Math Kernel Library (MKL) is a library of optimized, general-purpose math software. The routines are optimized and threaded, and CISL recommends using them on Cheyenne for applications that otherwise spend substantial computational time in non-optimized routines that do the same calculations.

In addition to using the examples below, you may find these links useful.

Check the version numbers to confirm that they are compatible with those of the Intel compiler and MKL you are using.
 


Prerequisites

Before running the examples below, load these environment modules:

  • ncarenv
  • intel
  • ncarcompilers
  • mkl
  • mpt

Loading the MKL module gives you access to many MKL examples. Examples 2 and 3 below show how to copy the files and untar them in a directory of your own.

The examples below assume that you are using SGI's Message Passing Toolkit (MPT) MPI library and that the programs being run were compiled with Intel 16. Contact cislhelp@ucar.edu for assistance with adapting them for other cases.

Example 1: Sequential program calling an MKL routine

This example shows how to use the Intel compiler ifort to compile a program with MKL dependencies. Though this program is sequential, i.e. contains no OpenMP directives, there is great advantage to compiling and linking it with ifort's -qopenmp option, because the MKL library is in fact threaded and will run faster if you use that option.  

The program in this example calls BLAS subroutine DGEMM, which is included in MKL. To begin, create a file named main.f90 with the following source code.

program main
! This program uses subroutine DGEMM
! to perform matrix-matrix operation
! c := alpha*op( a )*op( b ) + beta*c
!
 implicit none
  integer :: i, j
  character*1 ::   transa, transb
  data transa, transb/'T', 'T'/
  double precision :: alpha, beta
  data alpha, beta/0.50d0, -1.20d0/
  integer :: m, n, k
  parameter (m=16000, k=16000, n=16000)
  double precision :: a(m,k), b(k,n), c(m,n)
!
  print *, "This example computes real matrix C=alpha*A*B+beta*C"
  print *, "using Intel® MKL function DGEMM, where a, b, and c"
  print *, "are matrices and alpha and beta are double precision "
  print *, "scalars"
  print *, ""

  print *, "Initializing data for matrix multiplication C=A*B for "
  print 10, " matrix a(",m," x",k, ") and matrix b(", k," x", n, ")"
  print *, ""
  alpha = 1.0 
  beta = 0.0
!
  print *, "Intializing matrix data"
  a = 1.0
  b = 1.1
  c = 0.0
!
  print *, "Computing matrix product using Intel® MKL DGEMM "
  print *, "subroutine"
  call DGEMM('n','n',m,n,k,alpha,a,m,b,k,beta,c,m)
  print *, "Computations completed."
  print *, ""
!
  print *, "Top left corner of matrix A:"
  print 20, ((a(i,j), j = 1,min(k,6)), i = 1,min(m,6))
  print *, ""
  print *, "Top left corner of matrix B:"
  print 20, ((b(i,j),j = 1,min(n,6)), i = 1,min(k,6))
  print *, ""
  print *, "Top left corner of matrix C:"
  print 30, ((c(i,j), j = 1,min(n,6)), i = 1,min(m,6))
  print *, ""
!
 10   format(a,i5,a,i5,a,i5,a,i5,a)
 20   format(6(f12.0,1x))
 30   format(6(es12.4,1x))
!
  print *, "Example completed."
end program main

Be sure you have the necessary environment modules loaded. (See Prerequisites above.) Compile the program as shown here:

ifort -O2 -qopenmp -o tdgemm.exe tdgemm.f90

Next, create a batch script following the example here and submit it with a qsub command It will run on one node using 8 processors:

#!/bin/tcsh
#PBS -N omp
#PBS -A project_code
#PBS -l walltime=00:10:00
#PBS -q regular
#PBS -j oe
#PBS -m abe
#PBS -M email_address
#PBS -l select=1:ncpus=1:ompthreads=8

mkdir -p /glade/scratch/username/temp
setenv TMPDIR /glade/scratch/username/temp

### Run the executable
omplace -nt $OMP_NUM_THREADS ./tdgemm.exe

#endjob

The expected output includes lines like this:

  1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04
  1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04   1.7600E+04

It is a good exercise to rerun the job with this alternative for the select statement, then compare the timings of the two jobs:

#PBS -l select=1:ncpus=1:ompthreads=1

Compare the timings by running qstat -xwu $USER.  In one test, the job with 8 threads ran in 38 seconds and the other took 4 minutes, 20 seconds.


Example 2: Intel sequential program calling an MKL routine

The official MKL example uses a program named dgemmx.f, which takes input data from the file dgemmx.d. After you get the program by following the instructions below, you can study the Fortran code to verify that it contains error-checking and error-reporting code. The shorter, heuristic program in Example 1 does not offer these advantages, even though both examples effect the same matrix-multiply calculation by calling DGEMM.

Here's how to get the example program and dependencies on Cheyenne.

  1. Load the minimum recommended set of environment modules. See Prerequisites above.
  2. Create a clean directory on Cheyenne; /glade/scratch/username/mkl_dgemm, for example.
  3. Change to that directory with the cd command, then run these commands.
cp $MKLROOT/examples/examples_core_f.tgz .
gunzip examples_core_f.tgz;  tar xvf examples_core_f.tar
cd blas
make libintel64 function=dgemm
cd _results/intel_lp64_parallel_intel_iomp5_intel64_lib/
cat dgemmx.res

Expected numerical output (last four lines):

OUTPUT DATA
ARRAY C
     8.950     8.950     7.750     7.750     7.750
    19.110    19.110    17.910    17.910    17.910

Example 3: Using MKL with the MPI parallel programming model

This example uses an MKL Fast Fourier Transform (FFT) and an MKL makefile to build a binary executable file. This will familiarize you with MKL's extensive example files and help you adapt them to the applications that you run on the Cheyenne system. The examples will guide you in calling MKL routines correctly and in using the library in your applications.

Here's how to get the example program and dependencies on Cheyenne.

  1. Load the minimum recommended set of environment modules. See Prerequisites above.
  2. Create a clean directory on Cheyenne; /glade/scratch/username/cluster, for example.
  3. Change to that directory with the cd command, then run these commands.
cp $MKLROOT/examples/examples_cluster_f.tgz .
gunzip examples_cluster_f.tgz;  tar xvf examples_cluster_f.tar
cd cdftf
make libintel64 example=dm_complex_2d_double_ex1 mpi=custom

The make command will fail to run the executable unless you execute it from a PBS interactive job or a pure batch script like the following:

#!/bin/tcsh
#PBS -N mpi_job
#PBS -A project_code
#PBS -l walltime=00:01:00
#PBS -q regular
#PBS -j oe
#PBS -l select=1:ncpus=36:mpiprocs=36
#PBS -m abe
#PBS -M email_address

mkdir -p /glade/scratch/username/temp
setenv TMPDIR /glade/scratch/username/temp

### Run the executable
mpiexec_mpt omplace _results/intel_custom_lp64_intel64_lib_parallel/dm_complex_2d_double_ex1.exe < data/dm_complex_2d_double_ex1.dat

# endjob

Finally, check the results of the run. If your test is successful, your job's output file will include the message "TEST PASSED" towards the end.