Table of Contents

Parallel analysis using Rmpi

HowTo by

RubenArslan

This is a documentation of the steps that were necessary to execute a parallel MCMCglmm analysis using MPI (message passing interface) on the GWDG intel cluster.

1. Preparation

1a. Get an account ( support@gwdg.de )

1b. Login with standard user name and password (e.g. in the Mac OS X Terminal). ssh username@gwdu102.gwdg.de

1c. Create own R-Library in the home directory. To find out which directory is the home directory Rscript -e “Sys.getenv('R_LIBS_USER')”

To create: mkdir -p ~/R/x86_64-redhat-linux-gnu-library/3.1 (this path will change with different R versions).

1d. Prepare MPI: Execute this in the Terminal. Maybe MPI and MKL were updated. Then the module load will fail. Use

module avail

to find out the current version.

module load intel/mkl/64/11.2/2015.3.187
echo $PATH
module load openmpi/intel/64/1.8.5
echo $PATH

The bit that was added to the $PATH after loading openmpi is a necessary configuration parameter for the Rmpi installation.

1e. Install the necessary packages:

# Open R
R
# install Rmpi
install.packages(
  "Rmpi", 
   repos = "http://ftp5.gwdg.de/pub/misc/cran/",
   dependencies = TRUE,
  configure.args = c(
   "--with-mpi=/cm/shared/apps/openmpi/intel/64/1.8.5/" # this is where MPI is located after using the module loader, unless MPI was updated
))
# most other packages are easier to install
install.packages(c("foreach","doMPI","coda","MCMCglmm"), repos = "http://ftp5.gwdg.de/pub/misc/cran/")

When you're done installing packages, you can quit R with q()

2. Test

mpirun -H localhost -n 3 Rscript -e "library(doMPI); cl = cl <- startMPIcluster(2); registerDoMPI(cl); foreach(i=1:3) %dopar% print(i) ; closeCluster(cl); mpi.quit()"

should count from 1 to 3.

3. Your real analysis.

In my case this involved

3a. Installing Transmit (or another SFTP client) to upload my scripts and data files to my home directory.

3b. After each login (1a) the repetition of step 1d, namely

module load intel/mkl/64/11.2/2015.3.187 && module load openmpi/intel/64/1.8.5

(this could be put into a job script for ease of use).

3c. Sending my script out on the cluster with bsub:

bsub -a openmpi -q mpi-short -W 1:00 -n 40 -R np20 mpirun.lsf R --slave -f "parallelised_script.r"

A stripped down example of my parallel MCMCglmm analysis:

library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
cat("Step 1")
Model = foreach(i=1:clusterSize(cl),.options.mpi = list(seed=1337) ) %dopar% {

    cat("Step 2")
    library(MCMCglmm)
    load("mydata.rdata")
    nitt = 7000; thin = 50; burnin = 3000
    MCMCglmm( outcome ~ pred ,
        random=~idParents,
        family="poisson", 
        data=mydata, 
        pr = F, saveX = T, saveZ = T,
        nitt=nitt,thin=thin,burnin=burnin)
}
library(coda)
mcmclist = mcmc.list(lapply(Model,FUN=function(x) { x$Sol}))
save(Model,mcmclist, file = "Model.rdata")
closeCluster(cl)
mpi.quit()

These resources helped me do this:

Scientific Computing