======= Parallel analysis using Rmpi ======= ===== HowTo by ===== [[https://psych.uni-goettingen.de/de/biopers/team/arslan/|RubenArslan]] This is a documentation of the steps that were necessary to execute a parallel MCMCglmm analysis using MPI (message passing interface) on the GWDG intel cluster. ==== 1. Preparation ==== 1a. Get an account ( support@gwdg.de ) 1b. Login with standard user name and password (e.g. in the Mac OS X Terminal). ''ssh username@gwdu102.gwdg.de'' 1c. Create own R-Library in the home directory. To find out which directory is the home directory ''Rscript -e "Sys.getenv('R_LIBS_USER')"'' To create: ''mkdir -p ~/R/x86_64-redhat-linux-gnu-library/3.1'' (this path will change with different R versions). 1d. Prepare MPI: Execute this in the Terminal. Maybe MPI and MKL were updated. Then the module load will fail. Use module avail to find out the current version. module load intel/mkl/64/11.2/2015.3.187 echo $PATH module load openmpi/intel/64/1.8.5 echo $PATH The bit that was added to the $PATH after loading openmpi is a necessary configuration parameter for the Rmpi installation. 1e. Install the necessary packages: # Open R R # install Rmpi install.packages( "Rmpi", repos = "http://ftp5.gwdg.de/pub/misc/cran/", dependencies = TRUE, configure.args = c( "--with-mpi=/cm/shared/apps/openmpi/intel/64/1.8.5/" # this is where MPI is located after using the module loader, unless MPI was updated )) # most other packages are easier to install install.packages(c("foreach","doMPI","coda","MCMCglmm"), repos = "http://ftp5.gwdg.de/pub/misc/cran/") When you're done installing packages, you can quit R with ''q()'' ==== 2. Test ==== mpirun -H localhost -n 3 Rscript -e "library(doMPI); cl = cl <- startMPIcluster(2); registerDoMPI(cl); foreach(i=1:3) %dopar% print(i) ; closeCluster(cl); mpi.quit()" should count from 1 to 3. ==== 3. Your real analysis. ==== In my case this involved 3a. Installing Transmit (or another SFTP client) to upload my scripts and data files to my home directory. 3b. After each login (1a) the repetition of step 1d, namely module load intel/mkl/64/11.2/2015.3.187 && module load openmpi/intel/64/1.8.5 (this could be put into a job script for ease of use). 3c. Sending my script out on the cluster with bsub: bsub -a openmpi -q mpi-short -W 1:00 -n 40 -R np20 mpirun.lsf R --slave -f "parallelised_script.r" A stripped down example of my parallel MCMCglmm analysis: library(doMPI) cl <- startMPIcluster() registerDoMPI(cl) cat("Step 1") Model = foreach(i=1:clusterSize(cl),.options.mpi = list(seed=1337) ) %dopar% { cat("Step 2") library(MCMCglmm) load("mydata.rdata") nitt = 7000; thin = 50; burnin = 3000 MCMCglmm( outcome ~ pred , random=~idParents, family="poisson", data=mydata, pr = F, saveX = T, saveZ = T, nitt=nitt,thin=thin,burnin=burnin) } library(coda) mcmclist = mcmc.list(lapply(Model,FUN=function(x) { x$Sol})) save(Model,mcmclist, file = "Model.rdata") closeCluster(cl) mpi.quit() These resources helped me do this: * [[http://www.stats.uwo.ca/faculty/yu/Rmpi/install.htm]] * [[http://stackoverflow.com/questions/25082565/run-rmpi-on-cluster-specify-library-path/]] [[Kategorie: Scientific Computing]]