User Tools

Site Tools


SLURM Sbatch Script for Spark Applications

Spark applications use parallelisme by distributing computations to multiple executors under the supervision of a driver process. Each executor uses its cores for running independent tasks from the application. The executor processes are placed on the nodes in a cluster under the supervision of a cluster manager. The Apache Spark software provides components to set up the standalone cluster on a set of nodes. The Spark standalone cluster has a master for controlling the driver, running on one core of a node, called the master host, and one or more workers controlling the executors, running on a prescribed number of cores on different worker nodes. The master host can simultaneous be one of the worker nodes.

The SLURM workload management allowes to allocate a set of hardware resources. It will give exclusive use to a number of cores on a number of different nodes on GWDG's compute cluster. The sbatch <scriptfile> command allows to specify in <scriptfile> resources to be allocated and the commands to be executed on these resources.

The following scriptfile start-standalone.script will start a Spark stanalone cluster on resources allocated by the SLURM management system:


#SBATCH --partition=medium
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=2048
#SBATCH --time=60:00
#SBATCH --output=outfile-%J

sleep infinity

The Spark standalone cluster will provide as many executors as specified in the --ntask option, with a number of cores per executor as specified in the --cpus-per-task option and an amount of memory per executor equal to the number of cores times the value specified in the --mem-per-cpu option. The SLURM system will allocate a number of nodes to supply the resources needed for the specified configuration of the Sparrk application. The number of workers in the Spark standalone cluster will be equal to the number of allocated nodes. The Spark standalone cluster will be alive for the time required by the option --time, or until the command scancel <slurm_job_id> is launched, where <slurm_job_id> is the job-id returned from the sbatch command.

This configuration of the Spark standalone cluster and the SparkContext is set by sourcing the script
module load JAVA/jdk1.8.0_31 spark

export SPARK_CONF_DIR=~/SparkConf
mkdir -p $SPARK_CONF_DIR

echo "export SPARK_LOG_DIR=~/SparkLog" > $env
echo "export SPARK_WORKER_DIR=~/SparkWorker" >> $env
echo "export SLURM_MEM_PER_CPU=$SLURM_MEM_PER_CPU" >> $env
echo 'export SPARK_WORKER_CORES=`nproc`' >> $env

echo "export SPARK_HOME=$SPARK_HOME" > ~/.bashrc
echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc
echo "export SPARK_CONF_DIR=$SPARK_CONF_DIR" >> ~/.bashrc

scontrol show hostname $SLURM_JOB_NODELIST > $SPARK_CONF_DIR/slaves

echo "spark.default.parallelism" $(( $SLURM_CPUS_PER_TASK * $SLURM_NTASKS ))> $conf
echo "spark.submit.deployMode" client >> $conf
echo "spark.master" spark://`hostname`:7077 >> $conf
echo "spark.executor.cores" $SLURM_CPUS_PER_TASK >> $conf
echo "spark.executor.memory" $(( $SLURM_CPUS_PER_TASK*$SLURM_MEM_PER_CPU ))M >> $conf

The Spark standalone cluster is started by the command, which returns after starting the master und worker processes. The standalone cluster will not be ready for executing Spark applications, until connections between master and workers have been established and the worker processes have been registerd with the master process. Sourcing the script will wait until the registering of all workers has succeeded:
num_workers=`cat $SPARK_CONF_DIR/slaves|wc -l`
echo number of workers to be registered: $num_workers
master_logfile=`ls -tr ${SPARK_LOG_DIR}/*master* |tail -1`
worker_logfiles=`ls -tr ${SPARK_LOG_DIR}/*worker* |tail -$num_workers`
for i in {1..100}
  sleep $steptime
  num_reg=` grep 'registered' $worker_logfiles|wc -l`
  if [ $num_reg -eq $num_workers ]
echo registered workers after $((i * steptime)) seconds  :
for file in $worker_logfiles
  grep 'registered' $file
grep 'Starting Spark master' $master_logfile
grep 'Registering worker' $master_logfile|tail -$num_workers

The script will signal the completion of the standalone cluster by writing corresponding information lines into the output file specified in the --output option of the batch script:

number of workers to be registered: 3
registered workers after 114 seconds :
19/08/11 12:26:25 INFO Worker: Successfully registered with master spark://
19/08/11 12:27:07 INFO Worker: Successfully registered with master spark://
19/08/11 12:26:29 INFO Worker: Successfully registered with master spark://
19/08/11 12:26:24 INFO Master: Starting Spark master at spark://
19/08/11 12:26:25 INFO Master: Registering worker with 4 cores, 8.0 GB RAM
19/08/11 12:26:29 INFO Master: Registering worker with 2 cores, 4.0 GB RAM
19/08/11 12:27:07 INFO Master: Registering worker with 2 cores, 4.0 GB RAM 

Only after finding these lines in the output file, Spark applications can be submitted from the same login node from which the SLURM batch jab had been started:

gwdu102 > module load JAVA spark
gwdu102 > SPARK_CONF_DIR=~/SparkConf
gwdu102 > spark-submit  ~/SparkApp/target/scala-2.11/sparkapp_2.11-0.1.0-SNAPSHOT.jar

For executig the Spark application as a batch job, the batch sript has to be modified by replacing the command sleep infinity by

spark-submit ~/SparkApp/target/scala-2.11/sparkapp_2.11-0.1.0-SNAPSHOT.jar

Values for the Spark environment variables $SPARK_CONF_DIR, SPARK_LOG_DIR and SPARK_WORKER_DIR have to be supplied in the script. The files from successive executions of Spark applications will be kept in the SPARK_LOG_DIR and SPARK_WORKER_DIR directories, they have to be deleted by hand if no longer needed. The second group of commands in the scriptfile creates a file ~/.bashrc. This file is sourced by the ssh commands to set the environment on the worker nodes before starting the worker processes. Any existing ~/.bashrc file therefore should be saved before sourcing the scriptfile.

An introduction to the Apache Spark system can be found in the chapter user provided application documentation of GWDG's HPC documentation. under the heading Parallel Processing with Spark on GWDG’s Scientific Compute Cluster . Detailed explanations for setting up a Spark standalone cluster on resources allocated by SLURM sbatch jobs are given in the sectionsSetting up a Spark Standalone Cluster, Submitting to the Spark Standalone Cluster and Running Spark Applications on GWDG's Scientific Compute Cluster.

wiki/hpc/slurm_sbatch_script_for_spark_applications.txt · Last modified: 2019/08/11 15:23 by ohaan