wiki:hpc:slurm_sbatch_script_for_spark_applications
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | wiki:hpc:slurm_sbatch_script_for_spark_applications [2019/08/11 15:23] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ======= SLURM Sbatch Script for Spark Applications ======= | ||
+ | Spark applications use parallelisme by distributing computations to multiple executors under the supervision of a driver process. Each executor uses its cores for running independent tasks from the application. The executor processes are placed on the nodes in a cluster under the supervision of a cluster manager. The Apache Spark software provides components to set up the '' | ||
+ | The SLURM workload management allowes to allocate a set of hardware resources. It will give exclusive use to a number of cores on a number of different nodes on GWDG's compute cluster. The '' | ||
+ | |||
+ | The following scriptfile '' | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH --partition=medium | ||
+ | #SBATCH --ntasks=4 | ||
+ | #SBATCH --cpus-per-task=2 | ||
+ | #SBATCH --mem-per-cpu=2048 | ||
+ | #SBATCH --time=60: | ||
+ | #SBATCH --output=outfile-%J | ||
+ | |||
+ | . setenv.sh | ||
+ | $SPARK_HOME/ | ||
+ | . wait-worker.sh | ||
+ | |||
+ | |||
+ | sleep infinity | ||
+ | </ | ||
+ | |||
+ | The Spark standalone cluster will provide as many executors as specified in the '' | ||
+ | The Spark standalone cluster will be alive for the time required by the option '' | ||
+ | |||
+ | This configuration of the Spark standalone cluster and the SparkContext is set by sourcing the script '' | ||
+ | |||
+ | <code - setenv.sh> | ||
+ | ## | ||
+ | module load JAVA/ | ||
+ | |||
+ | export SPARK_CONF_DIR=~/ | ||
+ | mkdir -p $SPARK_CONF_DIR | ||
+ | |||
+ | env=$SPARK_CONF_DIR/ | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo ' | ||
+ | echo ' | ||
+ | |||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | |||
+ | scontrol show hostname $SLURM_JOB_NODELIST > $SPARK_CONF_DIR/ | ||
+ | |||
+ | conf=$SPARK_CONF_DIR/ | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | echo " | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | The Spark standalone cluster is started by the '' | ||
+ | |||
+ | <code - wait-worker.sh> | ||
+ | ## | ||
+ | . $SPARK_CONF_DIR/ | ||
+ | num_workers=`cat $SPARK_CONF_DIR/ | ||
+ | echo number of workers to be registered: $num_workers | ||
+ | master_logfile=`ls -tr ${SPARK_LOG_DIR}/ | ||
+ | worker_logfiles=`ls -tr ${SPARK_LOG_DIR}/ | ||
+ | steptime=3 | ||
+ | for i in {1..100} | ||
+ | do | ||
+ | sleep $steptime | ||
+ | num_reg=` grep ' | ||
+ | if [ $num_reg -eq $num_workers ] | ||
+ | then | ||
+ | break | ||
+ | fi | ||
+ | done | ||
+ | echo registered workers after $((i * steptime)) seconds | ||
+ | for file in $worker_logfiles | ||
+ | do | ||
+ | grep ' | ||
+ | done | ||
+ | grep ' | ||
+ | grep ' | ||
+ | |||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | |||
+ | < | ||
+ | number of workers to be registered: 3 | ||
+ | registered workers after 114 seconds : | ||
+ | 19/08/11 12:26:25 INFO Worker: Successfully registered with master spark:// | ||
+ | 19/08/11 12:27:07 INFO Worker: Successfully registered with master spark:// | ||
+ | 19/08/11 12:26:29 INFO Worker: Successfully registered with master spark:// | ||
+ | 19/08/11 12:26:24 INFO Master: Starting Spark master at spark:// | ||
+ | 19/08/11 12:26:25 INFO Master: Registering worker 10.108.102.34: | ||
+ | 19/08/11 12:26:29 INFO Master: Registering worker 10.108.102.46: | ||
+ | 19/08/11 12:27:07 INFO Master: Registering worker 10.108.102.47: | ||
+ | </ | ||
+ | |||
+ | Only after finding these lines in the output file, Spark applications can be submitted from the same login node from which the SLURM batch jab had been started: | ||
+ | |||
+ | < | ||
+ | gwdu102 > module load JAVA spark | ||
+ | gwdu102 > SPARK_CONF_DIR=~/ | ||
+ | gwdu102 > spark-submit | ||
+ | </ | ||
+ | |||
+ | For executig the Spark application as a batch job, the batch sript '' | ||
+ | by replacing the command '' | ||
+ | |||
+ | < | ||
+ | spark-submit ~/ | ||
+ | $SPARK_HOME/ | ||
+ | </ | ||
+ | |||
+ | Values for the Spark environment variables '' | ||
+ | |||
+ | An introduction to the Apache Spark system can be found in the chapter '' | ||
+ | under the heading | ||
+ | [[https:// | ||
+ | |||
+ |
wiki/hpc/slurm_sbatch_script_for_spark_applications.txt · Last modified: 2019/08/11 15:23 by 127.0.0.1