Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:spark [2020/11/06 08:55]
mboden [Creating a Spark Cluster on the SCC]
en:services:application_services:high_performance_computing:spark [2021/04/22 15:29]
mboden removed
Line 73: Line 73:
  
 The local dataset containing the integers from //1// to //1E9// is distributed across the executors using the parallelize function and filtered according to the rule that the random point //(x,y)// with //0 < x, y < 1// that is being sampled according to a uniform distribution,​ is inside the unit circle. Consequently,​ the ratio of the points conforming to this rule to the total number of points approximates the area of one quarter of the unit circle and allows us to extract an estimate for the number //Pi// in the last line. The local dataset containing the integers from //1// to //1E9// is distributed across the executors using the parallelize function and filtered according to the rule that the random point //(x,y)// with //0 < x, y < 1// that is being sampled according to a uniform distribution,​ is inside the unit circle. Consequently,​ the ratio of the points conforming to this rule to the total number of points approximates the area of one quarter of the unit circle and allows us to extract an estimate for the number //Pi// in the last line.
 +
 +===== Configuration =====
 +By default, Spark'​s [[https://​spark.apache.org/​docs/​latest/​configuration.html#​application-properties|scratch space]] is created in ''/​tmp/​$USER/​spark''​. If you find that the ''​2G''​ size of the partition where this directory is stored is insufficient,​ you can configure a different directory, for example in the ''​scratch''​ filesystem, for this purpose before deploying your cluster as follows:
 +<​code>​
 +export SPARK_LOCAL_DIRS=/​scratch/​users/​$USER
 +</​code>​
  
 ===== Further reading ===== ===== Further reading =====
 You can find a more in-depth tour on the Spark architecture,​ features and examples (based on Scala) in the [[https://​info.gwdg.de/​wiki/​doku.php?​id=wiki:​hpc:​parallel_processing_with_spark_on_gwdg_s_scientific_compute_cluster|HPC wiki]]. You can find a more in-depth tour on the Spark architecture,​ features and examples (based on Scala) in the [[https://​info.gwdg.de/​wiki/​doku.php?​id=wiki:​hpc:​parallel_processing_with_spark_on_gwdg_s_scientific_compute_cluster|HPC wiki]].
  
- --- //​[[christian.koehler@gwdg.de|ckoehle2]] ​2019/09/02 19:50//+ --- //​[[christian.koehler@gwdg.de|ckoehle2]] ​2020/11/10 13:59//