Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:running_jobs_slurm [2019/07/09 17:48]
mboden [Using Job Scripts] #SBATCH at top of the script
en:services:application_services:high_performance_computing:running_jobs_slurm [2020/01/28 12:13] (current)
hsommer [Miscellaneous Slurm Commands] format fields
Line 63: Line 63:
  
 **fat-fas+** and **fat-fmz+**\\ **fat-fas+** and **fat-fmz+**\\
-These partitions are for jobs that require more than 256 GB RAM on single node. Nodes of fat+ partitions have 512 GB, 1.5 and 2 TB RAM. Due to limited amount of such nodes, there are restrictions of using the partition, which you can find at the [[en:​services:​application_services:​high_performance_computing:​running_jobs_for_experienced_users#​using_the_fat_partitions|page for experienced Users]] ​+These partitions are for jobs that require more than 256 GB RAM on single node. Nodes of fat+ partitions have 512 GB, 1.5 and 2 TB RAM. 
  
-**gpu** - A partition for nodes containing GPUs. Please refer to INSERT STUFF HERE+**gpu** - A partition for nodes containing GPUs. Please refer to [[en:​services:​application_services:​high_performance_computing:​running_jobs_slurm#​gpu_selection]] ​
  
 ====  Available QOS  ==== ====  Available QOS  ====
Line 81: Line 81:
 Interactively or in batch mode. We generally recommend using the Interactively or in batch mode. We generally recommend using the
 batch mode. If you need to run a job interactively,​ you can find batch mode. If you need to run a job interactively,​ you can find
-information about that LINK.+information about that in the [[en:​services:​application_services:​high_performance_computing:​running_jobs_slurm#​interactive_session_on_the_nodes|corresponding section]].
 Batch jobs are submitted to the cluster using the '​sbatch'​ command Batch jobs are submitted to the cluster using the '​sbatch'​ command
 and a jobscript or a command:\\ and a jobscript or a command:\\
Line 93: Line 93:
  
 ====  "​sbatch"​ options ​ ==== ====  "​sbatch"​ options ​ ====
 +
 +**<​nowiki>​-A all</​nowiki>​**\\
 +Specifies the account '​all'​ for the job. This option is //​mandatory//​ for users who have access to special hardware and want to use the general partitions.
  
 **<​nowiki>​-p <​partition></​nowiki>​**\\ **<​nowiki>​-p <​partition></​nowiki>​**\\
Line 106: Line 109:
 **<​nowiki>​-o <​file></​nowiki>​**\\ **<​nowiki>​-o <​file></​nowiki>​**\\
 Store the job output in "​file"​ (otherwise written to slurm-<​jobid>​). ''​%J''​ in the filename stands for the jobid.\\ Store the job output in "​file"​ (otherwise written to slurm-<​jobid>​). ''​%J''​ in the filename stands for the jobid.\\
 +
 +**<​nowiki>​--noinfo</​nowiki>​**\\
 +Some metainformation about your job will be added to your output file. If you do not want that, you can suppress it with this flag.\\
 +
 +**<​nowiki>​--mail-type=[ALL|BEGIN|END]</​nowiki>​\\
 +<​nowiki>​--mail-user=your@mail.com</​nowiki>​** \\
 +Receive mails when the jobs start, end or both. There are even more options, refer to the sbatch man-page for more information about mail types. If you have a GWDG-mail-address,​ you do not need to specify the mail-user.\\
  
 ====  Resource Selection ​ ==== ====  Resource Selection ​ ====
Line 132: Line 142:
  
 **<​nowiki>​--mem-per-cpu=<​size[units]></​nowiki>​**\\ **<​nowiki>​--mem-per-cpu=<​size[units]></​nowiki>​**\\
-Required memory per task instead of node. <​nowiki>​--mem</​nowiki>​ and <​nowiki>​--mem-per-cpu</​nowiki> ​is mutually exclusive\\+Required memory per task instead of node. <​nowiki>​--mem</​nowiki>​ and <​nowiki>​--mem-per-cpu</​nowiki> ​are mutually exclusive.\\
 === Example === === Example ===
  
Line 254: Line 264:
 ==== GPU selection ==== ==== GPU selection ====
  
-In order to use a GPU you should submit your job to the ''​gpu''​ partition, and request GPU count and optionally the model. CPUs of the nodes in gpu partition are evenly distributed for every GPU. So if you are requesting a single GPU on the node with 20 cores adn 4 GPUs, you can get up to 5 cores reserved exclusively for you, the same is with memory. So for example, if you want 2 GPUs of model Nvidia GeForce GTX 1080 with 10 CPUs, you can submit a job script with the following flags: +In order to use a GPU you should submit your job to the ''​gpu''​ partition, and request GPU count and optionally the model. CPUs of the nodes in gpu partition are evenly distributed for every GPU. So if you are requesting a single GPU on the node with 20 cores and 4 GPUs, you can get up to 5 cores reserved exclusively for you, the same is with memory. So for example, if you want 2 GPUs of model Nvidia GeForce GTX 1080 with 10 CPUs, you can submit a job script with the following flags: 
-<​code> ​+<​code>​
 #SBATCH -p gpu #SBATCH -p gpu
 #SBATCH -n 10 #SBATCH -n 10
-#SBATCH --gres=gpu:gtx1080:2+#SBATCH -gtx1080:2
 </​code>​ </​code>​
  
-You can also omit the model selection, here is an example of selecting 1 GPU of any available model: + 
-<​code>​+You can also omit the model selection, here is an example of selecting 1 GPU of any available model:<​code>​
 #SBATCH -p gpu #SBATCH -p gpu
 #SBATCH -n 10 #SBATCH -n 10
-#SBATCH --gres=gpu:1+#SBATCH -1
 </​code>​ </​code>​
 +
 +There are different options to select the number of GPUs, such as ''​%%--gpus-per-node%%'',​ ''​%%--gpus-per-task%%''​ and more. See the [[https://​slurm.schedmd.com/​sbatch.html|sbatch man page]] for details.
  
 Currently we have several generations of NVidia GPUs in the cluster, namely: Currently we have several generations of NVidia GPUs in the cluster, namely:
Line 279: Line 291:
 <​code>​ <​code>​
 #SBATCH -p gpu #SBATCH -p gpu
-#SBATCH --gres=gpu:k40:2+#SBATCH -k40:2
 </​code>​ </​code>​
 Our Tesla K40 are of the Kepler generation. Our Tesla K40 are of the Kepler generation.
Line 285: Line 297:
 <​code>​ sinfo -p gpu --format=%N,​%G </​code>​ shows a list of host with GPUs, as well as their type and count. <​code>​ sinfo -p gpu --format=%N,​%G </​code>​ shows a list of host with GPUs, as well as their type and count.
  
-=====  ​Miscallaneous ​Slurm Commands ​ =====+=====  ​Miscellaneous ​Slurm Commands ​ =====
  
 While ''​sbatch''​ is arguably the most important Slurm command, you may also find the following commands useful: While ''​sbatch''​ is arguably the most important Slurm command, you may also find the following commands useful:
Line 303: Line 315:
 **<​nowiki>​sacct -j <​jobid>​ --format=JobID,​User,​UID,​JobName,​MaxRSS,​Elapsed,​Timelimit</​nowiki>​**\\ **<​nowiki>​sacct -j <​jobid>​ --format=JobID,​User,​UID,​JobName,​MaxRSS,​Elapsed,​Timelimit</​nowiki>​**\\
 Get job Information even after the job has finished.\\ Get job Information even after the job has finished.\\
-**Note on ''​sacct'':​** Depending on the parameters given ''​sacct''​ chooses a time window in a rather unintuitive way. This is documented in the DEFAULT TIME WINDOW section of its man page. If you unexpectedly get no results from your ''​sacct''​ query, try specifying the start time with, e.g. ''<​nowiki>​-S 2019-01-01</​nowiki>''​.+**Note on ''​sacct'':​** Depending on the parameters given ''​sacct''​ chooses a time window in a rather unintuitive way. This is documented in the DEFAULT TIME WINDOW section of its man page. If you unexpectedly get no results from your ''​sacct''​ query, try specifying the start time with, e.g. ''<​nowiki>​-S 2019-01-01</​nowiki>''​.\\ 
 +The ''<​nowiki>​--format</​nowiki>''​ option knows many more fields like **Partition**,​ **Start**, **End** or **State**, for the full list refer to the man page.
  
 **<​nowiki>​scancel</​nowiki>​**\\ **<​nowiki>​scancel</​nowiki>​**\\