Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:running_jobs_slurm [2020/01/10 15:49]
mboden [sbatch options]
en:services:application_services:high_performance_computing:running_jobs_slurm [2020/04/14 14:47] (current)
mboden [''sbatch'': Specifying node properties with ''-C'']
Line 46: Line 46:
  
 **medium**\\ **medium**\\
-This is our general purpose partition, usable for serial and SMP jobs with up to 20 tasks, but it is especially well suited for large MPI jobs. Up to 1024 cores can be used in a single MPI job, and the maximum runtime is 48 hours.+This is our general purpose partition, usable for serial and SMP jobs with up to 24 tasks, but it is especially well suited for large MPI jobs. Up to 1024 cores can be used in a single MPI job, and the maximum runtime is 48 hours.
  
 **fat**\\ **fat**\\
-This is the partition for SMP jobs, especially those requiring lots of memory. Serial jobs with very high memory requirements do also belong in this partition. Up to 64 cores and 256 GB are available on one host. Maximum runtime is 48 hours.+This is the partition for SMP jobs, especially those requiring lots of memory. Serial jobs with very high memory requirements do also belong in this partition. Up to 64 cores and up to 512 GB are available on one host. Maximum runtime is 48 hours.\\ 
 +The nodes of the fat+ partitions are also present in this partition, but will only be used, if they are not needed for bigger jobs submitted to the fat+ partition.
  
 **fat+**\\ **fat+**\\
-This partition is meant for very memory intensive jobs. For more details see below (**fat-fas+** and **fat-fmz+**). +This partition is meant for very memory intensive jobs. These partitions are for jobs that require more than 512 GB RAM on single node. Nodes of fat+ partitions have 1.5 and 2 TB RAM. You are required to have specify your memory needs on job submission to use these nodes (see [[en:​services:​application_services:​high_performance_computing:​running_jobs_slurm#​resource_selection|resource selection]]).\\ 
- +As general advice: Try your jobs on the smaller nodes in the fat partition first and work your way up and don't be afraid to ask for help here.
-These are called '​meta'​ partitions, because they are just collections +
-of different partitions.\\ +
-If you need more fine grained control over the on which kind of nodes your job runs, you can also directly use the underlying '​real'​ partitions:​\\ +
-**medium-fas** - Medium nodes at Faßberg\\ +
-**medium-fmz** - Medium nodes at the Fernmeldezentrale\\ +
-**fat-fas** - Fat nodes at Faßberg\\ +
-**fat-fmz** - Fat nodes at the Fernmeldezentrale +
- +
-**fat-fas+** and **fat-fmz+**\\ +
-These partitions are for jobs that require more than 256 GB RAM on single node. Nodes of fat+ partitions have 512 GB, 1.5 and 2 TB RAM. +
  
 **gpu** - A partition for nodes containing GPUs. Please refer to [[en:​services:​application_services:​high_performance_computing:​running_jobs_slurm#​gpu_selection]] ​ **gpu** - A partition for nodes containing GPUs. Please refer to [[en:​services:​application_services:​high_performance_computing:​running_jobs_slurm#​gpu_selection]] ​
  
-====  ​Available QOS  ====+====  ​Runtime limits (QoS)  ====
 If the default time limits are not sufficient for your jobs, you can use a "​Quality of Service"​ or **QOS** to modify those limits on a per job basis. We currently have two QOS. If the default time limits are not sufficient for your jobs, you can use a "​Quality of Service"​ or **QOS** to modify those limits on a per job basis. We currently have two QOS.
  
Line 126: Line 117:
 **<​nowiki>​-c <cpus per task></​nowiki>​**\\ **<​nowiki>​-c <cpus per task></​nowiki>​**\\
 The number of cpus per tasks. The default is one cpu per task. The number of cpus per tasks. The default is one cpu per task.
 +
 +**<​nowiki>​-c vs -n</​nowiki>​**\\
 +As a rule of thumb, if you run your code on a single node, use -c. For multi-node MPI-jobs, use -n.\\
  
 **<​nowiki>​-N <​minNodes[,​maxNodes]></​nowiki>​**\\ **<​nowiki>​-N <​minNodes[,​maxNodes]></​nowiki>​**\\
Line 162: Line 156:
 **-C scratch[2]**\\ **-C scratch[2]**\\
 The node must have access to shared ''/​scratch''​ or ''/​scratch2''​. The node must have access to shared ''/​scratch''​ or ''/​scratch2''​.
 +
 +**-C fmz / -C fas**\\
 +The node has to be at that location. It is pretty similar to -C scratch / -C scratch2, since the nodes in the FMZ have access to scratch and those at the Fassberg location have access to  scratch2. This is mainly for easy compatibility with our old partition naming scheme.
 +
 +**-C [architecture]**\\
 +request a specific CPU architecture. Available Options are: abu-dhabi, ivy-bridge, haswell, broadwell. See [[en:​services:​application_services:​high_performance_computing:​start#​hardware_overview|this table]] for the corresponding nodes.
  
  
Line 315: Line 315:
 **<​nowiki>​sacct -j <​jobid>​ --format=JobID,​User,​UID,​JobName,​MaxRSS,​Elapsed,​Timelimit</​nowiki>​**\\ **<​nowiki>​sacct -j <​jobid>​ --format=JobID,​User,​UID,​JobName,​MaxRSS,​Elapsed,​Timelimit</​nowiki>​**\\
 Get job Information even after the job has finished.\\ Get job Information even after the job has finished.\\
-**Note on ''​sacct'':​** Depending on the parameters given ''​sacct''​ chooses a time window in a rather unintuitive way. This is documented in the DEFAULT TIME WINDOW section of its man page. If you unexpectedly get no results from your ''​sacct''​ query, try specifying the start time with, e.g. ''<​nowiki>​-S 2019-01-01</​nowiki>''​.+**Note on ''​sacct'':​** Depending on the parameters given ''​sacct''​ chooses a time window in a rather unintuitive way. This is documented in the DEFAULT TIME WINDOW section of its man page. If you unexpectedly get no results from your ''​sacct''​ query, try specifying the start time with, e.g. ''<​nowiki>​-S 2019-01-01</​nowiki>''​.\\ 
 +The ''<​nowiki>​--format</​nowiki>''​ option knows many more fields like **Partition**,​ **Start**, **End** or **State**, for the full list refer to the man page.
  
 **<​nowiki>​scancel</​nowiki>​**\\ **<​nowiki>​scancel</​nowiki>​**\\