Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:running_jobs_for_experienced_users [2019/09/27 09:44]
vend [OUTDATED Running Jobs (for experienced Users)] ->https://projects.gwdg.de/projects/parallelrechnerbeschaffung-2012-13/wiki/outdated-running-jobs-for-experienced-users
— (current)
Line 1: Line 1:
- 
- 
-===== Nodes' specification ===== 
- 
-**Currently there are next "​mpi"​ nodes:** 
-  * 168 nodes with 20 Intel Ivy-Bridge CPU cores 
-  * 101 nodes with 24 Intel Broadwell CPU cores 
-  * 72 nodes with 16 Intel Haswell CPU cores 
-  * 32 nodes with 24 Intel Haswell CPU cores 
-Broadwell nodes have 128 GB of memory, Ivy- and Sandy-Bridge nodes 64 GB and Haswell nodes 128/256 GB. 
- 
-**Current "​fat"​ nodes:** 
-  * 5 nodes with 48 AMD CPU cores (with 128 GB of memory) 
-  * 25 nodes with 64 AMD CPU cores (with 256 GB of memory) 
-  * 15 nodes with 24 Intel Broadwell CPU cores (with 512 GB of memory) 
-  * 5 nodes with 40 Intel Haswell CPU cores (with 1.5 TB of memory) 
-  * 1 node with 32 Intel Haswell CPU cores (with 2 TB of memory) 
- 
-**There are "​gpu"​ nodes as well:** 
-  * 20 nodes with 1 NVidia GTX 770 GPUs 
-  * 15 nodes with 2 NVidia GTX 980 GPUs 
-  * 10 nodes with 2 NVidia K40 GPUs 
-GTX nodes are for single precision CUDA applications and K40 nodes for double precision or memory intensive CUDA applications. 
- 
-The ''​mpi''​ nodes provide the bulk of the compute power of our compute cluster and are meant for all types of applications. They are especially well suited for large MPI jobs, as they have a balanced compute to network performance ratio. The ''​fat''​ nodes are meant for shared memory parallelized workloads scaling beyond 16 cores, and for all applications requiring more than 64 GB of memory on a single node. 
- 
-===== Interactive session on the nodes ===== 
- 
-As stated before, ''​bsub''​ is used to submit jobs to the cluster. For example, to avoid running large tests on the frontend (a good idea!) you can get an interactive session (with the bash shell) on one of the ''​mpi''​ nodes with 
- 
-<​code>​ 
-bsub -ISs -q mpi-short -n 16 -R '​span[hosts=1]'​ -R np16 /​bin/​bash</​code>​ 
-\\ 
-''​-ISs''​ requests support for an interactive shell, and ''​-q mpi-short''​ the corresponding queue. ''​-n 16 -R <​nowiki>'​span[hosts=1]'</​nowiki>​ -R np16''​ ensures that you get a Sandy-Bridge (16 core, ''​np16''​) node exclusively (see below, use ''​-n 64''​ for tp and fat queues). You will get a shell prompt, as soon as a suitable node becomes available. Single thread, non-interactive jobs can be submitted with 
- 
-<​code>​ 
-bsub -q mpi ./​myexecutable</​code>​ 
- 
-===== MPI jobs ===== 
- 
-Note that a single thread job submitted like above will share its execution host with other jobs. It is therefore expected that it does not use more than the memory available per core! On the ''​mpi''​ nodes this amount is 4 GB, as well as on the newer ''​fat''​ nodes. If your job requires more, you must assign additional cores. For example, if your single thread job requires 64 GB of memory, you must submit it like this: 
- 
-<​code>​ 
-bsub -q mpi -n 16 ./​myexecutable</​code>​ 
-\\ 
-OpenMPI jobs can be submitted as follows: 
- 
-<​code>​ 
-bsub -q mpi -n 256 -a openmpi mpirun.lsf ./​myexecutable</​code>​ 
-\\ 
-For Intel MPI jobs it suffices to use ''​-a intelmpi''​ instead of ''​-a openmpi''​. Please note that LSF will not load the correct modules (compiler, library, MPI) for you. You either have to do that before executing ''​bsub'',​ in which case your setup will be copied to the execution hosts, or you will have to use a job script and load the required modules there. ​ 
- 
-A new feature in LSF is ''​pinning''​ support. ''​Pinning''​ (in its most basic version) means instructing the operating system to not apply its standard scheduling algorithms to your workloads, but instead keep processes on the CPU core they have been started on. This may significantly improve performance for some jobs, especially on the ''​fat''​ nodes with their high CPU core count. ''​Pinning''​ is managed via the MPI library, and currently only OpenMPI is supported. There is not much experience with this feature, so we are interested in your feedback. Here is an example: 
- 
-<​code>​ 
-bsub -R "​select[np16] span[ptile=16] affinity[core(1):​cpubind=core]"​ -q mpi -n 256 -a openmpi mpirun.lsf ./​myexecutable</​code>​ 
-\\ 
-The affinity string ''"​affinity[core(1):​cpubind=core]"''​ means that each task is using one core and that the binding should be done based on cores (as opposed to sockets, NUMA units, etc). Because this example is for a pure MPI application,​ x in ''​core(x)''​ is one. In an SMP/MPI hybrid job, x would be equal to the number of threads per task (e. g., equal to ''​OMP_NUM_THREADS''​ for Openmp/MPI hybrid jobs). 
- 
-===== SMP jobs ===== 
- 
-Shared memory parallelized jobs can be submitted with 
- 
-<​code>​ 
-bsub -q mpi -n 8,20 -R '​span[hosts=1]'​ -a openmp ./​myexecutable</​code>​ 
-\\ 
-The ''​span''​ option is required, without it, LSF will assign cores to the job from several nodes, if that is advantageous from the scheduling perspective. 
- 
-===== Using the fat+ queue ===== 
- 
-Nodes with a lot of memory are very expensive and should not normally be used for jobs which could also run on our other nodes. Therefore, please note the following policies: 
- 
-  * Your job must need more than 250 GB RAM. 
-  * Your job must use at least a full 512 GB node or half a 1.5 TB or 2 TB node: 
- 
-  * For a full 512 GB node: 
-<​code>​ 
-#BSUB -x 
-#BSUB -R "​maxmem < 600000"​ 
-</​code>​ 
- 
-  * For half a 1.5 TB node (your job needs more than 500 GB RAM): 
-<​code>​ 
-#BSUB -n 20 
-#BSUB -R span[hosts=1] 
-#BSUB -R "​maxmem < 1600000 && maxmem > 600000"​ 
-</​code>​ 
- 
-  * For a full 1.5 TB node (your job needs more than 700 GB RAM): 
-<​code>​ 
-#BSUB -x 
-#BSUB -R "​maxmem < 1600000 && maxmem > 600000"​ 
-</​code>​ 
- 
-  * For half a 2 TB node (your job needs more than 700 GB RAM): 
-<​code>​ 
-#BSUB -n 16 
-#BSUB -R span[hosts=1] 
-#BSUB -R "​maxmem > 1600000"​ 
-</​code>​ 
- 
-  * For a full 2 TB node (your job needs more than 1.5 TB RAM): 
-<​code>​ 
-#BSUB -x 
-#BSUB -R "​maxmem > 1600000"​ 
-</​code>​ 
- 
-The 512 GB nodes are also available in the fat queue, without these restrictions. However, fat jobs on these nodes have a lower priority compared to fat+ jobs. 
- 
-===== CPU architecture selection ===== 
- 
-Our cluster provides four generations of Intel CPUs and two generations of AMD CPUs. However, the main difference between these CPU types is whether they support Intel'​s AVX2 or not. For selecting this we have introduced the x64inlvl (for x64 instruction level) label: 
- 
-<​code>​ 
-x64inlvl=1 : Supports only AVX 
-x64inlvl=2 : Supports AVX and AVX2 
-</​code>​ 
- 
-In order to choose an AVX2 capable node you therefore have to include 
-<​code>​ 
-#BSUB -R "​x64inlvl=2"​ 
-</​code>​ 
-in your submission script. 
- 
-If you need to be more specific, you can also directly choose the CPU generation: 
- 
-<​code>​ 
-amd=1 : Interlagos 
-amd=2 : Abu Dhabi 
- 
-intel=1 : Sandy Bridge 
-intel=2 : Ivy Bridge 
-intel=3 : Haswell 
-intel=4 : Broadwell 
-</​code>​ 
- 
-So, in order to choose any AMD CPU: 
-<​code>​ 
-#BSUB -R amd 
-</​code>​ 
-In order to choose an Intel CPU of at least Haswell generation: 
-<​code>​ 
-#BSUB -R "​intel>​=3"​ 
-</​code>​ 
-This is equivalent to ''​x64inlvl=2''​. 
- 
-===== GPU selection ===== 
- 
-In order to use a GPU you should submit your job to the ''​gpu''​ queue, and request GPU shares. Each node equipped with a GPU provides as many GPU shares as it has cores, independent of how many GPUs are built in. So for example, on the nodes, which have 24 CPU cores, the following would give you exclusive access to GPUs: 
-<​code>​ 
-#BSUB -R "​rusage[ngpus_shared=24]"​ 
-</​code>​ 
-Note that you need not necessarily also request 24 cores with ''​-n 24'',​ as jobs from the MPI queue may utilize free CPU cores if you do not need them. The latest "​gpu"​ nodes have two GPUs each, and you should use both, if possible. 
- 
-If you request less shares than cores available, other jobs may also utilize the GPUs. However, we have currently no mechanism to select a specific one for a job. This would have to be handled in the application or your job script. 
- 
-A good way to use the nodes which have 2 GPUs with jobs only working on one GPU would be to put two together in one job script and preselect a GPU for each one. 
- 
-Currently we have several generations of NVidia GPUs in the cluster, selectable in the same way as CPU generations:​ 
- 
-<​code>​ 
-nvgen=1 : Kepler 
-nvgen=2 : Maxwell 
-nvgen=3 : Pascal 
-</​code>​ 
- 
-Most GPUs are commodity graphics cards, and only provide good performance for single precision calculations. If you need double precision performance,​ or error correcting memory (ECC RAM), you can select the Tesla GPUs with 
-<​code>​ 
-#BSUB -R tesla 
-</​code>​ 
-Our Tesla K40 are of the Kepler generation (nvgen=1). 
- 
-If you want to make sure to run on a node equipped with two GPUs use: 
-<​code>​ 
-#BSUB -R "​ngpus=2"​ 
-</​code>​ 
- 
-===== Memory selection ===== 
- 
-Note that the following paragraph is about **selecting** nodes with enough memory for a job. The mechanism to actually **reserve** that memory does not change: The memory you are allowed to use equals memory per core times slots (-n option) requested. 
- 
-You can select a node either by currently available memory (mem) or by maximum available memory (maxmem). If you request complete nodes, the difference is actually very small, as a free node's available memory is close to its maximum memory. All requests are in MB. 
- 
-To select a node with more than about 500 GB available memory use: 
-<​code>​ 
-#BSUB -R "​mem>​500000"​ 
-</​code>​ 
-To select a node with more than about 6 GB maximum memory per core use: 
-<​code>​ 
-#BSUB -R "​maxmem/​ncpus>​6000"​ 
-</​code>​ 
-(Yes, you can do basic math in the requirement string!) 
- 
-It bears repeating: None of the above is a memory reservation. If you actually want to reserve "​mem"​ memory, the easiest way is to combine ''​-R "​mem>​...''​ with ''​-x''​ for an exclusive job. 
- 
-Finally, note that the ''​-M''​ option just denotes the memory limit of your job per core (in KB). This is of no real consequence,​ as we do not enforce these limits and it has no influence on the host selection. 
- 
- 
-Besides the options shown in this article, you can of course use the options for controlling walltime limits (-W), output (-o), and your other requirements as usual. You can also continue to use job scripts instead of the command line (with the ''#​BSUB <​option>​ <​value>''​ syntax). 
- 
-Please consult the LSF man pages if you need further information. 
- 
-[[Kategorie:​ Scientific Computing]]