OUTDATED Latest Nodes

for new version see: Running Jobs Slurm

Most new nodes are equipped with 2×12 core Intel Broadwell CPUs. The following nodes are added to the cluster:

  • 76 nodes with 128 GB memory (mpi queue)
  • 15 nodes with two NVidia GTX 980 GPUs for single precision CUDA applications (gpu queue and, with lower priority, mpi queue).
  • 10 nodes with two NVidia K40 GPUs for double precision or memory intensive CUDA applications (gpu queue and, with lower priority, mpi queue).
  • 15 nodes with 512 GB memory (fat+ queue and, with lower priority, fat queue)
  • 5 nodes with 1.5 TB memory and 4×10 core Haswell CPUs (fat+ queue)

In order to accommodate the new systems we have introduced a lot of configuration changes to our job management. However, most of you will only need to know one thing now:

The new nodes use a new scratch file system, /scratch2. They do not have access to /scratch! For your convenience /scratch2 is linked to /scratch on the new nodes, but the new nodes cannot utilize any files stored in the existing /scratch.

Requesting nodes with access to the existing /scratch works as before:

#BSUB -R scratch

To request the new scratch2:

#BSUB -R scratch2

If you use scratch space only for storing temporary data, and do not need to access data stored previously, you can request /scratch or /scratch2:

#BSUB -R "scratch||scratch2"

In that case use /scratch/${USERID} for the temporary data (don't forget to create it on /scratch2). On the new nodes data will then be stored in /scratch2 via the mentioned symlink.

The following changes are not relevant to many of you, but please try to skim over them, in order to see what might affect you or be useful:

1. The "latest" tag:

Both the last and the new acquisition currently have the latest tag, in order to not interfere with submitted jobs. However, in about a week the latest tag will be removed from the older nodes. After that time, a job submission combining -R latest and -R scratch can never start, as the latest nodes cannot access /scratch. It may be similar for other combined requests involving “latest”.

2. CPU architecture selection

With the upgrade, our cluster provides four generations of Intel CPUs and two generations of AMD CPUs. However, the main difference between these CPU types is whether they support Intel's AVX2 or not. For selecting this we have introduced the x64inlvl (for x64 instruction level) label:

x64inlvl=1 : Supports only AVX
x64inlvl=2 : Supports AVX and AVX2

In order to choose an AVX2 capable node you therefore have to include

#BSUB -R "x64inlvl=2"

in your submission script.

If you need to be more specific, you can also directly choose the CPU generation:

amd=1 : Interlagos
amd=2 : Abu Dhabi

intel=1 : Sandy Bridge
intel=2 : Ivy Bridge
intel=3 : Haswell
intel=4 : Broadwell

So, in order to choose any AMD CPU:

#BSUB -R amd

In order to choose an Intel CPU of at least Haswell generation:

#BSUB -R "intel>=3"

This is equivalent to x64inlvl=2.

3. CPU slot count

The npxx resource requirement should be substituted by ncpus=xx. For example, in order to choose a node with 20 CPU slots, your submission script should contain

#BSUB -R "ncpus=20"

The old syntax (-R np20) will continue to work, and will also be supported on the new nodes. However, it is now deprecated and will be removed with the next cluster extension. It will also be removed from our documentation.

Please note: Like -R np20, -R “ncpus=20” does not substitute the -n <x> statement. -n <x> requests the amount of cores, while -R “ncpus=20” only tells the batchsystem that you want your job to be run only on nodes with the given amount of cores.

4. Memory selection

Note that the following paragraph is about selecting nodes with enough memory for a job. The mechanism to actually reserve that memory does not change: The memory you are allowed to use equals memory per core times slots (-n option) requested.

You can select a node either by currently available memory (mem) or by maximum available memory (maxmem). If you request complete nodes, the difference is actually very small, as a free node's available memory is close to its maximum memory. All requests are in MB.

To select a node with more than about 500 GB memory use:

#BSUB -R "maxmem>500000"

To select a node with more than about 6 GB memory per core use:

#BSUB -R "maxmem/ncpus>6000"

(Yes, you can do basic math in the requirement string!)

It bears repeating: None of the above is a memory reservation. If you actually want to reserve memory, the easiest way is to combine -R “maxmem>… with -x for an exclusive job.

Finally, note that the -M option just denotes the memory limit of your job per core (in KB). This is of no real consequence, as we do not enforce these limits and it has no influence on the host selection.

5. Using the fat+ queue

Nodes with a lot of memory are very expensive and should not normally be used for jobs which could also run on our other nodes. Therefore, please note the following policies:

  • Your job must need more than 250 GB RAM.
  • Your job must use at least a full 512 GB node or half a 1.5 TB or 2 TB node:
  • For a full 512 GB node:
#BSUB -x
#BSUB -R "maxmem < 600000"
  • For half a 1.5 TB node (your job needs more than 500 GB RAM):
#BSUB -n 20
#BSUB -R span[hosts=1]
#BSUB -R "maxmem < 1600000 && maxmem > 600000"
  • For a full 1.5 TB node (your job needs more than 700 GB RAM):
#BSUB -x
#BSUB -R "maxmem < 1600000 && maxmem > 600000"
  • For half a 2 TB node (your job needs more than 700 GB RAM):
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -R "maxmem > 1600000"
  • For a full 2 TB node (your job needs more than 1.5 TB RAM):
#BSUB -x
#BSUB -R "maxmem > 1600000"

The 512 GB nodes are also available in the fat queue, without these restrictions. However, fat jobs on these nodes have a lower priority compared to fat+ jobs.

6. GPU selection

In order to use a GPU you should submit your job to the GPU queue, and request GPU shares. Each node equipped with a GPU provides as many GPU shares as it has cores, independent of how many GPUs are built in. So, on the new nodes, which have 24 cores, the following would give you exclusive access to GPUs:

#BSUB -R "rusage[ngpus_shared=24]"

Note that you need not necessarily also request 24 cores with -n 24, as jobs from the MPI queue may utilize free CPU cores if you do not need them. The new nodes have two GPUs each, and you should use both, if possible.

If you request less shares than cores available, other jobs may also utilize the GPUs. However, we have currently no mechanism to select a specific one for a job. This would have to be handled in the application or your job script.

A good way to use the new nodes with jobs only working on one GPU would be to put two together in one job script and preselect a GPU for each one.

Currently we have two generations of NVidia GPUs in the cluster, selectable in the same way as CPU generations:

nvgen=1 : Kepler
nvgen=2 : Maxwell

Most GPUs are commodity graphics cards, and only provide good performance for single precision calculations. If you need double precision performance, or error correcting memory (ECC RAM), you can select the Tesla GPUs with

#BSUB -R tesla

Our Tesla K40 are of the Kepler generation (nvgen=1).

If you want to make sure to run on a node equipped with two GPUs use:

#BSUB -R "ngpus=2"

7. New frontend

The new frontend gwdu103 has 2×12 Intel Broadwell CPUs and 64 GB memory. If you compile a program on gwdu103 it will often be automatically optimized for Broadwell CPUs / intel=4. In that case it will probably also run on intel=3 / Haswell, but not below that or on AMD nodes.

The same policies apply as for all other frontends: You may use it to compile software, to copy from and to the new scratch-Filesystem, and for short tests of your compiled binaries. You must not use it to run long time tasks, especially if they are CPU or memory intensive.