This is an old revision of the document!


Exclusive jobs

An exclusive job does use all of its allocated nodes exclusively, i.e., it never shares a node with another job. This is useful if you require all of a node's memory (but not all of its CPU cores), or for SMP/MPI hybrid jobs, for example.

To submit an exclusive job add -x to your bsub options. For example, to submit a single task job, which uses a complete fat node with 256 GB memory, you could use:

bsub -x -q fat -R big ./mytask

(-R big requests a 256 GB node, excluding the 128 GB nodes in the fat queue)

For submitting an OpenMP/MPI hybrid job with a total of 8 MPI processes, spread evenly across 2 nodes, use:

export OMP_NUM_THREADS=4
bsub -x -q mpi -n 8 -R span[ptile=4] -a intelmpi mpirun.lsf ./hybrid_job

(each MPI process creates 4 OpenMP threads in this case).

Please note that fairshare evaluation and accounting is done based on the number of job slots allocated. So the first example would count as 64 slots for both fairshare and accounting.

Using exclusive jobs does not require reserving all of a node's slots explicitly (e.g., with span[ptile='!']) and subsequently using the MPI library's mpiexec or mpiexec.hydra to set the process number, as we explain in our introductory course. This makes submitting a hybrid job as exclusive job more straightforward.

However, there is a disadvantage: LSF will not reserve the additional job slots required to get a node exclusively. Therefore, when the cluster is very busy, an exclusive job needing a lot of nodes may wait significantly longer.

A Note On Job Memory Usage

LSF will try to fill up each node with processes up to its job slot limit. Therefore each process in your job must not use more memory than available per core! If your per core memory requirements are too high, you have to add more job slots in order to allow your job to use the memory of these slots as well. If your job's memory usage increases with the number of processes, you have to leave additional job slots empty, i.e., do not run processes on them.

Recipe: Reserving Memory for OpenMP

The following job script recipe demonstrates using empty job slots for reserving memory for OpenMP jobs:

#!/bin/sh
#BSUB -q fat
#BSUB -W 00:10
#BSUB -o out.%J
#BSUB -n 64
#BSUB -R big
#BSUB -R "span[hosts=1]"

export OMP_NUM_THREADS=8
./myopenmpprog


Disk Space Options

You have the following options for attributing disk space to your jobs:

/local
This is the local hard disk of the node. It is a fast - and in the case of the gwda, gwdd, dfa, dge, dmp, dsu and dte nodes even very fast, SSD based - option for storing temporary data. There is automatic file deletion for the local disks.

/scratch
This is the shared scratch space, available on gwda and gwdd nodes and frontends gwdu101 and gwdu102. You can use -R scratch to make sure to get a node with access to shared /scratch. It is very fast, there is no automatic file deletion, but also no backup! We may have to delete files manually when we run out of space. You will receive a warning before this happens.

/scratch2
This space is the same as scratch described above except it is ONLY available on the nodes dfa, dge, dmp, dsu and dte and on the frontend gwdu103. You can use -R scratch2 to make sure to get a node with access to that space.

$HOME
Your home directory is available everywhere, permanent, and comes with backup. Your attributed disk space can be increased. It is comparably slow, however.

Recipe: Using ''/scratch''

This recipe shows how to run Gaussian09 using /scratch for temporary files:

#!/bin/sh
#BSUB -q fat
#BSUB -n 64
#BSUB -R "span[hosts=1]"
#BSUB -R scratch
#BSUB -W 24:00
#BSUB -C 0
#BSUB -a openmp

export g09root="/usr/product/gaussian"
. $g09root/g09/bsd/g09.profile

mkdir -p /scratch/${USER}
MYSCRATCH=`mktemp -d /scratch/${USER}/g09.XXXXXXXX`
export GAUSS_SCRDIR=${MYSCRATCH}

g09 myjob.com myjob.log
 
rm -rf $MYSCRATCH


Using ''/scratch2''

Currently the latest nodes do NOT have an access to /scratch. They have an access only to shared /scratch2.

If you use scratch space only for storing temporary data, and do not need to access data stored previously, you can request /scratch or /scratch2:

#BSUB -R "scratch||scratch2"

For that case /scratch2 is linked to /scratch on the latest nodes. You can just use /scratch/${USERID} for the temporary data (don't forget to create it on /scratch2). On the latest nodes data will then be stored in /scratch2 via the mentioned symlink.

Miscallaneous LSF Commands

While bsub is arguably the most important LSF command, you may also find the following commands useful:

bjobs
Lists current jobs. Useful options are: -p, -l, -a, , <jobid>, -u all, -q <queue>, -m <host>.

bhist
Lists older jobs. Useful options are: -l, -n, <jobid>.

lsload
Status of cluster nodes. Useful options are: -l, <hostname>.

bqueues
Status of batch queues. Useful options are: -l, <queue>.

bhpart
Why do I have to wait? bhpart shows current user priorities. Useful options are: -r, <host partition>.

bkill
The Final Command. It has two use modes:

  1. bkill <jobid>: This kills a job with a specific jobid.
  2. bkill <selection options> 0: This kills all jobs fitting the selection options. Useful selection options are: -q <queue>, -m <host>.

Have a look at the respective man pages of these commands to learn more about them!

Getting Help

The following sections show you where you can get status Information and where you can get support in case of problems.

Information sources

Using the GWDG Support Ticket System

Write an email to hpc@gwdg.de. In the body:

  • State that your question is related to the batch system.
  • State your user id ($USER).
  • If you have a problem with your jobs please always send the complete standard output and error!
  • If you have a lot of failed jobs send at least two outputs. You can also list the jobids of all failed jobs to help us even more with understanding your problem.
  • If you don’t mind us looking at your files, please state this in your request. You can limit your permission to specific directories or files.

Scientific Computing

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies