Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:running_jobs [2019/09/20 10:47]
vend [Exclusive jobs] [archived](https://projects.gwdg.de/projects/parallelrechnerbeschaffung-2012-13/wiki/outdated-running-jobs)
— (current)
Line 1: Line 1:
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
-====  A Note On Job Memory Usage  ==== 
- 
-LSF will try to fill up each node with processes up to its job slot limit. Therefore each process in your job must not use more memory than available //per core//! If your per core memory requirements are too high, you have to add more job slots in order to allow your job to use the memory of these slots as well. If your job's memory usage increases with the number of processes, you have to leave additional job slots //empty//, i.e., do not run processes on them. 
- 
-====  Recipe: Reserving Memory for OpenMP ​ ==== 
- 
-The following job script recipe demonstrates using empty job slots for reserving memory for OpenMP jobs: 
- 
-<​code>#​!/​bin/​sh 
-#BSUB -q fat 
-#BSUB -W 00:10 
-#BSUB -o out.%J 
-#BSUB -n 64 
-#BSUB -R big 
-#BSUB -R "​span[hosts=1]"​ 
- 
-export OMP_NUM_THREADS=8 
-./​myopenmpprog 
-</​code>​ 
-\\ 
-====  Disk Space Options ​ ==== 
- 
-You have the following options for attributing disk space to your jobs: 
- 
-**/​local**\\ 
-This is the local hard disk of the node. It is a fast - and in the case of the ''​gwda,​ gwdd, dfa, dge, dmp, dsu''​ and ''​dte''​ nodes even very fast, SSD based - option for storing temporary data. There is automatic file deletion for the local disks.\\ 
-\\ 
-**/​scratch**\\ 
-This is the shared scratch space, available on ''​gwda''​ and ''​gwdd''​ nodes and frontends ''​gwdu101''​ and ''​gwdu102''​. You can use ''​-R scratch''​ to make sure to get a node with access to shared /scratch. It is very fast, there is no automatic file deletion, but also no backup! We may have to delete files manually when we run out of space. You will receive a warning before this happens.\\ 
-\\ 
-**/​scratch2**\\ 
-This space is the same as scratch described above except it is **ONLY** available on the nodes ''​dfa,​ dge, dmp, dsu''​ and ''​dte''​ and on the frontend ''​gwdu103''​. You can use ''​-R scratch2''​ to make sure to get a node with access to that space.\\ 
-\\ 
-**$HOME**\\ 
-Your home directory is available everywhere, permanent, and comes with backup. Your attributed disk space can be increased. It is comparably slow, however. 
- 
-====  Recipe: Using ''/​scratch'' ​ ==== 
- 
-This recipe shows how to run Gaussian09 using ''/​scratch''​ for temporary files: 
- 
-<​code>​ 
-#!/bin/sh 
-#BSUB -q fat 
-#BSUB -n 64 
-#BSUB -R "​span[hosts=1]"​ 
-#BSUB -R scratch 
-#BSUB -W 24:00 
-#BSUB -C 0 
-#BSUB -a openmp 
- 
-export g09root="/​usr/​product/​gaussian"​ 
-. $g09root/​g09/​bsd/​g09.profile 
- 
-mkdir -p /​scratch/​${USER} 
-MYSCRATCH=`mktemp -d /​scratch/​${USER}/​g09.XXXXXXXX` 
-export GAUSS_SCRDIR=${MYSCRATCH} 
- 
-g09 myjob.com myjob.log 
-  
-rm -rf $MYSCRATCH 
-</​code>​ 
-\\ 
- 
-====  Using ''/​scratch2'' ​ ==== 
-Currently the latest nodes do NOT have an access to ''/​scratch''​. They have an access only to shared ''/​scratch2''​. 
- 
-If you use scratch space only for storing temporary data, and do not need to access data stored previously, you can request /scratch or /scratch2: 
-<​code>​ 
-#BSUB -R "​scratch||scratch2"​ 
-</​code>​ 
-For that case ''/​scratch2''​ is linked to ''/​scratch''​ on the latest nodes. You can just use ''/​scratch/​${USERID}''​ for the temporary data (don't forget to create it on ''/​scratch2''​). On the latest nodes data will then be stored in ''/​scratch2''​ via the mentioned symlink. 
- 
-=====  Miscallaneous LSF Commands ​ ===== 
- 
-While ''​bsub''​ is arguably the most important LSF command, you may also find the following commands useful: 
- 
-**bjobs**\\ 
-Lists current jobs. Useful options are: ''​-p,​ -l, -a, , <​jobid>,​ -u all, -q <​queue>,​ -m <​host>''​.\\ 
-\\ 
-**bhist**\\ 
-Lists older jobs. Useful options are: ''​-l,​ -n, <​jobid>''​.\\ 
-\\ 
-**lsload**\\ 
-Status of cluster nodes. Useful options are: ''​-l,​ <​hostname>''​.\\ 
-\\ 
-**bqueues**\\ 
-Status of batch queues. Useful options are: ''​-l,​ <​queue>''​.\\ 
-\\ 
-**bhpart**\\ 
-Why do I have to wait? ''​bhpart''​ shows current user priorities. Useful options are: ''​-r,​ <host partition>''​.\\ 
-\\ 
-**bkill**\\ 
-The Final Command. It has two use modes: 
- 
-  -  ''​bkill <​jobid>'':​ This kills a job with a specific jobid. 
-  -  ''​bkill <​selection options> 0'':​ This kills all jobs fitting the selection options. Useful selection options are: ''​-q <​queue>,​ -m <​host>''​. 
- 
-Have a look at the respective man pages of these commands to learn more about them! 
- 
-=====  Getting Help  ===== 
-The following sections show you where you can get status Information and where you can get support in case of problems. 
-====  Information sources ​ ==== 
- 
-  *  Cluster status page 
-    *  [[http://​lsf.gwdg.de/​lsfinfo/​]] 
-  *  HPC announce mailing list 
-    *  [[https://​listserv.gwdg.de/​mailman/​listinfo/​hpc-announce]] 
- 
-====  Using the GWDG Support Ticket System ​ ==== 
- 
-Write an email to <​hpc@gwdg.de>​. In the body: 
-  *  State that your question is related to the batch system. 
-  *  State your user id (''​$USER''​). 
-  *  If you have a problem with your jobs please //always send the complete standard output and error//! 
-  *  If you have a lot of failed jobs send at least two outputs. You can also list the jobids of all failed jobs to help us even more with understanding your problem. 
-  *  If you don’t mind us looking at your files, please state this in your request. You can limit your permission to specific directories or files. 
- 
-[[Kategorie:​ Scientific Computing]]