Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:services:application_services:high_performance_computing:running_jobs [2019/09/20 10:47]
vend [A Note On Job Memory Usage] [archived](https://projects.gwdg.de/projects/parallelrechnerbeschaffung-2012-13/wiki/outdated-running-jobs)
— (current)
Line 1: Line 1:
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
-====  Recipe: Reserving Memory for OpenMP ​ ==== 
- 
-The following job script recipe demonstrates using empty job slots for reserving memory for OpenMP jobs: 
- 
-<​code>#​!/​bin/​sh 
-#BSUB -q fat 
-#BSUB -W 00:10 
-#BSUB -o out.%J 
-#BSUB -n 64 
-#BSUB -R big 
-#BSUB -R "​span[hosts=1]"​ 
- 
-export OMP_NUM_THREADS=8 
-./​myopenmpprog 
-</​code>​ 
-\\ 
-====  Disk Space Options ​ ==== 
- 
-You have the following options for attributing disk space to your jobs: 
- 
-**/​local**\\ 
-This is the local hard disk of the node. It is a fast - and in the case of the ''​gwda,​ gwdd, dfa, dge, dmp, dsu''​ and ''​dte''​ nodes even very fast, SSD based - option for storing temporary data. There is automatic file deletion for the local disks.\\ 
-\\ 
-**/​scratch**\\ 
-This is the shared scratch space, available on ''​gwda''​ and ''​gwdd''​ nodes and frontends ''​gwdu101''​ and ''​gwdu102''​. You can use ''​-R scratch''​ to make sure to get a node with access to shared /scratch. It is very fast, there is no automatic file deletion, but also no backup! We may have to delete files manually when we run out of space. You will receive a warning before this happens.\\ 
-\\ 
-**/​scratch2**\\ 
-This space is the same as scratch described above except it is **ONLY** available on the nodes ''​dfa,​ dge, dmp, dsu''​ and ''​dte''​ and on the frontend ''​gwdu103''​. You can use ''​-R scratch2''​ to make sure to get a node with access to that space.\\ 
-\\ 
-**$HOME**\\ 
-Your home directory is available everywhere, permanent, and comes with backup. Your attributed disk space can be increased. It is comparably slow, however. 
- 
-====  Recipe: Using ''/​scratch'' ​ ==== 
- 
-This recipe shows how to run Gaussian09 using ''/​scratch''​ for temporary files: 
- 
-<​code>​ 
-#!/bin/sh 
-#BSUB -q fat 
-#BSUB -n 64 
-#BSUB -R "​span[hosts=1]"​ 
-#BSUB -R scratch 
-#BSUB -W 24:00 
-#BSUB -C 0 
-#BSUB -a openmp 
- 
-export g09root="/​usr/​product/​gaussian"​ 
-. $g09root/​g09/​bsd/​g09.profile 
- 
-mkdir -p /​scratch/​${USER} 
-MYSCRATCH=`mktemp -d /​scratch/​${USER}/​g09.XXXXXXXX` 
-export GAUSS_SCRDIR=${MYSCRATCH} 
- 
-g09 myjob.com myjob.log 
-  
-rm -rf $MYSCRATCH 
-</​code>​ 
-\\ 
- 
-====  Using ''/​scratch2'' ​ ==== 
-Currently the latest nodes do NOT have an access to ''/​scratch''​. They have an access only to shared ''/​scratch2''​. 
- 
-If you use scratch space only for storing temporary data, and do not need to access data stored previously, you can request /scratch or /scratch2: 
-<​code>​ 
-#BSUB -R "​scratch||scratch2"​ 
-</​code>​ 
-For that case ''/​scratch2''​ is linked to ''/​scratch''​ on the latest nodes. You can just use ''/​scratch/​${USERID}''​ for the temporary data (don't forget to create it on ''/​scratch2''​). On the latest nodes data will then be stored in ''/​scratch2''​ via the mentioned symlink. 
- 
-=====  Miscallaneous LSF Commands ​ ===== 
- 
-While ''​bsub''​ is arguably the most important LSF command, you may also find the following commands useful: 
- 
-**bjobs**\\ 
-Lists current jobs. Useful options are: ''​-p,​ -l, -a, , <​jobid>,​ -u all, -q <​queue>,​ -m <​host>''​.\\ 
-\\ 
-**bhist**\\ 
-Lists older jobs. Useful options are: ''​-l,​ -n, <​jobid>''​.\\ 
-\\ 
-**lsload**\\ 
-Status of cluster nodes. Useful options are: ''​-l,​ <​hostname>''​.\\ 
-\\ 
-**bqueues**\\ 
-Status of batch queues. Useful options are: ''​-l,​ <​queue>''​.\\ 
-\\ 
-**bhpart**\\ 
-Why do I have to wait? ''​bhpart''​ shows current user priorities. Useful options are: ''​-r,​ <host partition>''​.\\ 
-\\ 
-**bkill**\\ 
-The Final Command. It has two use modes: 
- 
-  -  ''​bkill <​jobid>'':​ This kills a job with a specific jobid. 
-  -  ''​bkill <​selection options> 0'':​ This kills all jobs fitting the selection options. Useful selection options are: ''​-q <​queue>,​ -m <​host>''​. 
- 
-Have a look at the respective man pages of these commands to learn more about them! 
- 
-=====  Getting Help  ===== 
-The following sections show you where you can get status Information and where you can get support in case of problems. 
-====  Information sources ​ ==== 
- 
-  *  Cluster status page 
-    *  [[http://​lsf.gwdg.de/​lsfinfo/​]] 
-  *  HPC announce mailing list 
-    *  [[https://​listserv.gwdg.de/​mailman/​listinfo/​hpc-announce]] 
- 
-====  Using the GWDG Support Ticket System ​ ==== 
- 
-Write an email to <​hpc@gwdg.de>​. In the body: 
-  *  State that your question is related to the batch system. 
-  *  State your user id (''​$USER''​). 
-  *  If you have a problem with your jobs please //always send the complete standard output and error//! 
-  *  If you have a lot of failed jobs send at least two outputs. You can also list the jobids of all failed jobs to help us even more with understanding your problem. 
-  *  If you don’t mind us looking at your files, please state this in your request. You can limit your permission to specific directories or files. 
- 
-[[Kategorie:​ Scientific Computing]]