When working with complex codes that require a certain environment to work, e.g. specific software of a specific version it might be useful to provide this code as a singularity container to other users.
The user may then work with (execute) this singularity container on the GWDG HPC system as is explained here. One obstacle, however, is that from within the singularity container one cannot execute slurm commands (or equally pbs commands which are outdated and therefore will not be discussed here) by default.
This page serves as a tutorial on how to achieve the usage of slurm from a singularity container on the SCC of the GWDG (as of 2020-05-12). The explanations operate under the assumption that the singularity container
<my_container> models a UNIX like operating system. (At least for this case the procedure described below was tested to work.)
In order to execute a slurm command from inside a singularity container follow these steps:
singularity shell -B /opt/slurm,/usr/lib64/libreadline.so.6,/usr/lib64/libhistory.so.6,/usr/lib64/libtinfo.so.5,/var/run/munge,/usr/lib64/libmunge.so.2,/usr/lib64/libmunge.so.2.0.0,/run/munge <my_container> >> export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH >> echo "slurmadmin:x:300:300::/opt/slurm/slurm:/bin/false" >> /etc/passwd >> echo "slurmadmin:x:300:" >> /etc/group >> <any_other_code> >> /opt/slurm/bin/<any_slurm_command> <any_optional_arguments> >> <any_other_code>
Two steps are necessary in the procedure:
First, the container needs to have access to the installation of slurm on the host system, i.e. the SCC, and to various other required libraries. Therefore one needs to bind the following paths/files when starting the container:
such that the final command for executing a command in
singularity exec -B /opt/slurm,/usr/lib64/libreadline.so.6,/usr/lib64/libhistory.so.6,/usr/lib64/libtinfo.so.5,/var/run/munge,/usr/lib64/libmunge.so.2,/usr/lib64/libmunge.so.2.0.0,/run/munge <my_container> <any_command_to_be_executed_within_my_container>
Here, 1. lets the container know where to find slurm on the host system, 2. - 4. provide necessary libraries for slurm itself (Note that problems may occur if your container needs a different major version of any of these libraries!), and the remaining files/paths will load the
munge encryption tool which the GWDG uses to ensure that only registered users operate on the system (with their respective rights).
Having started a container the following commands have to be executed once within the container to allow for it to make use of the libraries/paths introduced in the previous section:
echo “slurmadmin:x:300:300::/opt/slurm/slurm:/bin/false” » /etc/passwd
echo “slurmadmin:x:300:” » /etc/group
The first command just adds a new path to the library path, thereby letting the container know that new libraries are to be loaded in
/usr/lib64. The other commands let the container know the user's identity on the host by creating an account inside the container that does not require a password (
x - Note that normally these accounts would not have any rights, however, it is just important for the container to know the user's identity on the host as all slurm requests will be redirected to the host, where the user then has his/her native rights) adding this information to the files
Starting a singularity container is quite cumbersome for the operating system of the host and, in case too many requests to start a singularity container are received at the same time, it might fail. It is, hence, beneficial to keep the calls to singularity itself to a minimum.
Therefore, the usage of the
exec command of singularity is suitable with the commands explained in Step 2 outsourced to a shell (sub-)script. A call to singularity executing code that incorporates a slurm command on the host system may look like this:
<any_other_code> singularity exec -B /opt/slurm,/usr/lib64/libreadline.so.6,/usr/lib64/libhistory.so.6,/usr/lib64/libtinfo.so.5,/var/run/munge,/usr/lib64/libmunge.so.2,/usr/lib64/libmunge.so.2.0.0,/run/munge <my_container> bash <a_file_to_be_executed_within_the_container>.sh <any_other_code>
<any_other_code> export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH echo "slurmadmin:x:300:300::/opt/slurm/slurm:/bin/false" >> /etc/passwd echo "slurmadmin:x:300:" >> /etc/group /opt/slurm/bin/<any_slurm_command> <any_optional_arguments> <any_other_code>
The prefix to the slurm command (
/opt/slurm/bin/) may be omitted when the
PATH variable is updated inside the singularity container via
which may be added to
<a_file_to_be_executed_within_the_container>.sh after exporting the