{"id":23421,"date":"2023-09-07T09:25:57","date_gmt":"2023-09-07T07:25:57","guid":{"rendered":"https:\/\/info.gwdg.de\/news\/?p=23421"},"modified":"2023-10-13T08:02:02","modified_gmt":"2023-10-13T06:02:02","slug":"using-apptainer-containers-to-manage-your-python-environments","status":"publish","type":"post","link":"https:\/\/info.gwdg.de\/news\/using-apptainer-containers-to-manage-your-python-environments\/","title":{"rendered":"Using apptainer containers to manage your Python environments"},"content":{"rendered":"<p dir=\"auto\" data-sourcepos=\"4:1-4:183\">When it comes to our High-Performance Computing (HPC) systems, efficiency is the name of the game. But are you managing your Python environments efficiently?<\/p>\n<p dir=\"auto\" data-sourcepos=\"8:1-8:485\">Our HPC systems use network file systems that are optimized to access a small number of large files. This is particularly true for the parallel file systems that are used for the \/scratch directories (which uses the Lustre file system). Using a typical Python environment with many thousands of small files creates overhead for the file system and can slow down the loading time of your program significantly, as well as slowing down the file system for all other users of the cluster.<\/p>\n<p dir=\"auto\" data-sourcepos=\"10:1-11:183\">In this article we learn how to put the Python environment inside a single container file instead, which greatly reduces the overhead on the file system. The used container definition files can also easily be tracked with version management to satisfy the requirements of scientific data management, similar to the source code you write.<\/p>\n<p dir=\"auto\" data-sourcepos=\"10:1-11:183\">A more permanent version to this article with updates and fixes can be found at <a href=\"https:\/\/gitlab-ce.gwdg.de\/hpc-team-public\/science-domains-blog\/-\/blob\/main\/20230907_python-apptainer.md\" class=\"external\" rel=\"nofollow\">https:\/\/gitlab-ce.gwdg.de\/hpc-team-public\/science-domains-blog\/-\/blob\/main\/20230907_python-apptainer.md<\/a><\/p>\n<h2 dir=\"auto\" data-sourcepos=\"13:1-13:29\">Sign up for NHR@G\u00f6ttingen<\/h2>\n<p dir=\"auto\" data-sourcepos=\"14:1-14:358\">If you don&#8217;t have access yet you can create an NHR account at <a href=\"https:\/\/zulassung.hlrn.de\/\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/zulassung.hlrn.de\/<\/a>. This is free for all researchers associated with german universities. Is this your first time on any cluster? Check out our <a href=\"https:\/\/gitlab-ce.gwdg.de\/hpc-team-public\/science-domains-blog\/-\/blob\/main\/20230417_cluster-practical.md\" class=\"external\" rel=\"nofollow\">bonus material on cluster concepts<\/a>.<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"16:1-16:41\"><a id=\"user-content-apptainer-on-the-emmy-grete-systems\" class=\"anchor\" href=\"#apptainer-on-the-emmy-grete-systems\" aria-hidden=\"true\"><\/a>Apptainer on the Emmy &amp; Grete systems<\/h3>\n<p dir=\"auto\" data-sourcepos=\"17:1-17:70\">You can load the <code>apptainer<\/code> module on the login servers <code>glogin[0-9]<\/code><\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-12\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"18:1-20:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">module load apptainer\r\n<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"21:1-21:432\">You can run <code>apptainer build<\/code> commands on the login nodes to create your container files, and submit a slurm job that uses <code>apptainer run<\/code> to run them on the compute nodes. It is currently not possible to run them directly on the login nodes. Note that for testing the test queues as well as interactive sessions can be used to prevent waiting for your job to start only to figure out later that you forgot a semicolon in your code.<\/p>\n<p dir=\"auto\" data-sourcepos=\"23:1-23:92\">To use the container you have to submit a slurm job to run them on one of the compute nodes.<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"24:1-24:38\"><a id=\"user-content-i-want-to-test-it-on-my-own-system\" class=\"anchor\" href=\"#i-want-to-test-it-on-my-own-system\" aria-hidden=\"true\"><\/a>I want to test it on my own system<\/h3>\n<p dir=\"auto\" data-sourcepos=\"25:1-25:361\">You can also install apptainer on your local linux computer or virtual machine (VM). You can find the latest releases on <a href=\"https:\/\/github.com\/apptainer\/apptainer\/releases\/\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/github.com\/apptainer\/apptainer\/releases\/<\/a> If you don&#8217;t use Linux on your computer, you can also create a Virtual Machine (VM) with a Linux installation to test on (for example by using VirtualBox, VMWare Fusion or Parallels Desktop).<\/p>\n<p dir=\"auto\" data-sourcepos=\"27:1-27:147\">On Ubuntu 22.04 LTS Desktop the following should be sufficient to install version 1.2.2 (which might be outdated by the time you are reading this):<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-13\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"28:1-32:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">sudo nala install fuse-overlayfs<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">wget https:\/\/github.com\/apptainer\/apptainer\/releases\/download\/v1.2.2\/apptainer_1.2.2_amd64.deb<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">dpkg -i apptainer_1.2.2_amd64.deb<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"33:1-33:242\"><strong>Note:<\/strong> <code>nala<\/code> is a more powerful and intuitive replacement for apt. If you don&#8217;t use <code>nala<\/code> yet you can install it with <code>sudo apt install nala<\/code> or use the old <code>apt install<\/code> instead of <code>nala install<\/code>. Check out <code>nala history<\/code> in particular!<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"35:1-35:62\"><a id=\"user-content-can-i-run-it-on-the-gwdg-scientific-compute-cluster-scc\" class=\"anchor\" href=\"#can-i-run-it-on-the-gwdg-scientific-compute-cluster-scc\" aria-hidden=\"true\"><\/a>Can I run it on the GWDG Scientific Compute Cluster (SCC)?<\/h3>\n<p dir=\"auto\" data-sourcepos=\"36:1-36:205\">We currently do not support building your own containers on SCC. If you bring your own container image <code>container.sif<\/code> though (for example created on your own system, see previous section), you can run it:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-14\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"37:1-40:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">module load singularity<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">singularity run container.sif<\/span><\/code><\/pre>\n<\/div>\n<h2 dir=\"auto\" data-sourcepos=\"41:1-41:19\"><a id=\"user-content-general-workflow\" class=\"anchor\" href=\"#general-workflow\" aria-hidden=\"true\"><\/a>General workflow<\/h2>\n<p dir=\"auto\" data-sourcepos=\"42:1-42:68\">When working with apptainer, there are usually three steps involved:<\/p>\n<ol dir=\"auto\" data-sourcepos=\"43:1-46:0\">\n<li data-sourcepos=\"43:1-43:61\">Writing a container definition file (<em>.def<\/em>) (or <em>recipe<\/em>)<\/li>\n<li data-sourcepos=\"44:1-44:94\">Using <code>apptainer build<\/code> to create a container image file (<em>.sif<\/em>) ( or <em>baking<\/em> the recipe)<\/li>\n<li data-sourcepos=\"45:1-46:0\">Using <code>apptainer run<\/code> to run the container (if we continue with the metaphor, getting scientific results is like eating a cake)<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p dir=\"auto\" data-sourcepos=\"47:1-47:210\">It is not necessary to do all of the steps on the same computer or even by the same person: you can exchange recipes with your colleagues, and also make container image files you created available for download.<\/p>\n<p dir=\"auto\" data-sourcepos=\"49:1-49:127\">In this tutorial we will exemplify the process by providing you recipes that you can build &amp; run on our Emmy &amp; Grete clusters.<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"51:1-51:36\"><a id=\"user-content-editing-text-files-on-the-cluster\" class=\"anchor\" href=\"#editing-text-files-on-the-cluster\" aria-hidden=\"true\"><\/a>Editing text files on the cluster<\/h2>\n<p dir=\"auto\" data-sourcepos=\"52:1-54:329\">Please keep in mind that you don&#8217;t have to use terminal text editors like <code>vim<\/code> and <code>emacs<\/code> but can use the graphical text editor you might already be familiar with if you use an SFTP\/SCP client like <em>FileZilla<\/em>, <em>CyberDuck<\/em> or <em>Transmit<\/em> or a text-editor with a built-in client. You need to provide the same hostname, username and SSH key as when logging in via the terminal. The details depend on your operating system and SFTP\/SCP client and might be more difficult for Windows users.<\/p>\n<figure id=\"attachment_23428\" aria-describedby=\"caption-attachment-23428\" style=\"width: 1886px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/info.gwdg.de\/news\/wp-content\/uploads\/2023\/09\/20230907_python-apptainer-figure1.png\" class=\"external\" rel=\"nofollow\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-23428\" src=\"https:\/\/info.gwdg.de\/news\/wp-content\/uploads\/2023\/09\/20230907_python-apptainer-figure1.png\" alt=\"\" width=\"1896\" height=\"1216\" \/><\/a><figcaption id=\"caption-attachment-23428\" class=\"wp-caption-text\">Python apptainer<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p dir=\"auto\" data-sourcepos=\"52:1-54:329\">On Linux your file browser probably already supports SFTP\/SCP without having to install anything. (If you use Gnome and you have setup the <code>$HOME\/.ssh\/config<\/code> on your computer so that e.g. you only have to type <code>ssh glogin-gpu<\/code> to connect to Grete, just enter <code>ssh:\/\/glogin-gpu<\/code> in Files &gt; + Other Locations &gt; Connect To Server.)<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"55:1-55:39\"><a id=\"user-content-miniconda-container-physics-example\" class=\"anchor\" href=\"#miniconda-container-physics-example\" aria-hidden=\"true\"><\/a>Miniconda container: Physics Example<\/h2>\n<p dir=\"auto\" data-sourcepos=\"56:1-56:73\">Don&#8217;t worry, you will not need any physics knowledge to run this example.<\/p>\n<p dir=\"auto\" data-sourcepos=\"58:1-59:47\">We want to install Quspin (<a href=\"https:\/\/quspin.github.io\/QuSpin\/\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/quspin.github.io\/QuSpin\/<\/a>) in a container to simulate a many-body quantum system. Create the following Apptainer Definition File:<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"61:1-61:16\"><a id=\"user-content-quspindef\" class=\"anchor\" href=\"#quspindef\" aria-hidden=\"true\"><\/a><code>quspin.def<\/code><\/h3>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-15\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"62:1-69:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">Bootstrap: docker<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">From: continuumio\/miniconda3<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">%post<\/span>\r\n<span id=\"LC4\" class=\"line\" lang=\"plaintext\">    conda update -y conda<\/span>\r\n<span id=\"LC5\" class=\"line\" lang=\"plaintext\">    conda install -y python=3.10<\/span>\r\n<span id=\"LC6\" class=\"line\" lang=\"plaintext\">    conda install -y -c weinbe58 omp quspin <\/span><\/code><\/pre>\n<pre class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"62:1-69:3\"><code><span id=\"LC6\" class=\"line\" lang=\"plaintext\">  <\/span><\/code><\/pre>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"70:1-70:415\">If we build this file, the official miniconda3 container will be fetched from <a href=\"https:\/\/hub.docker.com\/r\/continuumio\/miniconda3\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/hub.docker.com\/r\/continuumio\/miniconda3<\/a> and the commands in the <code>%post<\/code> section will be run to install the quspin module before creating a single Singularity Image Format file (SIF). If you read this article in the far future, you might have to re-adjust the Python version from 3.10 to one that is currently supported by quspin.<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-16\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"71:1-73:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">$ apptainer build quspin.sif quspin.def<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"74:1-74:52\">The creation of the SIF file can take a few minutes.<br \/>\nWe can now fetch an example from the QuSpin project and try out our new container (we use srun here to start a slurm job on the Emmy cluster):<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-17\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"77:1-86:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">$ wget https:\/\/quspin.github.io\/QuSpin\/downloads\/be9497383fff21e4d03309a4d1a24ce1\/example5.py<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">$ srun -p medium40:test .\/quspin.sif python example5.py<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">srun: job 4813650 queued and waiting for resources<\/span>\r\n<span id=\"LC4\" class=\"line\" lang=\"plaintext\">srun: job 4813650 has been allocated resources<\/span>\r\n<span id=\"LC5\" class=\"line\" lang=\"plaintext\">Hermiticity check passed!<\/span>\r\n<span id=\"LC6\" class=\"line\" lang=\"plaintext\">Symmetry checks passed!<\/span>\r\n<span id=\"LC7\" class=\"line\" lang=\"plaintext\">Particle conservation check passed!<\/span>\r\n<span id=\"LC8\" class=\"line\" lang=\"plaintext\">$ <\/span><\/code><\/pre>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"87:1-87:154\"><strong><br \/>\nNote:<\/strong> If you are reading this in the future and the examples have changed, feel free to substitute a different example script from the QuSpin website.<\/p>\n<p dir=\"auto\" data-sourcepos=\"89:1-89:184\">Congratulations! You might have just simulated your first many-body quantum system! Feel free to explore the other quspin examples and simulate your favourite many-body quantum system.<\/p>\n<p dir=\"auto\" data-sourcepos=\"91:1-91:107\">Instead of explicitly executing the .sif file as an executable, you can also use the apptainer run command:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-18\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"92:1-94:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">srun -p medium40:test apptainer run quspin.sif python example5.py <\/span><\/code><\/pre>\n<\/div>\n<hr data-sourcepos=\"96:1-97:0\" \/>\n<p dir=\"auto\" data-sourcepos=\"98:1-98:286\"><strong>NOTE:<\/strong> By default your home directory on the cluster $HOME is mounted inside the container at the same location. If you need other directories &#8212; in particular $WORK and $TMPDIR &#8212; to be available inside the container, add e.g. <code>--bind $WORK,$TMPDIR<\/code> to your <code>apptainer run<\/code> command:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-19\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"99:1-101:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">apptainer run --bind $WORK,$TMPDIR quspin.sif python example5.py <\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"102:1-102:157\">Otherwise the files in your $WORK and $TMPDIR directories on Emmy\/Grete will not be visible inside the container, since they are not subdirectories of $HOME.<\/p>\n<p dir=\"auto\" data-sourcepos=\"104:1-104:297\"><strong>NOTE:<\/strong> If you have installed Python modules directly into your $HOME with pip, this might cause all sorts of problems (not just with apptainer but also conda and venv environments) and you might consider running the apptainer with &#8211;no-home and &#8211;bind only the directories you need explicitly.<\/p>\n<p dir=\"auto\" data-sourcepos=\"106:1-106:251\">You can check if you have those modules contaminating your container\/environment by running <code>python -m pip list --user<\/code> inside the container\/environment and <strong>if you are sure you don&#8217;t need them<\/strong>, remove them with <code>python -m pip uninstall &lt;pkgname&gt;<\/code>.<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"108:1-108:50\"><a id=\"user-content-running-the-quspin-example-on-your-own-machine\" class=\"anchor\" href=\"#running-the-quspin-example-on-your-own-machine\" aria-hidden=\"true\"><\/a>Running the quspin example on your own machine<\/h3>\n<p dir=\"auto\" data-sourcepos=\"109:1-109:156\">You will not be using the slurm resource manager if you are following the examples on your own machine. In that case you can execute the container directly:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-20\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"110:1-112:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">.\/quspin.sif python example5.py<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"113:1-113:102\">The other commands used were running directly on the login node and should also work on your computer.<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"115:1-115:51\"><a id=\"user-content-python-container-with-pip-deep-learning-example\" class=\"anchor\" href=\"#python-container-with-pip-deep-learning-example\" aria-hidden=\"true\"><\/a>Python container with pip: Deep Learning Example<\/h2>\n<p dir=\"auto\" data-sourcepos=\"116:1-116:638\">In this example we will move the Python environment that is used for the <strong>A Beginner\u2019s Guide to Deep Learning with GPUs on Grete<\/strong> at <a href=\"https:\/\/info.gwdg.de\/news\/a-beginners-guide-to-deep-learning-with-gpus-on-grete\/\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/info.gwdg.de\/news\/a-beginners-guide-to-deep-learning-with-gpus-on-grete\/<\/a> into a container. Since <code>pip<\/code> is used to install the modules, we can use the official Python container and do not have to mix pip with conda (which can break your environments if you are not careful). Also it is always good to be less reliant on commercial software like Anaconda. We just need the Apptainer Definition File (.def) and a file with the modules we want pip to install which we will call <code>requirements.txt<\/code>.<\/p>\n<p dir=\"auto\" data-sourcepos=\"118:1-118:348\">In this case we take the <code>requirements.txt<\/code> from the Course gitlab repository: <a href=\"https:\/\/gitlab-ce.gwdg.de\/dmuelle3\/deep-learning-with-gpu-cores\/-\/blob\/main\/code\/requirements.txt\" class=\"external\" rel=\"nofollow\">https:\/\/gitlab-ce.gwdg.de\/dmuelle3\/deep-learning-with-gpu-cores\/-\/blob\/main\/code\/requirements.txt<\/a>. Note that we use Python version 3.8 to match what is used in the course. You can check out what other versions are currently available at <a href=\"https:\/\/hub.docker.com\/_\/python\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">https:\/\/hub.docker.com\/_\/python<\/a>.<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"120:1-120:22\"><a id=\"user-content-deep-learningdef\" class=\"anchor\" href=\"#deep-learningdef\" aria-hidden=\"true\"><\/a><code>deep-learning.def<\/code><\/h2>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-21\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"121:1-130:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">Bootstrap: docker<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">From: python:3.8<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC4\" class=\"line\" lang=\"plaintext\">%files<\/span>\r\n<span id=\"LC5\" class=\"line\" lang=\"plaintext\">    $PWD\/requirements.txt requirements.txt<\/span>\r\n<span id=\"LC6\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC7\" class=\"line\" lang=\"plaintext\">%post<\/span>\r\n<span id=\"LC8\" class=\"line\" lang=\"plaintext\">    pip install --root-user-action=ignore -r requirements.txt<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"131:1-131:30\">We run the following commands:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-22\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"132:1-135:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">wget https:\/\/gitlab-ce.gwdg.de\/dmuelle3\/deep-learning-with-gpu-cores\/-\/raw\/main\/code\/requirements.txt<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">apptainer build --nv deep-learing.sif deep-learning.def<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"137:1-137:85\">We can now test our container on the Grete cluster. We will use a simple test script:<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"139:1-139:31\"><a id=\"user-content-test-python-environmentpy\" class=\"anchor\" href=\"#test-python-environmentpy\" aria-hidden=\"true\"><\/a><code>test-python-environment.py<\/code><\/h2>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-23\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"140:1-159:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">import torch<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">import torch.nn as nn<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">import torch.optim as optim<\/span>\r\n<span id=\"LC4\" class=\"line\" lang=\"plaintext\">torch.utils.collect_env<\/span>\r\n<span id=\"LC5\" class=\"line\" lang=\"plaintext\">import sklearn<\/span>\r\n<span id=\"LC6\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC7\" class=\"line\" lang=\"plaintext\">print(sklearn.show_versions())<\/span>\r\n<span id=\"LC8\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC9\" class=\"line\" lang=\"plaintext\">print(torch.__config__.show())<\/span>\r\n<span id=\"LC10\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC11\" class=\"line\" lang=\"plaintext\">print(torch.utils.collect_env.get_pretty_env_info())<\/span>\r\n<span id=\"LC12\" class=\"line\" lang=\"plaintext\"><\/span>\r\n<span id=\"LC13\" class=\"line\" lang=\"plaintext\">if torch.cuda.is_available() and torch.cuda.device_count() &gt; 0:<\/span>\r\n<span id=\"LC14\" class=\"line\" lang=\"plaintext\">\tprint(\"Active CUDA device:\", <\/span>\r\n<span id=\"LC15\" class=\"line\" lang=\"plaintext\">          torch.cuda.get_device_name(torch.cuda.current_device()))<\/span>\r\n<span id=\"LC16\" class=\"line\" lang=\"plaintext\">else:<\/span>\r\n<span id=\"LC17\" class=\"line\" lang=\"plaintext\">        print(\"No active CUDA devices.\")<\/span>\r\n<span id=\"LC18\" class=\"line\" lang=\"plaintext\"><\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"160:1-160:52\">We start an interactive job on a grete compute node:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-24\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"161:1-163:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">$ salloc -t 01:00:00 -p grete:interactive -N1 -G V100:1<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"164:1-164:29\">And then run the test script:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-25\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"165:1-169:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">$ module load apptainer<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">$ apptainer run --bind $WORK,$TMPDIR --nv deep-learning.sif python test-python-environment.py <\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">$ exit<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"170:1-170:160\"><strong>NOTE:<\/strong> To get <em>NVIDIA CUDA<\/em> support the <code>--nv<\/code> option to the <code>apptainer run<\/code> is required. On <em>AMD ROCm<\/em> platforms this would instead be replaced by <code>--rocm<\/code>.<\/p>\n<p dir=\"auto\" data-sourcepos=\"172:1-172:39\"><strong>Example output on Grete (shortened)<\/strong><\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-26\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"173:1-191:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">login9:~\/deep-learning-test $ salloc -t 01:00:00 -p grete:interactive -N1 -G V100:1<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">[... snip ...]<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">salloc: Nodes ggpu02 are ready for job<\/span>\r\n<span id=\"LC4\" class=\"line\" lang=\"plaintext\">ggpu02:~\/deep-learning-test $ module load apptainer<\/span>\r\n<span id=\"LC5\" class=\"line\" lang=\"plaintext\">ggpu02:~\/deep-learning-test $ apptainer run --bind $WORK,$TMPDIR --nv deep-learning.sif python test-python-environment.py <\/span>\r\n<span id=\"LC6\" class=\"line\" lang=\"plaintext\">[... snip ...]<\/span>\r\n<span id=\"LC7\" class=\"line\" lang=\"plaintext\">Versions of relevant libraries:<\/span>\r\n<span id=\"LC8\" class=\"line\" lang=\"plaintext\">[pip3] numpy==1.24.2<\/span>\r\n<span id=\"LC9\" class=\"line\" lang=\"plaintext\">[pip3] torch==2.0.0<\/span>\r\n<span id=\"LC10\" class=\"line\" lang=\"plaintext\">[pip3] torchvision==0.15.1<\/span>\r\n<span id=\"LC11\" class=\"line\" lang=\"plaintext\">[conda] Could not collect<\/span>\r\n<span id=\"LC12\" class=\"line\" lang=\"plaintext\">Active CUDA device: Tesla V100S-PCIE-32GB<\/span>\r\n<span id=\"LC13\" class=\"line\" lang=\"plaintext\">ggpu02:~\/deep-learning-test $ exit<\/span>\r\n<span id=\"LC14\" class=\"line\" lang=\"plaintext\">exit<\/span>\r\n<span id=\"LC15\" class=\"line\" lang=\"plaintext\">salloc: Relinquishing job allocation 4812554<\/span>\r\n<span id=\"LC16\" class=\"line\" lang=\"plaintext\">salloc: Job allocation 4812554 has been revoked.<\/span>\r\n<span id=\"LC17\" class=\"line\" lang=\"plaintext\">glogin9:~\/deep-learning-test $ <\/span><\/code><\/pre>\n<\/div>\n<hr data-sourcepos=\"193:1-194:0\" \/>\n<p dir=\"auto\" data-sourcepos=\"195:1-195:220\">Similarly the submit scripts from the course can be changed to use apptainer instead of anaconda3. For example, if we put the container file at $WORK\/deep-learning.sif, the code in <code>submit_train.sh<\/code> could look like this:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-27\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"197:1-201:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\"># SBATCH ...<\/span>\r\n<span id=\"LC2\" class=\"line\" lang=\"plaintext\">module load apptainer<\/span>\r\n<span id=\"LC3\" class=\"line\" lang=\"plaintext\">apptainer run --nv --bind \/scratch $WORK\/deep-learning.sif python train.py<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"202:1-202:184\">Here we have bound the entire <code>\/scratch<\/code> directory so we have access to $WORK, $TMPDIR and also the data folder in <code>\/scratch\/projects\/workshops\/gpu-workshop<\/code> from inside the container.<\/p>\n<p dir=\"auto\" data-sourcepos=\"204:1-204:213\">To get the debug output that was previously provided by <code>python -m torch.utils.collect_env<\/code> in the submit script, you can for example add the following line to the <code>if __name__ == \"__main__\":<\/code> block in <code>train.py<\/code><\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-28\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"205:1-207:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">print(torch.utils.collect_env.get_pretty_env_info())<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"208:1-208:239\">A more advanced usage would be to create a shell script that is executed inside the container (<code>apptainer run ... run_train.sh<\/code>) that allows you to run e.g. multiple Python commands and emit extra debugging information about the container.<\/p>\n<h3 dir=\"auto\" data-sourcepos=\"210:1-210:57\"><a id=\"user-content-running-the-deep-learning-example-on-your-own-machine\" class=\"anchor\" href=\"#running-the-deep-learning-example-on-your-own-machine\" aria-hidden=\"true\"><\/a>Running the deep learning example on your own machine<\/h3>\n<p dir=\"auto\" data-sourcepos=\"211:1-211:156\">You will not be using the slurm resource manager if you are following the examples on your own machine. In that case you can execute the container directly:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-29\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"212:1-214:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">apptainer run --nv deep-learning.sif python test-python-environment.py<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"215:1-215:174\">If you don&#8217;t have a CUDA Stack you can also leave out the &#8211;nv flag. You should still see some output from PyTorch, even though the &#8222;Active CUDA device: &#8230;&#8220; will be missing.<\/p>\n<p dir=\"auto\" data-sourcepos=\"217:1-217:102\">The other commands used were running directly on the login node and should also work on your computer.<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"218:1-218:18\"><a id=\"user-content-using-a-sandbox\" class=\"anchor\" href=\"#using-a-sandbox\" aria-hidden=\"true\"><\/a>Using a sandbox<\/h2>\n<p dir=\"auto\" data-sourcepos=\"219:1-219:233\">While you are testing how to best build your container, you can skip the creation of the <code>.sif<\/code> file and instead create a directory that contains all the individual files inside the container. If we change the earlier QuSpin example:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-30\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"220:1-222:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">apptainer build --sandbox quspin quspin.def<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"223:1-223:119\">You can now run commands inside the container (for a sandbox this will also work on the login nodes of NHR@G\u00f6ttingen!)<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-31\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"224:1-226:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">apptainer shell quspin<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"228:1-228:131\">You can even make changes (for example to install software with apt, yum or pip) to the container if you use the following command:<\/p>\n<div class=\"gl-relative markdown-code-block js-markdown-code\">\n<pre id=\"code-32\" class=\"code highlight js-syntax-highlight language-plaintext white\" lang=\"plaintext\" data-sourcepos=\"229:1-231:3\"><code><span id=\"LC1\" class=\"line\" lang=\"plaintext\">apptainer shell --writable quspin<\/span><\/code><\/pre>\n<p>&nbsp;<\/p>\n<\/div>\n<p dir=\"auto\" data-sourcepos=\"233:1-233:282\">If you are satisfied with the container you should add all the necessary commands to the <code>.def<\/code> file and build a <code>.sif<\/code> file for use on the cluster as shown in the previous sections, since the sandbox has all the same disadvantages for the file system that Python environments have.<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"234:1-234:19\"><a id=\"user-content-other-containers\" class=\"anchor\" href=\"#other-containers\" aria-hidden=\"true\"><\/a>Other containers<\/h2>\n<p dir=\"auto\" data-sourcepos=\"235:1-235:389\">A few other interesting containers to start from are <a href=\"https:\/\/hub.docker.com\/r\/condaforge\/miniforge3\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">miniforge3 from conda-forge<\/a> with access to the large community driven package repository and <a href=\"https:\/\/hub.docker.com\/r\/nvidia\/cuda\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" class=\"external\">NVIDIA CUDA<\/a> with the latest version of the NVIDIA CUDA Deep Neural Network library. Please be careful about only using containers from reputable sources!<\/p>\n<p dir=\"auto\" data-sourcepos=\"237:1-237:53\">I wish you a fun and educational time with apptainer!<\/p>\n<h2 dir=\"auto\" data-sourcepos=\"237:1-237:53\">Author<\/h2>\n<p>Niklas B\u00f6lter<br \/>\nhttps:\/\/gitlab-ce.gwdg.de\/hpc-team-public\/science-domains-blog\/-\/blob\/main\/20230907_python-apptainer.md<\/p>\n<p>CUDA is a registered trademark of NVIDIA Corporation ROCm is a registered trademark of Advanced Micro Devices, Inc<\/p>\n<details>\n<summary>Appendix: Example output on Grete (full, Click to expand)<\/summary>\n<p>login9:~\/deep-learning-test $ salloc -t 01:00:00 -p grete:interactive -N1 -G V100:1<br \/>\nsalloc: Pending job allocation 4812554<br \/>\nsalloc: job 4812554 queued and waiting for resources<br \/>\nsalloc: job 4812554 has been allocated resources<br \/>\nsalloc: Granted job allocation 4812554<br \/>\nsalloc: Waiting for resource configuration<br \/>\nsalloc: Nodes ggpu02 are ready for job<br \/>\nggpu02:~\/deep-learning-test $ module load apptainer<br \/>\nggpu02:~\/deep-learning-test $ apptainer run &#8211;bind $WORK,$TMPDIR &#8211;nv deep-learning.sif python test-python-environment.py<br \/>\nINFO: fuse: warning: library too old, some operations may not work<br \/>\nINFO: underlay of \/usr\/bin\/nvidia-smi required more than 50 (517) bind mounts<\/p>\n<p>System:<br \/>\npython: 3.8.17 (default, Jul 28 2023, 06:03:56) [GCC 12.2.0]<br \/>\nexecutable: \/usr\/local\/bin\/python<br \/>\nmachine: Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.34<\/p>\n<p>Python dependencies:<br \/>\nsklearn: 1.2.2<br \/>\npip: 23.0.1<br \/>\nsetuptools: 65.6.3<br \/>\nnumpy: 1.24.2<br \/>\nscipy: 1.10.1<br \/>\nCython: None<br \/>\npandas: 1.5.3<br \/>\nmatplotlib: 3.7.1<br \/>\njoblib: 1.2.0<br \/>\nthreadpoolctl: 3.1.0<\/p>\n<p>Built with OpenMP: True<\/p>\n<p>threadpoolctl info:<br \/>\nuser_api: openmp<br \/>\ninternal_api: openmp<br \/>\nprefix: libgomp<br \/>\nfilepath: \/usr\/local\/lib\/python3.8\/site-packages\/torch\/lib\/libgomp-a34b3233.so.1<br \/>\nversion: None<br \/>\nnum_threads: 1<\/p>\n<p>user_api: blas<br \/>\ninternal_api: openblas<br \/>\nprefix: libopenblas<br \/>\nfilepath: \/usr\/local\/lib\/python3.8\/site-packages\/numpy.libs\/libopenblas64_p-r0-15028c96.3.21.so<br \/>\nversion: 0.3.21<br \/>\nthreading_layer: pthreads<br \/>\narchitecture: SkylakeX<br \/>\nnum_threads: 1<\/p>\n<p>user_api: openmp<br \/>\ninternal_api: openmp<br \/>\nprefix: libgomp<br \/>\nfilepath: \/usr\/local\/lib\/python3.8\/site-packages\/scikit_learn.libs\/libgomp-a34b3233.so.1.0.0<br \/>\nversion: None<br \/>\nnum_threads: 1<\/p>\n<p>user_api: blas<br \/>\ninternal_api: openblas<br \/>\nprefix: libopenblas<br \/>\nfilepath: \/usr\/local\/lib\/python3.8\/site-packages\/scipy.libs\/libopenblasp-r0-41284840.3.18.so<br \/>\nversion: 0.3.18<br \/>\nthreading_layer: pthreads<br \/>\narchitecture: SkylakeX<br \/>\nnum_threads: 1<br \/>\nNone<br \/>\nPyTorch built with:<br \/>\n&#8211; GCC 9.3<br \/>\n&#8211; C++ Version: 201703<br \/>\n&#8211; Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications<br \/>\n&#8211; Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)<br \/>\n&#8211; OpenMP 201511 (a.k.a. OpenMP 4.5)<br \/>\n&#8211; LAPACK is enabled (usually provided by MKL)<br \/>\n&#8211; NNPACK is enabled<br \/>\n&#8211; CPU capability usage: AVX2<br \/>\n&#8211; CUDA Runtime 11.7<br \/>\n&#8211; NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86<br \/>\n&#8211; CuDNN 8.5<br \/>\n&#8211; Magma 2.6.1<br \/>\n&#8211; Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=\/opt\/rh\/devtoolset-9\/root\/usr\/bin\/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,<\/p>\n<p>PyTorch version: 2.0.0+cu117<br \/>\nIs debug build: False<br \/>\nCUDA used to build PyTorch: 11.7<br \/>\nROCM used to build PyTorch: N\/A<\/p>\n<p>OS: Debian GNU\/Linux 12 (bookworm) (x86_64)<br \/>\nGCC version: (Debian 12.2.0-14) 12.2.0<br \/>\nClang version: Could not collect<br \/>\nCMake version: version 3.26.1<br \/>\nLibc version: glibc-2.36<\/p>\n<p>Python version: 3.8.17 (default, Jul 28 2023, 06:03:56) [GCC 12.2.0] (64-bit runtime)<br \/>\nPython platform: Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.34<br \/>\nIs CUDA available: True<br \/>\nCUDA runtime version: Could not collect<br \/>\nCUDA_MODULE_LOADING set to: LAZY<br \/>\nGPU models and configuration: GPU 0: Tesla V100S-PCIE-32GB<br \/>\nNvidia driver version: 530.30.02<br \/>\ncuDNN version: Could not collect<br \/>\nHIP runtime version: N\/A<br \/>\nMIOpen runtime version: N\/A<br \/>\nIs XNNPACK available: True<\/p>\n<p>CPU:<br \/>\nArchitecture: x86_64<br \/>\nCPU op-mode(s): 32-bit, 64-bit<br \/>\nAddress sizes: 46 bits physical, 48 bits virtual<br \/>\nByte Order: Little Endian<br \/>\nCPU(s): 80<br \/>\nOn-line CPU(s) list: 0-79<br \/>\nVendor ID: GenuineIntel<br \/>\nModel name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz<br \/>\nCPU family: 6<br \/>\nModel: 85<br \/>\nThread(s) per core: 2<br \/>\nCore(s) per socket: 20<br \/>\nSocket(s): 2<br \/>\nStepping: 7<br \/>\nCPU(s) scaling MHz: 38%<br \/>\nCPU max MHz: 3900.0000<br \/>\nCPU min MHz: 1000.0000<br \/>\nBogoMIPS: 5000.00<br \/>\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities<br \/>\nL1d cache: 1.3 MiB (40 instances)<br \/>\nL1i cache: 1.3 MiB (40 instances)<br \/>\nL2 cache: 40 MiB (40 instances)<br \/>\nL3 cache: 55 MiB (2 instances)<br \/>\nNUMA node(s): 2<br \/>\nNUMA node0 CPU(s): 0-19,40-59<br \/>\nNUMA node1 CPU(s): 20-39,60-79<br \/>\nVulnerability Itlb multihit: KVM: Mitigation: VMX unsupported<br \/>\nVulnerability L1tf: Not affected<br \/>\nVulnerability Mds: Not affected<br \/>\nVulnerability Meltdown: Not affected<br \/>\nVulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable<br \/>\nVulnerability Retbleed: Mitigation; Enhanced IBRS<br \/>\nVulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl<br \/>\nVulnerability Spectre v1: Mitigation; usercopy\/swapgs barriers and __user pointer sanitization<br \/>\nVulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence<br \/>\nVulnerability Srbds: Not affected<br \/>\nVulnerability Tsx async abort: Mitigation; TSX disabled<\/p>\n<p>Versions of relevant libraries:<br \/>\n[pip3] numpy==1.24.2<br \/>\n[pip3] torch==2.0.0<br \/>\n[pip3] torchvision==0.15.1<br \/>\n[conda] Could not collect<br \/>\nActive CUDA device: Tesla V100S-PCIE-32GB<br \/>\nggpu02:~\/deep-learning-test $ exit<br \/>\nexit<br \/>\nsalloc: Relinquishing job allocation 4812554<br \/>\nsalloc: Job allocation 4812554 has been revoked.<br \/>\nglogin9:~\/deep-learning-test $<\/p>\n<\/details>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to our High-Performance Computing (HPC) systems, efficiency is the name of the game. But are you managing your Python environments efficiently? Our HPC systems use network file systems that are optimized to access a small number of large files. This is particularly true for the parallel file systems that are used for &#8230; <a title=\"Using apptainer containers to manage your Python environments\" class=\"read-more\" href=\"https:\/\/info.gwdg.de\/news\/using-apptainer-containers-to-manage-your-python-environments\/\" aria-label=\"Mehr Informationen \u00fcber Using apptainer containers to manage your Python environments\">Weiterlesen<\/a><\/p>\n","protected":false},"author":166,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,129],"tags":[],"class_list":["post-23421","post","type-post","status-publish","format-standard","hentry","category-alle","category-wissenschaftliche-domaenen"],"_links":{"self":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/users\/166"}],"replies":[{"embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/comments?post=23421"}],"version-history":[{"count":17,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23421\/revisions"}],"predecessor-version":[{"id":23446,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23421\/revisions\/23446"}],"wp:attachment":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/media?parent=23421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/categories?post=23421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/tags?post=23421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}