2025-07-19 2025-07-19

Using PyTorch NGC Containers with Apptainer

Introduction

This post provides tips for using PyTorch NGC containers provided by NVIDIA with Apptainer.

PyTorch NGC containers come with NVIDIA libraries like CUDA and cuDNN pre-installed, making it easy to train models using GPUs. However, since they are provided as Docker containers, they need to be converted for Apptainer use in HPC environments. Additionally, the procedure for installing extra libraries differs slightly from using Docker.

This post focuses on these differences and explains how to use PyTorch NGC containers with Apptainer.

Prerequisites

This guide assumes Apptainer is installed in your HPC environment. If it is not, please ask your system administrator to install it.

Steps

1. Load Apptainer

While Apptainer may be loaded automatically in some cases, if you need to load it manually, run the following command. Note that this command may vary depending on your HPC environment, so adjust it as necessary.

$ module load Apptainer/<version>

2. Download the PyTorch NGC Container

$ apptainer pull docker://nvcr.io/nvidia/pytorch:<version>

Here, <version> specifies the desired PyTorch NGC version. You can check the available versions on the NGC Catalog page. For example, to use PyTorch 25.04-py3, you can run the following command:

$ apptainer pull docker://nvcr.io/nvidia/pytorch:25.04-py3

The converted container file will be in the format pytorch_<version>-py3.sif. For example, running the command above will generate a file named pytorch_25.04-py3.sif.

3. Move the Container to an Appropriate Directory

By default, the downloaded container file is saved in the current directory. I recommend moving it to a project directory or a specific folder according to your preference. In my case, I move it to a containers folder within my project directory as follows:

$ mkdir -p $PROJ_HOME/containers
$ mv pytorch_25.04-py3.sif $PROJ_HOME/containers

4. Run the Container

Next, execute the downloaded container. Specify the --nv option to enable GPU support. Also, use the --bind option to mount your project directory inside the container. This allows you to access project files from within the container.

$ apptainer exec \
    --bind $PROJ_HOME:$PROJ_HOME \
    --nv $PROJ_HOME/containers/pytorch_25.04-py3.sif \
    /bin/bash

In some HPC environments, home directories and Lustre-based file storage might be automatically mounted. In such cases, the --bind option is not necessary. Refer to your HPC documentation to confirm whether this is the case.

5. Install Additional Required Libraries

PyTorch NGC containers come with a basic PyTorch environment, but if you want to use libraries like Transformers or others, you will need to install them additionally. I recommend using a virtual environment for these extra installations. Here is an example of installing the Transformers library using pip:

$ python3 -m venv --system-site-packages $PROJ_HOME/envs/test
$ source $PROJ_HOME/envs/test/bin/activate
$ unset PIP_CONSTRAINT
$ pip install transformers

This command creates a virtual environment in envs/test within your project directory and installs the Transformers library there. The --system-site-packages option allows you to use system packages from within the container.

By unsetting the PIP_CONSTRAINT environment variable, you temporarily override the version lock set in the PyTorch NGC container. This is a key difference from Docker containers. While the version lock is configured in /etc/pip/constraint.txt, it might not be editable, which is why unsetting the environment variable is used.

Once the installation is complete, exit the container.

$ exit

6. Use the Container for Your Tasks

For Interactive Jobs

When running the container for an interactive job, you can use the same command shown in step 4:

$ apptainer exec \
    --bind $PROJ_HOME:$PROJ_HOME \
    --nv $PROJ_HOME/containers/pytorch_25.04-py3.sif \
    /bin/bash

If you need to set library environment variables within the container, refer to the following example. However, paths may vary depending on your environment. For the CUDA version used by the container, refer to the official support matrix. For 25.04-py3, CUDA 12.9 is used, so set it as follows:

export CUDA_HOME=/usr/local/cuda-12.9
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export TMPDIR=/tmp

For Job Scripts

When running the container using a job script, use the apptainer exec command as shown below. Here’s an example of a Slurm job script:

#!/bin/bash
#SBATCH --job-name=pytorch_job
#SBATCH --output=pytorch_job.out
#SBATCH --error=pytorch_job.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --partition=gpu

module load Apptainer/<version>
PROJ_HOME=/path/to/your/project
apptainer exec \
    --bind $PROJ_HOME:$PROJ_HOME \
    --nv $PROJ_HOME/containers/pytorch_25.04-py3.sif \
    python3 $PROJ_HOME/scripts/train.py

If you need to specify many command-line arguments, I recommend separating the shell script into an execution script (internal) and a Slurm job script (wrapper). Here is an example:

Execution Script
#!/bin/bash

source $PROJ_HOME/envs/test/bin/activate

export CUDA_HOME=/usr/local/cuda-12.9
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export TMPDIR=/tmp

python3 $PROJ_HOME/scripts/train.py "$@"
Slurm Job Script
#!/bin/bash
#SBATCH --job-name=pytorch_job
#SBATCH --output=pytorch_job.out
#SBATCH --error=pytorch_job.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --partition=gpu

module load Apptainer/<version>
PROJ_HOME=/path/to/your/project
apptainer exec \
    --bind $PROJ_HOME:$PROJ_HOME \
    --nv $PROJ_HOME/containers/pytorch_25.04-py3.sif \
    $PROJ_HOME/scripts/run.sh "$@"

By doing this, you can call the execution script from the Slurm job script to run your Python script.

Conclusion

This post explained how to use PyTorch NGC containers with Apptainer.

PyTorch NGC containers come with most of the necessary libraries like CUDA pre-installed, allowing you to quickly set up an experimental environment. If you have not used them before, give them a try!