Running PyTorch on GPUs – DZone

To run an AI workload on a GPU machine, kernel drivers and user space libraries from GPU vendors like AMD and NVIDIA need to be installed. Once the driver and software are installed, using AI frameworks like PyTorch and TensorFlow requires using the appropriate framework built against the GPU target. Most of the time, the AI applications are executed on popular AI frameworks and as such hide the tedious installation steps. This article highlights the importance of hardware, driver, software, and frameworks for running AI applications or workloads.

This article is about Linux operating system, ROCm software stack for AMD GPU, CUDA software stack for NVIDIA GPU, and PyTorch for AI frameworks. Docker plays a crucial role in booting the entire stack, allowing different workloads to be launched in parallel.

The above diagram shows the AI software stack on a node with 8 AMD GPUs.

The hardware layer consists of a node with the usual CPU, memory, etc. + the GPU devices. A node can have one GPU device. Larger AI models require a lot of GPU memory to load, so it is common to use more than one GPU in a node. The GPUs are interconnected via XGMI and NVLink. A cluster has multiple such nodes and GPUs on one node can communicate with GPUs on another node. This connection is typically via InfiniBand, Ethernet/ROCe. The GPU connection to be used depends on the underlying GPU hardware.

Kernel driver installation

At the software layer, the AMD GPU driver or NVIDIA GPU driver needs to be installed. It is not uncommon to install the entire ROCm or CUDA software package on the native host OS, including the kernel driver. Since we will be using a Docker container to launch the AI workload, the user-space ROCm or CUDA software is redundant on the native host OS; but this allows us to test whether the underlying kernel driver is working properly or not via the user-space tools.

Launch ROCm or CUDA based Docker container

Once the GPU drivers are installed, ROCm or CUDA-based Docker images can be used for AMD and NVIDIA GPU nodes respectively.

Different Linux flavor Docker images are released periodically by AMD and NVIDIA. This is one of the advantages of Dockerized applications instead of running applications on a native OS. We can have Ubuntu 22.04 host OS with GPU drivers installed and then launch Centos, Ubuntu 20.04 based Docker containers with different ROCm versions in parallel.

Launch of ROCm-based Docker container

ROCm Docker images are available here . Check here for dev-ubuntu-22.04.

docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/dev-ubuntu-22.04

The above command maps all GPU devices to the container. You can also target specific GPUs (more info at “Running ROCm Docker Containers”).

Check if the GPUs are listed once the container is running.

You can download the PyTorch code and build it for AMD GPU. More instructions on GitHub or you can run any workload that has ROCm support.

Launch of ROCm-based PyTorch Docker container

If PyTorch does not need to be built from source (in most cases there is no need to build PyTorch from source), one can directly download the ROCm-based PyTorch Docker image. Just make sure the ROcm kernel drivers are installed, and then start the PyTorch-based containers.

PyTorch with ROCm support Docker images can be found here.

docker run -it --rm --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined rocm/pytorch

Once the container is running, verify that the GPUs are rendered as described earlier.

Let’s try out some code snippets from PyTorch framework to check GPUs, ROCm/hip version, etc.

root@node:/var/lib/jenkins# python3
Python 3.10.14 (main, Mar 21 2024, 16:24:04) (GCC 11.2.0) on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.1.2+git70dfd51'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
8
>>> torch.version.hip
'6.1.40091-a8dbc0c19'

Conclusion

In conclusion, this paper highlights the importance of software stack compatibility with the underlying GPU hardware. Incorrect selection of software stack on a particular GPU type may lead to the use of the default device (i.e. CPU), thereby underutilizing the GPU’s computational power.

Have fun with GPU programming!