Setup Nvidia GPU for Docker
Requirements
- Operating System
- Debian
- Ubuntu
- Nvidia GPU, 700 series or newer
- Docker
Getting Started
Update the OS
sudo apt update
sudo apt full-upgrade
Install Docker
Click here for a guide on installing Docker
Setup for Nvidia Drivers
Install required packages
sudo apt install linux-headers-$(uname -r) make pciutils wget libc-dev libc6-dev gcc g++
Blacklist the Nouveau drivers
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
Regenerate the kernel initramfs
sudo update-initramfs -u
Backup GRUB
sudo cp /etc/default/grub /etc/default/grub.bak
Edit GRUB
sudo nano /etc/default/grub
Find the line GRUB_CMDLINE_LINUX
and edit to to look like below
GRUB_CMDLINE_LINUX="quiet rd.driver.blacklist=grub.nouveau"
Save and exit with ctrl+x
Regenerate GRUB
sudo grub-mkconfig -o /boot/grub/grub.cfg
Restart to load the new grub.
Download Nvidia Drivers
Double check what GPU you have with lspci
lspci | grep VGA
The output might look like, in this case the GPU is a Nvidia Quadro T400, specifically the TU117GLM
01:00.0 VGA compatible controller: NVIDIA Corporation TU117GLM [Quadro T400 Mobile] (rev a1)
Go to the Nvidia driver site at, https://www.nvidia.com/download/index.aspx.
Use the form to find your GPU, make sure Operating System is set to Linux 64-bit
and Download Type is Production Branch
.
Click the Search
button. On the next page, take note of the Version
and Release Date
fields, then click the Download
button. You should be on the Download page, right click the Agree & Download
button and select Copy Link
. The contents of the copied text should like something like below, with the version at the end matching what you noted before.
https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run
Download the driver file to your linux machine using wget. If you copy the command below make sure to update the URL to the correct path!
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run
The output should look like below
--2023-12-11 23:46:32-- https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run
Resolving us.download.nvidia.com (us.download.nvidia.com)... 192.229.211.70, 2606:2800:21f:3aa:dcf:37b:1ed6:1fb
Connecting to us.download.nvidia.com (us.download.nvidia.com)|192.229.211.70|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 341737575 (326M) [application/octet-stream]
Saving to: ‘NVIDIA-Linux-x86_64-535.146.02.run’
NVIDIA-Linux-x86_64-535.146.02.run 100%[============================================================================>] 325.91M 73.9MB/s in 4.3s
2023-12-11 23:46:37 (75.2 MB/s) - ‘NVIDIA-Linux-x86_64-535.146.02.run’ saved [341737575/341737575]
Verify the file downloaded
ls -halt
You should see a file like below
-rw-r--r-- 1 user group 326M Dec 4 02:53 NVIDIA-Linux-x86_64-535.146.02.run
Make it executable, make sure to update the file name in the command below
chmod +x NVIDIA-Linux-x86_64-535.146.02.run
Use ls
to verify the changes.
Install Nvidia Driver
Run the installer
sudo ./NVIDIA-Linux-x86_64-535.146.02.run
Follow the prompts, if you don’t have a desktop environment or you’re running a headless system then don’t install any of the nvidia-xconfig or X.org. The 32-bit components are usually not required either.
Verify the driver installed
nvidia-smi
The output should look like, note the Driver Version
and CUDA Version
, make sure these are appropriate for the service using the GPU.
Tue Dec 12 01:01:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA T400 Off | 00000000:01:00.0 Off | N/A |
| 53% 48C P0 N/A / 31W | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Install Nvidia Container Toolkit
Add the Nvidia Container repos to apt
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Install the Nvidia Container Toolkit from apt
sudo apt update
sudo apt install nvidia-container-toolkit -y
Configure Docker to use the toolkit
sudo nvidia-ctk runtime configure --runtime=docker
Restart docker
sudo systemctl restart docker
Verify Docker can use the GPU
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
The result should look like, take note of the Driver Version
and CUDA Version
, it should match what you saw before.
Tue Dec 12 06:02:37 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA T400 Off | 00000000:01:00.0 Off | N/A |
| 53% 47C P0 N/A / 31W | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Docker Compose
The same command can be used with a Docker Compose file like below, make sure to update the timezone
version: '3'
services:
ubuntu:
environment:
- TZ=America/New_York
- NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
- NVIDIA_VISIBLE_DEVICES=all
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: - gpu
image: ubuntu
command: nvidia-smi
A minimal Docker Compose is also possible
version: '3'
services:
ubuntu:
environment:
- TZ=America/New_York
- NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
- NVIDIA_VISIBLE_DEVICES=all
runtime: nvidia
image: ubuntu
command: nvidia-smi
Put the contents of either code block into a docker-compose.yaml
file and run it with the command below
docker compose up
Watch the log output for any errors.