From 6a9e1847d7db3c2145fab60a60b886054811cc13 Mon Sep 17 00:00:00 2001 From: Brandon Rozek Date: Sat, 29 Mar 2025 10:52:39 -0400 Subject: [PATCH] New post --- content/blog/ollama-cuda-podman-quadlets.md | 153 ++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 content/blog/ollama-cuda-podman-quadlets.md diff --git a/content/blog/ollama-cuda-podman-quadlets.md b/content/blog/ollama-cuda-podman-quadlets.md new file mode 100644 index 0000000..cce15c7 --- /dev/null +++ b/content/blog/ollama-cuda-podman-quadlets.md @@ -0,0 +1,153 @@ +--- +title: "Setting up Ollama with CUDA on Podman Quadlets" +date: 2025-03-29T09:59:55-04:00 +draft: false +tags: [] +math: false +medium_enabled: false +--- + +[Open WebUI](https://www.openwebui.com/) provides a nice chat interface for interacting with LLMs over Ollama and OpenAI compatible APIs. Using [Ollama](https://ollama.com/), we can self-host many different LLMs that are open-sourced! This post documents the steps that I took in order to get Ollama working with CUDA using my Podman setup. However given how fast Machine Learning projects iterate, I wouldn't be surprised if these exact steps no longer work. In that case, I'll provide links to the official documentation which hopefully can help. + +I'll assume that you have the NVIDIA driver installed on your machine. The steps vary by OS/distribution and how modern of a driver you want, but I generally recommend to stick with what's packaged in your distribution's repository. This is to minimize headaches... + +With that, our first step is to install the `nvidia-container-toolkit`. This package contains a collection of libraries and scripts to help us run our GPU inside a container. + +```bash +sudo dnf install nvidia-container-toolkit +``` + +As of this time of writing, instructions for installing the toolkit can be found on [NVIDIA's website](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). + +We can use this toolkit to generate a Common Device Interface (CDI) file which Podman will use to talk to the GPU. + +```bash +sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml +``` + +**Note:** Every time you update your NVIDIA driver, you'll have to run this command. + +NVIDIA also documents the steps for configuring CDI on [their website](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#running-a-workload-with-cdi). + +From here, we should make sure that the NVIDIA toolkit found the appropriate GPU(s) and has set up their CDI. + +```bash +nvidia-ctk cdi list +``` + +I only have one GPU on my machine, so it outputs something like the following: + +``` +INFO[0000] Found 3 CDI devices +nvidia.com/gpu=0 +nvidia.com/gpu=GPU-52785a8a-f8ca-99b9-0312-01a1f59e789b +nvidia.com/gpu=all +``` + +If you want your container to be able to access all the GPUs, we can use the `nvidia.com/gpu=all` device interface. Otherwise, we can use a specific one. + +Then, we restart Podman so that the CDI files are loaded. + +```bash +sudo systemctl restart podman +``` + +For our first test, we'll make sure that the container can appropriately access the GPU by running the `nvidia-smi` command. + +```bash +sudo podman run --rm \ + --device nvidia.com/gpu=all \ + docker.io/nvidia/cuda:11.0.3-base-ubuntu20.04 \ + nvidia-smi +``` + +For my GPU it outputs: + +``` ++-----------------------------------------------------------------------------------------+ +| NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 | +|-----------------------------------------+------------------------+----------------------+ +| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|=========================================+========================+======================| +| 0 NVIDIA GeForce RTX 3060 Off | 00000000:02:00.0 On | N/A | +| 0% 50C P8 19W / 170W | 1546MiB / 12288MiB | 0% Default | +| | | N/A | ++-----------------------------------------+------------------------+----------------------+ + ++-----------------------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=========================================================================================| ++-----------------------------------------------------------------------------------------+ + +``` + +Now we are ready to set up Ollama! To save time when running our `systemd` commands, let's pull the image ahead of time. + +```bash +sudo podman pull docker.io/ollama/ollama +``` + +We'll have to save the models somewhere, so in this example we'll save them to `/opt/ollama`. + +```bash +sudo mkdir /opt/ollama +``` + +Let's configure the Quadlet. Save the following to `/etc/containers/systemd/ollama.container`: + +```ini +[Container] +ContainerName=ollama +HostName=ollama +Image=docker.io/ollama/ollama +AutoUpdate=registry +Volume=/opt/ollama:/root/.ollama +PublishPort=11434:11434 +AddDevice=nvidia.com/gpu=all + +[Unit] + +[Service] +Restart=always + +[Install] +WantedBy=default.target +``` + +This file specifies the flags that we pass to the podman command: + +- Publish the port 11434: This is the port we'll use when sending messages to Ollama from Open WebUI. Of course you're welcome to use other networking tricks to pull that off. +- Mount the folder `/opt/ollama` on the filesystem to `/root/.ollama` within the container: We don't want to have to re-download the LLM models each time! + +For the moment of truth, let's start it! + +```bash +sudo systemctl start ollama +``` + +I won't show in this post how to configure Open WebUI, but we can make sure that everything is working by looking at the Ollama container itself. + +```bash +sudo podman exec -it ollama /bin/bash +``` + +We'll perform a test with a smaller model (1.2 GB): + +```bash +ollama run llama3.2:1b +``` + +Depending on your Internet connection, this will take a couple minutes to download and load onto the GPU. + +When it's done the prompt will be replaced with: + +``` +>>> +``` + +From here you can chat with the LLM! +