Run Ollama with NVIDIA GPU Acceleration on Proxmox LXC (2026)

April 28, 2026Tutorial~17 min read

This content is a preview and may change or be removed before publication.

intermediate~30 min

Prerequisites

•Proxmox VE 8.1 or later with IOMMU enabled
•An NVIDIA GPU physically installed in the host

Tools

•SSH terminal
•Web browser

Software

•debian — 13
•ollama — 0.22.0
•proxmox-ve — 9.1
•nvidia-driver — 595.58.03

▶ Watch on YouTube

Running large language models locally is one of the best things you can do with a homelab. No API keys, no usage fees, no data leaving your network. The bottleneck is speed — running inference on a CPU is painfully slow, but a mid-range NVIDIA GPU turns a 30-second response into a 2-second one.

This guide walks through the full stack: installing the NVIDIA driver on a Proxmox host, passing the GPU through to an unprivileged LXC container using Proxmox VE's modern dev0: device passthrough syntax, and installing Ollama inside the container for GPU-accelerated inference. By the end, you will have a dedicated Ollama LXC that any other service on your network can call for fast, local AI.

We are using the dev0: syntax introduced in Proxmox VE 8.1, which replaces the old lxc.cgroup2.devices.allow and lxc.mount.entry approach that you will see in older guides. The new syntax is cleaner, handles both cgroup permissions and device bind mounts in a single config line, and works with unprivileged containers out of the box.

What You Will End Up With

NVIDIA driver 595.58.03 installed on the Proxmox host with DKMS (survives kernel updates)
A dedicated Debian 13 LXC with full GPU access via dev0: passthrough
Ollama serving models on the local network with GPU acceleration
A test inference confirming everything works end-to-end

For this guide, the Ollama LXC takes CT ID 105 and IP 10.1.20.105. The GPU is an NVIDIA GeForce RTX 3060 12GB. Substitute your own values.

NOTE

About 25 to 30 minutes of hands-on time. The NVIDIA driver compilation is the longest wait.

Step 1: Identify Your GPU on the Proxmox Host

On the Proxmox host, check which NVIDIA GPU is installed and what bus address it is on:

List NVIDIA PCI devices

lspci | grep -i nvidia

You will see output like:

Output:
09:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate]
09:00.1 Audio device: NVIDIA Corporation GA106 High Definition Audio Controller

Note the bus address — 09:00 in this example. Modern NVIDIA cards are multi-function devices: the VGA controller at .0, the audio device at .1, and on Turing+ cards, sometimes USB and serial bus controllers at .2 and .3. All of these belong to the GPU and will be passed through together.

If you have multiple NVIDIA GPUs and want to pass only one to the LXC (keeping others for VM passthrough or other use), note which bus address belongs to which card.

Step 2: Blacklist Nouveau and Reboot

The open-source nouveau driver will try to claim the GPU before the proprietary NVIDIA driver can. We need to blacklist it. We also blacklist nvidiafb since we do not need framebuffer support on a headless server.

Create the blacklist file

nano /etc/modprobe.d/blacklist-nvidia.conf

Paste the following, then save with Ctrl+X, Y, Enter:

/etc/modprobe.d/blacklist-nvidia.conf

blacklist nouveau
blacklist nvidiafb

If you have a single NVIDIA GPU and want it available for LXC passthrough, skip the next section and jump straight to the initramfs rebuild below.

Multi-GPU: Selective VFIO Binding (Optional)

If you have multiple NVIDIA GPUs and want one dedicated to VM passthrough (bound to vfio-pci) while the other uses the NVIDIA driver for LXC passthrough, you need a udev rule that binds by bus address rather than by vendor ID.

First, make sure the vfio modules load at boot:

Create the VFIO modules file

nano /etc/modules-load.d/vfio.conf

Paste the following, then save with Ctrl+X, Y, Enter:

/etc/modules-load.d/vfio.conf

vfio
vfio_iommu_type1
vfio_pci

Now create a udev rule that only binds the VM slot to vfio-pci. Replace 0000:41:00 with the bus address of the GPU you want reserved for VMs:

Create the selective VFIO udev rule

nano /etc/udev/rules.d/10-vfio-nvidia.rules

Paste the following, then save with Ctrl+X, Y, Enter:

/etc/udev/rules.d/10-vfio-nvidia.rules

SUBSYSTEM=="pci", KERNEL=="0000:41:00.*", ATTR{driver_override}="vfio-pci"

Flag	What it does
`SUBSYSTEM=="pci"`	Only matches PCI devices. Keep.
`KERNEL=="0000:41:00.*"`	Matches all functions at bus address 41:00 (VGA, audio, USB, etc.). Replace with the bus address of the GPU you want bound to vfio-pci for VM passthrough.
`ATTR{driver_override}="vfio-pci"`	Forces vfio-pci as the driver. Keep.

The wildcard .* at the end catches all functions on that bus address. The GPU at the other bus address (our LXC GPU) is not matched, so the NVIDIA driver claims it instead.

Rebuild the Initramfs and Reboot

WARNING

This step is critical. The blacklist (and udev rules, if you created them) are baked into the initramfs at build time. If you skip this, the old initramfs loads at boot with nouveau still active, and the NVIDIA driver installer in the next step will fail. This is the single most common failure in GPU passthrough setups.

Rebuild the initramfs

update-initramfs -u -k $(uname -r)

Now reboot so the kernel comes up with nouveau blocked:

Reboot the host

reboot

After the host comes back up, SSH in and verify nouveau is no longer loaded:

Verify nouveau is not loaded

lsmod | grep nouveau

No output means nouveau is successfully blocked. If you still see it listed, double-check that /etc/modprobe.d/blacklist-nvidia.conf exists and contains the blacklist nouveau line, rebuild the initramfs again, and reboot.

Step 3: Install the NVIDIA Driver on the Host

The NVIDIA driver needs kernel headers to compile its modules. Install them along with build-essential for the compiler toolchain:

Install kernel headers and build tools

apt-get install -y proxmox-headers-$(uname -r) build-essential dkms

Flag	What it does
`proxmox-headers-$(uname -r)`	Kernel headers matching your running PVE kernel. The `$(uname -r)` automatically fills in the correct version. Keep.
`build-essential`	C compiler and make. Needed for NVIDIA module compilation. Keep.
`dkms`	Dynamic Kernel Module Support. Automatically recompiles the NVIDIA modules when the kernel is updated. Keep.

Download the NVIDIA driver. We use the 595.58.03 production branch — the latest Linux driver as of April 2026:

Download the NVIDIA driver

wget -P /root https://us.download.nvidia.com/XFree86/Linux-x86_64/595.58.03/NVIDIA-Linux-x86_64-595.58.03.run

Make it executable and run the installer:

Make the installer executable

chmod +x /root/NVIDIA-Linux-x86_64-595.58.03.run

Run the NVIDIA installer

bash /root/NVIDIA-Linux-x86_64-595.58.03.run --dkms --silent

Flag	What it does
`--dkms`	Registers the modules with DKMS so they automatically rebuild on kernel updates. Keep.
`--silent`	Runs without the ncurses UI. On a headless Proxmox host there is no display server, so the installer's questions about X libraries are irrelevant. Keep.

This takes 2-5 minutes while it compiles the kernel modules. You will see a few warnings about X library paths and 32-bit compatibility libraries — both are harmless on a headless server.

Verify the driver is loaded and sees your GPU:

Verify the NVIDIA driver

nvidia-smi

You should see a table showing your GPU, the driver version (595.58.03), CUDA version, temperature, and memory usage. If you get command not found or No devices were found, the most likely causes are:

Nouveau still loaded — run lsmod | grep nouveau. If it appears, the blacklist did not take effect. Go back to Step 2 and check that /etc/modprobe.d/blacklist-nvidia.conf exists, rebuild the initramfs, and reboot.
GPU bound to vfio-pci — run lspci -nnks <your-bus-address> and check Kernel driver in use. If it says vfio-pci instead of nvidia, your udev rule is matching the wrong GPU. Check the bus address in your rule.

Verify DKMS registered the modules:

Check DKMS status

dkms status

You should see nvidia/595.58.03 listed as installed.

Step 4: Create the Ollama LXC with GPU Passthrough

On the Proxmox host, create the container:

Create the Ollama LXC

pct create 105 local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst \
    --hostname ollama \
    --cores 4 \
    --memory 8192 \
    --swap 2048 \
    --rootfs local-lvm:20 \
    --net0 name=eth0,bridge=vmbr0,ip=10.1.20.105/24,gw=10.1.20.1 \
    --unprivileged 1 \
    --features nesting=1 \
    --onboot 1 \
    --password

Flag	What it does
`105` (positional)	Container VMID. Replace with the next free ID on your host.
`local:vztmpl/debian-13-standard_13.1-2_amd64.tar.zst`	OS template. Keep. If missing, run `pveam update && pveam download local debian-13-standard_13.1-2_amd64.tar.zst`.
`--hostname ollama`	Container hostname. Keep or rename to taste.
`--cores 4`	CPU cores. Keep — Ollama uses CPU for prompt processing even with GPU offload.
`--memory 8192`	RAM in MB. Keep — Ollama loads model metadata into system RAM alongside the GPU VRAM.
`--swap 2048`	Swap in MB. Keep — safety net for larger models.
`--rootfs local-lvm:20`	20 GB root filesystem. Replace `local-lvm` if your storage pool is named differently. Models can be large (4-8 GB each), so 20 GB gives room for a few.
`--net0 name=eth0,bridge=vmbr0,ip=10.1.20.105/24,gw=10.1.20.1`	Network attachment. Replace `bridge`, `ip`, and `gw` with your environment's values.
`--unprivileged 1`	Unprivileged container. Keep — the `dev0:` syntax works with unprivileged containers.
`--features nesting=1`	Enables systemd cgroup management. Keep.
`--onboot 1`	Auto-start on host boot. Keep.
`--password`	Prompts for a root password. Keep.

Now add the GPU device passthrough lines. These use PVE's dev0: syntax to pass the NVIDIA device nodes into the container:

Add GPU passthrough to the container config

nano /etc/pve/lxc/105.conf

Add the following lines at the end of the file, then save with Ctrl+X, Y, Enter:

/etc/pve/lxc/105.conf (append)

dev0: /dev/nvidia0,gid=44
dev1: /dev/nvidiactl,gid=44
dev2: /dev/nvidia-uvm,gid=44
dev3: /dev/nvidia-uvm-tools,gid=44
dev4: /dev/nvidia-caps/nvidia-cap1,gid=44
dev5: /dev/nvidia-caps/nvidia-cap2,gid=44

Line	What it does
`/dev/nvidia0`	The primary GPU device. If you have multiple GPUs on the NVIDIA driver, `nvidia0` is the first one. Keep.
`/dev/nvidiactl`	NVIDIA control device — shared across all GPUs. Keep.
`/dev/nvidia-uvm`	Unified Virtual Memory — required for CUDA compute workloads like LLM inference. Keep.
`/dev/nvidia-uvm-tools`	UVM tools interface. Keep.
`/dev/nvidia-caps/nvidia-cap1`, `nvidia-cap2`	NVIDIA capability devices. Keep.
`gid=44`	Sets the group inside the container to GID 44 (the `video` group on Debian). This lets the `ollama` user access the devices without running as root. Keep.

Start the container:

Start the container

pct start 105

Verify the GPU devices are visible inside the container:

Check GPU devices inside the container

pct exec 105 -- ls -la /dev/nvidia*

You should see all six device nodes listed with the video group.

Step 5: Install the NVIDIA Userspace Libraries in the Container

The container shares the host's kernel (and its NVIDIA kernel modules), but it needs its own copy of the userspace libraries — nvidia-smi, libcuda.so, and friends. We use the same .run installer with --no-kernel-module so it only installs the userspace components.

On the Proxmox host, copy the installer into the container:

Copy the NVIDIA installer into the container

pct push 105 /root/NVIDIA-Linux-x86_64-595.58.03.run /root/NVIDIA-Linux-x86_64-595.58.03.run --perms 755

Now enter the container:

Enter the container

pct enter 105

The minimal Debian 13 template ships without a generated locale, which causes noisy Perl warnings on every package operation. Fix that before doing anything else:

Generate the en_US.UTF-8 locale

sed -i 's/^# en_US.UTF-8/en_US.UTF-8/' /etc/locale.gen && locale-gen

Now install the NVIDIA userspace libraries:

Install NVIDIA userspace libraries

bash /root/NVIDIA-Linux-x86_64-595.58.03.run --no-kernel-module --silent

Flag	What it does
`--no-kernel-module`	Skips kernel module compilation entirely. The container uses the host's kernel modules via the passed-through devices. Keep.
`--silent`	No UI prompts. Keep.

Verify the driver sees the GPU from inside the container:

Verify GPU access

nvidia-smi

You should see the same GPU table as on the host — same driver version, same GPU model, same VRAM. If you get Failed to initialize NVML: Unknown Error, the device nodes are not correctly passed through. Go back to Step 4 and check the dev0: lines in the container config.

Clean up the installer to save disk space:

Remove the installer

rm /root/NVIDIA-Linux-x86_64-595.58.03.run

TIP

The NVIDIA driver version inside the container must match the host exactly. If you update the host driver later, you need to re-run this step inside the container with the matching new version.

Step 6: Install Ollama

Ollama provides a one-line install script (curl -fsSL https://ollama.com/install.sh | bash), but piping a remote script to bash means trusting whatever that URL serves at the moment you run it. For a service that will have GPU access and run inference on your network, it is worth doing the install manually. It only takes a few extra commands, and you will understand exactly what is on your system.

Install curl and zstd — both are needed to download and extract the Ollama release archive, and neither is in the minimal Debian template:

Install dependencies

apt update && apt install -y curl zstd

Download the official Ollama release tarball and extract it into /usr. This places the ollama binary at /usr/bin/ollama and its shared libraries under /usr/lib/ollama:

Download and extract Ollama

curl -fsSL https://ollama.com/download/ollama-linux-amd64.tar.zst | tar x --zstd -C /usr

Verify the binary is in place:

Verify Ollama binary

ollama --version

Now create a dedicated system user to run the Ollama service. Running it as its own user rather than root is standard practice — it limits what the process can access if it is ever compromised:

Create the ollama system user

useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama

Flag	What it does
`-r`	System account (no aging, low UID). Keep.
`-s /bin/false`	No login shell — this user only runs the service. Keep.
`-U`	Creates a matching `ollama` group. Keep.
`-m -d /usr/share/ollama`	Creates the home directory at `/usr/share/ollama`, where Ollama stores downloaded models. Keep.

Add the ollama user to the video group so it can access the GPU devices we passed through in Step 4:

Grant GPU access to the ollama user

usermod -a -G video ollama

Create the systemd service file:

Create the Ollama service file

nano /etc/systemd/system/ollama.service

Paste the following, then save with Ctrl+X, Y, Enter:

/etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target
 
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
 
[Install]
WantedBy=multi-user.target

Enable and start the service:

Enable and start Ollama

systemctl daemon-reload && systemctl enable ollama && systemctl start ollama

Verify Ollama is running:

Check Ollama service status

systemctl status ollama --no-pager

You should see Active: active (running). Check the logs to confirm GPU detection:

Check GPU detection in logs

journalctl -u ollama --no-pager | grep -i "inference compute"

You should see a line mentioning your GPU model and VRAM — something like NVIDIA GeForce RTX 3060 with 12.0 GiB. If you see no GPU line, restart the service with systemctl restart ollama and check again.

Step 7: Configure Ollama for Network Access

By default, Ollama only listens on localhost:11434. If you want other containers or machines on your network to use this Ollama instance (for example, Paperless-GPT running on a different LXC), you need to change the listen address.

Create a systemd override

mkdir -p /etc/systemd/system/ollama.service.d

Open the override file

nano /etc/systemd/system/ollama.service.d/override.conf

Paste the following, then save with Ctrl+X, Y, Enter:

/etc/systemd/system/ollama.service.d/override.conf

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload systemd and restart Ollama

systemctl daemon-reload && systemctl restart ollama

Ollama now listens on all interfaces. Any machine on your LAN can reach it at http://10.1.20.105:11434.

WARNING

There is no authentication on the Ollama API. Anyone who can reach port 11434 can use your GPU for inference. On a home network behind a firewall this is fine. On a shared or public network, restrict access with firewall rules.

Step 8: Pull a Model and Test Inference

Pull a small model first to verify everything works before committing to a multi-gigabyte download:

Pull a test model

ollama pull llama3.2:1b

This downloads the 1.3 GB Llama 3.2 1B model. Once it finishes, test inference:

Test inference

ollama run llama3.2:1b "What is Proxmox? Reply in two sentences."

You should get a coherent response in a few seconds. If this takes 30+ seconds, the GPU is not being used — see the verification step below.

For actual workloads like document classification or coding assistance, pull a larger model:

Pull an 8B model for production use

ollama pull llama3.1:8b

This downloads about 4.9 GB. The 8B parameter model fits entirely in the 12 GB of VRAM on an RTX 3060, with room to spare for context.

Step 9: Verify GPU Acceleration

After running any prompt, check that the model is loaded into VRAM (not system RAM):

Check where the model is loaded

curl -s http://localhost:11434/api/ps | python3 -m json.tool

Look for the size_vram field. If it is equal to (or very close to) the size field, the entire model is in GPU VRAM. If size_vram is 0, the model is running on CPU only.

You can also watch GPU utilization during inference in a second terminal:

Watch GPU utilization

watch -n 1 nvidia-smi

During inference, you should see GPU-Util spike to 50-100% and Memory-Usage increase by the model size.

If the model is not using the GPU:

Check that nvidia-smi works inside the container — if it does not, the device passthrough is broken.
Check Ollama's startup log — run journalctl -u ollama --no-pager | head -30 and look for lines about GPU discovery. You should see your GPU model listed.
Restart Ollama — sometimes the GPU is not detected on first boot if the NVIDIA driver loaded after Ollama started. Run systemctl restart ollama and try again.

Standard Homelab Wiring (Optional Bonus)

These steps wire the Ollama LXC into the homelab stack — local DNS, HTTPS via Caddy, backups, and monitoring. Ollama is an API service with no built-in authentication, so the Caddy reverse proxy does not add security here. I set it up purely for uniformity — every service in the homelab gets a clean https://name.hake.rodeo URL, and I prefer consistency over having one oddball http://ip:port endpoint. Skip this section entirely if you do not run these services, or cherry-pick the parts that apply.

Pi-hole Local DNS Record

Open your Pi-hole admin UI and click Local DNS Records in the left sidebar.

Domain: ollama.hake.rodeo
IP Address: 10.1.20.101 (your Caddy LXC, not the Ollama LXC)

Click Add. Verify:

Verify DNS resolution

dig +short ollama.hake.rodeo @10.1.20.100

You should see 10.1.20.101.

Caddy Reverse Proxy

On the Proxmox host, enter the Caddy container:

Enter the Caddy container

pct enter 101

Open the Caddyfile

nano /etc/caddy/Caddyfile

Add the following block at the end of the file:

/etc/caddy/Caddyfile (append)

ollama.hake.rodeo {
    tls {
        dns cloudflare {env.CLOUDFLARE_API_TOKEN}
    }
    reverse_proxy 10.1.20.105:11434
}

Save with Ctrl+X, Y, Enter, then reload:

Reload Caddy

systemctl reload caddy

Verify with a quick API call:

Test the reverse proxy

curl -s https://ollama.hake.rodeo/api/tags

You should see a JSON response listing your pulled models. Other services on your network can now reach Ollama at https://ollama.hake.rodeo instead of http://10.1.20.105:11434 — both work, use whichever you prefer.

PBS Backup Job

In the Proxmox web UI, go to Datacenter > Backup, double-click your existing backup job, and tick 105 (ollama) in the VM selection list. Save.

Ollama model files are large — a single 8B model is around 5 GB, and you will probably accumulate several. These files are freely re-downloadable with ollama pull, so backing them up wastes storage and slows down every backup run. PBS supports a file called .pxarexclude that works like .gitignore — place it inside the container, and PBS will skip the matched paths during backup.

On the Proxmox host, enter the Ollama container:

Enter the Ollama container

pct enter 105

Create a .pxarexclude file at the filesystem root:

Create the PBS exclusion file

nano /.pxarexclude

Paste the following, then save with Ctrl+X, Y, Enter:

/.pxarexclude

/usr/share/ollama/.ollama/models/

This tells PBS to skip the entire models directory during backup. Your Ollama configuration, service files, and systemd overrides are all still backed up — only the large, re-downloadable model blobs are excluded.

After a restore, just re-pull your models:

Re-pull models after a restore

ollama pull llama3.2:1b
ollama pull llama3.1:8b

TIP

Run ollama list before a planned rebuild to note which models you have installed. The config is tiny, but the model names are easy to forget.

Exit the container when done:

Exit the container

exit

Uptime Kuma Monitors

Open Uptime Kuma and add two monitors:

Monitor 1 — Ping:

Monitor Type: Ping
Friendly Name: Ollama LXC
Hostname: 10.1.20.105
Heartbeat Interval: 60 seconds

Monitor 2 — HTTP:

Monitor Type: HTTP(s)
Friendly Name: Ollama API
URL: https://ollama.hake.rodeo/api/tags
Heartbeat Interval: 60 seconds
Expected Status Code: 200

Next Steps

Other services. Any container or VM on your network can now call https://ollama.hake.rodeo (or http://10.1.20.105:11434 directly) for AI inference. Paperless-GPT, Open WebUI, LangChain apps, custom scripts — anything that supports the Ollama API.
More models. Run ollama list to see what you have, and browse ollama.com/library for the full model catalog. For a 12 GB GPU, stick to models at or below 8B parameters for fast inference, or try 13B-14B quantized models if you do not mind slightly slower generation.
Persistent storage. Models are stored in /usr/share/ollama/.ollama/models/. If you plan to pull many models, consider mounting a larger storage volume to this path.
Vision models. Models like llava and minicpm-v can process images alongside text. Useful for OCR replacement or image analysis pipelines.
GPU monitoring. Run nvidia-smi any time you want to check temperature, power draw, and VRAM usage. NVIDIA GPUs throttle at high temperatures — if your homelab runs warm, keep an eye on the thermal column.

Run Ollama with NVIDIA GPU Acceleration on Proxmox LXC (2026)

Prerequisites

Tools

Software

What You Will End Up With

Step 1: Identify Your GPU on the Proxmox Host

Step 2: Blacklist Nouveau and Reboot

Multi-GPU: Selective VFIO Binding (Optional)

Rebuild the Initramfs and Reboot

Step 3: Install the NVIDIA Driver on the Host

Step 4: Create the Ollama LXC with GPU Passthrough

Step 5: Install the NVIDIA Userspace Libraries in the Container

Step 6: Install Ollama

Step 7: Configure Ollama for Network Access

Step 8: Pull a Model and Test Inference

Step 9: Verify GPU Acceleration

Standard Homelab Wiring (Optional Bonus)

Pi-hole Local DNS Record

Caddy Reverse Proxy

PBS Backup Job

Uptime Kuma Monitors

Next Steps

Related Content

Caddy + Cloudflare SSL on Proxmox — HTTPS for Every Service (2026)

Install Paperless-ngx on Proxmox LXC (2026)

Monitor Your Homelab with Uptime Kuma — Part 1: Install & First Monitors

Pi-hole v6 + Unbound on Proxmox — Network-Wide DNS (2026)

Proxmox Backup Server 4.1 Setup Guide (2026)