← Homelab

Advisories

Identical GPUs need bus-address-based vfio binding, not PCI ID

When two physically identical GPUs (same model, same PCI IDs) are installed — e.g., 2x Quadro RTX 4000 — you cannot selectively vfio-bind only one of them using `vfio-pci ids=<VID:DID>`, because the ID match captures both cards. The fix is a udev rule keyed on the PCI bus address (`KERNEL=="0000:45:00.*"`) setting `driver_override="vfio-pci"`. Works because bus addresses are unique per slot, and udev fires at device discovery (before any driver autoloads).

vfio-pci softdep alone can race against auto-loading PCI drivers — use udev driver_override

Putting a device's PCI ID in `vfio-pci ids=` plus `softdep <rival_module> pre: vfio-pci` is only enough when the rival driver is loaded by modprobe chains. If the kernel auto-binds the device at PCI discovery time (before vfio-pci has even been requested), softdep never runs and the rival driver wins. Observed with `xhci-pci-renesas` claiming the Renesas uPD720201 at 3.4s while vfio-pci didn't load until 7.9s. The robust fix is a udev rule that sets `driver_override="vfio-pci"` on the device at discovery — this pins the binding regardless of module load order.

RAIDZ2 + large recordsize hits 4K random IOPS hard — tune per-dataset for VM/DB workloads

On a 4-drive RAIDZ2 NVMe pool with default 128K–1M recordsize, random 4K write IOPS drops to single-digit thousands due to read-modify-write amplification across all 4 drives per write. Same drives as individuals do over 1M IOPS. For VM/DB workloads on the pool, override per-dataset: `recordsize=16K` on CT parent datasets, `blocksize=16K` on `zfspool:` storage entries so VM zvols inherit it. Sequential throughput is unaffected (measured 7.5 GB/s write, 7.0 GB/s read on 4× SN850X RAIDZ2).

Affects:Jack

zfs send -R preserves unprivileged CT subuid ownership cleanly across pools

`zfs send -R` preserves UIDs/GIDs (including high-numbered unprivileged CT subuids like 100000, 100100, 100999, 101000), xattrs, POSIX ACLs, and ZFS properties across the migration. No manual chown needed on the target. Verified during hdd-pool → data-pool migration (2026-04-17) with spot checks on Immich (100999:100991), Syncthing (100100:100101), and claude-drops (100000:100000).

Affects:Jack

VM hostpci0 config uses bus address, not PCI ID — breaks on GPU reshuffle

Proxmox VM configs with `hostpci0: 0000:XX:00` pin the passthrough to a bus *address*, not a vendor/device ID. If you swap cards between slots or add new GPUs, the address can now resolve to a different card than intended, breaking passthrough or silently grabbing the wrong device. Update VM configs whenever PCIe topology changes.

Threadripper PRO dual-POST with black screen between is normal (memory training)

M12SWA-TF (and other WRX80 boards) show the splash screen at POST, go dark for 10–30 seconds, then POST again before booting. This is DIMM retraining — expected behavior on cold boots, topology changes (PCIe card add/remove, bifurcation changes), and some warm boots. Only concerning if the second POST fails to complete.

Affects:Jack

Add-in PCIe USB cards have no UEFI option ROM — no keyboard in BIOS

PCIe USB 3.0 add-in cards (verified on Renesas uPD720201) are not enumerated by BIOS/POST because they ship without a UEFI option ROM. Keyboards plugged into such cards will not work during BIOS setup — use an onboard USB port, or IPMI iKVM, for BIOS work. The card does work fine once the Linux kernel initializes it.

Affects:Jack

BIOS bifurcation is per-slot and does not follow the card

Moving an ASUS Hyper M.2 card (or any PCIe card requiring bifurcation) between motherboard slots requires re-enabling x4x4x4x4 bifurcation on the new slot. The setting is keyed to the slot number, not the card. Moving a card out of a bifurcation-enabled slot means only the first M.2 device on the card will enumerate (classic "only 1 of 4 drives visible" symptom) until bifurcation is set on the new slot.

Affects:Jack
High-bandwidth USB webcams through virtual USB can crash the host

Passing a 4K webcam (Logitech BRIO) through QEMU's virtual USB controller caused isochronous transfer failures that cascaded into a full host crash. Lower-bandwidth cameras (1080p OBSBOT) work fine. For 4K webcams, use a dedicated PCIe USB card with controller passthrough instead.

Use VFIO PCI IDs to bind specific GPUs in multi-GPU systems

With multiple NVIDIA GPUs, use vfio-pci ids= with specific device IDs in /etc/modprobe.d/vfio.conf plus 'softdep nvidia pre: vfio-pci' to ensure VFIO claims the target GPU before nvidia claims all of them. This lets one GPU go to a VM while others stay available for LXC passthrough.

Stay on NVIDIA 580.x while Pascal GPUs are installed

Driver 590+ drops Pascal support entirely. Pin to 580.x LTS while Tesla P4 or other Pascal cards are in use. Once Pascal cards are retired, can upgrade to 590+.

NVIDIA open kernel modules only support Turing and newer

Driver 580.x with open kernel modules (default for .run installer) won't probe Pascal GPUs (Tesla P4). Use --kernel-module-type proprietary when installing. Error: 'not supported by open nvidia.ko because it does not include the required GPU'.

Pi-hole DNS records for *.hake.rodeo must point to Caddy, not the service

Local DNS records for *.hake.rodeo subdomains must resolve to Caddy (10.1.10.101), not to the individual service IP. Caddy terminates TLS and reverse proxies to the backend. Pointing DNS directly to the service bypasses Caddy and causes TLS handshake failures.

Affects:Pi-holeCaddy
Install locales before PostgreSQL on fresh Debian 13 LXCs

If PostgreSQL is installed before locale-gen on a fresh Debian 13 LXC, the default cluster uses SQL_ASCII/C encoding. Databases created in that cluster inherit this encoding. Install locales and generate en_US.UTF-8 BEFORE installing PostgreSQL, or recreate databases with explicit UTF-8 encoding.

Cloudflare cfut_ tokens require caddy-dns/cloudflare module update

Cloudflare now issues API tokens with a cfut_ prefix. The caddy-dns/cloudflare module v0.2.3 rejects this format. Rebuild xcaddy with @latest to get support for the new format.

Affects:Caddy
Paperless-ngx API token must be for an admin user

Paperless-ngx API tokens inherit the user's permissions. `User.objects.first()` may return a non-admin user. Always explicitly get the admin user when generating tokens for integrations.

dhcpcd overwrites resolv.conf on static IP VMs

Debian 13 ships dhcpcd-base which overwrites /etc/resolv.conf on boot. On a static IP system with no DHCP server, it writes an empty file — breaking all DNS. Purge dhcpcd-base after switching to static IP.

Debian netinst may skip static IP config

The Debian 13 netinst installer may not offer manual network configuration and silently assigns DHCP. Configure static IP post-install via /etc/network/interfaces.

Alpine helper scripts use a different filename

Alpine variants use prefixed filenames: `alpine-syncthing.sh` not `syncthing.sh`. Verify the exact filename from the script discovery page before writing the curl URL.

Long-running cron jobs need lock files

Cron-triggered scripts that run longer than the cron interval will overlap, causing concurrent instances to compete for the same files. Add a lock file check at the top of the script.

Affects:FileBot
Shell ! escaping breaks shebangs via SSH

Writing scripts via SSH heredoc — zsh/bash escapes `!` in `#!/bin/bash` to `#\!/bin/bash`. Write to local temp file first, then scp + pct push.

Affects:FileBot
--no-install-recommends can skip critical runtime deps

FileBot lists Java, mediainfo, libchromaprint as Recommended not Depends. With --no-install-recommends it installs but won't run. Always check skipped Recommends.

Affects:FileBotSamba
Jellyfin apt repo requires correct Debian codename

Jellyfin repo has separate packages per Debian release. Bookworm packages depend on libs not in trixie. Use `trixie` as codename — confirm with version suffix `+deb13`.

Affects:Jellyfin
pct set -delete doesn't unmount from running containers

`pct set -delete mpX` only updates config — mount stays active until reboot. Reboot before `zfs destroy` or it fails with "dataset is busy."

Affects:Jack
Fresh Debian 13 LXC lacks common tools

debian-13-standard template lacks curl, sudo, dig/dnsutils, and generated locales. Install needed tools before validation steps.

Affects:Jack
Debian 13 ships PostgreSQL 17, not 16

`postgresql-16` doesn't exist in Debian 13 repos. Use PostgreSQL 17 — fully compatible, no reason for older version.

Affects:Immich
New ZFS bind mounts need host-side chown for unprivileged LXCs

New ZFS datasets are owned by host root (UID 0), but unprivileged LXC root maps to UID 100000. Run `chown 100000:100000` on the host before mounting.

Affects:Jack
proxmox-headers not pve-headers in PVE 9

Kernel headers package renamed from `pve-headers-$(uname -r)` to `proxmox-headers-$(uname -r)` in PVE 9. Old name is transitional.

Container disk too small for NVIDIA .run installer

NVIDIA .run installer extracts ~1.5GB to /tmp and installs ~1.5GB. 4GB rootfs is not enough — SIGBUS on extraction. Use at least 8GB rootfs.

nvidia-persistenced service not created by .run installer

The .run installer installs nvidia-persistenced binary but doesn't create a systemd service. Create it manually or /dev/nvidia* devices disappear after idle.

NVIDIA apt driver 550.x fails on kernel 6.17

Debian 13 apt nvidia-driver (550.x) fails DKMS on kernel 6.16+. The 550 branch is EOL. Must use the .run installer from nvidia.com with driver 580+.

Don't use tag=10 for VLAN 10 containers

VLAN 10 is the native (untagged) VLAN. Containers on VLAN 10 should have no tag= setting. Adding tag=10 causes frames to be dropped by the switch.

VLAN-aware bridge required for multi-VLAN LXCs

vmbr0 needs `bridge-vlan-aware yes` and `bridge-vids 2-4094` for VLAN tags on container NICs to work. Host traffic stays untagged.

Affects:Jack
Don't trust helper script defaults

Community helper scripts often leave incorrect config. After every deployment, read the actual config files and verify every value — don't just test "does it respond."

Affects:Pi-hole
wtmpdb-rotate.timer shows as failed (cosmetic)

Debian 13 template references wtmpdb-rotate.timer but the unit file doesn't exist. `systemctl is-system-running` reports degraded. Cosmetic only — ignore it.

Affects:Caddy
Debian 13 samba pulls full AD stack

`apt install samba` on Debian 13 pulls samba-ad-dc, Kerberos, LDAP via Recommends. Use `--no-install-recommends` for a simple file server.

Affects:Samba
Debian 13 tmpfs /tmp causes OOM in small containers

Debian 13 mounts /tmp as tmpfs sized to host RAM (not container cgroup). A 512MB container on a 256GB host OOM-kills on apt installs. Fix: `systemctl mask tmp.mount` then reboot.

Interactive /dev/tty commands crash in LXC

Commands that read /dev/tty for passwords (e.g., `vaultwarden hash`) panic in LXC — no TTY available via pct exec. Use alternative CLI tools that accept piped input.

Affects:Vaultwarden
pct exec has a minimal PATH

`pct exec` runs with PATH=/sbin:/bin:/usr/sbin:/usr/bin — /usr/local/bin is excluded. Use full paths or `bash -lc` for tools installed there.

Affects:Pi-hole
Static IP containers don't get DHCP DNS

LXC containers with static IPs skip DHCP entirely, so DNS must be set explicitly via `pct set <VMID> -nameserver <DNS_IP>` then reboot.

Affects:Pi-hole
Debian 13 LXCs require nesting

Systemd 256+ (Debian 13) requires nesting=1 on unprivileged LXC containers. Without it, systemd can't create namespaces — services fail with credential errors.

Affects:Jack