← ~/content

OpenClaw on AMD Strix Halo: Set Up a Self-Hosted AI Agent with 128GB Unified Memory

Tutorial~14 min read
OpenClaw on AMD Strix Halo: Set Up a Self-Hosted AI Agent with 128GB Unified Memory
intermediate~45 min

Prerequisites

  • AMD Ryzen AI Max+ / Strix Halo system (GMKtec EVO X2, Framework Desktop, or similar)
  • Ubuntu Server 24.04 installed
  • SSH access from your workstation
  • Basic Linux CLI experience

Tools

  • SSH terminal
  • Web browser

Software

  • rocm7.2
  • modelGLM-4.7-flash
  • kernel6.18.0
  • ollama0.18.3
  • ubuntu24.04
  • openclaw2026.3.24
Watch on YouTube

OpenClaw has been getting a lot of attention lately — it's one of the most popular open-source AI agent projects right now, and I wanted to see what it could actually do. Not just chat, but execute real tasks: run commands, manage files, interact with services. The kind of thing you'd want an AI to do in a homelab.

So I set it up on a GMKtec EVO X2 with AMD's Ryzen AI Max+ 395 and 128GB of unified memory. This hardware is interesting because the CPU and GPU share the same memory pool — no discrete GPU needed, and you can run models that wouldn't fit on most consumer graphics cards.

Everything here is 100% CLI-based and fully self-hosted. No cloud accounts, no API keys, no data leaving your network. By the end, you'll have an AI agent running on your own hardware that you can chat with from a browser.

NOTE

This is Part 1 of a 6-part series. We start with the agent itself, then build up to automated monitoring with Uptime Kuma (Part 2), a self-hosted Matrix command center (Part 3), a local model shootout comparing 6 models (Part 4), backup and restore (Part 5), and production hardening (Part 6).

Prerequisites

  • An AMD Strix Halo system with Ubuntu Server 24.04 installed. This tutorial uses the GMKtec EVO X2 with the Ryzen AI Max+ 395 and 128GB LPDDR5X-8000. Other Strix Halo systems will work too — the Minisforum MS-S1 MAX, Beelink GTR9 Pro, and GEEKOM A9 Mega all use the same chip.
  • SSH access to the machine
  • An internet connection for downloading packages and models
  • Basic comfort with the Linux command line

TIP

Ubuntu Server (not Desktop) is the right choice here. No desktop environment needed — everything is CLI-based, and you'll save resources for AI inference.

Step 1: Update and Verify Your System

Before changing anything, update the system and check what we're working with. SSH into your machine:

sudo apt update && sudo apt upgrade -y

Now check the baseline:

uname -r
lspci | grep -i display
free -h | head -2

You should see something like this:

Terminal showing kernel 6.8.0, GPU as Device 1586 (unrecognized), and 124GB RAM on Strix Halo

Notice the GPU shows as "Device 1586" — the stock Ubuntu 24.04 kernel (6.8.0) doesn't have the Strix Halo device IDs yet. That changes after we upgrade the kernel and install ROCm. The 124GB of 128GB is expected — firmware and the iGPU reserve a small amount.

NOTE

Your IP address, hostname, and username will differ from this tutorial. Wherever you see specific IPs or usernames, substitute your own values.

Step 2: Upgrade the Kernel to 6.18

AMD's ROCm documentation requires kernel 6.18.4+ for stable Strix Halo support. The KFD driver fixes for queue creation and memory availability landed in 6.18.4. Without this, GPU compute workloads may fail or behave unpredictably.

NOTE

Why not the Ubuntu HWE or OEM kernel? We evaluated three options:

  • Ubuntu HWE kernel (linux-generic-hwe-24.04) ships kernel 6.17 — close, but missing critical 6.18.4 Strix Halo fixes.
  • Ubuntu OEM kernel (linux-oem-24.04c) tops out at 6.14 — too old.
  • Canonical mainline builds from kernel.ubuntu.com provide kernel 6.18 with all Strix Halo fixes. Not signed (secure boot must be disabled), but built by Canonical from upstream sources.

Mainline 6.18 is the only option that meets AMD's requirements.

WARNING

If secure boot is enabled in your BIOS, mainline kernels won't boot. Disable secure boot first. If something goes wrong, your old kernel remains installed — select it from GRUB Advanced Options to roll back.

TIP

The exact filenames below may change with point releases. Check kernel.ubuntu.com/mainline for the latest 6.18.x version before downloading.

cd /tmp
wget -q https://kernel.ubuntu.com/mainline/v6.18/amd64/linux-headers-6.18.0-061800-generic_6.18.0-061800.202511302339_amd64.deb
wget -q https://kernel.ubuntu.com/mainline/v6.18/amd64/linux-headers-6.18.0-061800_6.18.0-061800.202511302339_all.deb
wget -q https://kernel.ubuntu.com/mainline/v6.18/amd64/linux-image-unsigned-6.18.0-061800-generic_6.18.0-061800.202511302339_amd64.deb
wget -q https://kernel.ubuntu.com/mainline/v6.18/amd64/linux-modules-6.18.0-061800-generic_6.18.0-061800.202511302339_amd64.deb

You should see four .deb files in /tmp:

Terminal showing four kernel 6.18 .deb packages downloaded in /tmp, highlighted in red

Install all four:

sudo apt install -y ./linux*6.18*.deb

Step 3: Reboot and Verify the New Kernel

sudo reboot

After reconnecting (about 60 seconds):

uname -r

Terminal showing uname -r output: 6.18.0-061800-generic

The GPU will still show as "Device 1586" — that's fine. It gets its proper name after ROCm.

Step 4: Install ROCm 7.2

ROCm is AMD's equivalent of NVIDIA's CUDA. Without it, all AI inference runs on CPU — which works, but it's painfully slow compared to GPU acceleration.

NOTE

ROCm install flags for Strix Halo: AMD recommends --usecase=rocm for the full compute stack and --no-dkms because Ryzen APUs should use inbox kernel drivers, not compiled DKMS modules. If you're on an Instinct datacenter GPU, you'd want DKMS — but for consumer hardware like Strix Halo, skip it.

TIP

The exact URL below may change. Check AMD's ROCm install docs for the current version.

wget -q https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
sudo apt install -y ./amdgpu-install_7.2.70200-1_all.deb

This next command downloads 2-3GB of packages and takes several minutes:

sudo amdgpu-install -y --usecase=rocm --no-dkms

Add your user to the GPU access groups:

sudo usermod -aG render,video $USER

The group changes require a reboot to take effect — we'll reboot after the next two steps.

Step 5: Configure Unified Memory (TTM)

This is where Strix Halo gets interesting. By default, the GPU can only map about 15.5GB of system memory. But with 128GB of unified memory sitting right there, that's a waste. The Translation Table Manager (TTM) controls how much memory the GPU can access.

NOTE

AMD recommends keeping BIOS dedicated VRAM small (0.5GB) and using the shared TTM pool instead. We're setting 100GB for GPU access, leaving about 28GB for the OS and applications. Adjust based on your needs.

sudo apt install -y pipx
pipx install amd-debug-tools
sudo $HOME/.local/bin/amd-ttm --set 100

You should see: "Successfully set TTM pages limit to 26214400 pages (100.00 GB)". The tool asks about rebooting — say no, we have one more step.

Verify the config was written:

cat /etc/modprobe.d/ttm.conf

TIP

For Strix Halo, the parameter is ttm.pages_limit — NOT amdttm.pages_limit (that's for Instinct datacenter GPUs). The amd-ttm tool handles this correctly.

Step 6: Set Strix Halo ROCm Environment Variables

These two environment variables are the most important configuration in this entire tutorial. Without them, everything appears to work — but it doesn't work right.

sudo nano /etc/profile.d/rocm-strix-halo.sh

Paste the following, then save (Ctrl+X, Y, Enter):

/etc/profile.d/rocm-strix-halo.sh
# Strix Halo (gfx1151) ROCm configuration
export HSA_OVERRIDE_GFX_VERSION=11.5.1
export HSA_ENABLE_SDMA=0

Make it executable:

sudo chmod +x /etc/profile.d/rocm-strix-halo.sh

WARNING

HSA_OVERRIDE_GFX_VERSION=11.5.1 is the single most critical setting. Without it, ROCm doesn't recognize gfx1151 and silently falls back to CPU. No error message — your model runs, it responds, everything looks fine, but it's 10x slower than it should be and there's no indication anything is wrong. This one env var fixes it.

WARNING

HSA_ENABLE_SDMA=0 disables a bugged DMA engine in Strix Halo's unified memory path. Without it, compute output corrupts after 4-5 conversational turns — the model starts outputting repetitive single characters. If your AI starts speaking gibberish after a few conversations, this is why.

Step 7: Reboot and Verify the GPU

Both the TTM config and group membership changes require a reboot.

sudo reboot

After reconnecting, source the env vars and check the GPU:

source /etc/profile.d/rocm-strix-halo.sh
rocminfo | grep -E "Name:|Marketing"

You should see gfx1151 and the GPU recognized:

Terminal showing rocminfo output with gfx1151 GPU architecture and AMD Radeon Graphics marketing name

Compare this to Step 1 where the GPU was "Device 1586". Now it's properly recognized with full memory access.

TIP

If rocminfo only shows CPU, check echo $HSA_OVERRIDE_GFX_VERSION — if it's empty, the profile script didn't source. Verify /etc/profile.d/rocm-strix-halo.sh exists.

Step 8: Install Ollama

Ollama is the local LLM runtime that handles model downloading, GPU loading, and inference. It's what actually runs the AI models — OpenClaw connects to it as a backend.

NOTE

Why Ollama and not SGLang, vLLM, or llama-server? OpenClaw has a dedicated Ollama provider that uses Ollama's native API (/api/chat) for tool calling. The widely reported "Ollama breaks tool calling" bug is actually in OpenClaw's OpenAI compatibility layer (/v1), not in Ollama itself. With the native provider, tool calling works.

curl -fsSL https://ollama.com/install.sh -o /tmp/ollama-install.sh
sudo sh /tmp/ollama-install.sh

Terminal showing Ollama install complete with AMD GPU ready message

Ollama auto-detects AMD hardware and downloads the ROCm build — no manual configuration needed.

Step 9: Configure Ollama for Strix Halo

The environment variables from Step 6 only apply to interactive shells. Systemd services need their own env configuration. We also bind Ollama to all interfaces so other devices on the network can access it.

sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo nano /etc/systemd/system/ollama.service.d/override.conf

Paste the following, then save:

/etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl daemon-reload
sudo systemctl restart ollama

Check the logs to verify the GPU is detected:

sudo journalctl -u ollama --no-pager -n 10

Ollama service logs showing gfx1151 GPU with 101.0 GiB total VRAM and 262144 default context

101GB of VRAM detected — that's the TTM config working. The 262K default context window is massive, enabled by the large memory pool.

Step 10: Install Node.js and OpenClaw

OpenClaw is a Node.js application. Install Node.js 22 LTS from NodeSource:

curl -fsSL https://deb.nodesource.com/setup_22.x -o /tmp/nodesource.sh
sudo bash /tmp/nodesource.sh
sudo apt install -y nodejs

Install OpenClaw:

sudo npm install -g openclaw@latest

Verify both:

node --version
openclaw --version

Terminal showing Node.js v22.22.0 and OpenClaw 2026.3.24 installed

Step 11: Run OpenClaw Onboarding

OpenClaw has an interactive onboarding wizard that configures the LLM backend, downloads a model, and sets up the gateway service.

openclaw onboard

The first screen is a security warning — read it and accept:

OpenClaw interactive onboarding screen showing the security warning and recommended baseline settings

Walk through each prompt with these choices:

  • Setup mode: Manual
  • What to set up: Local gateway (this machine)
  • Workspace directory: default (~/.openclaw/workspace)
  • Model/auth provider: Ollama
  • Ollama base URL: default (http://127.0.0.1:11434)
  • Ollama mode: Local (local models only)
  • Default model: ollama/glm-4.7-flash (the recommended default)
  • Gateway port: default (18789)
  • Gateway bind: LAN (0.0.0.0) — needed so Uptime Kuma can reach the webhook API in Part 2
  • Gateway auth: Token
  • Tailscale: Off
  • Gateway token: Generate/store plaintext token (default) — leave blank to auto-generate
  • Configure chat channels: No — we add Matrix manually in Part 3
  • Search provider: Skip for now
  • Configure skills: No
  • Enable hooks: Skip for now — we configure the webhook API manually in Part 2
  • Install gateway service: Yes
  • Gateway service runtime: Node (recommended)
  • Enable bash shell completion: Yes

The onboarding downloads GLM-4.7-flash via Ollama and creates the gateway service:

OpenClaw onboarding summary showing all configuration choices

Terminal showing OpenClaw gateway service installed as a user systemd service with lingering enabled

NOTE

GLM-4.7-flash is a 30B MoE (Mixture of Experts) model with only 3B parameters active per token. That's why it's fast — 30 billion parameters of knowledge, but only 3 billion fire for each response.

The onboarding installs a user-level systemd service (openclaw-gateway.service) with lingering enabled — this means it survives logouts on headless servers. No custom system service needed.

Verify it's running:

systemctl --user status openclaw-gateway

NOTE

The dashboard URL with token is shown during onboarding. You can get it again anytime with openclaw dashboard --no-open.

Step 12: Personalize Your Agent

OpenClaw's workspace has several personality files that change how the agent interacts with you. Without customization, the agent doesn't know who it is or who you are.

ls ~/.openclaw/workspace/

The key files:

  • SOUL.md — the agent's core personality, values, and boundaries. The default is solid — leave it for now.
  • IDENTITY.md — the agent's name, vibe, and self-concept. Empty — fill this in.
  • USER.md — information about you. Empty — fill this in.
  • MEMORY.md — the agent builds this over time. Don't touch it.

Edit the identity file:

nano ~/.openclaw/workspace/IDENTITY.md

Here's an example — make it your own:

~/.openclaw/workspace/IDENTITY.md
# IDENTITY.md - Who Am I?
 
- **Name:** PinchWorth
- **Creature:** AI butler with a lobster problem
- **Vibe:** Dry wit, competent, gets things done without drama

Now tell the agent about yourself:

nano ~/.openclaw/workspace/USER.md
~/.openclaw/workspace/USER.md
# USER.md - About Your Human
 
- **Name:** Alex
- **What to call them:** Alex
- **Timezone:** America/New_York
- **Notes:** Runs a homelab with Proxmox, several self-hosted services. Prefers concise answers.
 
## Context
 
Homelab enthusiast. Runs Proxmox with ZFS storage, Caddy for reverse proxy, Pi-hole for DNS. This machine is a dedicated AI workstation. Interested in automation and monitoring.

Save both files, then restart OpenClaw:

systemctl --user restart openclaw-gateway

TIP

The more context you put in USER.md (timezone, projects, preferences), the more useful the agent becomes. It uses this to tailor responses to your specific situation.

Step 13: Access the WebChat Dashboard

OpenClaw has a built-in Control UI with a chat interface. No Discord or cloud service needed — it runs entirely on your hardware.

The Control UI requires a secure context (HTTPS or localhost). Since the gateway is on a remote machine, we use an SSH tunnel to make it appear as localhost in your browser.

From your workstation (not the server), open a tunnel:

ssh -N -L 18789:127.0.0.1:18789 your_user@your_server_ip

NOTE

Replace your_user@your_server_ip with your actual username and IP. The -N flag means "don't open a shell, just tunnel." Leave this running while you use the dashboard. This works the same on Windows (PowerShell), macOS, and Linux.

Get the tokenized dashboard URL (on the server):

openclaw dashboard --no-open

Copy the full URL (including the #token=... part) and open it in your browser.

WARNING

The token in the URL authenticates your session — treat it like a password.

Step 14: Your First Agent Task

Navigate to the Chat section in the Control UI. The first message takes about 30 seconds — that's Ollama loading the model from disk into GPU memory. After that, responses are 2-5 seconds.

Try a real task:

What are my system specs?

OpenClaw WebChat interface showing the agent's system specs response with CPU, RAM, storage, and GPU details in a table

The agent correctly identifies the hardware — AMD Ryzen AI Max+ 395, 124GB RAM, storage, and the Radeon 8060S. All from running commands on the machine. Tool calling confirmed working through Ollama's native API.

Step 15: Quick Demo — Check Homelab Services

Give the agent a taste of what's coming in Part 2. If you have services running on your network:

Check if these services are responding: https://pihole.hake.rodeo, https://vault.hake.rodeo, https://jellyfin.hake.rodeo

NOTE

Replace those URLs with services on your own network. The agent can check anything reachable from the machine.

OpenClaw agent response showing three homelab services checked: Pi-hole 403 (expected), Vault 200 OK, Jellyfin 302 redirect — all interpreted intelligently

Pi-hole returned 403 — the agent knows that's expected because the admin UI needs authentication. Jellyfin returned 302 — it recognized that as a login redirect. Vault returned 200 — running. "All three services are online."

That's the difference between a monitoring tool and an AI agent. Uptime Kuma would tell you "403 — down." The agent knows it's not.

Step 16: Verify Reboot Persistence

If it doesn't survive a reboot, it's not done.

sudo reboot

After reconnecting (about 90 seconds — the model needs to reload):

uname -r
systemctl --user status openclaw-gateway --no-pager | head -5

You should see kernel 6.18 and the gateway running. Open the WebChat dashboard (re-establish the SSH tunnel first) and send a message — the agent should respond after about 30 seconds (cold model reload).

Troubleshooting

GPU not detected by ROCmrocminfo shows only CPU. The HSA_OVERRIDE_GFX_VERSION=11.5.1 env var is not set. Check echo $HSA_OVERRIDE_GFX_VERSION. For Ollama, verify the systemd override at /etc/systemd/system/ollama.service.d/override.conf. This is the number one gotcha — no error, just slow inference.

ROCm only sees 15.5GB of memory — TTM page limit not configured. Run amd-ttm --set 100 and reboot. Verify with cat /etc/modprobe.d/ttm.conf — should show options ttm pages_limit=26214400.

Model output corrupts after several conversations — SDMA bug. Set HSA_ENABLE_SDMA=0 in both /etc/profile.d/rocm-strix-halo.sh and the Ollama service override.

Kernel 6.18 won't boot — Secure boot is blocking the unsigned mainline kernel. Disable secure boot in BIOS, or select the old kernel from GRUB Advanced Options.

OpenClaw shows "thinking" but never responds — Ollama isn't running or HSA env vars are missing from the Ollama service. Check sudo systemctl status ollama and sudo journalctl -u ollama -n 20. It's almost always the backend.

amd-ttm not found — It's in amd-debug-tools, installed via pipx, not pip or apt. The binary lands at ~/.local/bin/amd-ttm.

First message takes 30+ seconds — Normal. Ollama loads the model from disk to GPU on the first request after boot. Subsequent responses are 2-5 seconds.

systemctl --user commands fail — Make sure you're running as your user, not root. User services don't work with sudo.

Summary

Here's what we built:

  • Kernel 6.18 with Strix Halo GPU support
  • ROCm 7.2 for GPU-accelerated AI inference
  • 100GB of GPU-accessible memory via TTM configuration
  • Ollama running the GLM-4.7-flash model with 101GB VRAM
  • OpenClaw with interactive onboarding and WebChat access
  • A personalized AI agent that knows who it is and who you are

Everything runs locally. No cloud, no API keys, no data leaving your network.

In Part 2, we connect this agent to Uptime Kuma for automated incident response. When a service goes down, Uptime Kuma detects it in seconds and triggers the agent to investigate — running diagnostic scripts on your Proxmox host, analyzing the output, and posting an investigation report. The agent goes from chatbot to monitoring tool.

Related Products

Some links are affiliate links. I may earn a small commission at no extra cost to you.