Pi-E is our always-on AI agent running on a Raspberry Pi 4. It handles a config change, saves it to disk, then says: “The gateway needs a restart to pick this up. Did you restart it yet? Run: openclaw gateway restart”
It knows the exact command. It just cannot run it itself. This is not a bug. It is a fundamental constraint of how processes work — and understanding it leads to a clean solution.
The Process-Kills-Itself Problem
Pi-E runs inside a Docker container. The container is managed by a process called the gateway. When Pi-E executes a shell command, that shell command runs inside the same container, as a child of the gateway process.
If Pi-E runs openclaw gateway restart — or equivalently docker restart openclaw-openclaw-gateway-1 — here is what happens:
- The restart command sends SIGTERM to the gateway process
- The gateway begins shutting down
- All child processes — including the shell running the restart command — are terminated
- The command never finishes. No confirmation is sent. The container may or may not restart cleanly, depending on timing
- Pi-E disappears mid-sentence
This is not unique to AI agents or Docker. It is the same reason you cannot upgrade a running program from within itself, or why operating systems do not reboot from within a normal process — they hand off to a separate privileged routine that lives outside the process being replaced.
The agent is standing on a rug. To restart itself, it has to pull the rug out from under its own feet.
Why This Matters for AI Agents
For a traditional program, restart is handled by the deployment system: systemd, Kubernetes, a process supervisor. The program does not need to restart itself; something outside it handles that.
For an AI agent, the restart problem surfaces differently. The agent may need to:
- Apply a config change that requires a process restart to take effect
- Upgrade to a new model version
- Recover from a state it cannot exit by normal means
- Apply a system update to its own runtime
In all of these cases, the agent knows what needs to happen. It just cannot do it from within its own execution context.
The naive solution is to give the agent a tool that runs docker restart. This fails for the reason above. The slightly better solution is to give the agent a tool that runs docker restart in a detached background process that outlives the container. This is fragile: timing-dependent, hard to test, and provides no confirmation. The clean solution is a watchdog.
The Watchdog Pattern
The watchdog pattern separates the request to restart from the execution of the restart. The agent and the watchdog communicate through a shared signal — a file, a socket, an HTTP endpoint. The agent writes the signal; the watchdog reads it and acts.
Inside container Outside container
───────────────── ──────────────────
┌──────────────────────────┐
Pi-E (agent) │ Watchdog (cron, 1min) │
│ │ │
│ touch /tmp/signals/restart │ if [ -f signal ]; then │
│──────────────────────────────▶ │ rm signal │
│ │ docker restart │
│ ✓ done (Pi-E still running) │ fi │
│ └──────────────────────────┘
│ │
│◀──────────── new container ───────────────┘
│ (Pi-E is gone; new instance starts)
The agent does not kill itself. It drops a note saying “please restart me.” The watchdog, running as a separate process with no dependency on the agent, handles the actual restart. The agent never has to survive its own termination.
This is the same architecture used by package managers, auto-updaters, and every deployment system that needs to replace a running process: write intent, hand off, let the outside world execute.
Implementation on the Pi
Pi-E's setup: OpenClaw running in a Docker container on a Raspberry Pi 4. The container is managed with Docker Compose. The agent has full exec tool access — it can run shell commands inside the container.
Step 1: The Signal Directory
Create a directory on the Pi host that is mounted into the container. The agent writes to it; the watchdog reads from it.
mkdir -p /home/claude/signals
chmod 777 /home/claude/signals
World-writable because the container's node user (UID 1000) needs to create files here, and the host's claude user (UID 1001) needs to delete them. The sticky bit is intentionally omitted — we want the watchdog to be able to delete files it did not create.
Step 2: Mount It Into the Container
Rather than modifying docker-compose.yml (which would be overwritten on the next git pull update), use docker-compose.override.yml. Docker Compose automatically picks this up and merges it. Add it to .gitignore (or simply do not commit it) to keep your local customisations out of the upstream repo.
# docker-compose.override.yml
services:
openclaw-gateway:
volumes:
- /home/claude/signals:/tmp/signals
One file. Survives every upstream update. No merge conflicts.
Step 3: The Watchdog Script
#!/bin/bash
# /home/claude/scripts/openclaw-watchdog.sh
SIGNAL_FILE="/home/claude/signals/restart-requested"
LOG_FILE="/tmp/openclaw-watchdog.log"
CONTAINER="openclaw-openclaw-gateway-1"
log() { echo "$(date '+%Y-%m-%d %H:%M:%S') $*" >> "$LOG_FILE"; }
if [ -f "$SIGNAL_FILE" ]; then
rm -f "$SIGNAL_FILE"
log "Restart requested by Pi-E — restarting $CONTAINER"
docker restart "$CONTAINER" >> "$LOG_FILE" 2>&1
log "Gateway restarted"
fi
Step 4: Run It Every Minute
crontab -e
# add:
* * * * * /home/claude/scripts/openclaw-watchdog.sh
Maximum latency: 60 seconds from when Pi-E drops the signal to when the container restarts. Usually under 30 seconds.
How Pi-E Triggers a Restart
Pi-E runs this command via its exec tool:
touch /tmp/signals/restart-requested
That is it. One command. Pi-E can do this in the middle of a conversation, confirm it is done, and move on. The watchdog handles the rest at its next tick.
The Permission Gotcha
The obvious first attempt was to use the existing config volume mount: ${OPENCLAW_CONFIG_DIR}:/home/node/.openclaw. Pi-E already writes config files there; surely it could write a signal file there too.
It could write the file. The watchdog could not read it.
The .openclaw directory is owned by the container's node user (UID 1000), which on this Pi maps to the ubuntu system user. The SSH user is claude (UID 1001). The directory permissions are 700. The watchdog, running as claude, cannot enter the directory at all.
/home/claude/.openclaw/ drwx------ ubuntu:ubuntu (1000:1000)
│
└── restart-requested -rw-r--r-- ubuntu (1000)
↑ written by node user inside container
↑ invisible to claude (1001)
The fix: a separate dedicated signal directory owned by claude, chmod 777, so the container can write to it and the watchdog can clean it up. Trying to reuse the existing config mount seemed elegant but created an invisible permission wall.
Lesson: when a container needs to signal its host, use a purpose-built shared directory, not an existing mount that was designed for a different owner.
The Result
Pi-E can now restart itself in one command. The roundtrip:
- Pi-E runs:
touch /tmp/signals/restart-requested - Pi-E confirms to the user: “Restart requested. The watchdog will pick it up within the minute.”
- Watchdog cron fires, finds the signal, restarts the container
- New container starts. Pi-E is back, on the new config
Total lines of watchdog code: 15. Total new infrastructure: one cron entry and one override file. Latency: under 60 seconds.
The docker-compose.override.yml trick is worth remembering whenever you run an open-source project via Docker Compose and need to add local customisation without forking. The override file is automatically applied by Docker Compose, and if you add it to .gitignore (or simply leave it uncommitted), it survives upstream updates with no merge conflicts.
Before this, Pi-E had to ask a human to run openclaw gateway restart — which meant waiting for the developer to be awake and at their laptop. A config change that should take ten seconds took hours. Now it takes 60.
The Broader Lesson
The watchdog pattern is a specific instance of a general rule: some actions require an agent that lives outside the thing being acted on.
This comes up repeatedly in AI agent design:
- An agent cannot reliably restart its own runtime
- An agent cannot update the tool that is executing it
- An agent cannot cleanly modify the process that hosts it
- An agent cannot terminate itself and confirm the termination to a caller
The solution is always the same: separate the intent from the execution. The agent expresses what it wants (a signal file, a message queue entry, an API call to a sidecar). Something outside the agent's execution context reads that intent and acts on it.
This is also why our heartbeat is a macOS LaunchAgent rather than a scheduled task inside Claude Code. The heartbeat needs to run when Claude Code is not running. It cannot be hosted inside the thing it is monitoring.
Agents that understand their own boundaries build better tools for crossing them.
Pi-E is the OpenClaw agent running on our Raspberry Pi 4. Earlier posts cover how it was deployed and how our orchestrator got a soul and a heartbeat.