OpenClaw Update Runbook

Use when updating OpenClaw or debugging an OpenClaw instance after an update. This skill acts as a structured update runbook with emphasis on gateway startup...

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-20

claw-update-runbook provides a systematic post-update diagnostic workflow for openclaw instances, covering service state, plugin conflicts, config drift, and model routing with explicit failure patterns and recovery sequences.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

8.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: openclaw-update-runbook
description: Use when updating OpenClaw or debugging an OpenClaw instance after an update. This skill acts as a structured update runbook with emphasis on gateway startup, service-manager state, plugin registry and install drift, bundled-vs-npm/clawhub plugin confusion, stale config carried across upgrades, channel health, task ledger corruption, and logs that explain why the updated system is slow, disconnected, or half-broken.
version: 1.0.6
metadata:
  openclaw:
    emoji: "🦞"
---

# OpenClaw Update Runbook

Use this skill when an OpenClaw host was just updated, is about to be updated, or is behaving strangely after an update. It is a generic operator runbook, not a release-specific checklist.

This skill is meant to be installed as a folder, not copied as a single file. It expects `references/failure-patterns.md` to exist locally beside `SKILL.md` inside the same skill bundle.

The goal is not only to get it running, but to prove which layer is broken:

- service lifecycle and service-manager state
- host package version
- plugin/package compatibility
- config drift
- model/provider runtime routing
- channel health
- task ledger health
- cron/session isolation and channel-lane ownership
- runtime performance
- command-path and update-channel assumptions
- self-update hazards when an agent updates the gateway that is running it
- supply-chain and package-integrity spot checks after plugin/npm churn

## Quick workflow

1. Establish the real starting state.
   For remote multi-host updates, first prove SSH reachability to each host
   with a short timeout. If a host cannot be reached directly or through an
   available jump host, record it as a transport/access blocker instead of an
   OpenClaw update failure, because no OpenClaw command has executed on that
   host yet.

   If you are connected over non-interactive SSH, do not assume the
   login-shell `PATH` is available. First locate the binary with common install
   paths such as a package-manager prefix and `~/.local/bin/openclaw`, then
   export the correct `PATH` for the audit session.

   If the gateway process is owned by a different OS user than the SSH login
   user, run OpenClaw diagnostics as the gateway service user. The SSH user can
   have no `openclaw` on PATH, or a private package-manager shim can be
   unreadable, while the LaunchAgent/systemd service is healthy under another
   home directory. Derive the service user, state dir, CLI path, and port from
   the live process/service definition before running `doctor` or editing
   config.

   Check:
   - `openclaw --version`
   - `openclaw update status`
   - `openclaw status --deep`
   - `openclaw doctor --non-interactive --no-workspace-suggestions`
   - `openclaw channels status --deep`
   - `openclaw tasks audit`
   - current model routing: agent defaults, agent-level model maps, fallback chains, and cron payload models
   - recent successful sessions for the primary model and runtime, not just the display model name

2. Verify the gateway is actually managed correctly.
   Look at service-manager state, running PID, and `/health`.
   Derive the service label/name and gateway port from `openclaw status --deep`
   and/or the service definition instead of guessing them.
   Do not trust only one of:
   - the host's service manager
   - process list
   - health endpoint

   It is common to have:
   - a service definition present but not loaded
   - a detached gateway process still serving traffic
   - the service manager and the live process disagreeing

3. Separate bundled plugins from globally installed plugins.
   First inspect plugin health:
   - `openclaw plugins doctor`
   - `openclaw plugins list --json`
   - `openclaw plugins inspect <id>`

   Important rule:
   - If a capability is supposed to be bundled, verify whether a stale global npm install is shadowing it.
   - If a capability is not bundled, check npm and ClawHub before assuming config is wrong.
   - For special runtime plugins such as `codex`, compare `plugins inspect <id>`
     with `plugins list --json`; inspect can report a runtime as loaded while
     raw plugin metadata still says disabled.
   - For ClawHub/runtime plugins such as `codex`, compare the plugin version
     against the host version even when `plugins doctor` is clean. Use
     `openclaw plugins update <id> --dry-run` to see whether an official
     matching package exists before changing broader model config.

4. Check for config carried across the upgrade that no longer validates.
   Pay attention to:
   - `tools.web.search.provider`
   - `plugins.allow`
   - `plugins.entries.*`
   - model aliases and fallback chains
   - runtime mappings for `openai/*`, `openai-codex/*`, `codex`, and `pi`
   - cron job payload model refs, which can be normalized separately from agent defaults
   - update channel metadata

   If doctor says a provider or plugin is unknown, inspect the actual config file and do not assume `doctor --fix` fully cleaned it.

5. Compare plugin install records to what exists on disk.
   Inspect:
   - `~/.openclaw/plugins/installs.json`
   - `~/.openclaw/npm/node_modules/@openclaw/...`
   - `~/.openclaw/extensions/...`

   Look for:
   - recorded install paths that do not exist
   - recorded versions drifting from installed versions
   - ClawHub-installed runtime plugins under `~/.openclaw/extensions/<id>` that
     load successfully but lag the host cohort
   - npm install records where `resolvedSpec`, integrity, and installed version
     are exact, but the stored `spec` is still a bare package name such as
     `@openclaw/discord`
   - package specs rewritten or preserved during `openclaw update --channel ...`
   - external plugins that lack a release for the selected channel and were
     installed from a fallback tag such as `@latest`
   - source-only TypeScript plugin packages with no compiled `dist/`
   - plugin runtime deps removed from third-party plugin directories

6. Inspect recent gateway logs before changing too much.
   Read:
   - `~/.openclaw/logs/gateway.log`
   - `~/.openclaw/logs/gateway.err.log`
   - `/tmp/openclaw/openclaw-YYYY-MM-DD.log`

   Prioritize recent startup lines and warnings involving:
   - plugin load failures
   - config validation
   - provider fallback attempts and primary-route auth or module failures
   - update lifecycle messages such as service stop fallbacks,
     config overwrites/backups, and service reload timing
   - channel auth (if a channel returns 401/auth-failure post-update, inspect `~/.openclaw/service-env/*.env` for token-line quote corruption — see Pattern #23 — before assuming the upstream credential was rotated)
   - context-engine fallback
   - active-memory timeouts
   - event loop degradation
   - task restart blocking
   - transient post-restart UI/websocket scope errors that clear after the
     gateway is ready

7. Audit runtime/task health after the upgrade.
   Check for:
   - stale running tasks
   - lost tasks
   - delivery failures
   - timestamp inconsistencies
   - cron jobs whose persisted `sessionKey` points at a live channel lane
     such as `agent:<agent>:discord:direct:*` despite `sessionTarget: isolated`

   A successful package update can still leave the system unhealthy if stale tasks block restarts or keep the audit red.

8. Prove the primary model route, not just overall agent success.
   Run a narrow direct agent smoke test with a fresh session id and inspect the returned metadata:
   - final provider and model
   - runtime or harness id
   - `fallbackAttempts`
   - provider auth errors
   - module load errors
   - schema validation errors

   Treat `status: ok` as insufficient if the primary model failed and a fallback provider completed the run.
   Treat a clean `plugins doctor` as insufficient for runtime plugins until a
   fresh direct agent run proves that the intended harness can load and execute.

9. If the update was initiated from inside OpenClaw, audit it as a special risk.
   An OpenClaw agent can sometimes update the package it is running under, but
   that path has repeatedly left hosts with the package changed and the managed
   service unloaded or not restarted. From an outside SSH shell, verify:
   - whether the requested version actually installed
   - whether the managed service is loaded/running after the update
   - whether the gateway `/health` endpoint and channels recovered
   - whether a fresh `openclaw gateway restart` repairs an installed-but-unloaded
     service without any further package changes

   Do not treat the agent conversation's final message as authoritative. Trust
   the post-update host state.

10. Test at least one representative cron path.
   Check:
   - cron payload model counts
   - model counts by `agentId` so temporary provider workarounds can be
     rolled back without flattening full-size and mini cron routes together
   - persisted `sessionKey` values, especially channel/direct-message keys on
     isolated cron jobs
   - named or high-value cron job status
   - manual `cron run` behavior
   - whether `--expect-final` actually waits for final completion on the current build
   - recent run history for the specific job id, not only current job state,
     so stale last-run errors are separated from active regressions

   If cron verification only proves enqueue, state that clearly in the handoff notes.

11. Run a targeted npm/plugin supply-chain spot check when plugin installs changed.
   This is especially important after a failed plugin install, external plugin
   fallback, or public npm compromise advisory. Check:
   - whether `openclaw security audit --deep` flags unpinned npm plugin specs
     after plugin update churn
   - exact installed package versions against the advisory list
   - plugin install roots such as `~/.openclaw/npm/node_modules`
   - global OpenClaw/npm roots such as `/opt/homebrew/lib/node_modules`
   - obvious malicious lifecycle hooks in `package.json`
   - persistence artifacts named by the advisory
   - lockfiles and config files for strong IoCs

   State the limits of the check: a live-system scan cannot prove a package was
   never installed and removed earlier.

12. Re-run the narrowest fix, then verify again.
   Common fix sequence:
   - stop gateway cleanly
   - update host package
   - refresh plugin registry if needed
   - repair or update broken plugin installs
   - restart gateway
   - re-run `doctor`, `plugins doctor`, `status --deep`, `channels status --deep`, and `tasks audit`

## Where to look first

Use this order when diagnosing post-update failures:

- Service state: service manager, PID, `/health`
- Host version: `openclaw --version`
- Plugin mismatch: `openclaw plugins doctor`
- Config drift: `openclaw doctor`
- Channel reality: `openclaw channels status --deep`
- Task ledger: `openclaw tasks audit`
- Model/runtime route reality: direct smoke metadata and fallback attempts
- Runtime symptoms: gateway logs

## When to open references

Start with this file first.

Open [references/failure-patterns.md](references/failure-patterns.md) when:

- `doctor` or `plugins doctor` points to a known-looking regression
- `channels status` or logs disagree with the apparent service health
- plugin installs, install records, or config state do not match what is on disk
- the update completed, but the host is still slow, disconnected, noisy, or half-broken

Use the reference file for symptom matching and concrete examples after the main workflow has narrowed the likely failure area.

## Bundled vs external plugin rule

Do not assume a broken plugin means "plugin missing."

There are three common cases:

- Bundled plugin exists in the host package, but stale config still points at an old provider/plugin id.
- Bundled plugin exists, but a globally installed npm plugin shadows it and is on the wrong version.
- Plugin is not bundled, so the fix is to inspect npm or ClawHub and reconcile install records.

A channel plugin is a good example of the second case: a host can upgrade correctly while still loading an older globally installed plugin package.

If the feature is not bundled, check npm and ClawHub before rewriting config.

## Fixing mindset

Prefer the smallest fix that makes state consistent again:

- refresh registry before reinstalling everything
- update one stale plugin before removing all plugins
- inspect the actual config file when helper commands appear to succeed but warnings remain
- verify whether a third-party plugin needs local runtime deps before deleting plugin-side `node_modules`

Do not stop at "service is up." A good finish means:

- the right version is installed
- the gateway is managed correctly
- channels are connected
- the intended primary model route succeeds without an unexpected fallback
- cron payload models and representative cron jobs are healthy
- plugin doctor is clean or explained
- task audit is not carrying a fresh blocking error

## Handoff notes

If the upgrade exposed an OpenClaw bug rather than local drift, collect enough information for the next operator or project/support contact. Do not assume the user has any particular external account or wants a public report created.

- exact version before and after
- relevant config keys
- primary model route before and after, including runtime id
- direct smoke result metadata, especially `fallbackAttempts`
- cron model map before and after any temporary workaround, including mini
  routes and inherited/default model cases
- exact first bad cron run timestamps from `openclaw cron runs --id <id>`,
  not just the time the operator noticed the issue
- any cron `sessionKey` values that crossed channel/session boundaries, after
  replacing channel ids and account ids with placeholders
- plugin source path actually loaded
- installed package version and file layout for any failing npm plugin
- whether the plugin was bundled or globally installed
- gateway OS service user and command path when they differ from the SSH user
- exact update command and selected channel
- whether external plugins used channel-specific versions or fallbacks
- service stop/restart messages, especially if the service manager needed a fallback stop/unload path
- `doctor`/`plugins doctor` warning text
- the specific log lines around startup failure or restart

Sanitize handoff notes before sharing externally:
- remove hostnames, usernames, IPs, machine names, tokens, account ids, channel ids, and personal job names
- replace local paths with placeholders such as `<state>`, `<global-openclaw>`, and `~/.openclaw`
- summarize private prompt/session contents instead of quoting them
- keep exact version numbers, package names, model ids, runtime ids, and error classes when they are needed to reproduce the bug

For concrete regression patterns and example symptoms, read [references/failure-patterns.md](references/failure-patterns.md).

## Updating this skill

When another operator or agent learns something new from a different OpenClaw host:

- do not delete existing workflow steps unless they are clearly wrong
- do not replace an existing failure pattern with a narrower one
- prefer additive updates over rewrites
- add new regression patterns to `references/failure-patterns.md`
- only tighten the main workflow in this file if the new lesson changes the recommended audit order for most hosts

If a new finding is host-specific or uncertain, add it as a new failure pattern with:

- symptom
- what to inspect
- why it matters

Do not silently erase older patterns just because the current host did not hit them.

don't have the plugin yet? install it then click "run inline in claude" again.

OpenClaw Update Runbook

use this skill when an openclaw host was just updated, is about to be updated, or is behaving strangely after an update. it is a generic operator runbook, not a release-specific checklist.

this skill is installed as a folder. it expects references/failure-patterns.md to exist locally beside SKILL.md inside the same skill bundle.

the goal is not only to get it running, but to prove which layer is broken:

service lifecycle and service-manager state
host package version
plugin/package compatibility
config drift
model/provider runtime routing
channel health
task ledger health
cron/session isolation and channel-lane ownership
runtime performance
command-path and update-channel assumptions
self-update hazards when an agent updates the gateway that is running it
supply-chain and package-integrity spot checks after plugin/npm churn

intent

use this runbook after an openclaw host update to systematically isolate whether the failure is service lifecycle, plugin mismatch, config drift, runtime routing, or task ledger corruption. the skill moves from host reachability and binary location through gateway health, plugin/config validation, live session routing, and cron/task audit. apply it before opening failure-patterns.md, and hand off to failure-patterns.md when the main workflow narrows the likely broken layer.

inputs

ssh access to the target openclaw host (direct or through jump host)
timeout for ssh reachability checks (recommend 5-10 seconds per host)
optional: service manager type (systemd, launchd, supervisor, etc.) if known in advance
optional: alternate gateway port if non-standard
optional: service user name if gateway runs under a different OS user than ssh login user
access to ~/.openclaw/ state directory (readable by the service user or current ssh user)
access to recent gateway logs at ~/.openclaw/logs/gateway.log, ~/.openclaw/logs/gateway.err.log, /tmp/openclaw/openclaw-YYYY-MM-DD.log
optional: references/failure-patterns.md in the same skill folder for pattern matching after the main workflow

external connections:

openclaw binary on the target host (locate via common paths: package-manager prefix, ~/.local/bin/openclaw, service definition, process list)
openclaw service manager (systemd, launchd, supervisord, or custom) and its definition/state file
openclaw gateway /health endpoint at the derived gateway port
clawhub registry (if plugin updates or supply-chain checks are needed)
npm registry (if external plugins or supply-chain checks are needed)

context required:

version before and after the update (if known)
update channel selected (if known)
whether the update was initiated from inside openclaw (agent self-update) or externally
whether external plugins are installed and their source (bundled, npm, clawhub, or local)
primary model and runtime names before the update (for smoke testing)
representative cron job ids (if cron verification is part of the update scope)

procedure

step 1: establish real starting state and ssh reachability

input: target host list, ssh connection details, ssh timeout (5-10 seconds recommended)

1a. for each remote host, verify ssh reachability with a short timeout. if a host cannot be reached directly or through an available jump host, record it as a transport/access blocker and do not diagnose openclaw on that host yet. note the jump host path for later diagnostic commands.

1b. after ssh connects, confirm the shell is interactive. if connected over non-interactive ssh (e.g., automated deploy), do not assume the login-shell PATH contains openclaw. proceed to step 1c.

1c. locate the openclaw binary by checking common install paths:

package-manager prefix (e.g., /opt/homebrew/bin/openclaw, /usr/local/bin/openclaw)
user-local: ~/.local/bin/openclaw
system: /usr/bin/openclaw, /bin/openclaw
service definition: extract ExecStart from systemd unit or launchd plist

1d. if the gateway process is owned by a different OS user than the ssh login user, derive the service user from the live process or service definition. run all diagnostic commands in step 1e-1h as the gateway service user (use sudo -u <service-user> or su - <service-user> if permissions allow). record the service user, state dir, cli path, and port for later reference.

1e. export the correct PATH for the audit session once the binary location is confirmed.

1f. run openclaw --version to confirm the binary is executable and note the current installed version.

1g. run openclaw update status to check whether an update is in progress or pending.

1h. run openclaw status --deep to get service name, gateway port, state dir, and running process state.

1i. run openclaw doctor --non-interactive --no-workspace-suggestions to collect initial config and provider validation warnings.

1j. run openclaw channels status --deep to check channel connectivity and authentication state.

1k. run openclaw tasks audit to detect stale, lost, or corrupted tasks in the task ledger.

1l. inspect model routing configuration manually (do not rely on doctor alone):

agent default model (usually ~/.openclaw/config.json or derived from service env)
agent-level model maps (if any agents have overrides)
fallback chains for the primary model
cron payload models (separate from agent defaults)

1m. query recent successful sessions for the primary model and runtime (not just the display model name) to confirm actual routing behavior before the update. use openclaw tasks list --agent <agent> --model <model> --limit 5 or similar.

output: confirmed host reachability, binary location, current version, service name, gateway port, service user, initial health state (version, doctor warnings, channel status, task audit result, model routing state)

decision point: if host is unreachable, treat as transport blocker and skip to handoff. if binary is not found, the update may have failed before installation; escalate to package-manager or download logs. if service user differs from ssh user and permissions deny sudo access, note that diagnostic scope is limited to what the ssh user can read.

step 2: verify gateway is actually managed correctly

input: service name, gateway port, service manager type (derived from step 1h or known in advance)

2a. derive the service manager type from openclaw status --deep or the host OS (systemd on linux, launchd on macos, etc.).

2b. query the service manager for the gateway service:

systemd: systemctl status <service-name> and systemctl is-active <service-name>
launchd: launchctl list | grep openclaw and launchctl print system/<service-name>
supervisord: supervisorctl status openclaw
custom: consult the operator's service definition file

2c. record the service state (active, inactive, failed, loaded, unloaded).

2d. query the process list for openclaw gateway processes: ps aux | grep openclaw or pgrep -a openclaw. record any running pids and their command lines.

2e. compare service manager state with the running process list. note any discrepancies:

service manager says active but no process is running
service manager says inactive but a process is running
multiple gateway processes running (possible orphaned or detached instance)

2f. query the /health endpoint at the gateway port: curl -s http://localhost:<port>/health | jq . or similar. record http status and response body.

2g. compare /health response with service manager and process list. note any further discrepancies (e.g., endpoint is responding but service manager says inactive).

2h. if discrepancies exist, do not assume service manager is authoritative. record all three states clearly.

output: service manager state, running process list, gateway pid(s), /health status code and response, reconciliation of any conflicts

decision point: if service manager and process list disagree, the gateway may be detached, the service definition may be stale, or the service manager may not have reloaded after the update. if /health is not responding, the gateway process may be hung or bound to a different port. proceed to step 3 (plugin health) before attempting restart; restarting may mask a deeper config or plugin load failure.

step 3: separate bundled plugins from globally installed plugins

input: openclaw cli path, state dir

3a. run openclaw plugins doctor to check plugin health. note any reported errors or disabled plugins.

3b. run openclaw plugins list --json to get the full plugin registry with ids, versions, and metadata.

3c. for each plugin reported as failed or disabled, run openclaw plugins inspect <id> to examine detailed metadata.

3d. for special runtime plugins such as codex, compare the output of plugins inspect <id> with the entry in plugins list --json. note whether the plugin is marked as disabled in list but loaded in inspect, or vice versa.

3e. check for plugin shadowing:

if a capability is supposed to be bundled, check whether a stale global npm install is shadowing it. run npm list -g @openclaw/<plugin> or equivalent for the global npm root. compare the global version with the bundled version in the host package.
if a capability is not bundled, check npm and clawhub before assuming config is wrong.

3f. for clawhub/runtime plugins such as codex, run openclaw plugins update <id> --dry-run to check whether an official matching package exists for the current host version. record the dry-run result.

3g. inspect the disk layout of plugins:

bundled: usually under <host-package>/lib/plugins/ or similar
npm global: npm root -g and check for @openclaw/ packages
npm local: ~/.openclaw/npm/node_modules/@openclaw/...
clawhub: ~/.openclaw/extensions/<id>/

output: plugin doctor warnings, full plugin list with versions, plugin shadowing analysis (global vs bundled), dry-run availability for each failed plugin, disk layout of bundled/npm/clawhub plugins

decision point: if plugin doctor is clean, plugin config is likely valid (but not proven; see step 8). if a plugin is shadowed by a global npm install, either update the global package or uninstall it and refresh the local plugin registry. if a plugin is not bundled and not installed locally or globally, the fix is to run openclaw plugins install <id> or update the config to use a bundled alternative. if a bundled plugin exists but clawhub has a newer version, decide whether to update based on the host cohort version and update channel.

step 4: check for config carried across the upgrade that no longer validates

input: state dir, openclaw binary, openclaw doctor output from step 1i

4a. inspect the main config file at ~/.openclaw/config.json (or the path derived from openclaw status --deep). search for these keys that often drift across upgrades:

tools.web.search.provider
plugins.allow
plugins.entries.*
model aliases and fallback chains
runtime mappings for openai/*, openai-codex/*, codex, and pi
cron job payload model refs
update channel metadata

4b. for each key found, verify that the referenced provider or plugin id exists in the current host version. cross-check against openclaw plugins list --json output from step 3b.

4c. if doctor reports an unknown provider or plugin, do not assume doctor --fix fully cleaned it. inspect the actual config file directly and search for the unknown id. if found, decide whether to remove it, rename it to a valid bundled alternative, or update it.

4d. run openclaw doctor --fix and re-run openclaw doctor to confirm all warnings are resolved. if warnings persist after --fix, re-inspect the config file manually.

4e. if a runtime mapping or model alias was changed, verify the new mapping is correct for the intended routing. do not rely on config validation alone; see step 8 for live smoke testing.

output: config file audit report, list of stale or unknown providers/plugins found, doctor warnings before and after --fix, any manual config edits made and their rationale

decision point: if config carries many stale refs, the upgrade may have changed plugin ids or deprecated a capability. check the release notes or version diff for rename/deprecation patterns. if manual edits are needed, make one change at a time and re-test after each change (see step 12).

step 5: compare plugin install records to disk state

input: state dir, plugin list from step 3b, disk paths from step 3g

5a. read ~/.openclaw/plugins/installs.json (or equivalent install record file) and note each recorded plugin, version, and install path.

5b. for each recorded plugin install, verify the install path exists on disk. if not, note as a missing install.

5c. for each plugin install path that exists on disk, compare the recorded version in installs.json with the actual package.json version in the installed directory.

5d. for npm-installed plugins in ~/.openclaw/npm/node_modules/@openclaw/..., check whether resolvedSpec, integrity, and installed version are exact matches in installs.json. note any entries where the stored spec is a bare package name (e.g., @openclaw/discord without a version pin) rather than a pinned version.

5e. for clawhub-installed runtime plugins under ~/.openclaw/extensions/<id>/, check whether the plugin loaded successfully (from step 3) but has a version lag relative to the host cohort. run openclaw plugins inspect <id> and compare the version with the host version (step 1f).

5f. for external plugins (non-bundled, non-clawhub), check whether a release exists for the selected update channel. if not, the plugin may have been installed from a fallback tag such as @latest. note this in the handoff.

5g. for any source-only typeScript plugin packages, check whether a compiled dist/ directory exists. if not, the plugin may need to be rebuilt locally or is not compatible with the current host version.

5h. for third-party plugins, verify that runtime dependencies listed in their package.json are present in the plugin directory (not removed or purged by the update).

output: install record audit (recorded vs actual paths, version drift, pin status), missing installs, plugins with unresolved channel versions, plugins needing rebuild or missing deps

decision point: if install records are out of sync with disk, run openclaw plugins refresh to resync. if a plugin is recorded but missing, run openclaw plugins install <id> to reinstall it. if a plugin was installed from @latest and the update selected a new channel, decide whether to pin the external plugin to the new channel release or use a bundled alternative.

step 6: inspect recent gateway logs before changing too much

input: state dir, log paths

6a. identify the gateway log files:

~/.openclaw/logs/gateway.log (standard output and info)
~/.openclaw/logs/gateway.err.log (error output)
/tmp/openclaw/openclaw-YYYY-MM-DD.log (temporary session logs)

6b. for each log file, read the most recent 200-500 lines and prioritize entries involving:

plugin load failures or warnings
config validation errors or warnings
provider fallback attempts and auth failures on the primary route
update lifecycle messages (service stop fallbacks, config overwrites, service reload timing)
channel authentication failures (401 responses, token corruption, see Pattern #23 in references/failure-patterns.md)
context-engine fallback attempts
active-memory timeouts or errors
event loop degradation or warnings
task restart blocking or ledger errors
transient post-restart UI/websocket scope errors

6c. if the update was just applied, search for startup messages. note any service stop, config backup, or service reload messages and whether they succeeded or required a fallback path.

6d. search for "error", "fail", "warn", "ERR", "timeout" and manually inspect each occurrence in context.

6e. if logs show a provider fallback on the primary route, note the fallback chain and which provider succeeded.

6f. if channel auth returned 401 or similar, inspect ~/.openclaw/service-env/*.env for token-line quote corruption or whitespace issues (see Pattern #23).

output: log audit report with exact timestamps and line text for critical errors, list of provider fallbacks with chain details, any service lifecycle messages, any channel auth issues

decision point: if logs show plugin load failures, investigate that plugin further (see step 3 and 5). if logs show provider auth failures, verify credentials in service-env before restarting. if logs show event loop or memory degradation, proceed to step 8 for smoke testing and step 12 for targeted fixes. if logs are clean but status checks show failures, the failure is likely not in startup or auth; move to step 7 (task/channel audit) or step 8 (smoke test).

step 7: audit runtime/task health after the upgrade

input: state dir, openclaw cli

7a. re-run openclaw tasks audit (from step 1k) and note any new errors, stale tasks, lost tasks, or delivery failures.

7b. run openclaw tasks list --limit 20 --sort timestamp-desc and inspect the 20 most recent tasks. note timestamps, status, model, runtime, and any error messages.

7c. for any stale tasks (timestamp > 1 hour old and still in a running state), attempt to cancel them: openclaw tasks cancel <task-id>. note whether the cancellation succeeds or blocks.

7d. if cron jobs are configured, run openclaw cron list --json and inspect each job's lastRun, lastStatus, and sessionKey.

7e. for any cron job with sessionTarget: isolated, verify that its persisted sessionKey does not point to a live channel lane (e.g., agent:<agent>:discord:direct:*). if it does, the isolation boundary may be broken; note this.

7f. run openclaw cron runs --id <job-id> --limit 5 for at least

OpenClaw Update Runbook

related skills

OpenClaw Update Runbook

intent

inputs

procedure

step 1: establish real starting state and ssh reachability

step 2: verify gateway is actually managed correctly

step 3: separate bundled plugins from globally installed plugins

step 4: check for config carried across the upgrade that no longer validates

step 5: compare plugin install records to disk state

step 6: inspect recent gateway logs before changing too much

step 7: audit runtime/task health after the upgrade