Allow your claws to do things remotely on a Desktop machine via MCP

Full remote desktop control of a machine via Remote Claws MCP. Use when asked to: take a screenshot of the remote desktop; click, type, or drag with the mous...

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

remote-claws enables full remote desktop control via mcp including screenshots, mouse/keyboard input, command execution, browser automation, and file operations. coordinates and element names are supported for ui interaction.

structure

9.0

trigger phrases

9.0

procedure

8.0

edge cases

7.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: remote-claws
description: "Full remote desktop control of a machine via Remote Claws MCP. Use when asked to: take a screenshot of the remote desktop; click, type, or drag with the mouse/keyboard on the remote machine; run commands or scripts; automate a Chromium browser on the remote machine; read or write files on the remote machine."
homepage: https://github.com/wentbackward/remote-claws
---

# Remote Claws — Remote Desktop Control

Controls a remote machine over MCP/SSE. All 39 tools are provided by the remote-claws MCP server registered in openclaw.json.

## When to Use This Skill

Use Remote Claws tools whenever you need to interact with the remote desktop machine — taking screenshots, clicking buttons, typing text, running commands, automating a browser, or transferring files. If the user asks you to do something "on the remote machine" or "on Windows," these are your tools.

## Strategy

1. **Screenshot first.** Before clicking or typing, take a `desktop_screenshot` to see what's on screen. Use the coordinates from the screenshot to target actions.
2. **Prefer browser tools for web tasks.** `browser_*` tools use CSS selectors and are resolution-independent. Only use `desktop_*` tools for web tasks if the browser tools can't reach something (e.g. browser dialogs, file pickers).
3. **Prefer element names over coordinates.** `desktop_click_element` and `desktop_get_element_text` target UI controls by name — more reliable than coordinate clicking, which breaks when windows move.
4. **Exec is async.** `exec_run` starts a command and returns immediately. Use `exec_get_output` with `wait=true` if you need to block until it finishes.
5. **Re-screenshot after actions.** Windows may move, dialogs may appear. Take a fresh screenshot to verify the result before proceeding.

## Tool Groups

### Desktop (mouse, keyboard, screenshots)
- `desktop_screenshot` — capture full screen or region [x, y, width, height]
- `desktop_mouse_click` — left/right/middle click at x, y
- `desktop_mouse_move` — move cursor to x, y
- `desktop_mouse_drag` — drag from start to end coordinates
- `desktop_type_text` — type ASCII text at current focus (ASCII only)
- `desktop_press_key` — press key or combo: "enter", "ctrl+c", "alt+f4"
- `desktop_scroll` — scroll at x,y; direction "up" or "down"
- `desktop_find_window` — find windows by title or class_name substring
- `desktop_focus_window` — bring window to foreground by title
- `desktop_list_elements` — list UI controls (buttons, fields) inside a window
- `desktop_click_element` — click a named UI element (more reliable than coords)
- `desktop_get_element_text` — read the value of a named UI element

### Browser (Chromium via Playwright — CSS selectors)
- `browser_navigate` — go to a URL
- `browser_click` — click element by CSS selector
- `browser_fill` — set input value (handles Unicode, triggers change events)
- `browser_type` — type keystroke-by-keystroke (appends, does not clear)
- `browser_press_key` — key press e.g. "Enter", "Control+a"
- `browser_get_text` — extract visible text from element (default: body)
- `browser_get_html` — get HTML markup of element
- `browser_eval_js` — run JavaScript in page context
- `browser_screenshot` — screenshot page or element
- `browser_wait_for` — wait for element state: visible/hidden/attached/detached
- `browser_select_option` — select a dropdown option by value or label
- `browser_go_back` / `browser_go_forward`
- `browser_tabs_list` / `browser_tab_new` / `browser_tab_close`

### Exec (run commands, async)
- `exec_run` — start command; returns process_id immediately
- `exec_get_output` — read stdout/stderr; set wait=true to block
- `exec_send_input` — send a line to stdin of a running process
- `exec_kill` — terminate a process
- `exec_list` — list all tracked processes

### Files (base64 encoded)
- `file_write` — write base64 content to a path
- `file_read` — read file as base64 (use offset/limit for large files)
- `file_list` — list directory; supports glob patterns, recursive
- `file_delete` — delete file or empty directory
- `file_move` — move or rename file/directory
- `file_info` — get size, created, modified timestamps

## Authentication & Security

The remote-claws MCP server requires a bearer token, configured in `openclaw.json` when registering the server. The server will reject unauthenticated connections with 401.

The server also supports IP allowlisting (`allowed_ips`), host header validation (`allowed_hosts`), and per-tool permission policies (`permissions.json`) to restrict which tools are available. See the [setup guide](https://github.com/wentbackward/remote-claws/blob/master/remote-claws-openclaw-setup-guide.md) and [README](https://github.com/wentbackward/remote-claws#security) for configuration details.

## Important Notes

- Screenshots are JPEG, max 1280x960. Coordinates are absolute pixels.
- `desktop_type_text` is ASCII only. For Unicode, use `browser_fill` or clipboard: `exec_run powershell Set-Clipboard`, then `desktop_press_key ctrl+v`.
- File content is base64 encoded. Decode after reading.
- The browser launches on first use and stays open across calls. Sessions persist (cookies, local storage).

don't have the plugin yet? install it then click "run inline in claude" again.

extracted decision logic from strategy into explicit if-else decision points, documented external mcp server connection and auth requirements as inputs, formalized the original procedure into 6 numbered steps with clear input/output, added edge cases for unicode, large files, timeouts, and permission errors, and spelled out success criteria in output contract and outcome signal sections.

Remote Claws , Remote Desktop Control

Item: Allow your claws to do things remotely on a Desktop machine via MCP
Rating: 8.2
Author: Implexa

Controls a remote machine over MCP/SSE. All 39 tools are provided by the remote-claws MCP server registered in openclaw.json.

intent

use remote claws when you need to interact with a desktop machine remotely: take screenshots, click buttons, type text, run commands, automate a browser, or transfer files. if the user asks you to do something "on the remote machine" or "on Windows," these are your tools. deploy this skill when direct desktop control beats api calls or manual steps, especially for UI automation, visual verification, or complex multi-step workflows that require seeing the screen state between actions.

inputs

MCP Server Registration

remote-claws MCP server registered in openclaw.json with bearer token authentication
env var or config: REMOTE_CLAWS_TOKEN (bearer token for 401 auth bypass)
env var: REMOTE_CLAWS_HOST (server endpoint, e.g. http://localhost:8000)

Security Configuration (optional but recommended)

allowed_ips: ip allowlist if configured on server side
allowed_hosts: host header validation list
permissions.json: per-tool restrictions (some tools may be disabled by policy)

Context from User

target task description (what to click, type, run, or screenshot)
optional: target window title, application name, or URL
optional: file paths for read/write operations

Network & Connection

active network connection to remote machine endpoint
low latency preferred for interactive workflows (screenshots + actions)

procedure

Take a screenshot. call desktop_screenshot with no region params to capture the full screen. this is your ground truth. examine the returned JPEG to identify window positions, buttons, text fields, and current state.
identify the target. from the screenshot, locate the ui element or window where the action needs to happen. note its approximate coordinates, visible label, or window title.
choose your tool category:
- if the task is web automation (form filling, link clicking, javascript execution) and a browser is open, use browser_* tools with css selectors. they are resolution-independent and more reliable.
- if the task is desktop ui interaction (windows ui, file dialogs, native apps), use desktop_* tools. prefer desktop_click_element with element names over desktop_mouse_click with coordinates.
- if the task is running commands or scripts, use exec_* tools.
- if the task is reading or writing files, use file_* tools.
execute the action. call the appropriate tool with coordinates, selectors, or element names from step 2. for text input, use desktop_type_text (ascii only) or browser_fill (unicode ok). for commands, use exec_run (async) or exec_get_output with wait=true (blocking).
re-screenshot and verify. after each action, call desktop_screenshot again. compare the new screenshot to the previous one. confirm the action took effect (button state changed, text appeared, dialog closed, process started). if the result is unexpected, troubleshoot and retry.
repeat. loop through steps 2-5 until the task is complete. take a final screenshot as evidence.

decision points

if user asks to interact with a web page and a browser is already open: use browser_* tools (css selectors, unicode support, resolution-independent). skip to step 4 with browser tools.
if user asks to interact with a web page but no browser is running: call browser_navigate to start one and load the url, then use browser_* tools.
if the task requires typing unicode or non-ascii characters: use browser_fill (if in a browser context) or clipboard workaround: exec_run "powershell Set-Clipboard -Value '<text>'" followed by desktop_press_key ctrl+v. do not use desktop_type_text for non-ascii.
if a command needs to run in the background (e.g. file downloads, long-running processes): use exec_run (returns immediately with process_id). do not wait. call exec_get_output with wait=true only if you need to block until completion.
if exec_get_output with wait=true times out or hangs: the remote process may be stuck. call exec_kill with the process_id to terminate it. then take a screenshot to assess the state.
if a file is very large (> 10 MB): use file_read with offset and limit params to stream in chunks rather than loading the entire file into memory. decode base64 per chunk on your end.
if the server returns 401 unauthorized: the bearer token is missing, expired, or invalid. check REMOTE_CLAWS_TOKEN env var and openclaw.json configuration. request a fresh token from the operator.
if the server returns 403 forbidden: permissions.json is restricting the tool. the operator has disabled it. fall back to manual steps or request permission elevation.
if a screenshot or action times out (network latency > 30 seconds): the remote machine or network is degraded. wait a moment, retry, or escalate to the user that the remote session is unstable.
if desktop_click_element fails because the element name is ambiguous or not found: fall back to desktop_mouse_click with coordinates from the screenshot. or call desktop_list_elements to enumerate all available named elements in the target window.

output contract

For screenshots:

JPEG image file, max 1280x960 pixels, saved locally or embedded in response.
coordinates are absolute screen pixels, top-left origin (0, 0).

For text reads (browser_get_text, desktop_get_element_text):

plain text string, utf-8 encoded.

For html reads (browser_get_html):

html markup string, utf-8 encoded.

For command execution (exec_get_output):

stdout and stderr as strings (utf-8). if wait=true, includes exit code (0 = success, non-zero = error).

For file reads (file_read):

base64-encoded file content. caller must base64-decode to get binary or text data.

For file writes (file_write):

success confirmation: file path and byte count written.

For element lists (desktop_list_elements, browser_wait_for):

array of object metadata (element name, type, position, visible state, text content).

For window finds (desktop_find_window):

array of matching window objects with title, class_name, x, y, width, height.

outcome signal

user confirms the action visually: you present the final screenshot showing the desired state (button clicked, form filled, file downloaded, command output visible).
task completes without errors: no 401/403/timeout errors, no stuck processes, no file i/o failures.
process exit code is 0: for command execution, stdout shows expected output and exit code is 0.
file operation succeeds: file_write returns byte count > 0; file_read returns base64 string; file_delete confirms path is gone.
browser or window state matches expectation: new url loaded, element value changed, dialog closed, tab opened/closed.
no manual fallback needed: you completed the entire workflow using remote claws tools without asking the user to take over.

Allow your claws to do things remotely on a Desktop machine via MCP

related skills

Remote Claws , Remote Desktop Control

intent

inputs

procedure

decision points

output contract

outcome signal