Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
---
name: computer-use
description: Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag, etc). Unlike OpenClaw's browser tool, operates at the X11 level so websites cannot detect automation. Includes VNC for live viewing.
version: 1.2.1
---
# Computer Use Skill
Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor.
## Environment
- **Display**: `:99`
- **Resolution**: 1024x768 (XGA, Anthropic recommended)
- **Desktop**: XFCE4 (minimal — xfwm4 + panel only)
## Quick Setup
Run the setup script to install everything (systemd services, flicker-free VNC):
```bash
./scripts/setup-vnc.sh
```
This installs:
- Xvfb virtual display on `:99`
- Minimal XFCE desktop (xfwm4 + panel, no xfdesktop)
- x11vnc with stability flags
- noVNC for browser access
All services auto-start on boot and auto-restart on crash.
## Actions Reference
| Action | Script | Arguments | Description |
|--------|--------|-----------|-------------|
| screenshot | `screenshot.sh` | — | Capture screen → base64 PNG |
| cursor_position | `cursor_position.sh` | — | Get current mouse X,Y |
| mouse_move | `mouse_move.sh` | x y | Move mouse to coordinates |
| left_click | `click.sh` | x y left | Left click at coordinates |
| right_click | `click.sh` | x y right | Right click |
| middle_click | `click.sh` | x y middle | Middle click |
| double_click | `click.sh` | x y double | Double click |
| triple_click | `click.sh` | x y triple | Triple click (select line) |
| left_click_drag | `drag.sh` | x1 y1 x2 y2 | Drag from start to end |
| left_mouse_down | `mouse_down.sh` | — | Press mouse button |
| left_mouse_up | `mouse_up.sh` | — | Release mouse button |
| type | `type_text.sh` | "text" | Type text (50 char chunks, 12ms delay) |
| key | `key.sh` | "combo" | Press key (Return, ctrl+c, alt+F4) |
| hold_key | `hold_key.sh` | "key" secs | Hold key for duration |
| scroll | `scroll.sh` | dir amt [x y] | Scroll up/down/left/right |
| wait | `wait.sh` | seconds | Wait then screenshot |
| zoom | `zoom.sh` | x1 y1 x2 y2 | Cropped region screenshot |
## Usage Examples
```bash
export DISPLAY=:99
# Take screenshot
./scripts/screenshot.sh
# Click at coordinates
./scripts/click.sh 512 384 left
# Type text
./scripts/type_text.sh "Hello world"
# Press key combo
./scripts/key.sh "ctrl+s"
# Scroll down
./scripts/scroll.sh down 5
```
## Workflow Pattern
1. **Screenshot** — Always start by seeing the screen
2. **Analyze** — Identify UI elements and coordinates
3. **Act** — Click, type, scroll
4. **Screenshot** — Verify result
5. **Repeat**
## Tips
- Screen is 1024x768, origin (0,0) at top-left
- Click to focus before typing in text fields
- Use `ctrl+End` to jump to page bottom in browsers
- Most actions auto-screenshot after 2 sec delay
- Long text is chunked (50 chars) with 12ms keystroke delay
## Live Desktop Viewing (VNC)
Watch the desktop in real-time via browser or VNC client.
### Connect via Browser
```bash
# SSH tunnel (run on your local machine)
ssh -L 6080:localhost:6080 your-server
# Open in browser
http://localhost:6080/vnc.html
```
### Connect via VNC Client
```bash
# SSH tunnel
ssh -L 5900:localhost:5900 your-server
# Connect VNC client to localhost:5900
```
### SSH Config (recommended)
Add to `~/.ssh/config` for automatic tunneling:
```
Host your-server
HostName your.server.ip
User your-user
LocalForward 6080 127.0.0.1:6080
LocalForward 5900 127.0.0.1:5900
```
Then just `ssh your-server` and VNC is available.
## System Services
```bash
# Check status
systemctl status xvfb xfce-minimal x11vnc novnc
# Restart if needed
sudo systemctl restart xvfb xfce-minimal x11vnc novnc
```
### Service Chain
```
xvfb → xfce-minimal → x11vnc → novnc
```
- **xvfb**: Virtual display :99 (1024x768x24)
- **xfce-minimal**: Watchdog that runs xfwm4+panel, kills xfdesktop
- **x11vnc**: VNC server with `-noxdamage` for stability
- **novnc**: WebSocket proxy with heartbeat for connection stability
## Opening Applications
```bash
export DISPLAY=:99
# Chrome — only use --no-sandbox if the kernel lacks user namespace support.
# Check: cat /proc/sys/kernel/unprivileged_userns_clone
# 1 = sandbox works, do NOT use --no-sandbox
# 0 = sandbox fails, --no-sandbox required as fallback
# Using --no-sandbox when unnecessary causes instability and crashes.
if [ "$(cat /proc/sys/kernel/unprivileged_userns_clone 2>/dev/null)" = "0" ]; then
google-chrome --no-sandbox &
else
google-chrome &
fi
xfce4-terminal & # Terminal
thunar & # File manager
```
**Note**: Snap browsers (Firefox, Chromium) have sandbox issues on headless servers. Use Chrome `.deb` instead:
```bash
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f
```
## Manual Setup
If you prefer manual setup instead of `setup-vnc.sh`:
```bash
# Install packages
sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 x11vnc novnc websockify
# Run the setup script (generates systemd services, masks xfdesktop, starts everything)
./scripts/setup-vnc.sh
```
If you prefer fully manual setup, the `setup-vnc.sh` script generates all systemd service files inline -- read it for the exact service definitions.
## Troubleshooting
### VNC shows black screen
- Check if xfwm4 is running: `pgrep xfwm4`
- Restart desktop: `sudo systemctl restart xfce-minimal`
### VNC flickering/flashing
- Ensure xfdesktop is masked (check `/usr/bin/xfdesktop`)
- xfdesktop causes flicker due to clear→draw cycles on Xvfb
### VNC disconnects frequently
- Check noVNC has `--heartbeat 30` flag
- Check x11vnc has `-noxdamage` flag
### x11vnc crashes (SIGSEGV)
- Add `-noxdamage -noxfixes` flags
- The DAMAGE extension causes crashes on Xvfb
## Requirements
Installed by `setup-vnc.sh`:
```bash
xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 x11vnc novnc websockify
```
don't have the plugin yet? install it then click "run inline in claude" again.
Control a full desktop GUI on headless Linux servers by spawning a virtual X11 display (Xvfb + XFCE4) and automating mouse/keyboard input via xdotool. Use this skill to run and interact with any desktop application on VPS or cloud instances without a physical monitor or browser restrictions. Unlike browser-based tools, X11-level automation bypasses website bot detection.
Display Environment
DISPLAY=:99 (xvfb virtual framebuffer on display 99)System Dependencies
-noxdamage -noxfixes flags mandatory)External Services / VNC Access (Optional)
http://localhost:6080/vnc.html via SSH LocalForwardSetup Script
./scripts/setup-vnc.sh installs all packages, generates systemd service files, masks xfdesktop, auto-starts servicesSystem State
/proc/sys/kernel/unprivileged_userns_clone status (1 = sandbox enabled, 0 = use --no-sandbox for Chrome)Install and Setup: Run ./scripts/setup-vnc.sh to install all dependencies, generate systemd services, and auto-start the virtual desktop stack. Services auto-restart on crash and auto-start on reboot.
Verify Desktop Stack: Check all services are running with systemctl status xvfb xfce-minimal x11vnc novnc. If any service is down, restart with sudo systemctl restart xvfb xfce-minimal x11vnc novnc.
Export Display Variable: Set export DISPLAY=:99 in your shell or script before running any actions.
Take Initial Screenshot: Run ./scripts/screenshot.sh to capture the current desktop state and receive a base64-encoded PNG. This establishes visual baseline.
Identify Target Coordinates: Parse the screenshot to locate UI elements (buttons, text fields, menu items). Note the X,Y pixel coordinates (origin at top-left). Use ./scripts/zoom.sh x1 y1 x2 y2 to capture and enlarge a specific region if precision is needed.
Move Mouse (Optional): Run ./scripts/mouse_move.sh x y to move the cursor to target coordinates without clicking. Useful for hover-triggered UI changes.
Click Element: Execute one of the click actions:
./scripts/click.sh x y left for left click./scripts/click.sh x y right for right click./scripts/click.sh x y middle for middle click./scripts/click.sh x y double for double click (text selection)./scripts/click.sh x y triple for triple click (select entire line)Type Text: Use ./scripts/type_text.sh "your text here" to enter text. Text is chunked into 50-character segments with 12ms delay between keystrokes to avoid buffer overflow. Always click a text field first to focus it.
Press Keys: Execute ./scripts/key.sh "key_combo" to press single keys or key combinations (e.g. Return, ctrl+c, alt+F4, ctrl+s, ctrl+End). Use ctrl+End to jump to bottom of web pages in browsers.
Hold Key Down: Run ./scripts/hold_key.sh "key" seconds to hold a modifier or key for a specified duration. Output shows the key was held.
Drag and Drop: Execute ./scripts/left_click_drag.sh x1 y1 x2 y2 to click at start coordinates, hold, move to end coordinates, and release. Simulates click-hold-move-release.
Mouse Down / Mouse Up: Use ./scripts/left_mouse_down.sh and ./scripts/left_mouse_up.sh for fine-grained drag control when step 11 is insufficient (e.g. custom drag curves). Down presses, up releases.
Scroll: Run ./scripts/scroll.sh direction amount [x y] where direction is up, down, left, or right, and amount is the wheel notches (e.g. 5). Optional x,y centers the scroll at a specific point; defaults to screen center. Use in web pages and scrollable lists.
Wait and Capture: Execute ./scripts/wait.sh seconds to pause for animations/network requests to complete, then auto-capture a screenshot. Typical waits are 1-3 seconds.
Get Cursor Position: Run ./scripts/cursor_position.sh to retrieve current mouse X,Y coordinates as JSON (e.g. {"x": 512, "y": 384}). Useful for debugging or conditional logic.
Verify Result: Take a screenshot with ./scripts/screenshot.sh to confirm the action succeeded (button pressed, text entered, page loaded, etc.).
Loop or Exit: Repeat steps 4-16 for subsequent tasks, or stop when the goal is reached.
Chrome Sandbox Check (Required)
/proc/sys/kernel/unprivileged_userns_clone:1: kernel supports user namespaces, run google-chrome & without flags.0: kernel does not support sandboxing, run google-chrome --no-sandbox & as fallback.--no-sandbox when unnecessary causes crashes and instability, so only add it if the check returns 0.Snap vs. DEB Browser Selection
.deb package instead (provided installation command in inputs).VNC Black Screen (Desktop Not Rendering)
pgrep xfwm4.sudo systemctl restart xfce-minimal.VNC Flickering or Flashing
-noxdamage -noxfixes flags (check systemd service file or ps aux | grep x11vnc).VNC Connection Drops Frequently
--heartbeat 30 flag in systemd service.-noxdamage -noxfixes flags enabled.x11vnc Process Crashes (SIGSEGV)
-noxdamage -noxfixes flags.Long Text Input (>1KB)
type_text.sh calls with small waits between to avoid buffer overflow.Screenshot Timeout or Blank Output
ps aux | grep Xvfb.Network Timeouts or Rate Limits (Web Automation)
./scripts/wait.sh 2 before critical actions).type_text.sh multiple times with waits between.Empty Result Sets (No UI Elements Found)
./scripts/wait.sh 3.pgrep appname).SSH Tunnel Not Connected
~/.ssh/config or SSH command.ssh -L 6080:localhost:6080 user@server then curl http://localhost:6080/ from local machine.