KasmVNC-based virtual desktop for headless Linux with AI-first automation and human handoff. Use when most steps are automated but a user must manually inter...
--- name: virtual-remote-desktop description: KasmVNC-based virtual desktop for headless Linux with AI-first automation and human handoff. Use when most steps are automated but a user must manually intervene for captcha/risk-control/login approval, then return to automation. Includes requirement-driven setup for mobile/desktop takeover and browser mobile/desktop rendering. --- # Virtual Remote Desktop (KasmVNC edition) ## What this skill is for Use this when the workflow is: 1. AI runs browser automation most of the time 2. Captcha / risk-control / MFA appears 3. User takes over remotely for 1-3 minutes 4. AI continues automatically This version replaces x11vnc+noVNC with **KasmVNC** and keeps computer-use style action scripts for AI control. ## Core commands ### 0) Install KasmVNC (one-time) ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/install_kasmvnc.sh ``` This installer also prepares required runtime tools: - `fluxbox` (lightweight desktop) - `xdotool` + `scrot` (computer-use actions) - `xauth` ### 1) Requirement-driven start (recommended) Before starting, always confirm these user requirements: 1. takeover device: **手机** or **电脑** 2. website render mode: **手机网页** or **桌面网页** 3. access mode: **本地隧道** (`127.0.0.1`) or **临时公网** (`0.0.0.0`) 4. network quality: **弱网 / 普通 / 良好** Use guided script (interactive Q&A): ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd_guided.sh ``` Preview config without starting: ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd_guided.sh --dry-run ``` ### 1.1) Direct start (manual env) ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd.sh ``` Important env vars: - `AUTO_LAUNCH_URL` (optional): open target page automatically - `KASM_BIND` (default `127.0.0.1`, safer) - `AUTO_STOP_IDLE_SECS` (default `900`) - `BROWSER_MOBILE_MODE=1` (launch browser with mobile emulation) - `BROWSER_DEVICE=iphone14pro|pixel7|ipad` Example: ```bash AUTO_LAUNCH_URL="https://example.com/login" \ AUTO_STOP_IDLE_SECS=1200 \ bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd.sh ``` Mobile-friendly VNC stream example (better phone takeover UX): ```bash MOBILE_MODE=1 MOBILE_PRESET=phone \ AUTO_STOP_IDLE_SECS=900 \ bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd.sh ``` Mobile stream options: - `MOBILE_MODE=1` enables mobile defaults - `MOBILE_PRESET=phone|tablet` sets default resolution (`960x540` / `1280x720`) - `KASM_MAX_FPS` can be lowered further (e.g. `18`) on weak networks Browser mobile emulation (website renders as mobile page): ```bash AUTO_LAUNCH_URL="https://example.com" \ BROWSER_MOBILE_MODE=1 BROWSER_DEVICE=iphone14pro \ bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/start_vrd.sh ``` Notes: - Browser mobile emulation changes UA + viewport + touch behavior. - It is different from `MOBILE_MODE` (which optimizes VNC stream size). ### 2) Check status / health ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/status_vrd.sh bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/health_vrd.sh ``` ### 3) Stop desktop ```bash bash /home/ubuntu/.openclaw/workspace/skills/virtual-remote-desktop/scripts/stop_vrd.sh ``` --- ## AI automation actions (computer-use style) All actions run on the active virtual display from `pids.env`. ```bash # screenshot (base64) bash scripts/action_screenshot.sh # click / type / key / scroll bash scripts/action_click.sh 500 420 left bash scripts/action_type.sh "hello" bash scripts/action_key.sh "ctrl+l" bash scripts/action_scroll.sh down 4 # helpers bash scripts/action_mouse_move.sh 800 300 bash scripts/action_cursor_position.sh bash scripts/action_wait.sh 2 ``` Recommended loop: 1. `action_screenshot.sh` 2. analyze 3. `action_click/type/key/...` 4. screenshot verify 5. repeat --- ## Best handoff pattern (AI ↔ User) When captcha/risk-control appears: 1. AI sends user the KasmVNC URL + username/password 2. User manually solves challenge 3. User replies “done” 4. AI runs screenshot + validation step 5. AI resumes automation This avoids full manual operation while keeping recovery fast. --- ## UX presets ### Preset A: safe local tunnel (recommended) - `KASM_BIND=127.0.0.1` - user connects via SSH tunnel - best for security ### Preset B: temporary public takeover - `KASM_BIND=0.0.0.0` - short `AUTO_STOP_IDLE_SECS` (e.g. 300) - use only for urgent remote intervention --- ## Notes - KasmVNC uses HTTPS + per-user auth (username/password). - This skill stores runtime files in `~/.openclaw/vrd-data` by default. - Browser profile persistence is controlled by `CHROME_PROFILE_DIR`. - If captcha frequency is high, keep one long-lived profile to reduce repeated challenges.
don't have the plugin yet? install it then click "run inline in claude" again.