Control physical devices over HTTP with unified commands for screen reading, input actions, app launch, navigation, and audio output using the Claw Use proto...
# Claw Use — Device Control for AI Agents Give your AI agent eyes, hands, and a voice on real devices. Claw Use is a **protocol + skill** for AI agents to control physical devices over HTTP. The `cu` CLI provides a unified interface — the same commands work across any device that implements the Claw Use API. ## Supported Devices | Platform | Implementation | Status | |----------|---------------|--------| | Android | [claw-use-android](https://github.com/4ier/claw-use-android) | ✅ Available | | iOS | claw-use-ios | 🔮 Planned | | Desktop | claw-use-desktop | 🔮 Planned | ## Prerequisites - `cu` CLI installed (ships with claw-use-android, or install standalone) - At least one device running a Claw Use implementation - Device and agent on the same network (or connected via Tailscale) ## Setup ```bash # Add a device with a friendly name cu add redmi 192.168.0.105 <token> cu add pixel 100.80.1.10 <token> # List devices cu devices # ▸ redmi 192.168.0.105 online v1.2.0 # pixel 100.80.1.10 offline # Switch default cu use pixel # Target a specific device cu -d redmi screenshot ``` ## Core API (all platforms) Every Claw Use implementation exposes the same HTTP endpoints: ### Perception — read the device ```bash cu screen # UI tree (semantic: element text, bounds, state) cu screen -c # compact mode (interactive elements only) cu screenshot # visual capture (JPEG, configurable quality) cu notifications # system notifications cu status # device health dashboard ``` ### Action — control the device ```bash cu tap <x> <y> # tap coordinates cu click <text> # tap by visible text (semantic click) cu type "text" # type text (CJK supported) cu swipe up|down|left|right cu scroll up|down|left|right cu back / cu home # system navigation cu launch <app> # open an application cu open <url> # open URL cu intent '<json>' # platform-specific intent (Android) ``` ### Audio ```bash cu tts "hello" # speak through device speaker cu say "你好" # alias ``` ### Device State ```bash cu wake # wake screen cu lock / cu unlock # lock/unlock (PIN required for unlock) ``` ## Workflow Patterns ### Navigate and interact ```bash cu launch org.telegram.messenger cu screen -c # see what's on screen cu click "Search" cu type "John" cu click "John, last seen recently" cu type "Hey!" cu click "Send" ``` ### Visual + semantic dual-channel ```bash cu screen -c # semantic: what elements exist cu screenshot 50 720 /tmp/look.jpg # visual: what it actually looks like ``` ### Multi-device orchestration ```bash cu -d phone1 launch com.whatsapp cu -d phone2 screenshot cu -d tablet open "https://example.com" ``` ## For Agent Developers Claw Use is designed as a **protocol**, not just an app. To add support for a new platform: 1. Implement the [Claw Use HTTP API spec](https://github.com/4ier/claw-use-android#api-details) 2. Expose endpoints on a configurable port (default: 7333) 3. Support token auth via `X-Bridge-Token` header 4. Return JSON responses matching the documented schemas The `cu` CLI and this skill work automatically with any compliant implementation. ## Tips - **`cu screen -c`** is the primary perception tool — compact mode filters noise - **`cu click`** by text is more reliable than `cu tap` when text is visible - **`cu screenshot`** when you need visual context the UI tree can't capture - Auto-unlock is transparent: locked devices auto-unlock before any command - Combine with Tailscale for remote access from anywhere
don't have the plugin yet? install it then click "run inline in claude" again.