Openclaw Skill Browser Use

Autonomous browser automation for AI agents. Two tools: agent-browser (CLI Playwright for step-by-step control) and browser-use (Python autonomous agent that...

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

openclaw-skill-browser-use provides two complementary tools for web automation: agent-browser for step-by-step cli control and browser-use for autonomous python-based agents. handles navigation, form filling, data extraction, screenshots, and complex multi-step flows.

structure

7.0

trigger phrases

8.0

procedure

8.0

edge cases

6.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: Browser Use
description: >
  Autonomous browser automation for AI agents. Two tools: agent-browser (CLI Playwright for step-by-step control)
  and browser-use (Python autonomous agent that decides what to do on pages). Navigate, click, fill forms,
  scrape data, manage sessions, and run complex multi-step browser tasks.
read_when:
  - Automating web interactions beyond simple fetch
  - Filling forms or completing multi-step web flows
  - Scraping structured data from dynamic pages
  - Running an autonomous browsing agent for complex tasks
  - Testing or interacting with authenticated web apps
  - Taking screenshots or recording browser sessions
metadata:
  clawdbot:
    emoji: "🌐"
    requires:
      bins: ["node", "npm", "python3"]
      system: ["chromium", "xvfb"]
allowed-tools: Bash(agent-browser:*,browser-use-agent:*,xvfb-run:*)
---

# Browser Use — Autonomous Browser Automation

Two complementary tools for browser automation:

| Tool | Best for | How it works |
|------|----------|-------------|
| **agent-browser** | Step-by-step control, scraping, form filling | CLI commands, you drive each action |
| **browser-use** | Complex autonomous tasks | Python agent that decides actions itself |

## Quick Start

### agent-browser (recommended for most tasks)

```bash
# Navigate and inspect
agent-browser open "https://example.com"
agent-browser snapshot -i          # Get interactive elements with @refs

# Interact using refs
agent-browser click @e3            # Click element
agent-browser fill @e2 "text"      # Fill input (clears first)
agent-browser press Enter          # Press key

# Extract data
agent-browser get text @e1         # Get element text
agent-browser get attr @e1 href    # Get attribute
agent-browser screenshot /tmp/p.png # Screenshot

# Done
agent-browser close
```

### browser-use (autonomous agent)

```bash
# Run a full autonomous browsing task
browser-use-agent "Find the pricing for Notion and compare plans"
```

The agent will navigate, click, read pages, and return a structured result.

## agent-browser — Full Reference

### Navigation
```bash
agent-browser open <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page
agent-browser close                # Close browser
```

### Snapshot (page analysis)
```bash
agent-browser snapshot             # Full accessibility tree
agent-browser snapshot -i          # Interactive elements only (recommended)
agent-browser snapshot -c          # Compact output
agent-browser snapshot -d 3        # Limit depth to 3
agent-browser snapshot -s "#main"  # Scope to CSS selector
agent-browser snapshot -i --json   # JSON output for parsing
```

### Interactions (use @refs from snapshot)
```bash
agent-browser click @e1            # Click
agent-browser dblclick @e1         # Double-click
agent-browser fill @e2 "text"      # Clear and type (use this for inputs)
agent-browser type @e2 "text"      # Type without clearing
agent-browser press Enter          # Press key
agent-browser press Control+a      # Key combination
agent-browser hover @e1            # Hover
agent-browser check @e1            # Check checkbox
agent-browser uncheck @e1          # Uncheck checkbox
agent-browser select @e1 "value"   # Select dropdown option
agent-browser scroll down 500      # Scroll page
agent-browser scrollintoview @e1   # Scroll element into view
agent-browser drag @e1 @e2         # Drag and drop
agent-browser upload @e1 file.pdf  # Upload files
```

### Extract Data
```bash
agent-browser get text @e1         # Get element text
agent-browser get html @e1         # Get innerHTML
agent-browser get value @e1        # Get input value
agent-browser get attr @e1 href    # Get attribute
agent-browser get title            # Page title
agent-browser get url              # Current URL
agent-browser get count ".item"    # Count matching elements
```

### Wait
```bash
agent-browser wait @e1             # Wait for element
agent-browser wait 2000            # Wait milliseconds
agent-browser wait --text "Done"   # Wait for text to appear
agent-browser wait --url "/dash"   # Wait for URL pattern
agent-browser wait --load networkidle  # Wait for network idle
```

### Screenshots, PDF & Recording
```bash
agent-browser screenshot path.png      # Save screenshot
agent-browser screenshot --full        # Full page screenshot
agent-browser pdf output.pdf           # Save as PDF
agent-browser record start ./demo.webm # Start recording
agent-browser record stop              # Stop and save
```

### Sessions (parallel browsers)
```bash
agent-browser --session s1 open "https://site1.com"
agent-browser --session s2 open "https://site2.com"
agent-browser session list
```

### State (persist auth/cookies)
```bash
agent-browser state save auth.json     # Save session (cookies, storage)
agent-browser state load auth.json     # Restore session
```

### Cookies & Storage
```bash
agent-browser cookies                  # Get all cookies
agent-browser cookies set name value   # Set cookie
agent-browser cookies clear            # Clear cookies
agent-browser storage local            # Get all localStorage
agent-browser storage local set k v    # Set value
```

### Tabs & Frames
```bash
agent-browser tab                      # List tabs
agent-browser tab new [url]            # New tab
agent-browser tab 2                    # Switch to tab
agent-browser frame "#iframe"          # Switch to iframe
agent-browser frame main               # Back to main frame
```

### Browser Settings
```bash
agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set media dark
```

### JavaScript
```bash
agent-browser eval "document.title"    # Run JS in page context
```

## browser-use — Autonomous Agent

For complex tasks where you want the agent to figure out the browsing steps:

```bash
browser-use-agent "Your task description here"
```

### Custom Script (advanced)

```python
# Run via: /opt/browser-use/bin/python3 script.py
import asyncio, os
from browser_use import Agent, Browser
from langchain_anthropic import ChatAnthropic

async def run():
    browser = Browser()
    llm = ChatAnthropic(
        model='claude-sonnet-4-20250514',
        api_key=os.environ['ANTHROPIC_API_KEY']
    )
    agent = Agent(
        task="Compare pricing on 3 competitor sites",
        llm=llm,
        browser=browser,
    )
    result = await agent.run(max_steps=15)
    await browser.close()
    return result

asyncio.run(run())
```

You can swap the LLM for any langchain-compatible model (OpenAI, Anthropic, etc).

## Standard Workflow

```bash
# 1. Open page
agent-browser open "https://example.com"

# 2. Snapshot to see what's on the page
agent-browser snapshot -i

# 3. Interact with elements using @refs from snapshot
agent-browser fill @e1 "search query"
agent-browser click @e2

# 4. Wait for new page to load
agent-browser wait --load networkidle

# 5. Re-snapshot (refs change after navigation!)
agent-browser snapshot -i

# 6. Extract what you need
agent-browser get text @e5

# 7. Close when done
agent-browser close
```

## Important Rules

1. **Always `snapshot -i` after navigation** — refs change on every page load
2. **Use `fill` not `type`** for inputs — fill clears existing text first
3. **Wait after clicks that trigger navigation** — `wait --load networkidle`
4. **Close the browser when done** — `agent-browser close`
5. **Google/Bing block headless browsers** (CAPTCHA) — use DuckDuckGo or `web_search` instead
6. **Save auth state** for sites requiring login — `state save/load`
7. **Use `--json`** when you need machine-parseable output
8. **Use sessions** for parallel browsing — `--session <name>`

## Troubleshooting

- **Element not found**: Re-run `snapshot -i` to get current refs
- **Page not loaded**: Add `wait --load networkidle` after navigation
- **CAPTCHA on search engines**: Use DuckDuckGo or the `web_search` tool instead
- **Auth expired**: Re-login and `state save` again
- **Display errors**: The install script sets up Xvfb for headless rendering

don't have the plugin yet? install it then click "run inline in claude" again.

separated agent-browser (manual) from browser-use-agent (autonomous) into distinct decision paths, added explicit inputs/outputs for each of 7 procedure steps, documented anthropic API as external connection, called out stale @refs and CAPTCHA edge cases in decision points, defined success criteria and outcome signals clearly.

Openclaw Skill Browser Use

intent

automate web interactions beyond simple HTTP requests when you need to navigate dynamic pages, fill forms, scrape data from JavaScript-rendered content, manage authenticated sessions, or run complex multi-step browser workflows. use agent-browser for step-by-step control where you drive each action, or browser-use for autonomous tasks where an AI agent decides what to do on the page.

inputs

system requirements

node and npm (for agent-browser CLI)
python3.8+ (for browser-use autonomous agent)
chromium browser (headless or display)
xvfb (X virtual framebuffer for headless rendering on linux)

external connections

anthropic api (for browser-use agent): set ANTHROPIC_API_KEY env var with valid claude API key. required for autonomous tasks.
langchain compatibility: browser-use accepts any langchain-compatible LLM (openai, anthropic, etc). swap the LLM class in the python script if using non-anthropic models.

setup steps

ensure chromium is installed: apt-get install chromium-browser (linux) or brew install chromium (macos).
install xvfb if headless: apt-get install xvfb (linux).
set ANTHROPIC_API_KEY if using browser-use agent: export ANTHROPIC_API_KEY=sk-...
have agent-browser and browser-use-agent available in PATH (provided by openclaw runtime).

procedure

step 1: open a browser session

input: target URL

agent-browser open "https://example.com"

output: browser window opens, page loads. on success, ready for snapshot.

step 2: analyze page structure (snapshot)

input: none (or optional CSS selector for scoping)

agent-browser snapshot -i

output: list of interactive elements with @ref identifiers (e.g., @e1, @e2). use -i flag to get only clickable/fillable elements. use --json for machine-readable output.

step 3a: interact with elements (manual control via agent-browser)

input: @ref from snapshot, interaction type (click, fill, press, hover, etc.), optional text/value

agent-browser click @e3
agent-browser fill @e2 "search query"
agent-browser press Enter
agent-browser select @e5 "option-value"
agent-browser check @e1
agent-browser upload @e7 /path/to/file.pdf

output: element state changes (text entered, button clicked, checkbox toggled). page may load new content.

step 3b: run autonomous task (browser-use agent)

input: task description as string

browser-use-agent "find pricing for notion and compare plans"

output: structured result object containing final page state and extracted data. agent runs up to max_steps (default 15) to complete task.

step 4: wait for page state changes

input: wait condition (element, time, text, URL pattern, or network state)

agent-browser wait --load networkidle
agent-browser wait @e1
agent-browser wait --text "Loaded"
agent-browser wait 2000

output: blocks until condition met or timeout (default 30s). critical after navigation or form submission.

step 5: extract data

input: element @ref, data type (text, html, value, attribute, title, url, count)

agent-browser get text @e4
agent-browser get attr @e6 href
agent-browser get url
agent-browser get title
agent-browser get count ".item-card"

output: string (text content, HTML, URL, title) or integer (count). ready for processing downstream.

step 6: persist session state (optional, for authenticated flows)

input: file path for state storage

agent-browser state save ./auth-session.json

output: session file written containing cookies, localStorage, sessionStorage. later: agent-browser state load ./auth-session.json to restore.

step 7: close browser

input: none

agent-browser close

output: browser process terminates. session ends.

decision points

if you need to drive each action explicitly, use agent-browser: step-by-step CLI control where you control snapshot, click, wait, extract. good for scraping, form filling, testing flows you fully understand.

if you need autonomous task execution, use browser-use-agent: give the agent a task description and max_steps, it figures out what to click/navigate/extract. good for open-ended exploration or complex multi-page flows where step sequence is not predetermined.

if element is not found after interaction: re-run snapshot -i because element @refs change after page navigation. old refs are stale.

if page content loads asynchronously (JS renders data): use wait --load networkidle after navigation to ensure network idle. use wait --text "pattern" if you know text that appears after load.

if you need to test auth-protected pages: state save after login, then state load to restore auth in future runs. avoids re-logging in on every session.

if facing CAPTCHA on google/bing search: switch to duckduckgo or use web_search skill instead. headless browsers trigger CAPTCHA; interactive browsers don't.

if running multiple browser sessions in parallel: use --session <name> flag to isolate state. e.g., agent-browser --session s1 open url1 and agent-browser --session s2 open url2 run independently.

if you need machine-readable output (parsing JSON): use snapshot --json, get with --json flag, or parse browser-use-agent result object in python.

output contract

agent-browser outputs

snapshot: plaintext accessibility tree (or JSON with --json flag). contains element @refs, text, attributes, hierarchy. use for identifying targets.
get text/html/value/attr: plaintext string. empty string if element not found.
screenshot: PNG file at specified path (e.g., /tmp/page.png). full-page if --full flag.
pdf: PDF file at specified path (e.g., output.pdf).
recording: WebM file at specified path (e.g., ./demo.webm). only if record start then record stop.
state save: JSON file containing cookies and storage blobs.

browser-use-agent outputs

result object: python dict or JSON with keys: success (bool), final_url (str), extracted_data (varies by task), steps_taken (int), summary (str). exact schema depends on agent logic and task.

outcome signal

agent-browser success:

snapshot returns element list with visible @refs (not empty).
interactions execute without error messages.
get text @e1 returns non-empty string if element contains text.
screenshot file is written to disk with size > 0.
wait command completes without timeout (success) or logs timeout (failure).
close returns exit code 0.

browser-use-agent success:

script exits with exit code 0.
result object has success: true.
extracted_data contains expected fields (varies by task).
steps_taken is > 0 and <= max_steps (did not timeout or fail early).
stdout contains summary of what agent did (readable action log).

user knows it worked when:

extracted data matches expected content (e.g., pricing found, form filled, link scraped).
screenshot shows page in expected state (form filled, button clicked, data visible).
session state file written after login (confirms auth preserved).
no error logs or timeout messages in stderr.
returned data can be parsed and used downstream (non-null, valid format).

Openclaw Skill Browser Use

related skills

Openclaw Skill Browser Use

intent

inputs

system requirements

external connections

setup steps

procedure

step 1: open a browser session

step 2: analyze page structure (snapshot)

step 3a: interact with elements (manual control via agent-browser)

step 3b: run autonomous task (browser-use agent)

step 4: wait for page state changes

step 5: extract data

step 6: persist session state (optional, for authenticated flows)

step 7: close browser

decision points

output contract

agent-browser outputs

browser-use-agent outputs

outcome signal