Open Browser Use

Platform-neutral guidance for using Open Browser Use, the open-source Chrome automation stack for AI agents. Use when an agent needs to install, verify, trou...

installs

stars

karma

SkillRank score ↗

8.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-07-21

open-browser-use provides platform-neutral automation for real Chrome profiles via CLI, SDKs, and MCP. covers multi-profile selection, tab lifecycle management, session isolation, and file/clipboard operations with explicit operating rules to protect user data.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

7.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: open-browser-use
description: Platform-neutral guidance for using Open Browser Use, the open-source Chrome automation stack for AI agents. Use when an agent needs to install, verify, troubleshoot, or operate Open Browser Use through its browser extension, native CLI, JavaScript SDK, Python SDK, Go SDK, or Browser Use style JSON-RPC methods; use for tasks involving real Chrome tabs, user tab claiming, CDP commands, downloads, file choosers, clipboard helpers, or session cleanup.
---

# Open Browser Use

## Overview

Open Browser Use connects an MV3 Chrome extension, a local native messaging host, a CLI, SDKs, and an optional stdio MCP server so agents can automate a real Chrome profile. It is not Codex.app-specific; adapt the commands, MCP config, and SDK examples to the agent runtime you are operating in.

## Core Workflow

1. Check setup with `open-browser-use ping` or `obu ping`. If it fails because setup is missing, read [references/installation.md](references/installation.md).
2. Pick the right Chrome profile if multiple are installed. See "Multi-profile handling" below before issuing browser commands.
3. Choose a unique browser session id for the current agent task before opening or claiming tabs. Prefer the surrounding runtime's conversation/session id when available; otherwise create a short unique id such as `obu-<task-slug>-<timestamp>`. Reuse that same id for every Open Browser Use command in this task.
3. Name the current browser task group before opening or claiming tabs. Use a short task label followed by ` - OBU`; if no better task label is available, use `Task - OBU`.
4. Use the CLI for simple inspection or one-shot actions: `info`, `tabs`, `user-tabs`, `history`, `open-tab`, `navigate`, `cdp`, and `call`.
5. Use `open-browser-use run` / `obu run` for CLI-level multi-step orchestration when a small line-oriented action plan is enough and writing SDK code would be unnecessary.
6. If the surrounding agent runtime supports local MCP servers, configure `obu mcp` and call the exposed browser tools directly. Use the `run_action_plan` MCP tool for the same line-oriented orchestration from MCP. Read [references/sdk-and-protocol.md](references/sdk-and-protocol.md).
7. Use the JavaScript, Python, or Go SDK for larger multi-step workflows, event subscriptions, richer control flow, or when the surrounding agent runtime already runs code. Read [references/sdk-and-protocol.md](references/sdk-and-protocol.md).
8. Before ending browser work, release or keep session tabs with `open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '<json-array>'`, the MCP `finalize_tabs` tool, or the SDK `finalizeTabs` / `finalize_tabs` / `FinalizeTabs` method.
9. If communication fails after setup, read [references/troubleshooting.md](references/troubleshooting.md).

## Operating Rules

- Treat the browser as the user's real Chrome profile. Do not inspect cookies, passwords, session stores, or unrelated browser data.
- Ask the user before installing the extension, opening Chrome for them, enabling extension permissions, uploading local files, reading/writing clipboard data, submitting forms, purchasing, deleting, sending, or making other externally visible changes.
- Do not assume Codex.app helpers, Node REPL globals, or a bundled plugin UI are available. Use the installed `open-browser-use` / `obu` CLI or the published SDKs.
- Do not guess tab ids. List tabs first, then use ids returned by `tabs`, `user-tabs`, `open-tab`, or SDK calls.
- Prefer `claim-tab` / `claimUserTab` for existing user tabs. Claiming should be based on the current `user-tabs` result and visible evidence such as URL, title, recency, or group.
- Use `--socket` only when the user or runtime provides an explicit socket. Otherwise let the CLI and SDKs discover the active socket registry.
- Do not rely on the CLI fallback session `obu-cli` for agent tasks. Always pass a task-unique `--session-id` to CLI and MCP commands, or set `sessionId` / `session_id` / `SessionID` in SDK clients. The fallback exists for quick manual use and can reuse stale task groups across unrelated agent sessions.
- Direct CLI subcommands and `open-browser-use run` can share the same browser session only when they use the same explicit `--session-id`. Finalize that same session before ending browser work.
- Use `call --method <method> --params '<json>'` only when no safer convenience command or SDK wrapper exists.

## Multi-profile handling

Some users run Chrome with several profiles (work, personal, side accounts). If
more than one profile has the Open Browser Use extension installed, the agent
must decide which profile this task should operate on rather than silently
picking whatever Chrome window happens to be active.

1. Before any browser command, list installed profiles:

```sh
open-browser-use profiles --connected
```

Columns: `DIRECTORY` (stable id like `Default`, `Profile 1`), `DISPLAY NAME`
(what the user sees in the Chrome avatar menu), `VERSION`, and `CONNECTED`
(whether that profile's host is currently reachable). JSON output is
available via `--json`.

2. If exactly one profile is installed and connected, proceed without asking.
If it is installed but not connected, ask the user to open Chrome on that
profile before running browser commands.

3. If multiple profiles are installed and the user did not already specify
which one to use, ask before the first browser command. List both directory
name and display name so the user can recognize them, and include whether
each profile is connected.

4. If the chosen profile is not connected, ask the user to open Chrome on that
profile before retrying. Do not silently fall back to a different connected
profile.

5. After the user has chosen, pass `--profile <selector>` to every CLI / MCP
command for the rest of the task. The selector accepts either the directory
name (`Default`, `Profile 1`) or the display name (`Eva`, `cookiy.com`),
case-insensitive. Do not switch profiles mid-task.

6. If `--profile` does not match any running host, the CLI prints which
profiles are currently connected. Ask the user to open Chrome on the chosen
profile, then retry; do not silently fall back to a different profile.

7. For MCP, lock the profile at server start:

```toml
[mcp_servers.open_browser_use]
command = "obu"
args = ["mcp", "--session-id", "obu-<task-id>", "--profile", "<selector>"]
```

Do not pass profile as a per-tool-call argument — the MCP server applies the
start-time selector to every call.

8. Do not remember the user's profile choice across unrelated tasks. A future
task may belong to a different profile; ask again rather than assuming.

## Common CLI Actions

```sh
export OBU_SESSION_ID="obu-docs-scan-$(date +%Y%m%d%H%M%S)"
open-browser-use ping --session-id "$OBU_SESSION_ID"
open-browser-use info --session-id "$OBU_SESSION_ID"
open-browser-use name-session --session-id "$OBU_SESSION_ID" --name "Task - OBU"
open-browser-use tabs --session-id "$OBU_SESSION_ID"
open-browser-use user-tabs --session-id "$OBU_SESSION_ID"
open-browser-use history --session-id "$OBU_SESSION_ID" --query "example" --limit 20
open-browser-use open-tab --session-id "$OBU_SESSION_ID" --url https://example.com
open-browser-use navigate --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --url https://example.com
open-browser-use cdp --session-id "$OBU_SESSION_ID" --tab-id <tab-id> --method Runtime.evaluate --params '{"expression":"document.title"}'
open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'
```

For CLI-level orchestration without writing SDK code, use a line-oriented
action plan:

```sh
open-browser-use run --session-id "$OBU_SESSION_ID" -c '
name-session "Docs scan - OBU"
open-tab https://docs.browser-use.com
wait-load domcontentloaded
page-info
finalize-tabs []
'
```

Each action line shares one session/turn. `open-tab` and `claim-tab` set the
default tab for later tab-scoped actions such as `wait-load`, `page-info`,
`navigate`, `cdp`, `move-mouse`, and `wait-file-chooser`.

Use `obu` as the short alias when available.

## MCP Usage

For runtimes that can launch local MCP servers over stdio, use:

```toml
[mcp_servers.open_browser_use]
command = "obu"
args = ["mcp", "--session-id", "obu-<task-or-conversation-id>"]
```

Use a fresh `--session-id` value per agent task or conversation. If the runtime
has a stable conversation/session id, derive the MCP `--session-id` from it.

The MCP server exposes tools including `user_tabs`, `open_tab`, `claim_tab`,
`navigate`, `wait_load`, `page_info`, `cdp`, `history`, `run_action_plan`,
`finalize_tabs`, and unrestricted `call`.

Use `run_action_plan` when the runtime wants to execute the same compact action
plan format available through `open-browser-use run` without shelling out for
each individual browser operation.

## Tab Lifecycle

- Session tabs are tabs Open Browser Use has created or claimed for the current agent workflow.
- Use one unique session id per agent task or conversation. Do not share the fallback `obu-cli` session across unrelated tasks.
- Task session groups should be named from the task, using the pattern `<short task> - OBU`. Use `Task - OBU` as the fallback name.
- Keep no tabs by default: `open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '[]'`.
- Keep a tab only when the user needs that live page after the turn. Omit research, source, search, intermediate, duplicate, blank, error, and login/navigation tabs after extracting what you need.
- Keep a tab with `status: "deliverable"` when the tab itself is the user-facing output or requested open page, such as a created or edited document, dashboard, checkout/cart, submitted form result, or a page the user explicitly asked to inspect directly.
- Keep a tab with `status: "handoff"` only when the task is still in progress and the user or a later turn should continue from the current task group, such as a page waiting for user input, login, approval, payment, CAPTCHA, or an unfinished workflow.
- Handoff tabs stay in the task session group. Deliverable tabs move to the shared `✅ Open Browser Use` tab group.
- Run finalization as the last Open Browser Use browser action for the turn. Do not call Open Browser Use browser tools after finalizing; if more browser work is needed, do it first and finalize once with the final tab disposition.

## File Choosers, Downloads, And Clipboard

- File uploads use the intercepted file chooser flow: start waiting, trigger the chooser in the page, then set absolute local paths with `set-file-chooser-files` or the SDK equivalent.
- Downloads can be observed with SDK notification handlers or Browser Use methods such as `waitForDownload` and `downloadPath`.
- Clipboard helpers operate through the current controlled tab and should be treated as sensitive user actions.

## References

- [references/installation.md](references/installation.md): one-time CLI and browser extension setup, including cases where user cooperation is required.
- [references/sdk-and-protocol.md](references/sdk-and-protocol.md): JavaScript, Python, Go, socket, and JSON-RPC usage details.
- [references/troubleshooting.md](references/troubleshooting.md): connection failures, stale sockets, extension/native host checks, and permission issues.

don't have the plugin yet? install it then click "run inline in claude" again.

extracted and formalized 6 implexa components from procedural overview, added explicit decision trees for multi-profile handling and tab disposition, documented inputs with setup guidance and external references, added edge cases (stale sockets, auth expiry, empty results, network timeouts), clarified output contract with success criteria, and specified outcome signals for CLI, MCP, and SDK usage.

Open Browser Use

Item: Open Browser Use
Rating: 8.3
Author: Implexa

intent

Open Browser Use is an open-source Chrome automation stack that connects an MV3 extension, native messaging host, CLI, and SDKs so agents can control a real Chrome profile. use this skill when your agent needs to automate browser actions (navigate, extract data, interact with forms, upload files, monitor downloads), manage multiple Chrome profiles, claim existing user tabs, run multi-step workflows, or handle file uploads and clipboard operations. not codex.app-specific; adapt commands and config to your agent runtime.

inputs

required:

open-browser-use CLI installed and in PATH (or alias obu available)
Chrome with the Open Browser Use extension installed on the profile you'll use
the extension must be enabled and permissions granted (user cooperation required for initial setup)

conditional:

--profile <selector> if multiple Chrome profiles have the extension installed (pass directory name like Default, Profile 1 or display name like Eva, case-insensitive)
--session-id <unique-id> for all CLI and MCP commands (required to isolate agent task state; reuse the same id across all commands in one task)
--socket <path> only if user or runtime provides explicit socket path; otherwise CLI auto-discovers
agent runtime that supports local stdio MCP servers (optional, for richer tool exposure)

external connections:

Chrome browser with real profile (native connection via native messaging host)
references/installation.md for setup steps (CLI install, extension install, browser permissions)
references/sdk-and-protocol.md for JavaScript, Python, Go SDK usage
references/troubleshooting.md for connection and permission failures

procedure

verify setup. run open-browser-use ping --session-id "$OBU_SESSION_ID" (where OBU_SESSION_ID is a unique task id like obu-task-slug-$(date +%s)). if it fails, read references/installation.md before proceeding. common failures: extension not installed, native host not registered, Chrome not running.
check and select Chrome profile (if multiple installed). run open-browser-use profiles --connected. if output shows more than one profile with CONNECTED=true, ask user which profile to use (include directory name and display name). do not switch profiles mid-task. add --profile <selector> to all subsequent CLI and MCP commands.
create task session group. run open-browser-use name-session --session-id "$OBU_SESSION_ID" --name "<task-label> - OBU" where task-label is short (e.g., checkout, bug-filing, docs-scan). if no label fits, use Task - OBU.
choose method of interaction based on task scope:
- simple one-shot actions: use CLI commands directly (e.g., open-tab, navigate, info, tabs, cdp). example: open-browser-use open-tab --session-id "$OBU_SESSION_ID" --url https://example.com.
- small multi-step workflows: use open-browser-use run with a line-oriented action plan (name-session, open-tab, wait-load, page-info, etc.). one session id per plan, actions share state.
- large multi-step workflows or event subscriptions: write SDK code (JavaScript, Python, Go) for richer control flow and event handling. pass sessionId / session_id / SessionID to SDK constructor.
- MCP-aware runtimes: configure obu mcp --session-id "$OBU_SESSION_ID" in your MCP config and call tools directly (e.g., user_tabs, open_tab, claim_tab, run_action_plan).
list and claim user tabs (if needed). run open-browser-use user-tabs --session-id "$OBU_SESSION_ID" to see existing tabs the user has open. if the task requires an existing tab, claim it with open-browser-use claim-tab --session-id "$OBU_SESSION_ID" --tab-id <id> (prefer claim over opening new tabs when possible; decide based on URL, title, recency, or group name).
perform browser actions. execute navigation, interaction, extraction, or file operations:
- navigate: open-browser-use navigate --session-id "$OBU_SESSION_ID" --tab-id <id> --url <url> or SDK equivalent.
- wait for load: wait-load domcontentloaded in action plans or SDK waitLoad / wait_load methods.
- extract page data: open-browser-use cdp --session-id "$OBU_SESSION_ID" --tab-id <id> --method Runtime.evaluate --params '{"expression":"document.title"}' or SDK cdp / execute_cdp_method.
- file uploads: start waiting with wait-file-chooser, trigger file chooser in page, then set paths with set-file-chooser-files or SDK equivalent. use absolute local paths.
- downloads: observe with SDK notification handlers or Browser Use waitForDownload / downloadPath methods.
- clipboard: use SDK or call --method only when no safer convenience command exists; treat as sensitive user action.
finalize tabs (required before ending browser work for the turn). run open-browser-use finalize-tabs --session-id "$OBU_SESSION_ID" --keep '<json-array>' or MCP finalize_tabs / SDK finalizeTabs / finalize_tabs / FinalizeTabs. pass empty array [] to discard all session tabs by default. keep tabs only if user needs them after the turn (see decision points below). this is the last Open Browser Use action for the turn; do not call browser tools after finalization.

decision points

if multiple Chrome profiles installed:

list profiles with open-browser-use profiles --connected.
if exactly one is connected, proceed without asking.
if multiple are connected, ask user which to use (list directory and display name). pass --profile <selector> to all commands for rest of task.
if chosen profile is not connected, ask user to open Chrome on that profile, then retry. do not silently fall back to a different profile.

if task is to claim existing user tabs vs. open new tabs:

run open-browser-use user-tabs first.
if a relevant tab already exists (matching URL, title, recency, or group), use claim-tab and reuse it.
else open a new tab with open-tab.

if tab disposal (keep or discard):

discard by default: pass --keep '[]' to discard all session tabs after extraction.
keep with status "deliverable": keep if the tab itself is the user-facing output (created document, dashboard, checkout cart, submitted form result, or page user explicitly asked to inspect). move to shared ✅ Open Browser Use group.
keep with status "handoff": keep only if task is in progress and user or later turn should continue (page waiting for input, login, approval, payment, CAPTCHA, unfinished workflow). stay in task session group.
discard research, source, search, intermediate, duplicate, blank, error, and login/navigation tabs after extracting data.

if communication fails after setup passes:

read references/troubleshooting.md for stale sockets, extension/native host issues, or permission errors.
common fixes: restart Chrome, re-enable extension permissions, reinstall native host, check socket path in CLI output.

if using MCP with multiple profiles:

pass --profile <selector> at server start in MCP config (e.g., args = ["mcp", "--session-id", "obu-<task-id>", "--profile", "Default"]), not per tool call.
the selector applies to every MCP tool invocation for that server.

if action plan (obu run) or direct CLI commands:

both can share one browser session only if they use the same explicit --session-id.
finalize that session once before ending task.
do not call Open Browser Use tools after finalization; if more work is needed, do it first, then finalize once with final tab disposition.

if user did not specify which profile to use:

ask before first browser command, list both directory and display name, note which are connected.
do not assume or remember choice across unrelated tasks.

if no safer CLI or SDK wrapper exists for an operation:

use open-browser-use call --method <method> --params '<json>' as fallback; this is unrestricted but less safe than convenience commands.

output contract

success is defined by:

CLI commands return exit code 0 and valid JSON output (when --json flag is used).
session id persists across all commands in the task (same --session-id value).
ping returns connection status; info and tabs return browser state.
user-tabs returns array of existing tabs (may be empty).
open-tab and navigate return updated tab data with valid tab_id.
cdp and call return JSON-RPC response (success or error).
finalize-tabs confirms which tabs were moved to shared group and which were discarded.
for SDK usage, constructors accept sessionId / session_id / SessionID parameter; methods return promises/futures with tab or action data.
for MCP, server starts with obu mcp and lists tools on client query; tool calls return JSON with content field.
tab group moves finalize on finalize-tabs call; no dangling session tabs remain.

outcome signal

you know it worked when:

ping --session-id <id> returns connection status without error.
tabs and user-tabs show expected tab count and metadata (url, title, group, id).
open-tab returns a new tab id and browser displays the requested URL.
navigate updates the tab url without opening a new tab.
cdp or page-info returns page title, url, and extracted text without timeout.
file chooser accepts paths and upload completes without error.
download is observable in downloads panel or SDK notification.
finalize-tabs --keep '[]' confirms all tabs moved out of session group (empty session).
if keeping tabs with status "deliverable", they appear in shared ✅ Open Browser Use group after finalization.
if keeping tabs with status "handoff", they stay in the task session group and user can resume from them.
no stale task groups or dangling tabs remain after task ends.
for MCP: tools are callable via runtime's tool interface without shell invocation.
for SDK: async methods resolve with tab/action data; event handlers fire on page events (load, download, input).

Open Browser Use

related skills

Open Browser Use

intent

inputs

procedure

decision points

output contract

outcome signal