Use when an agent needs to drive a real Chrome window via opencli — inspect a page, fill forms, click through logged-in flows, or extract data ad-hoc. Covers...
---
name: opencli-browser
description: Use when an agent needs to drive a real Chrome window via opencli — inspect a page, fill forms, click through logged-in flows, or extract data ad-hoc. Covers the selector-first target contract, compound form fields, stale-ref handling, network capture, and the agent-native envelopes the CLI returns. Not for writing adapters — see opencli-adapter-author for that.
allowed-tools: Bash(opencli:*), Read, Edit, Write
---
# opencli-browser
The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess.
This skill is for **driving a live browser** to accomplish an agent task. If you are building a reusable adapter under `~/.opencli/clis/<site>/` use `opencli-adapter-author` instead.
---
## Prerequisites
```bash
opencli doctor
```
Until `doctor` is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which.
---
## Window lifecycle
- `opencli browser *` commands already keep the automation session alive between calls. The window stays open until you run `opencli browser close` or the idle timeout expires.
- `--focus` (or `OPENCLI_WINDOW_FOCUSED=1`) opens the automation window in the foreground. Use it when you want to watch the page live.
- `--live` (or `OPENCLI_LIVE=1`) is mainly for browser-backed adapter commands such as `opencli xiaohongshu note ...`. It keeps the adapter's automation window open after the command returns so you can inspect the final page state.
---
## Mental model
1. **Selector-first target contract.** Every interaction command (`click`, `type`, `select`, `get text/value/attributes`) takes one `<target>`, which is *either* a numeric ref from `state`/`find` *or* a CSS selector. Use `--nth <n>` to disambiguate multiple CSS matches.
2. **Every envelope reports `matches_n` and `match_level`.** `match_level` is `exact`, `stable`, or `reidentified` — the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be.
3. **Compact output first, full payload on demand.** `state` is a budget-aware snapshot; `get html --as json` supports `--depth/--children-max/--text-max`; `network` returns shape previews and you re-fetch a single body with `--detail <key>`. If you emit a giant payload you are burning context you did not need to burn.
4. **Structured errors are machine-readable.** On failure the CLI emits `{error: {code, message, hint?, candidates?}}`. Branch on `code`, not on message strings.
---
## Critical rules
1. **Always inspect before you act.** Run `state` or `find` first. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot.
2. **Prefer numeric ref over CSS once you have it.** Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders.
3. **Read `match_level` after every write.** `exact` = all good. `stable` = the element is the same but some soft attrs drifted — your action still applied. `reidentified` = the original ref was gone and the CLI found a unique replacement; double-check you hit the right element.
4. **Use the `compound` field for form controls.** Do not regex-guess a date format, do not `state` twice to get the full `<select>` options list. The compound envelope has the format string, full option list up to 50, `options_total` for overflow, and `accept`/`multiple` for `<input type=file>`.
5. **Verify writes that matter.** After `type <target> <text>`, run `get value <target>`. After `select`, run `get value`. Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you.
6. **`state` → action → `state` after a page change.** Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition.
7. **Chain with `&&`.** A chained sequence runs in one shell so refs acquired by the first command stay live for the second. Separate shell invocations lose the session context you just set up.
8. **`eval` is read-only.** Wrap the JS in an IIFE and return JSON. If you need to *change* the page, use the structured `click` / `type` / `select` / `keys` commands instead — they produce structured output and fingerprints, `eval` does not.
9. **Prefer `network` to screen-scraping.** If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then `--detail <key>` the body you need.
---
## Target contract (`<target>` for click / type / select / get text|value|attributes)
```
<target> ::= <numeric-ref> | <css-selector>
```
- **Numeric ref** — the `[N]` index from `state` or `find`. Cheap, resilient to soft DOM drift.
- **CSS selector** — anything `querySelectorAll` accepts. Must be unambiguous on write ops, or pair with `--nth <n>`.
### Envelope on success
```json
{ "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" }
```
```json
{ "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" }
```
### match_level
| level | meaning | you should |
|-------|---------|------------|
| `exact` | Fingerprint agreed on tag + strong IDs with at most one soft drift | Proceed. |
| `stable` | Tag + strong IDs still agree, soft signals (aria-label, role, text) drifted | Proceed, but if *what* you typed/clicked matters, re-check with `get value` or `state`. |
| `reidentified` | Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old ref | Double-check you hit the right element before chaining more writes. |
### Structured error codes
Branch on these, not on the human message:
| code | meaning |
|------|---------|
| `not_found` | Numeric ref is no longer in the DOM. Re-`state`. |
| `stale_ref` | Ref exists but the element at that ref changed identity. Re-`state`. |
| `invalid_selector` | CSS was rejected by `querySelectorAll`. Fix the selector. |
| `selector_not_found` | CSS matches 0 elements. Try `find` with a looser selector. |
| `selector_ambiguous` | CSS matches >1 and no `--nth`. Add `--nth` or narrow the selector. |
| `selector_nth_out_of_range` | `--nth` beyond match count. |
| `option_not_found` | `select` couldn't find an option matching that label/value. Error envelope includes `available: string[]` of the real option labels. |
| `not_a_select` | `select` was called on a non-`<select>` element. |
Error envelope always includes `error.code` and `error.message`. Target errors (`selector_not_found`, `selector_ambiguous`, etc.) often add `error.candidates: string[]` with suggested selectors. `option_not_found` adds `error.available: string[]` instead.
---
## Command reference
### Inspect
| command | purpose |
|---------|---------|
| `browser state` | Snapshot: text tree with `[N]` refs, scroll hints, hidden-interactive hints, `compounds (N):` sidecar for date/select/file refs. |
| `browser find --css <sel> [--limit N] [--text-max N]` | Run a CSS query and return one entry per match with `{nth, ref, tag, role, text, attrs, visible, compound?}`. Allocates refs for matches the prior snapshot didn't tag. Cheap alternative to `state` when you already know the selector. |
| `browser frames` | List cross-origin iframe targets. Pass the index to `--frame` on `eval`. |
| `browser screenshot [path]` | Viewport PNG. No path → base64 to stdout. Prefer `state` when you just need structure. |
### Get (read-only)
| command | returns |
|---------|---------|
| `browser get title` | plain text |
| `browser get url` | plain text |
| `browser get text <target> [--nth N]` | `{value, matches_n, match_level}` |
| `browser get value <target> [--nth N]` | `{value, matches_n, match_level}` |
| `browser get attributes <target> [--nth N]` | `{value: {attr: val, ...}, matches_n, match_level}` |
| `browser get html [--selector <css>] [--as html\|json] [--depth N] [--children-max N] [--text-max N] [--max N]` | Raw HTML, or structured tree. JSON tree nodes have `{tag, attrs, text, children[], compound?}`. Truncation reported via `truncated: {depth?, children_dropped?, text_truncated?}`. |
### Interact
| command | notes |
|---------|-------|
| `browser click <target> [--nth N]` | Returns `{clicked, target, matches_n, match_level}`. |
| `browser type <target> <text> [--nth N]` | Clicks first, then types. Returns `{typed, text, target, matches_n, match_level, autocomplete}`. `autocomplete: true` means a combobox/datalist popup appeared after typing — you almost always need `keys Enter` or a follow-up `click` on the suggestion to commit the value. |
| `browser select <target> <option> [--nth N]` | Matches option by label first, then value. Use `compound` from `find`/`state` to see exactly what labels are available. |
| `browser keys <key>` | `Enter`, `Escape`, `Tab`, `Control+a`, etc. Runs against the focused element. |
| `browser scroll <direction> [--amount px]` | `up` / `down`. Default amount `500`. |
### Wait
```bash
browser wait selector "<css>" [--timeout ms] # wait until the selector matches
browser wait text "<substring>" [--timeout ms] # wait until the text appears
browser wait time <seconds> # hard sleep, last resort
```
Default timeout `10000` ms. SPA routes, login redirects, and lazy-loaded lists need `wait` before `state`/`get`.
### Extract
- **`browser eval <js> [--frame N]`** — Run an expression in the page (or in a cross-origin frame via `--frame`). Wrap in an IIFE and return JSON. Read-only: no `document.forms[0].submit()`, no clicks, no navigations. If the result is a string, stdout is the raw string; otherwise it's JSON.
- **`browser extract [--selector <css>] [--chunk-size N] [--start N]`** — Markdown extraction of long-form content with a continuation cursor. Returns `{url, title, selector, total_chars, chunk_size, start, end, next_start_char, content}`. Loop on `next_start_char` until it is `null`. Auto-scopes to `<main>`/`<article>`/`<body>` if you don't pass `--selector`.
### Network
```bash
browser network # shape preview + cache key list
browser network --detail <key> # full body for one cached entry
browser network --filter "field1,field2" # keep only entries whose body shape contains ALL fields as path segments
browser network --all # include static resources (usually noise)
browser network --raw # full bodies inline — large; use sparingly
browser network --ttl <ms> # cache TTL (default 24h)
```
List entries look like `{key, method, status, url, ct, size, shape, body_truncated?}`. Detail envelope is `{key, url, method, status, ct, size, shape, body, body_truncated?, body_full_size?, body_truncation_reason}`. Cache lives in `~/.opencli/cache/browser-network/` so you can re-inspect without re-triggering the request.
### Tabs & session
| command | purpose |
|---------|---------|
| `browser tab list` | JSON array of `{index, page, url, title, active}`. The `page` string is the tab identity you pass as `<targetId>` to `tab select` / `tab close`, or to `--tab <targetId>` on any subcommand. (`--tab`'s placeholder is historical — the value is always `page`.) |
| `browser tab new [url]` | Open a new tab. Prints the new `page` string. |
| `browser tab select [targetId]` | Make a tab the default. All subcommands accept `--tab <targetId>` to target one without changing the default. |
| `browser tab close [targetId]` | Close by `page`. |
| `browser back` | History back on the active tab. |
| `browser close` | Close the automation window when done. |
---
## Compound form controls
Every date/time, select, and file input carries a `compound` field. Use it — do not regex attributes.
### Date family
```json
{
"control": "date",
"format": "YYYY-MM-DD",
"current": "2026-04-21",
"min": "2026-01-01",
"max": "2026-12-31"
}
```
`control` is one of `date | time | datetime-local | month | week`. `format` is a concrete template string — type into the field using that exact format, or `select` by label if the site wraps the native input in a custom widget.
### Select
```json
{
"control": "select",
"multiple": false,
"current": "United States",
"options": [
{ "label": "United States", "value": "us", "selected": true },
{ "label": "Canada", "value": "ca" }
],
"options_total": 137
}
```
`options[]` is capped at 50 entries. **`current` is always correct** even when the selected option is past the cap — it's computed by scanning every option, not from the truncated list. If `options_total > options.length` and you need an option that isn't in `options[]`, call `browser select <target> "<label>"` directly — the CLI matches against the live DOM, not the truncated list.
### File
```json
{
"control": "file",
"multiple": true,
"current": ["report.pdf", "cover.png"],
"accept": "application/pdf,image/*"
}
```
Do not invent file paths. Upload is done via the normal click flow — respect `accept` when telling the user what to upload.
### Where compounds show up
- `browser find --css <sel>` entries: inline on each match.
- `browser get html --as json` tree nodes: inline on matching nodes.
- `browser state` snapshot: in a `compounds (N):` sidecar keyed by numeric ref, so you can tell at a glance which `[N]` entries have rich metadata.
---
## Cost guide
Think about payload size per call. Budgets exist for a reason.
| command | rough cost | when to use |
|---------|-----------|-------------|
| `state` | medium (bounded by internal budget) | First call on any page, after every nav, when you need refs. |
| `find --css <sel>` | small | You already know the selector — one query, compact entries. |
| `get title` / `get url` | tiny | Sanity checks between steps. |
| `get text/value/attributes` | tiny per call | Verifying one specific field. |
| `get html` (raw) | can be huge | Avoid on unbounded pages. Always pair with `--selector` and a budget. |
| `get html --as json --depth 3 --children-max 20` | medium | When you need to reason about structure, not a specific field. |
| `screenshot` | large | Only when the page is visual (CAPTCHA, charts). Prefer `state`. |
| `extract` | medium per chunk | Long-form reading. Loop via `next_start_char`. |
| `network` (default) | small | First look at APIs. |
| `network --detail <key>` | varies | Pull one body. |
| `network --raw` | huge | Only after `--filter` narrowed the candidate set. |
| `eval "JSON.stringify(...)"` | controlled | Targeted extraction when none of the above fit. |
Rule of thumb: **one `state` per page transition, one `find` per follow-up query, one `get`/`click`/`type` per action.** If your plan involves >10 calls per page you are probably scraping instead of interacting — consider `extract` or `network`.
---
## Chaining rules
**Good — one shell, live session:**
```bash
opencli browser open "https://news.ycombinator.com" \
&& opencli browser state \
&& opencli browser click 3
```
**Bad — each line is a fresh shell, refs from call 1 are already forgotten when call 2 runs.** (Only a problem if you rely on shell-scoped state; browser refs themselves persist in-page, but interleaving unrelated shells invites races.) Prefer `&&` when the steps are meant to be atomic.
**Never** chain a write and then an immediate `state` without a `wait` if the action causes a network round-trip — you will snapshot the pre-response DOM and make bad decisions off stale data.
---
## Recipes
### Fill a login form
```bash
opencli browser open "https://example.com/login"
opencli browser state # find [N] for email, password, submit
opencli browser type 4 "me@example.com"
opencli browser type 5 "hunter2"
opencli browser get value 4 # verify (autocomplete can eat chars)
opencli browser click 6 # submit
opencli browser wait selector "[data-testid=account-menu]" --timeout 15000
opencli browser state # fresh refs on the logged-in page
```
### Pick from a long dropdown
```bash
opencli browser state # sidebar shows [12] <select name=country>
opencli browser find --css "select[name=country]"
# the compound.options_total is 137, but compound.current is "" — unselected.
opencli browser select 12 "Uruguay"
opencli browser get value 12 # { value: "uy", match_level: "exact" }
```
### Scrape a list via network instead of DOM
```bash
opencli browser open "https://news.ycombinator.com"
opencli browser network --filter "title,score"
# -> find the /topstories entry, note its key
opencli browser network --detail topstories-a1b2
```
### Read a long article in chunks
```bash
opencli browser open "https://blog.example.com/long-post"
opencli browser extract --chunk-size 8000
# -> content + next_start_char: 8000
opencli browser extract --start 8000 --chunk-size 8000
# ...until next_start_char is null
```
### Cross-origin iframe
```bash
opencli browser frames
# -> [{"index": 0, "url": "https://checkout.stripe.com/...", ...}]
opencli browser eval "(() => document.querySelector('input[name=cardnumber]')?.value)()" --frame 0
```
---
## Pitfalls
- **Do not submit forms via `eval "document.forms[0].submit()"`** — modern sites intercept with JS handlers and silently drop the call. Either `click` the submit button via its ref, or (if you know the GET URL) just `open` it directly.
- **Do not reuse refs across a page transition.** `wait` for the new state, then re-`state`. Old refs will either 404 or (worse) `reidentify` onto a similarly-shaped element on the new page.
- **`match_level: reidentified` is a warning, not an error.** The action went through, but if you are chaining 5 more writes that all depend on that being the right element, verify with a `get text` or `get value` before continuing.
- **Budget-aware commands silently cap.** `get html --as json` with default budgets will return `truncated: {...}`. If your downstream logic needs the whole subtree, raise `--depth` / `--children-max` or tighten the selector.
- **`autocomplete: true` on a `type` response is not an error.** It means a suggestion popup is open and your value isn't committed yet. Typically `keys Enter` to accept the first suggestion, or `click` the one you want.
- **`network --filter` is AND-semantics on path segments.** `--filter "title,score"` keeps entries whose body shape contains *both* `title` and `score` as path segments, at any depth. It is not a regex.
- **Screenshots are for humans, not for agents.** Use `state` + `find` unless the page is genuinely visual (captcha, chart). Screenshots burn tokens and rarely add signal an agent can act on.
---
## Troubleshooting
| symptom | fix |
|---------|-----|
| `opencli doctor` red: "Browser not connected" | Start Chrome with `--remote-debugging-port=9222`, or rerun the extension install. |
| `attach failed: chrome-extension://...` | Disable 1Password / other CDP-hungry extensions temporarily. |
| `selector_not_found` right after `state` | Page mutated. `wait selector "..."` then retry. |
| `stale_ref` across every command | You are reusing refs from a prior page. Re-`state`. |
| `click` succeeds but nothing happens | The element is probably a decorative wrapper stealing clicks from the real target. `find --css "..."` with a narrower selector and retry on the inner element. |
| `type` appears to finish but value is wrong | Autocomplete, masked input, or React controlled re-render. Verify with `get value`. Add `keys Enter` or re-type. |
| Giant `get html` output | Pass `--selector` + `--as json --depth 3 --children-max 20 --text-max 200`. |
| Network cache seems stale | Bump `--ttl` down, or let it expire. The cache lives at `~/.opencli/cache/browser-network/`. |
---
## See also
- `opencli-adapter-author` — turning what you just figured out into a reusable `~/.opencli/clis/<site>/<command>.js`.
- `opencli-autofix` — when an existing adapter breaks, this skill walks you through `OPENCLI_DIAGNOSTIC` and filing a fix.
don't have the plugin yet? install it then click "run inline in claude" again.