Headless Google Meet bot that joins meetings and captures live captions as transcripts.
---
name: open-utter
description: Headless Google Meet bot that joins meetings and captures live captions as transcripts.
homepage: https://github.com/sumansid/openutter
metadata: { "openclaw": { "emoji": "๐ฆฆ", "requires": { "bins": ["node"] } } }
---
# OpenUtter
## STOP โ Ask the user before doing anything
**When the user asks to join a meeting, you MUST do these steps IN ORDER. Do NOT skip ahead.**
**Step A:** Read the file `~/.openutter/auth-meta.json` (it may not exist โ that's fine).
**Step B:** Ask the user how they want to join. Do NOT launch the bot yet.
If auth-meta.json exists and has an `email` field, ask:
> "How would you like to join the meeting?
> 1. Join as **user@gmail.com** (authenticated โ no host approval needed)
> 2. Join as **OpenUtter Bot** (guest โ host must admit)
> 3. Join with a custom name (guest)"
If auth-meta.json does NOT exist, ask:
> "What name should the bot use to join the meeting?"
Default to "OpenUtter Bot" if the user doesn't have a preference.
**Step C:** WAIT for the user to answer. Only then launch the bot with the right flags.
- User chose authenticated โ use `--auth`
- User chose guest โ use `--anon --bot-name "Their Name"` (always ask for a name)
**The bot will refuse to start without `--auth` or `--anon`.
When using `--anon`, `--bot-name` is also required.**
---
A meeting bot that joins Google Meet meetings via Playwright browser automation
and captures live captions as a real-time transcript.
## Prerequisites
- `playwright-core` (ships with openclaw)
- Chromium browser: `npx playwright-core install chromium`
## Join a Meeting
**IMPORTANT: Always run these commands with `background:true`** โ the bot is a long-running
process that stays in the meeting. Do not wait for it to complete; background it immediately
and poll for status updates.
### Launch command
```bash
exec background:true command:"npx tsx skills/openutter/scripts/utter-join.ts https://meet.google.com/abc-defg-hij --auth|--anon --channel <current-channel> --target <current-chat-id>"
```
**IMPORTANT:** Always pass `--channel` and `--target` from the current conversation context.
The bot uses these to send screenshots and status images directly to the user's chat.
Options (required โ bot will error without one):
- `--auth` โ join using saved Google account (~/.openutter/auth.json)
- `--anon --bot-name "Name"` โ join as a guest with this display name (both required together)
Other options:
- `--headed` โ show the browser window (for debugging)
- `--camera` โ join with camera on (default: off)
- `--mic` โ join with microphone on (default: off)
- `--duration 60m` โ auto-leave after duration (supports ms/s/m/h)
## Live Caption Transcript
Captions are automatically captured whenever the bot is in a meeting. After joining,
the bot enables Google Meet's built-in live captions and captures the text via a
MutationObserver. Captions are deduplicated (Meet updates word-by-word) and flushed
to a transcript file every 5 seconds.
**Transcript location:** `~/.openclaw/workspace/openutter/transcripts/<meeting-id>-<YYYY-MM-DD>.txt`
**Format:**
```
[14:30:05] Alice: Hey everyone, let's get started
[14:30:12] Bob: Sounds good, I have the updates ready
[14:30:25] Alice: Great, go ahead
```
When the meeting ends, the bot prints `[OPENUTTER_TRANSCRIPT] <path>` with the transcript file path.
## Get Transcript (what are they saying?)
**When the user asks "what are they saying?", "what's happening?", "summarize the meeting",
or anything about meeting content โ run this script. Do NOT use utter-screenshot.ts for this.**
```bash
exec command:"npx tsx skills/openutter/scripts/utter-transcript.ts"
```
Use `--last 20` to get only the last 20 lines (for long meetings).
Read the output and summarize it for the user in natural language.
## Take a Screenshot (visual context only)
If the user asks to **see** the meeting (e.g. "send me a screenshot", "what does it look like"):
```bash
exec command:"npx tsx skills/openutter/scripts/utter-screenshot.ts"
```
Send the screenshot image to the user via `message`. Do NOT read the screenshot yourself.
## How It Works
1. **Join**: Launches headless Chromium, navigates to the Meet URL, enters the bot name, clicks "Ask to join", and waits for the host to admit it.
2. **Caption capture**: After joining, the bot clicks the CC button to enable Google Meet's live captions, then injects a MutationObserver to capture caption text from the DOM. Captions are deduplicated (partial word-by-word updates are ignored) and flushed to a transcript file every 5 seconds.
## Authentication (Optional)
By default the bot joins as a guest and needs host admission. To join as an authenticated
Google user (no admission needed), run the auth script once:
```bash
npx tsx skills/openutter/scripts/utter-auth.ts
```
This opens a headed browser โ sign into Google, then press Enter. The session is saved to
`~/.openutter/auth.json` and automatically loaded on future joins. Re-run if the session expires.
## Files
- `~/.openutter/auth.json` โ saved Google session (cookies + localStorage, from utter-auth.ts)
- `~/.openutter/chrome-profile/` โ persistent Chromium profile (used when no auth.json)
- `~/.openutter/config.json` โ bot configuration (optional)
- `~/.openclaw/workspace/openutter/transcripts/` โ live caption transcripts
- `~/.openclaw/workspace/openutter/on-demand-screenshot.png` โ on-demand screenshot from a running meeting
- `~/.openclaw/workspace/openutter/debug-join-failed.png` โ screenshot on join failure
- `~/.openclaw/workspace/openutter/debug-admit-failed.png` โ screenshot on admission failure
- `~/.openclaw/workspace/openutter/joined-meeting.png` โ confirmation screenshot after joining
## Headless VM Tips
- Chrome flags `--use-fake-ui-for-media-stream` and `--use-fake-device-for-media-stream` are set automatically, so no real camera/mic hardware is needed.
- No X11/Wayland display is required โ runs fully headless.
- Use `--duration` to auto-leave after a set time.
## Agent Behavior โ MANDATORY
After launching the bot with `exec background:true`, you MUST poll the process
to check for success/failure and send screenshots back to the user.
### Step 1: Poll for output
After starting the background exec, poll the process every 10โ15 seconds:
```
process action:poll
```
### Step 2: Parse markers and send images using the message tool
The bot prints machine-readable markers. When you see them, you MUST use the
`message` tool to send the screenshot image to the user.
**On success** โ bot prints `[OPENUTTER_SUCCESS_IMAGE] <path>`:
```
message action:"send" media:"./openutter/joined-meeting.png" content:"Successfully joined the meeting!"
```
**On screenshot request** โ bot prints `[OPENUTTER_SCREENSHOT] <path>`:
```
message action:"send" media:"./openutter/on-demand-screenshot.png" content:"Here's the current meeting view"
```
**On failure** โ bot prints `[OPENUTTER_DEBUG_IMAGE] <path>` (or exits non-zero):
```
message action:"send" media:"./openutter/debug-join-failed.png" content:"Could not join the meeting. Here is what the bot saw"
```
**CRITICAL: ALWAYS use the `message` tool with `media:"./openutter/<filename>.png"` to send screenshots.**
Use relative paths only (starting with `./`). Never use absolute paths or ~ paths.
NEVER just describe what happened in text. The user MUST receive the actual image.
**When the user asks "send me screenshot" or "what do you see"**, run
`utter-screenshot.ts`, then use the `message` tool with `media:"./openutter/on-demand-screenshot.png"`.
### Step 3: When user asks about meeting content
**CRITICAL: When the user asks what's happening, what someone said, or anything about
meeting content โ run `utter-transcript.ts`. NEVER use `utter-screenshot.ts` for this.**
```bash
exec command:"npx tsx skills/openutter/scripts/utter-transcript.ts"
```
Read the output and summarize it for the user. Use `--last 20` for long meetings.
**On meeting end** โ bot prints `[OPENUTTER_TRANSCRIPT] <path>`:
Run `utter-transcript.ts` and provide a summary to the user.
### When to use which script
| User asks... | Use this script |
|-------------------------------------------|--------------------------|
| "what are they saying?" | `utter-transcript.ts` |
| "what's happening in the meeting?" | `utter-transcript.ts` |
| "summarize the meeting" | `utter-transcript.ts` |
| "what did they talk about?" | `utter-transcript.ts` |
| "send me a screenshot" | `utter-screenshot.ts` |
| "what does the meeting look like?" | `utter-screenshot.ts` |
**NEVER read or analyze screenshot images to understand meeting content.**
### Screenshot files
- `~/.openclaw/workspace/openutter/joined-meeting.png` โ confirmation screenshot after joining
- `~/.openclaw/workspace/openutter/debug-join-failed.png` โ join button not found
- `~/.openclaw/workspace/openutter/debug-admit-failed.png` โ not admitted / blocked / timed out
## Troubleshooting
- **Join button not found**: Google Meet UI changes occasionally. The debug screenshot shows what the bot saw โ send it to the user.
- **Not admitted**: The bot joins as a guest and needs host approval. Ask the host to admit "OpenUtter Bot". If timed out, the debug screenshot is sent automatically.
- **No captions captured**: The CC button selector may change with Meet updates. If the transcript is empty, captions may not have been enabled. Try `--headed` to verify the CC button is clicked.
- **Headless blocked**: The bot uses stealth patches to bypass headless detection. If Google Meet blocks it, try `--headed` for debugging.
don't have the plugin yet? install it then click "run inline in claude" again.