When the user sends a screenshot via any chat surface (Telegram, iMessage, Slack, etc.), parse it for events and tasks using OpenClaw's configured vision LLM...
---
name: scrask-bot
version: 4.3.0
description: >
When the user sends a screenshot via any chat surface (Telegram, iMessage, Slack, etc.),
parse it for events and tasks using OpenClaw's configured vision LLM by default, with
optional Gemini fast-path and Claude fallback for users who bring their own keys. Then
delegate creation to the user's installed calendar / task skills. Scrask does not write
to any store itself; it emits structured intent and the agent routes it.
author: sandip
metadata:
openclaw:
emoji: "🦞"
# Invocation: implicit by default (agent reads the Trigger Conditions section
# of this manifest and routes), with explicit override via the aliases below.
# If any alias appears at the start of a user message (with or without `@`
# or `/` prefix), the platform must dispatch to scrask regardless of the
# implicit trigger conditions.
invocation:
mode: hybrid # 'implicit' | 'explicit' | 'hybrid'
aliases:
- scrask
- "scrask this"
- screenshot
- "screenshot to calendar"
# No mandatory env vars. Default 'auto' provider routing uses OpenClaw's
# configured vision LLM when no skill-level keys are set, so the skill works
# out of the box.
requires:
env: []
bins:
- python3
optional_env:
# GEMINI_API_KEY: enables the cheap+fast Gemini-first routing in 'auto' mode,
# and is required if you pin --provider gemini.
# ANTHROPIC_API_KEY: enables Claude fallback in 'auto' mode (when Gemini
# confidence is shaky), and is required if you pin --provider claude.
- GEMINI_API_KEY
- ANTHROPIC_API_KEY
suggests:
# Calendar destination skills (any one is enough)
- calctl
- accli
- apple-calendar
- brainz-calendar
- gcal-pro
# Task destination skills (any one is enough)
- apple-reminders
- things-mac
- notion
config:
vision_provider:
type: string
description: >
'auto' (default) routes by what you have: GEMINI_API_KEY → Gemini-first
with Claude fallback; else ANTHROPIC_API_KEY → Claude only; else falls
back to OpenClaw's configured vision LLM. 'openclaw' always uses the
platform LLM. 'gemini' / 'claude' pin a specific provider (and require
the matching key).
default: auto
fallback_threshold:
type: number
description: "Worst per-field confidence floor for auto mode. If any per-field score drops below this, Claude reruns the parse."
default: 0.60
timezone:
type: string
description: "User's IANA timezone. Used when none is detected in the screenshot."
default: "UTC"
confidence_threshold:
type: number
description: "Legacy 0.0–1.0 per-item gate. Kept for backward-compatible callers; the new thresholds below drive clarification behaviour."
default: 0.75
actionable_threshold:
type: number
description: "Top-level 'is this actually an event/task?' gate. Below this, the parser flags needs_actionable_confirmation."
default: 0.70
type_threshold:
type: number
description: "Per-item 'calendar or task list?' gate. Below this type_confidence, the parser emits a type clarification."
default: 0.70
field_threshold:
type: number
description: "Per mandatory field. Below this confidence (or null value) the parser emits a targeted clarification question for that field."
default: 0.70
---
# Scrask Bot
## Overview
Scrask is a **screenshot-to-intent parser**. The user sends a screenshot via whatever chat surface
they have wired into OpenClaw (Telegram, iMessage, Slack, etc.). Scrask:
1. Decides whether the screenshot contains any actionable content (event, reminder, task). If not, ignores it.
2. Extracts every actionable item — a single screenshot may yield both an event and a task.
3. Emits structured intent JSON.
4. The OpenClaw agent then delegates each item to the user's installed destination skill:
- `destination: "calendar"` → `calctl` / `accli` / `apple-calendar` / `brainz-calendar` / `gcal-pro` / etc.
- `destination: "task"` → `apple-reminders` / `things-mac` / `notion` / etc.
Scrask never writes to a store directly. No service account JSON, no OAuth, no API keys for the
calendar/task layer — that's the destination skill's job.
## Invocation
Scrask is invoked in two ways. The platform tries explicit invocation first; if no alias matches, it falls back to the implicit trigger conditions.
### Explicit override (checked first)
If the user message begins with any of these aliases (case-insensitive, with or without a `@` or `/` prefix), the platform dispatches to Scrask regardless of the implicit conditions below:
- `scrask`
- `scrask this`
- `screenshot`
- `screenshot to calendar`
Examples that force-route to Scrask:
- `scrask this` (with an attached image)
- `@scrask` (with an attached image)
- `/scrask` (with an attached image)
- `screenshot to calendar` (with an attached image)
When invoked explicitly with no image attached, Scrask responds with a brief prompt asking the user to attach a screenshot, then stops. Do not run the parser without an image.
### Implicit (default, used when no alias matches)
The OpenClaw agent reads the incoming message and activates Scrask when:
1. The user sends a message in any connected chat surface that contains an **image attachment**.
2. The image appears to be a **screenshot** — not a photo of a person, place, or physical object.
3. No other skill has already claimed the image.
Do not activate (implicitly) for:
- Photos of people, places, food, scenery.
- Screenshots of code, errors, or UI bugs (leave for other skills).
- Images the user explicitly asks to edit, describe, or analyze for another purpose.
The implicit path is the one users will hit by default. The explicit aliases exist for two cases:
1. **Debugging / power-user override** — force Scrask to run on an ambiguous image the agent would otherwise route elsewhere (or skip).
2. **Recovery** — if the agent misses an obvious screenshot, the user can recover with `scrask this` instead of resending.
## Step-by-Step Instructions
### Step 1: Acknowledge Immediately
Reply on the user's current chat surface so they know the skill is working:
> "📸 Got it — analyzing your screenshot..."
### Step 2: Run the Parser
```bash
python3 {baseDir}/scripts/scrask_bot.py \
--image-path "<path-to-temp-image>" \
--provider "$CONFIG_VISION_PROVIDER" \
--timezone "$CONFIG_TIMEZONE" \
--confidence-threshold "$CONFIG_CONFIDENCE_THRESHOLD" \
--actionable-threshold "$CONFIG_ACTIONABLE_THRESHOLD" \
--type-threshold "$CONFIG_TYPE_THRESHOLD" \
--field-threshold "$CONFIG_FIELD_THRESHOLD"
```
The script reads credentials from the environment — never pass them on the command line.
In default `auto` mode it routes by what is available:
- `GEMINI_API_KEY` set → Gemini-first with Claude fallback (cheap + fast path).
- `ANTHROPIC_API_KEY` set (no Gemini key) → Claude only.
- Neither set → OpenClaw's configured vision LLM, read from the platform-injected env vars
`OPENCLAW_VISION_PROVIDER`, `OPENCLAW_VISION_KEY`, and optional `OPENCLAW_VISION_MODEL`.
So the skill works out of the box for any OpenClaw user with a vision-capable LLM
configured at the platform level. Bringing your own Gemini key only adds the cost-and-speed
optimisation on top.
The script returns JSON with:
- `success` — whether parsing worked
- `no_actionable_content` — true if nothing actionable was found
- `actionable_confidence` — 0.0–1.0, how sure the parser is the screenshot is actionable
- `needs_actionable_confirmation` — true if `actionable_confidence` is in the maybe band;
the bot should confirm "is this actually an event or task?" before dispatching
- `items[]` — one entry per detected item with:
- `type`, `destination`, `confidence` (legacy aggregate), `type_confidence`
- `confidences{}` — per-field 0.0–1.0 scores (`title`, `date`, `time`, `location`,
`participants`, `description`, `priority`, …)
- `needs_confirmation` — true when there is at least one outstanding clarification
- `clarifications[]` — targeted questions to ask the user, e.g.
`{ "field": "time", "question": "What time is dinner with Priya?", "reason": "low_confidence" }`
- all the extracted fields (`title`, `date`, `time`, `location`, `participants`, etc.)
- `summary_text` — chat-ready preview of what was found; send this verbatim, do not rephrase
- `screenshot_summary`, `parse_notes` — context
### Step 3: Handle the Output
**If `no_actionable_content` is true:**
Silently ignore the screenshot — or, if the user clearly meant for scrask to act on it,
reply with the `summary_text` field (which is a polite "couldn't find anything" message).
**If `success` is true:**
Send the `summary_text` value back to the user on the same chat surface. Then process each item.
### Step 4: Route Each Item to a Destination Skill
For every item in `items[]`:
**If `needs_actionable_confirmation: true` (top level):**
Send `summary_text` (which already opens with "Is this actually an event or task?") and wait for
the user. On "yes", proceed item-by-item below. On "no", reply "Got it, skipped ✓" and stop.
**For each item — if `needs_confirmation: false` (no outstanding clarifications):**
Invoke the appropriate destination skill **without** asking the user first.
- `destination: "calendar"` → invoke the user's installed calendar skill. Preference order:
`calctl` → `accli` → `apple-calendar` → `brainz-calendar` → `gcal-pro` → first available.
- `destination: "task"` → invoke the user's installed task skill. Preference order:
`apple-reminders` → `things-mac` → `notion` → first available.
Pass the item fields (`title`, `date`, `time`, `end_time`, `end_date`, `location`, `participants`,
`description`, `recurrence`, `online_link`, etc.) to whatever creation command that skill exposes.
If `end_date` is present and different from `date`, treat the item as a multi-day event.
**For each item — if `needs_confirmation: true`:**
The `clarifications[]` array lists the specific things to ask. Each entry has:
- `field` — which field needs clarification (e.g. `"time"`, `"date"`, `"type"`)
- `question` — the user-facing question (already pre-formatted with the item title)
- `reason` — `"missing"` (value is null) or `"low_confidence"` (extracted but uncertain) or
`"low_type_confidence"` (unsure whether this is a calendar event or a task)
The `summary_text` already renders these as a bullet list. Ask the user the questions in order
and patch the corresponding fields with their replies. Once every clarification is resolved,
route the item to the destination skill as above. If the user says **skip** at any point, drop
the item and confirm "Got it, skipped ✓".
For the special case of `field: "type"`, the user's reply determines whether the item routes to
`calendar` or `task` — update `destination` accordingly before dispatch.
### Step 5: Confirm Saves
After each destination skill returns, relay a one-line confirmation to the user. Examples:
- `📅 Added to Calendar via calctl: **Team Standup** — 2026-03-01 at 09:00`
- `🔔 Added to Reminders: **Pay electricity bill** (due 2026-02-28)`
- `✅ Added to Things: **Send Sandip my resume**`
If the destination skill errors, surface the error and ask whether to retry with a different destination.
## Edge Cases
| Scenario | Behavior |
|---|---|
| Single screenshot has both an event and a task | Process each independently; route to its own destination. |
| Event implies a prep step (e.g. dinner at a restaurant → book table) | The parser emits BOTH an event and a prep reminder. Inferred fields on the prep reminder land in the 0.65–0.80 band, so most prep reminders hit `needs_confirmation: true` with targeted clarifications (typically `time` and `date`). |
| Multi-day event (trip, conference) | `end_date` is set and differs from `date`. Pass both to the calendar skill (e.g. `calctl add --date --end-date --all-day`). |
| Rescheduled / cancelled event | Parser extracts the NEW date; `parse_notes` flags it as a reschedule. Confirm with user before overwriting any existing entry. |
| Screenshot is in Hindi, Tamil, or another language | Title and description are already in English; `language` holds the ISO code. Save as-is. |
| Recurring event ("every Monday") | Pass `recurrence` and `recurrence_day` to the calendar skill. |
| Date has already passed | Flag in the reply: "⚠️ This date has already passed. Save anyway?" |
| Screenshot of someone's calendar | `already_in_calendar_hint: true` — reply: "Looks like this is already in your calendar 🗓️" and skip. |
| No calendar / task skill installed | Reply with the missing-skill hint and stop. |
| Zoom/Meet link found | Pass `online_link` to the calendar skill; it should set both location and description. |
| Meme / non-actionable screenshot | `no_actionable_content: true` — ignore silently unless user clearly asked for action. |
## Configuration
```json
{
"skills": {
"entries": {
"scrask-bot": {
"enabled": true,
"env": {
// Both keys are OPTIONAL in v4.2+. Without either, Scrask uses
// OpenClaw's configured vision LLM via the platform-injected
// OPENCLAW_VISION_* env vars. Setting GEMINI_API_KEY opts into
// the cheap+fast Gemini routing. Setting ANTHROPIC_API_KEY adds
// Claude as a fallback (or as the primary if no Gemini key).
"GEMINI_API_KEY": "AIza-your-gemini-key",
"ANTHROPIC_API_KEY": "sk-ant-your-key-here"
},
"config": {
"vision_provider": "auto",
"fallback_threshold": 0.60,
"timezone": "Asia/Kolkata",
"confidence_threshold": 0.75,
"actionable_threshold": 0.70,
"type_threshold": 0.70,
"field_threshold": 0.70
}
}
}
}
}
```
`ANTHROPIC_API_KEY` is optional. Without it, auto mode runs Gemini only.
## Permissions Required
- `image:read` — to access the screenshot from the chat surface.
- `network:outbound` — to call the vision model API (Gemini and optionally Claude).
- `chat:reply` — to send confirmation messages back via the user's chat surface.
- Whatever permissions the downstream calendar / task skill needs (handled by that skill).
don't have the plugin yet? install it then click "run inline in claude" again.
restructured into implexa's 6-component format with explicit decision trees for routing logic, api fallbacks, error handling, multi-item processing, and edge cases. clarified vision provider auto-routing, destination skill selection, and confirmation gates. added timeout, rate limit, and auth error handling.
scrask parses screenshots sent via any chat surface (telegram, imessage, slack, etc.) for actionable events and tasks using a vision llm. when the user sends an image, the skill extracts calendar events, reminders, and tasks, then emits structured intent for the agent to route to the user's installed calendar or task management skills. use scrask when you want to turn a screenshot of a calendar invite, meeting notes, or to-do list into actual entries in your calendar or task manager without manual re-entry.
required:
external connections:
optional environment variables:
GEMINI_API_KEY (string) , enables gemini-first routing in auto mode and is required if you pin --provider gemini. recommended for cost and speed.ANTHROPIC_API_KEY (string) , enables claude fallback in auto mode when gemini confidence is low, and is required if you pin --provider claude.platform-injected environment variables (used in auto mode if no gemini or anthropic keys):
OPENCLAW_VISION_PROVIDER (string) , the platform's configured vision model (e.g. openai, vertex, bedrock)OPENCLAW_VISION_KEY (string) , platform vision api credentialsOPENCLAW_VISION_MODEL (string, optional) , specific model nameconfiguration parameters:
vision_provider (string, default: "auto") , routing strategy: "auto" tries gemini then claude then openclaw; "openclaw" always uses platform llm; "gemini" or "claude" pins a specific providerfallback_threshold (number, default: 0.60) , per-field confidence floor for auto mode. if any field drops below this, claude reruns the parsetimezone (string, default: "UTC") , iana timezone for date inference when screenshot contains no explicit timezoneconfidence_threshold (number, default: 0.75) , legacy aggregate gate; kept for backward compatibilityactionable_threshold (number, default: 0.70) , top-level gate: is the screenshot actually an event or task? below this triggers confirmationtype_threshold (number, default: 0.70) , per-item gate: is this a calendar event or task? below this confidence triggers type clarificationfield_threshold (number, default: 0.70) , per mandatory field gate. below this confidence or if value is null, parser emits targeted clarification questionstep 1: acknowledge immediately
on receiving an image, reply on the user's current chat surface within 2 seconds:
📸 Got it , analyzing your screenshot...
do not wait for parsing to complete. the user needs to know the skill is active.
step 2: extract the image path and validate
step 3: run the parser script
execute the python3 scrask parser:
python3 {baseDir}/scripts/scrask_bot.py \
--image-path "<path-to-temp-image>" \
--provider "$VISION_PROVIDER_CONFIG" \
--timezone "$CONFIG_TIMEZONE" \
--confidence-threshold "$CONFIG_CONFIDENCE_THRESHOLD" \
--actionable-threshold "$CONFIG_ACTIONABLE_THRESHOLD" \
--type-threshold "$CONFIG_TYPE_THRESHOLD" \
--field-threshold "$CONFIG_FIELD_THRESHOLD"
the script reads all api credentials from environment variables. never pass keys on the command line.
in auto mode, the script checks credentials in order:
GEMINI_API_KEY is set, route to gemini first. if gemini returns low confidence (below fallback_threshold on any field), retry with claude (if ANTHROPIC_API_KEY is set)ANTHROPIC_API_KEY is set, route to claude onlyOPENCLAW_VISION_PROVIDER, OPENCLAW_VISION_KEY, and OPENCLAW_VISION_MODEL from platform environmentstep 4: parse the script output
the script returns json with these top-level fields:
success (boolean) , whether parsing completed without errorno_actionable_content (boolean) , true if the screenshot contains no actionable event or taskactionable_confidence (number 0.0-1.0) , how confident the parser is that there is something actionableneeds_actionable_confirmation (boolean) , true if actionable_confidence is in the uncertain band (between actionable_threshold and 0.95); user confirmation is required before routingitems (array) , one entry per detected event or task with:type (string) , "event" or "task"destination (string) , "calendar" or "task"confidence (number) , legacy aggregate scoretype_confidence (number 0.0-1.0) , confidence that this is a calendar event vs. a taskconfidences (object) , per-field scores: title, date, time, location, participants, description, priority, end_date, end_time, recurrence, online_linkneeds_confirmation (boolean) , true if there are outstanding clarificationsclarifications (array) , objects with field (e.g. "time", "date", "type"), question (user-facing prompt), and reason ("missing" or "low_confidence" or "low_type_confidence")title, date, time, end_date, end_time, location, participants, description, recurrence, recurrence_day, online_link, priority, language, already_in_calendar_hint, is_rescheduledsummary_text (string) , chat-ready preview of what was found; send this verbatim to the userscreenshot_summary (string) , internal context (date range, language, etc.)parse_notes (string) , flags and context (e.g. "reschedule detected", "prep task inferred")step 5: handle no actionable content
if no_actionable_content is true:
step 6: handle top-level confirmation gate
if needs_actionable_confirmation is true (actionable_confidence is uncertain):
step 7: route each item to clarification or destination
for each item in items[]:
if needs_confirmation is false (no clarifications outstanding):
if needs_confirmation is true:
clarifications arrayquestion field to the user and wait for a replyfield: "type" (unsure if event or task), the user's reply determines destination: update it to "calendar" or "task"step 8: invoke the destination skill
determine which destination skill is installed and preferred:
for destination: "calendar":
title, date, time, end_time, end_date, location, participants, description, recurrence, recurrence_day, online_linkend_date is present and differs from date, set the event as multi-day (pass --all-day flag if available, or let the skill infer all-day from missing end_time)for destination: "task":
title, date (due date), time, description, prioritystep 9: confirm the save
after the destination skill returns:
on success:
on error from the destination skill:
step 10: clean up and finish
after all items are processed (routed or skipped):
implicit vs. explicit invocation
when a user sends a message with an image, scrask is invoked in two steps:
@ or / prefix): "scrask", "scrask this", "screenshot", or "screenshot to calendar", route to scrask immediatelyexplicit invocation with no image
if the user says "scrask" or "screenshot to calendar" but attaches no image, reply "please attach a screenshot" and stop. do not proceed.
gemini vs. claude routing in auto mode
if vision_provider is "auto":
GEMINI_API_KEY is set, send the image to gemini and get back a parse with per-field confidence scoresfallback_threshold and ANTHROPIC_API_KEY is also set, immediately retry the entire parse with claudeANTHROPIC_API_KEY is set (no gemini key), skip gemini and use claude onlyif vision_provider is "openclaw", "gemini", or "claude", use only that provider. if the chosen provider's credentials are missing, reply "vision provider {provider} not configured. check your env vars" and stop.
no actionable content (implicit vs. explicit context)
if no_actionable_content is true and the image was sent implicitly (no alias), silently ignore it.
if no_actionable_content is true and the user explicitly invoked scrask (used an alias), send summary_text so the user knows the skill ran and found nothing.
actionable confirmation gate
if needs_actionable_confirmation is true, send summary_text (which opens with "is this actually an event or task?") and wait. only proceed if the user says "yes". any other response (no, skip, or silence after 30 seconds) means drop the screenshot.
type clarification (event vs. task)
if a single item has field: "type" in clarifications (low type_confidence), ask the user "is this a calendar event or a task?" and update destination accordingly ("calendar" or "task"). this determines which skill to route to in step 8.
multi-item handling
if a single screenshot yields both an event and a task, process them independently:
date-has-passed check
after parsing, if any item has a date field in the past (earlier than the current date in the user's timezone), ask "this date has already passed. save anyway?" before routing. allow the user to correct the date, skip, or proceed anyway.
rescheduled event
if is_rescheduled is true in parse_notes, ask the user "this looks like a rescheduled event. should i update an existing entry?" before creating. this prevents duplicate entries.
already-in-calendar flag
if already_in_calendar_hint is true, reply "looks like this is already in your calendar 🗓️" and skip without routing.
no destination skill installed
if the item routes to "calendar" but no calendar skill is installed, reply "no calendar skill found. install one of: calctl, accli, apple-calendar, brainz-calendar, gcal-pro" and stop.
if the item routes to "task" but no task skill is installed, reply "no task skill found. install one of: apple-reminders, things-mac, notion" and stop.
destination skill error handling
if the destination skill returns an error (timeout, auth fail, rate limit, network error), surface the error to the user and ask "retry with a different destination, or skip?" on retry, try the next skill in the preference order. if all skills fail or are unavailable, ask to skip.
network timeout
if the vision api call times out after 30 seconds, reply "parsing took too long. please try again" and stop. do not retry automatically.
api rate limit
if the vision api returns a 429 (rate limit), reply "i'm hitting rate limits. try again in a few minutes" and stop.
image too large
if the image is over 10 mb, reply "screenshot is too large. please resize and resend" and stop.
language detection
if the parser detects a non-english language in the screenshot (field language), the title and description are already translated to english. save as-is. include a note in the confirmation: "note: translated from {language}" if helpful.
prep task inference
if the parser infers a prep task from an event (e.g. dinner at a restaurant implies "book table"), two items are emitted: the event and the reminder. the prep reminder typically has low field confidence (0.65-0.80 range) and will hit needs_confirmation: true. ask the standard clarifications before routing.
recurring event
if recurrence is set (e.g. "every monday") and recurrence_day is populated, pass both to the calendar skill's create command.
multi-day event
if end_date is present and differs from date, the event spans multiple days. pass both --date and --end-date to the calendar skill, or set --all-day flag if the skill supports it.
online link (zoom, meet, etc.)
if online_link is detected, pass it to the calendar skill. the skill should set it in both the location and description fields if possible.
success path:
success: trueno_actionable_content is true (and implicit invocation), no further outputno_actionable_content is true (and explicit invocation), send summary_textno_actionable_content is false and needs_actionable_confirmation is true, send summary_text and await user confirmationneeds_confirmation: true, send each clarification question from the clarifications array, collect user replies, and patch the itemerror path:
data format and retention:
user confirms the skill worked:
success without user action required:
if a screenshot has no clarifications needed (all per-field confidence scores are above field_threshold and type_confidence is above type_threshold), the item is routed to the destination skill immediately after the summary is sent. the user sees the confirmation in their chat and the entry in their calendar/task app, with no extra prompts.