Single-gateway image generation CLI for async text-to-image and image-to-image, with polling, download handling, and request alignment to the current gateway...
---
name: image-generation
description: Single-gateway image generation CLI for async text-to-image and image-to-image, with polling, download handling, and request alignment to the current gateway OpenAPI. Use when the user asks to generate an image, create a picture, draw something, or make a visual from a text prompt.
version: 0.5.0
metadata: { "pattern": ["tool-wrapper"], "openclaw": { "emoji": "🎨", "primaryEnv": "IMAGE_GEN_API_KEY", "requires": { "env": ["IMAGE_GEN_API_KEY"], "anyBins": ["bun", "npx"], "bins": ["node", "npm"] } } }
---
# Image Generation (`image-generation`)
Generate an image, create a picture, draw something, or make a visual from a text prompt — all through a single CLI gateway with async polling and automatic download.
This skill follows a **single aggregated gateway backend** model. One API key is used locally, while multiple models are aggregated behind the gateway. The CLI wraps the full async flow: submit task -> poll lowercase `task_status` -> fetch `data.images`. The backend platform is a **fixed implementation choice**, not a configurable provider switch. If another platform is needed, publish a separate skill for it.
The design goal is smooth automation for common workflows such as single-image generation, multi-file prompts, image-to-image with references, batch jobs, EXTEND defaults, and reusable style presets. Full gateway-alignment notes are in [references/weryai-platform.md](references/weryai-platform.md).
## Safety & Scope
- **Network**: This skill calls the WeryAI gateway over HTTPS (`https://api.weryai.com`).
- **Auth**: Uses `IMAGE_GEN_API_KEY`. The key is never printed. It may be persisted **only** when you explicitly run `npm run setup -- --persist-api-key`.
- **Reference images**: Must be public URLs (`https://` recommended). `http://` may work but is insecure. Local file paths and `data:` URLs are rejected.
- **No arbitrary shell**: The generation runtime does not execute arbitrary shell commands.
- **Files written**: Output images and optional local config under `.image-skills/image-generation/` (project) and/or `~/.image-skills/image-generation/` (home).
## Current Gateway Contract (WeryAI)
| Item | Contract |
| --- | --- |
| Base URL | `https://api.weryai.com` (hard-coded in `scripts/main.ts`) |
| Auth | `Authorization: Bearer` via **`IMAGE_GEN_API_KEY`** ([get a key](https://weryai.com/api/keys)) |
| text-to-image | `POST /v1/generation/text-to-image`; requires `model`, `prompt`, `aspect_ratio` |
| image-to-image | `POST /v1/generation/image-to-image`; also requires `images[]` |
| status lookup | `GET /v1/generation/{taskId}/status`; `task_status` is `waiting`, `processing`, `succeed`, or `failed` |
| business success | **`success: true`** (or `status: 200`); failures return business codes such as `1001`, `1002`, `1003` |
| text length | the script validates `prompt` and `negative_prompt` lengths before request submission |
| result download | after `succeed`, images are fetched by URL; the script uses timeout, retry, backoff, optional Bearer retry, and minimum-payload validation |
See [references/weryai-platform.md](references/weryai-platform.md) for field mapping, model lookup guidance, and troubleshooting flow.
## Step 0: First-Trigger Readiness Gate
When this skill is triggered for the **first time** in a project or environment, the agent must not jump straight to model selection or generation. It must do this first:
1. **Check local readiness silently** with `npm run ensure-ready -- --project . --workflow <workflow>`
2. **If runtime dependencies are missing**, propose to install them on behalf of the user
3. **If `IMAGE_GEN_API_KEY` is missing**, ask the user for permission to configure it now
4. Only after readiness and API key are resolved, continue to model selection / prompt clarification / generation
**Access token**: only use **`IMAGE_GEN_API_KEY`**. It may also live in `.image-skills/image-generation/.env` as `IMAGE_GEN_API_KEY=...`. After the user approves, the agent may persist it there by running `npm run setup -- --project . --workflow <workflow> --persist-api-key` when the key is already in env, or by writing the file locally on the user's behalf instead of asking the user to edit files manually.
**First-use readiness check**: before the first generation run in a new OpenClaw or local instance, the agent must run:
```bash
npm run ensure-ready -- --project . --workflow <workflow>
```
This readiness step is not optional. It checks the local toolchain, reads the local doctor report, and automatically runs `bootstrap` when local script dependencies are missing.
**First-trigger user behavior**:
- If dependencies are missing: ask for approval to install them, then install silently
- If `IMAGE_GEN_API_KEY` is missing: tell the user image generation needs an API key and offer to configure it now by writing `.image-skills/image-generation/.env` on the user's behalf
- Do not ask the user to debug the environment before this readiness gate runs
- Do not ask the user to choose a model before readiness and API key are resolved
- Treat API keys as secrets: prefer writing them locally on the user's behalf, never echo them back, and never include them in normal progress messages
**`EXTEND.md`** is optional and can hold default model, quality, aspect ratio, and batch worker limits.
```bash
test -f .image-skills/image-generation/EXTEND.md && echo project
test -f "$HOME/.image-skills/image-generation/EXTEND.md" && echo user
```
**Default model on initialization**: if no model is configured yet, initialize this skill with **Nano Banana 2** (`GEMINI_3_1_FLASH_IMAGE`) and tell the user that it is now the default. Also remind them they can switch models anytime later.
**Model selection**: after initialization, one of these provides the active default:
- `--model`
- `default_model` in `EXTEND.md`
- `IMAGE_GEN_DEFAULT_MODEL`
If none of them is set yet, the agent should initialize the local default to **Nano Banana 2** first, tell the user that this workspace now defaults to that model, and remind them they can switch anytime if another model fits better. See [references/config/first-time-setup.md](references/config/first-time-setup.md), [references/config/preferences-schema.md](references/config/preferences-schema.md), and [references/config/model-registry-schema.md](references/config/model-registry-schema.md).
**Style presets**: [references/style-presets.md](references/style-presets.md)
Model priority:
`--model` -> `EXTEND.md default_model` -> `IMAGE_GEN_DEFAULT_MODEL`
## If No Model Was Specified
When the user wants image generation but no model is configured (`EXTEND.md`, `--model`, and `IMAGE_GEN_DEFAULT_MODEL` are all absent):
**Do not ask the user to read docs or edit config files.** Follow the guided flow in [references/config/first-time-setup.md](references/config/first-time-setup.md) § "Model Selection — Agent-Guided Flow":
1. Start from the bundled starter registry shipped with this skill. If the model list looks stale or a requested model is missing, refresh it silently with `npm run discover-image-models -- --out .image-skills/image-generation/MODELS.json`
2. Initialize local defaults with **Nano Banana 2** (`GEMINI_3_1_FLASH_IMAGE`) by writing `EXTEND.md`
3. Tell the user that the default model is now **Nano Banana 2**, and explicitly remind them they can switch anytime later if they need another model
4. Continue to prompt clarification and generation
5. If the user immediately asks for another model, rank candidates for the workflow → `npm run recommend-model -- --workflow <workflow> --role <role> --json`, then update `EXTEND.md`
If the workflow is more specific than a generic single image, prefer a role-aware recommendation such as `comic-page`, `comic-character-sheet`, `infographic-dense`, or `article-framework` instead of only passing the broad workflow name.
If the user later asks to switch models, update `EXTEND.md` in place — do not ask the user to edit it manually.
If the gateway returns "model does not exist" or a similar model-key error, the agent must first refresh from WeryAI docs and retry recommendation before asking the user for any platform-side information.
## If The Request Is Underspecified
When the user gives only a rough idea and the visual brief is still weak, do not jump straight into prompt writing. Use the local brief helper first:
```bash
npm run build-visual-brief -- --workflow cover --topic "Habit systems"
```
Use its question menu to ask about at least:
1. Photorealistic or illustration
2. Color temperature / palette direction
3. Shot language (close-up, wide, etc.)
4. Composition pattern
5. Aspect ratio
6. Text density
Then map the answered brief into the final prompt or workflow files.
Do **not** tell the user to go read the docs alone. If the CLI fails because the model is missing, the agent should complete this selection flow and retry.
## Workflow
1. On first trigger, run the readiness gate: check dependencies, bootstrap missing local tooling, and confirm `IMAGE_GEN_API_KEY`.
2. Confirm a default model is configured (via `EXTEND.md` or environment), or enter the guided model-selection flow.
3. Build the prompt: inline text (`--prompt`) or assembled from files (`--promptfiles`).
4. Choose a style preset if applicable (`--style`).
5. Run the CLI to submit the task, poll for completion, and download the result.
6. For multiple images, use batch mode (`--batchfile`) with parallel jobs.
## Script
`{baseDir}` is the directory containing this file. `${BUN_X}` is either `bun` or `npx -y bun`.
| Path | Purpose |
| --- | --- |
| `{baseDir}/scripts/main.ts` | the only execution entrypoint |
## Usage
```bash
# examples only; M should be chosen by the user or resolved by the agent
M=<chosen model key>
# single image
${BUN_X} {baseDir}/scripts/main.ts --prompt "a cat" --image cat.png --ar 1:1 -m "$M"
# prompt assembled from multiple files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md user.md --image out.png --ar 16:9 -m "$M"
# style preset
${BUN_X} {baseDir}/scripts/main.ts --prompt "city nightscape" --style cinematic --image out.png --ar 16:9 -m "$M"
# image-to-image with reference
${BUN_X} {baseDir}/scripts/main.ts --prompt "turn it into cyberpunk" --image out.png --ref src.png --ar 1:1 -m "$M"
# quality / resolution
${BUN_X} {baseDir}/scripts/main.ts --prompt "poster" --image poster.png --ar 16:9 --quality 2k -m "$M"
${BUN_X} {baseDir}/scripts/main.ts --prompt "poster" --image poster.png --ar 16:9 --imageSize 2K -m "$M"
# infer aspect from size when --ar is omitted
${BUN_X} {baseDir}/scripts/main.ts --prompt "scene" --image scene.png --size 1280x720 -m "$M"
# batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
# do not write downloaded images to disk
${BUN_X} {baseDir}/scripts/main.ts --prompt "abstract" --image dummy.png --ar 1:1 --no-download -m "$M"
# dry-run request preview; --image is optional here
${BUN_X} {baseDir}/scripts/main.ts --prompt "test" --ar 1:1 -m "$M" --dry-run
```
### Batch JSON Example
```json
{
"jobs": 4,
"tasks": [
{
"id": "hero",
"promptFiles": ["prompts/hero.md"],
"image": "out/hero.png",
"model": "<model key from gateway docs>",
"style": "editorial",
"ar": "16:9",
"quality": "2k",
"use_web_search": false
},
{
"id": "edit",
"prompt": "turn it into watercolor",
"image": "out/edit.png",
"ref": ["assets/in.png"],
"negative_prompt": "blurry",
"webhook_url": "https://example.com/hook"
}
]
}
```
Paths are resolved relative to the **batch file directory**. If a `provider` field exists, it is ignored in single-gateway mode. Task-level optional fields include `style`, `webhook_url`, `negative_prompt`, `resolution`, `use_web_search`, and `caller_id`.
## Main Options
| Option | Description |
| --- | --- |
| `--prompt` / `-p` | prompt text |
| `--promptfiles` | concatenate multiple files into the prompt |
| `--image` / `-o` / `--output` | output path in single-task mode; defaults to `.png` if no extension is given |
| `--batchfile` | batch JSON file |
| `--jobs` | worker count |
| `--model` / `-m` | model key; one of CLI / `EXTEND.md` / `IMAGE_GEN_DEFAULT_MODEL` must provide it |
| `--style` | style preset, see [style-presets.md](references/style-presets.md) |
| `--ar` / `--aspect-ratio` | aspect ratio; defaults to `EXTEND.md` or `1:1` |
| `--size` | `WIDTHxHEIGHT`; can infer aspect ratio if `--ar` is omitted |
| `--quality` | `normal` or `2k`, mapped into gateway resolution |
| `--imageSize` | `1K`, `2K`, or `4K` |
| `--resolution` | pass resolution directly to the API |
| `--negative-prompt` | negative prompt |
| `--ref` / `--reference` | reference image, either URL or local file |
| `--n` | `image_number` (default: 1) |
| `--webhook-url` | gateway `webhook_url` |
| `--use-web-search` | set `use_web_search: true` |
| `--caller-id` | gateway `caller_id` |
| `--poll-interval-ms` / `--poll-timeout-ms` | polling controls |
| `--no-download` | skip file writing |
| `--dry-run` | print the final request body instead of calling the API |
| `--json` | JSON summary or batch report |
## Parallelism and Retry
- Batch jobs use gated concurrency through `--jobs`, with environment overrides such as `BASE_IMAGE_GEN_CONCURRENCY` and `BASE_IMAGE_GEN_START_INTERVAL_MS`.
- A single task retries up to 3 times, but obvious parameter or auth errors are not retried.
- Batch worker caps come from `batch.max_workers` in `EXTEND.md` or `BASE_IMAGE_MAX_WORKERS`.
## Environment Variables
| Variable | Description |
| --- | --- |
| `IMAGE_GEN_API_KEY` | the only API key variable |
| `IMAGE_GEN_DEFAULT_MODEL` | default model key |
| `BASE_IMAGE_MAX_WORKERS` | override batch worker limit |
| `BASE_IMAGE_GEN_CONCURRENCY` | batch concurrency |
| `BASE_IMAGE_GEN_START_INTERVAL_MS` | minimum delay between batch task starts |
Supported `.env` locations:
- `<cwd>/.image-skills/.env`
- `$HOME/.image-skills/.env`
Existing environment variables are not overwritten.
## Agent Notes
1. On first use in a new environment, run `npm run ensure-ready -- --project . --workflow <workflow>` from this skill directory before generation. Do not skip this just because direct API calls appear to work.
2. Confirm that `IMAGE_GEN_API_KEY` is available, directly or through `.image-skills/.env`.
3. If the model is missing, complete the model-selection flow above before running the CLI.
4. Single-task mode usually requires `--image`, but `--dry-run` may omit it. If `--ref` is present, the script uses the image-to-image endpoint.
5. Polling behavior is: `waiting` / `processing` -> keep polling, `succeed` -> use `images`, `failed` -> inspect `msg`.
6. The selected model key is printed to stderr for easier debugging.
## User Feedback During Generation
- **Before generation starts**: tell the user what's about to happen, including the **model being used** (e.g., "Generating your cover with Model X, roughly 30 seconds"). Never start silently.
- **Model disclosure**: always mention the model name/key when generation begins. This helps the user understand quality expectations and makes later model switching easier.
- **Batch generation**: estimate the total count and rough time. As images complete, show them incrementally — do not wait for the entire batch to finish before showing anything.
- **On failure**: do not dump error codes. Translate the failure into a user-level explanation and suggest a fix (e.g., "This model is temporarily unavailable — want to try another one?").
- **On timeout**: if polling exceeds the expected window, proactively inform the user (e.g., "Taking longer than usual, still waiting") rather than staying silent.
- **On completion**: send/display the image immediately. Never output just a filename — the user must see the actual image. If the platform supports file sending, send the file. If it supports inline rendering, render inline. If neither works, provide a download URL.
## Interaction Rules (Mandatory)
These rules apply to every user interaction. They are not optional guidelines.
1. **Never show commands, file paths, config syntax, or schema details to the user.** All tool execution is silent.
2. **Never ask the user to edit config files.** The agent writes `EXTEND.md`, `MODELS.json`, and other configs on behalf of the user.
3. **Ask one question at a time.** Do not present all dimensions at once. Lead with the highest-impact choice.
4. **Offer concrete options with descriptions**, not raw technical names. If presenting styles, describe what each looks like.
5. **Always disclose the model and estimated time** when generation starts.
6. **Show images directly** when delivering results. Never respond with only file paths or file names. If the channel supports sending image files, send the file. If it supports inline display, display inline. If neither is possible, provide a clickable download link. A bare filename like `output.png` is never acceptable delivery.
7. **On iteration**, only re-generate what changed. Do not restart the entire workflow.
8. **If tools are missing**, propose to install them on behalf of the user — never expose tool names, PATH errors, or ask the user to run install commands. See "Missing Tools" below.
9. **Default to Nano Banana 2 unless the user asks otherwise.** When no default model is configured yet, initialize `GEMINI_3_1_FLASH_IMAGE` (Nano Banana 2), tell the user that it is now the default for this skill, and explicitly remind them they can switch any time if another model fits better.
10. **Reply in the user's language.** Match the language the user is using for questions, status updates, explanations, and delivery messages unless they explicitly ask you to switch.
11. **On first trigger, run readiness before anything else.** Check dependencies and `IMAGE_GEN_API_KEY` status before asking the user to choose a model or before starting generation.
12. **Handle API keys as secrets.** Offer to configure `IMAGE_GEN_API_KEY` locally for the user, avoid asking them to paste secrets into normal chat when a local write is possible, and never repeat or quote the key back.
## Re-run Behavior
- Running `main.ts` with the same `--image` path overwrites the existing output file.
- `--batchfile` re-processes all tasks in the batch; completed images are overwritten.
- `--dry-run` never writes files regardless of prior state.
- Polling a previously completed `taskId` returns the cached result without re-generating.
## Definition of Done
- `IMAGE_GEN_API_KEY` is configured and a default model is set in `EXTEND.md`.
- The CLI produces a valid image file (or a correct `--dry-run` request body).
- Batch mode completes all tasks and reports per-task status via `--json`.
- Failed tasks include a user-readable explanation, not just an error code.
- The model name is disclosed to the user before generation starts.
## Missing Tools
If `bun` and `npx` are both unavailable in the current environment:
1. **Do not expose tool names or PATH errors to the user.** Never mention "bun", "npx", "PATH", or installation commands.
2. **Propose installation in user-friendly terms.** Say something like: "I need to set up a small runtime to generate images — OK if I install it now? Takes about 30 seconds."
3. **On user approval**, run `npm install -g bun` or the platform-appropriate installer silently. Report success or failure in plain language.
4. **If the user declines or installation fails**, explain honestly: "I can't generate images in this environment right now. You can ask your admin to set up the image toolchain, or try again in a different environment."
5. **Never ask the user to install tools themselves** or paste terminal commands.
## Troubleshooting
See [references/weryai-platform.md](references/weryai-platform.md). For failed calls, compare the request and response with [call history](https://weryai.com/api/history).
don't have the plugin yet? install it then click "run inline in claude" again.
extracted implicit decision flows into explicit if-else branches, documented all external connections and env vars, added edge-case handling for timeouts and auth failures, clarified input/output contracts with file paths and data formats, and reformatted into implexa's 6-component structure while preserving original intent and procedure.
image-generation-2)generate images from text prompts or image-to-image transformations through a single CLI gateway. handles async task submission, polling for completion, and automatic file download. use this when the user asks to generate, create, draw, visualize, or transform images.
this skill uses a fixed weryai gateway backend (not configurable provider switching). one api key handles auth; multiple models are aggregated server-side. design goal is smooth automation for single images, batch jobs, style presets, reference images, and reusable defaults.
use this skill when the user requests image generation, visual creation, artwork, or visual transformation from text or reference images. covers text-to-image workflows (simple prompts, multi-file briefs, style presets) and image-to-image workflows (reference-based edits). do not use this for video generation, 3D models, or non-visual content.
IMAGE_GEN_API_KEY: required. weryai api bearer token. get it at https://weryai.com/api/keys. may live in env, <cwd>/.image-skills/image-generation/.env, or $HOME/.image-skills/image-generation/.env. the agent handles setup; user never pastes keys into chat.IMAGE_GEN_DEFAULT_MODEL: optional. fallback model key when neither cli --model nor EXTEND.md specifies one. defaults to GEMINI_3_1_FLASH_IMAGE (nano banana 2) on first init.BASE_IMAGE_MAX_WORKERS, BASE_IMAGE_GEN_CONCURRENCY, BASE_IMAGE_GEN_START_INTERVAL_MS: optional concurrency tuning for batch mode.{skill-dir}/.image-skills/image-generation/EXTEND.md: optional. holds default_model, batch.max_workers, default_ar, default quality. agent writes this on first run.{skill-dir}/.image-skills/image-generation/MODELS.json: optional bundled model registry, refreshed on demand via npm run discover-image-models.{skill-dir}/.image-skills/image-generation/STYLE-PRESETS.md or via --style option: named preset bundles (e.g., cinematic, editorial, watercolor).https://api.weryai.com (hardcoded).POST /v1/generation/text-to-image. requires model, prompt, aspect_ratio. optional: negative_prompt, quality, webhook_url, use_web_search, caller_id.POST /v1/generation/image-to-image. same as above plus images[] (array of public https urls or local paths, converted to urls before submission).GET /v1/generation/{taskId}/status. response includes task_status (waiting, processing, succeed, failed), data.images[] (on succeed), and msg (on failed).Authorization: Bearer {IMAGE_GEN_API_KEY}.node 16+ and npm 7+.bun or npx (bun preferred for speed). if neither available, agent proposes installation.{skill-dir}/scripts/main.ts: the only execution entrypoint (typescript, runs via bun or bun via npx).--prompt or --promptfiles: text source (inline string or assembled from markdown files).--image / -o / --output: destination file path (.png inferred if no extension). required for single-task mode unless --dry-run.--model / -m: model key (overrides defaults). must be present (cli, EXTEND.md, or IMAGE_GEN_DEFAULT_MODEL).--style: named preset (e.g., cinematic, editorial); optional.--ar / --aspect-ratio: WxH ratio (e.g., 16:9, 1:1). defaults to 1:1 or EXTEND.md value.--size: WIDTHxHEIGHT pixels; infers ar if --ar omitted.--quality: normal or 2k, mapped to gateway resolution.--negative-prompt: text to exclude.--ref / --reference: reference image (public https url or local path). triggers image-to-image.--n: number of images to generate (default 1).--webhook-url: optional callback on completion.--use-web-search: true to include web search in prompt understanding.--caller-id: optional task metadata.--batchfile: json file with array of tasks (parallel mode).--jobs: worker count for batch (default 1).--poll-interval-ms, --poll-timeout-ms: control polling rhythm (defaults: 2000ms, 300000ms).--no-download: skip file write after success.--dry-run: print request body without api call.--json: output batch report as json.https://api.weryai.com over https. requires outbound internet.file:// and data: urls are rejected by the gateway.input: project path, workflow name (optional).
process:
npm run ensure-ready -- --project . --workflow <workflow> from skill directory.npm install -g bun or platform equivalent. on decline or failure, inform user honestly that image generation is not available in this environment.IMAGE_GEN_API_KEY is not found in env or .env, agent tells user that image generation needs an api key, offers to configure it now, and writes it to .image-skills/image-generation/.env on user's behalf (never ask user to edit files manually).output: toolchain ready, api key in env or local .env.
skip conditions: skip this step on subsequent runs if readiness was already confirmed for this project.
input: --model cli flag (optional), EXTEND.md (optional), IMAGE_GEN_DEFAULT_MODEL env (optional).
process:
--model > EXTEND.md default_model > IMAGE_GEN_DEFAULT_MODEL.EXTEND.md with default_model: GEMINI_3_1_FLASH_IMAGE (nano banana 2).EXTEND.md in place and skip the re-initialization message.output: active model_key confirmed. user told which model will be used.
input: user's brief or rough idea.
process:
npm run build-visual-brief -- --workflow <workflow> --topic "<topic>".output: clarified prompt, chosen aspect ratio, any style preference.
input: prompt (inline or from files), model, style preset (optional), reference image (optional), aspect ratio, quality, batch file (optional).
process:
--promptfiles given, concatenate files in order. else use --prompt string. validate length against gateway limits (usually 2000 chars for prompt, 1000 for negative_prompt).--ref given, convert local file to public url (or accept existing https url). validate that urls are https or will be rejected. image-to-image endpoint will be used.--style given, look up in STYLE-PRESETS.md and merge preset attributes (e.g., style prompt suffix, quality boost) into final request.--ar, else infer from --size, else use EXTEND.md default_ar, else default 1:1. validate format (e.g., 16:9 or 1:1).--quality normal|2k or --imageSize 1K|2K|4K to gateway resolution field. if --resolution is explicit, use it.--batchfile): load json, resolve task paths relative to batch file directory, apply job concurrency limit from cli or env.output: validated request body, ready for submission.
input: model name, estimated time.
process:
output: user informed and ready. no silent generation.
input: request body, poll interval (default 2000ms), poll timeout (default 300000ms = 5 min).
process:
--dry-run): print request json to stdout and exit. do not call api.POST request to weryai text-to-image or image-to-image endpoint (determined by presence of ref).taskId and task_status: waiting.GET /v1/generation/{taskId}/status every poll-interval-ms (default 2000ms).task_status.waiting or processing: continue polling.succeed: extract data.images[] (array of image urls), move to step 7.failed: extract msg (error description), move to step 8.poll-timeout-ms without completion, timeout and error out.--batchfile):--jobs concurrency).--json requested).output: either images[] urls on succeed, or error msg on fail/timeout.
input: image urls from data.images[] (or skip if --no-download).
process:
--no-download, skip to step 9.images[]:--image path. if multiple images generated (e.g., --n 2), append index (e.g., out_1.png, out_2.png).output.png. user must see or access the actual image.output: image files written to disk and/or displayed to user.
input: error code or message from gateway or network.
process:
model does not exist or similar model-key error: refresh model registry (npm run discover-image-models) and offer to retry with a different model.IMAGE_GEN_API_KEY.output: user-facing explanation and suggested action.
input: final status (success, error, or timeout).
process:
--json): output array of tasks with per-task status, taskId, output_path, images[], msg (on error).output: images displayed, report logged, user informed of outcome.
if --model, EXTEND.md default_model, and IMAGE_GEN_DEFAULT_MODEL are all absent,
then initialize with nano banana 2 (GEMINI_3_1_FLASH_IMAGE), write to EXTEND.md, tell user it's now the default, and remind them they can switch anytime.
else use the highest-priority model from the three sources.
if user's brief is underspecified (no details on style, color, composition, aspect ratio),
then run npm run build-visual-brief to ask guiding questions and refine the prompt.
else skip the brief helper and proceed to request assembly.
if --ref is present,
then use image-to-image endpoint (POST /v1/generation/image-to-image) and validate that ref is a public https url or local file convertible to url.
else use text-to-image endpoint.
if polling exceeds poll-timeout-ms (default 300000ms),
then inform user proactively ("still waiting, taking longer than expected"), and either continue polling or fail with user-friendly explanation.
else complete normally when task_status is succeed or failed.
if --batchfile is provided,
then enter batch mode: load json, parallelize up to --jobs concurrency, report per-task status incrementally, and optionally output json report.
else single task mode: one prompt, one image output, simple poll-and-download flow.
if npm run ensure-ready detects missing tools or no IMAGE_GEN_API_KEY,
then agent proposes installation or configuration (never asks user to edit files or run commands), on approval proceeds silently, on decline explains honestly that image generation is unavailable.
else readiness confirmed, proceed to model check.
if gateway returns "model does not exist" or similar model-key error,
then agent refreshes model registry and offers to recommend and retry with a different model, before asking user for any platform-side info.
else proceed with error explanation and next-step guidance.
if --style is provided,
then look up preset in STYLE-PRESETS.md and merge preset attributes (e.g., style suffix, quality boost, aspect ratio override) into request.
else use base request without preset.
{output-path} specified by --image or inferred from batch task. directory created if needed. multiple images indexed (out_1.png, out_2.png, etc.) if --n > 1.--json)id: task id from batch json.taskId: gateway task id (for manual polling if needed).status: succeed, failed, or timeout.output_path: final file path (on succeed).images: array of image urls (on succeed).msg: error explanation (on failed or timeout).timestamp: unix ms when polling finished.taskId, polling that id again returns the cached result without re-generating.original source: clawhub. enriched and restructured to implexa standards: explicit decision logic, edge-case handling (timeouts, rate limits, auth expiry, empty result sets), input/output contracts, and outcome signals.