Image Generation

Single-gateway image generation CLI for async text-to-image and image-to-image, with polling, download handling, and request alignment to the current gateway...

installs

stars

karma

SkillRank score ↗

7.4/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

image-generation-2 wraps async text-to-image and image-to-image requests against the weryai gateway, with polling, batch support, style presets, and first-run readiness gates. covers model selection, prompt assembly, and incremental user feedback during generation.

structure

9.0

trigger phrases

8.0

procedure

7.0

edge cases

6.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: image-generation
description: Single-gateway image generation CLI for async text-to-image and image-to-image, with polling, download handling, and request alignment to the current gateway OpenAPI. Use when the user asks to generate an image, create a picture, draw something, or make a visual from a text prompt.
version: 0.5.0
metadata: { "pattern": ["tool-wrapper"], "openclaw": { "emoji": "🎨", "primaryEnv": "IMAGE_GEN_API_KEY", "requires": { "env": ["IMAGE_GEN_API_KEY"], "anyBins": ["bun", "npx"], "bins": ["node", "npm"] } } }
---

# Image Generation (`image-generation`)

Generate an image, create a picture, draw something, or make a visual from a text prompt — all through a single CLI gateway with async polling and automatic download.

This skill follows a **single aggregated gateway backend** model. One API key is used locally, while multiple models are aggregated behind the gateway. The CLI wraps the full async flow: submit task -> poll lowercase `task_status` -> fetch `data.images`. The backend platform is a **fixed implementation choice**, not a configurable provider switch. If another platform is needed, publish a separate skill for it.

The design goal is smooth automation for common workflows such as single-image generation, multi-file prompts, image-to-image with references, batch jobs, EXTEND defaults, and reusable style presets. Full gateway-alignment notes are in [references/weryai-platform.md](references/weryai-platform.md).

## Safety & Scope

- **Network**: This skill calls the WeryAI gateway over HTTPS (`https://api.weryai.com`).
- **Auth**: Uses `IMAGE_GEN_API_KEY`. The key is never printed. It may be persisted **only** when you explicitly run `npm run setup -- --persist-api-key`.
- **Reference images**: Must be public URLs (`https://` recommended). `http://` may work but is insecure. Local file paths and `data:` URLs are rejected.
- **No arbitrary shell**: The generation runtime does not execute arbitrary shell commands.
- **Files written**: Output images and optional local config under `.image-skills/image-generation/` (project) and/or `~/.image-skills/image-generation/` (home).


## Current Gateway Contract (WeryAI)

| Item | Contract |
| --- | --- |
| Base URL | `https://api.weryai.com` (hard-coded in `scripts/main.ts`) |
| Auth | `Authorization: Bearer` via **`IMAGE_GEN_API_KEY`** ([get a key](https://weryai.com/api/keys)) |
| text-to-image | `POST /v1/generation/text-to-image`; requires `model`, `prompt`, `aspect_ratio` |
| image-to-image | `POST /v1/generation/image-to-image`; also requires `images[]` |
| status lookup | `GET /v1/generation/{taskId}/status`; `task_status` is `waiting`, `processing`, `succeed`, or `failed` |
| business success | **`success: true`** (or `status: 200`); failures return business codes such as `1001`, `1002`, `1003` |
| text length | the script validates `prompt` and `negative_prompt` lengths before request submission |
| result download | after `succeed`, images are fetched by URL; the script uses timeout, retry, backoff, optional Bearer retry, and minimum-payload validation |

See [references/weryai-platform.md](references/weryai-platform.md) for field mapping, model lookup guidance, and troubleshooting flow.

## Step 0: First-Trigger Readiness Gate

When this skill is triggered for the **first time** in a project or environment, the agent must not jump straight to model selection or generation. It must do this first:

1. **Check local readiness silently** with `npm run ensure-ready -- --project . --workflow <workflow>`
2. **If runtime dependencies are missing**, propose to install them on behalf of the user
3. **If `IMAGE_GEN_API_KEY` is missing**, ask the user for permission to configure it now
4. Only after readiness and API key are resolved, continue to model selection / prompt clarification / generation

**Access token**: only use **`IMAGE_GEN_API_KEY`**. It may also live in `.image-skills/image-generation/.env` as `IMAGE_GEN_API_KEY=...`. After the user approves, the agent may persist it there by running `npm run setup -- --project . --workflow <workflow> --persist-api-key` when the key is already in env, or by writing the file locally on the user's behalf instead of asking the user to edit files manually.

**First-use readiness check**: before the first generation run in a new OpenClaw or local instance, the agent must run:

```bash
npm run ensure-ready -- --project . --workflow <workflow>
```

This readiness step is not optional. It checks the local toolchain, reads the local doctor report, and automatically runs `bootstrap` when local script dependencies are missing.

**First-trigger user behavior**:

- If dependencies are missing: ask for approval to install them, then install silently
- If `IMAGE_GEN_API_KEY` is missing: tell the user image generation needs an API key and offer to configure it now by writing `.image-skills/image-generation/.env` on the user's behalf
- Do not ask the user to debug the environment before this readiness gate runs
- Do not ask the user to choose a model before readiness and API key are resolved
- Treat API keys as secrets: prefer writing them locally on the user's behalf, never echo them back, and never include them in normal progress messages

**`EXTEND.md`** is optional and can hold default model, quality, aspect ratio, and batch worker limits.

```bash
test -f .image-skills/image-generation/EXTEND.md && echo project
test -f "$HOME/.image-skills/image-generation/EXTEND.md" && echo user
```

**Default model on initialization**: if no model is configured yet, initialize this skill with **Nano Banana 2** (`GEMINI_3_1_FLASH_IMAGE`) and tell the user that it is now the default. Also remind them they can switch models anytime later.

**Model selection**: after initialization, one of these provides the active default:

- `--model`
- `default_model` in `EXTEND.md`
- `IMAGE_GEN_DEFAULT_MODEL`

If none of them is set yet, the agent should initialize the local default to **Nano Banana 2** first, tell the user that this workspace now defaults to that model, and remind them they can switch anytime if another model fits better. See [references/config/first-time-setup.md](references/config/first-time-setup.md), [references/config/preferences-schema.md](references/config/preferences-schema.md), and [references/config/model-registry-schema.md](references/config/model-registry-schema.md).

**Style presets**: [references/style-presets.md](references/style-presets.md)

Model priority:

`--model` -> `EXTEND.md default_model` -> `IMAGE_GEN_DEFAULT_MODEL`

## If No Model Was Specified

When the user wants image generation but no model is configured (`EXTEND.md`, `--model`, and `IMAGE_GEN_DEFAULT_MODEL` are all absent):

**Do not ask the user to read docs or edit config files.** Follow the guided flow in [references/config/first-time-setup.md](references/config/first-time-setup.md) § "Model Selection — Agent-Guided Flow":

1. Start from the bundled starter registry shipped with this skill. If the model list looks stale or a requested model is missing, refresh it silently with `npm run discover-image-models -- --out .image-skills/image-generation/MODELS.json`
2. Initialize local defaults with **Nano Banana 2** (`GEMINI_3_1_FLASH_IMAGE`) by writing `EXTEND.md`
3. Tell the user that the default model is now **Nano Banana 2**, and explicitly remind them they can switch anytime later if they need another model
4. Continue to prompt clarification and generation
5. If the user immediately asks for another model, rank candidates for the workflow → `npm run recommend-model -- --workflow <workflow> --role <role> --json`, then update `EXTEND.md`

If the workflow is more specific than a generic single image, prefer a role-aware recommendation such as `comic-page`, `comic-character-sheet`, `infographic-dense`, or `article-framework` instead of only passing the broad workflow name.

If the user later asks to switch models, update `EXTEND.md` in place — do not ask the user to edit it manually.

If the gateway returns "model does not exist" or a similar model-key error, the agent must first refresh from WeryAI docs and retry recommendation before asking the user for any platform-side information.

## If The Request Is Underspecified

When the user gives only a rough idea and the visual brief is still weak, do not jump straight into prompt writing. Use the local brief helper first:

```bash
npm run build-visual-brief -- --workflow cover --topic "Habit systems"
```

Use its question menu to ask about at least:

1. Photorealistic or illustration
2. Color temperature / palette direction
3. Shot language (close-up, wide, etc.)
4. Composition pattern
5. Aspect ratio
6. Text density

Then map the answered brief into the final prompt or workflow files.

Do **not** tell the user to go read the docs alone. If the CLI fails because the model is missing, the agent should complete this selection flow and retry.

## Workflow

1. On first trigger, run the readiness gate: check dependencies, bootstrap missing local tooling, and confirm `IMAGE_GEN_API_KEY`.
2. Confirm a default model is configured (via `EXTEND.md` or environment), or enter the guided model-selection flow.
3. Build the prompt: inline text (`--prompt`) or assembled from files (`--promptfiles`).
4. Choose a style preset if applicable (`--style`).
5. Run the CLI to submit the task, poll for completion, and download the result.
6. For multiple images, use batch mode (`--batchfile`) with parallel jobs.

## Script

`{baseDir}` is the directory containing this file. `${BUN_X}` is either `bun` or `npx -y bun`.

| Path | Purpose |
| --- | --- |
| `{baseDir}/scripts/main.ts` | the only execution entrypoint |

## Usage

```bash
# examples only; M should be chosen by the user or resolved by the agent
M=<chosen model key>

# single image
${BUN_X} {baseDir}/scripts/main.ts --prompt "a cat" --image cat.png --ar 1:1 -m "$M"

# prompt assembled from multiple files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md user.md --image out.png --ar 16:9 -m "$M"

# style preset
${BUN_X} {baseDir}/scripts/main.ts --prompt "city nightscape" --style cinematic --image out.png --ar 16:9 -m "$M"

# image-to-image with reference
${BUN_X} {baseDir}/scripts/main.ts --prompt "turn it into cyberpunk" --image out.png --ref src.png --ar 1:1 -m "$M"

# quality / resolution
${BUN_X} {baseDir}/scripts/main.ts --prompt "poster" --image poster.png --ar 16:9 --quality 2k -m "$M"
${BUN_X} {baseDir}/scripts/main.ts --prompt "poster" --image poster.png --ar 16:9 --imageSize 2K -m "$M"

# infer aspect from size when --ar is omitted
${BUN_X} {baseDir}/scripts/main.ts --prompt "scene" --image scene.png --size 1280x720 -m "$M"

# batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

# do not write downloaded images to disk
${BUN_X} {baseDir}/scripts/main.ts --prompt "abstract" --image dummy.png --ar 1:1 --no-download -m "$M"

# dry-run request preview; --image is optional here
${BUN_X} {baseDir}/scripts/main.ts --prompt "test" --ar 1:1 -m "$M" --dry-run
```

### Batch JSON Example

```json
{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "model": "<model key from gateway docs>",
      "style": "editorial",
      "ar": "16:9",
      "quality": "2k",
      "use_web_search": false
    },
    {
      "id": "edit",
      "prompt": "turn it into watercolor",
      "image": "out/edit.png",
      "ref": ["assets/in.png"],
      "negative_prompt": "blurry",
      "webhook_url": "https://example.com/hook"
    }
  ]
}
```

Paths are resolved relative to the **batch file directory**. If a `provider` field exists, it is ignored in single-gateway mode. Task-level optional fields include `style`, `webhook_url`, `negative_prompt`, `resolution`, `use_web_search`, and `caller_id`.

## Main Options

| Option | Description |
| --- | --- |
| `--prompt` / `-p` | prompt text |
| `--promptfiles` | concatenate multiple files into the prompt |
| `--image` / `-o` / `--output` | output path in single-task mode; defaults to `.png` if no extension is given |
| `--batchfile` | batch JSON file |
| `--jobs` | worker count |
| `--model` / `-m` | model key; one of CLI / `EXTEND.md` / `IMAGE_GEN_DEFAULT_MODEL` must provide it |
| `--style` | style preset, see [style-presets.md](references/style-presets.md) |
| `--ar` / `--aspect-ratio` | aspect ratio; defaults to `EXTEND.md` or `1:1` |
| `--size` | `WIDTHxHEIGHT`; can infer aspect ratio if `--ar` is omitted |
| `--quality` | `normal` or `2k`, mapped into gateway resolution |
| `--imageSize` | `1K`, `2K`, or `4K` |
| `--resolution` | pass resolution directly to the API |
| `--negative-prompt` | negative prompt |
| `--ref` / `--reference` | reference image, either URL or local file |
| `--n` | `image_number` (default: 1) |
| `--webhook-url` | gateway `webhook_url` |
| `--use-web-search` | set `use_web_search: true` |
| `--caller-id` | gateway `caller_id` |
| `--poll-interval-ms` / `--poll-timeout-ms` | polling controls |
| `--no-download` | skip file writing |
| `--dry-run` | print the final request body instead of calling the API |
| `--json` | JSON summary or batch report |

## Parallelism and Retry

- Batch jobs use gated concurrency through `--jobs`, with environment overrides such as `BASE_IMAGE_GEN_CONCURRENCY` and `BASE_IMAGE_GEN_START_INTERVAL_MS`.
- A single task retries up to 3 times, but obvious parameter or auth errors are not retried.
- Batch worker caps come from `batch.max_workers` in `EXTEND.md` or `BASE_IMAGE_MAX_WORKERS`.

## Environment Variables

| Variable | Description |
| --- | --- |
| `IMAGE_GEN_API_KEY` | the only API key variable |
| `IMAGE_GEN_DEFAULT_MODEL` | default model key |
| `BASE_IMAGE_MAX_WORKERS` | override batch worker limit |
| `BASE_IMAGE_GEN_CONCURRENCY` | batch concurrency |
| `BASE_IMAGE_GEN_START_INTERVAL_MS` | minimum delay between batch task starts |

Supported `.env` locations:

- `<cwd>/.image-skills/.env`
- `$HOME/.image-skills/.env`

Existing environment variables are not overwritten.

## Agent Notes

1. On first use in a new environment, run `npm run ensure-ready -- --project . --workflow <workflow>` from this skill directory before generation. Do not skip this just because direct API calls appear to work.
2. Confirm that `IMAGE_GEN_API_KEY` is available, directly or through `.image-skills/.env`.
3. If the model is missing, complete the model-selection flow above before running the CLI.
4. Single-task mode usually requires `--image`, but `--dry-run` may omit it. If `--ref` is present, the script uses the image-to-image endpoint.
5. Polling behavior is: `waiting` / `processing` -> keep polling, `succeed` -> use `images`, `failed` -> inspect `msg`.
6. The selected model key is printed to stderr for easier debugging.

## User Feedback During Generation

- **Before generation starts**: tell the user what's about to happen, including the **model being used** (e.g., "Generating your cover with Model X, roughly 30 seconds"). Never start silently.
- **Model disclosure**: always mention the model name/key when generation begins. This helps the user understand quality expectations and makes later model switching easier.
- **Batch generation**: estimate the total count and rough time. As images complete, show them incrementally — do not wait for the entire batch to finish before showing anything.
- **On failure**: do not dump error codes. Translate the failure into a user-level explanation and suggest a fix (e.g., "This model is temporarily unavailable — want to try another one?").
- **On timeout**: if polling exceeds the expected window, proactively inform the user (e.g., "Taking longer than usual, still waiting") rather than staying silent.
- **On completion**: send/display the image immediately. Never output just a filename — the user must see the actual image. If the platform supports file sending, send the file. If it supports inline rendering, render inline. If neither works, provide a download URL.

## Interaction Rules (Mandatory)

These rules apply to every user interaction. They are not optional guidelines.

1. **Never show commands, file paths, config syntax, or schema details to the user.** All tool execution is silent.
2. **Never ask the user to edit config files.** The agent writes `EXTEND.md`, `MODELS.json`, and other configs on behalf of the user.
3. **Ask one question at a time.** Do not present all dimensions at once. Lead with the highest-impact choice.
4. **Offer concrete options with descriptions**, not raw technical names. If presenting styles, describe what each looks like.
5. **Always disclose the model and estimated time** when generation starts.
6. **Show images directly** when delivering results. Never respond with only file paths or file names. If the channel supports sending image files, send the file. If it supports inline display, display inline. If neither is possible, provide a clickable download link. A bare filename like `output.png` is never acceptable delivery.
7. **On iteration**, only re-generate what changed. Do not restart the entire workflow.
8. **If tools are missing**, propose to install them on behalf of the user — never expose tool names, PATH errors, or ask the user to run install commands. See "Missing Tools" below.
9. **Default to Nano Banana 2 unless the user asks otherwise.** When no default model is configured yet, initialize `GEMINI_3_1_FLASH_IMAGE` (Nano Banana 2), tell the user that it is now the default for this skill, and explicitly remind them they can switch any time if another model fits better.
10. **Reply in the user's language.** Match the language the user is using for questions, status updates, explanations, and delivery messages unless they explicitly ask you to switch.
11. **On first trigger, run readiness before anything else.** Check dependencies and `IMAGE_GEN_API_KEY` status before asking the user to choose a model or before starting generation.
12. **Handle API keys as secrets.** Offer to configure `IMAGE_GEN_API_KEY` locally for the user, avoid asking them to paste secrets into normal chat when a local write is possible, and never repeat or quote the key back.

## Re-run Behavior

- Running `main.ts` with the same `--image` path overwrites the existing output file.
- `--batchfile` re-processes all tasks in the batch; completed images are overwritten.
- `--dry-run` never writes files regardless of prior state.
- Polling a previously completed `taskId` returns the cached result without re-generating.

## Definition of Done

- `IMAGE_GEN_API_KEY` is configured and a default model is set in `EXTEND.md`.
- The CLI produces a valid image file (or a correct `--dry-run` request body).
- Batch mode completes all tasks and reports per-task status via `--json`.
- Failed tasks include a user-readable explanation, not just an error code.
- The model name is disclosed to the user before generation starts.

## Missing Tools

If `bun` and `npx` are both unavailable in the current environment:

1. **Do not expose tool names or PATH errors to the user.** Never mention "bun", "npx", "PATH", or installation commands.
2. **Propose installation in user-friendly terms.** Say something like: "I need to set up a small runtime to generate images — OK if I install it now? Takes about 30 seconds."
3. **On user approval**, run `npm install -g bun` or the platform-appropriate installer silently. Report success or failure in plain language.
4. **If the user declines or installation fails**, explain honestly: "I can't generate images in this environment right now. You can ask your admin to set up the image toolchain, or try again in a different environment."
5. **Never ask the user to install tools themselves** or paste terminal commands.

## Troubleshooting

See [references/weryai-platform.md](references/weryai-platform.md). For failed calls, compare the request and response with [call history](https://weryai.com/api/history).

don't have the plugin yet? install it then click "run inline in claude" again.

extracted implicit decision flows into explicit if-else branches, documented all external connections and env vars, added edge-case handling for timeouts and auth failures, clarified input/output contracts with file paths and data formats, and reformatted into implexa's 6-component structure while preserving original intent and procedure.

Image Generation (`image-generation-2`)

generate images from text prompts or image-to-image transformations through a single CLI gateway. handles async task submission, polling for completion, and automatic file download. use this when the user asks to generate, create, draw, visualize, or transform images.

this skill uses a fixed weryai gateway backend (not configurable provider switching). one api key handles auth; multiple models are aggregated server-side. design goal is smooth automation for single images, batch jobs, style presets, reference images, and reusable defaults.

intent

use this skill when the user requests image generation, visual creation, artwork, or visual transformation from text or reference images. covers text-to-image workflows (simple prompts, multi-file briefs, style presets) and image-to-image workflows (reference-based edits). do not use this for video generation, 3D models, or non-visual content.

inputs

environment & auth

IMAGE_GEN_API_KEY: required. weryai api bearer token. get it at https://weryai.com/api/keys. may live in env, <cwd>/.image-skills/image-generation/.env, or $HOME/.image-skills/image-generation/.env. the agent handles setup; user never pastes keys into chat.
IMAGE_GEN_DEFAULT_MODEL: optional. fallback model key when neither cli --model nor EXTEND.md specifies one. defaults to GEMINI_3_1_FLASH_IMAGE (nano banana 2) on first init.
BASE_IMAGE_MAX_WORKERS, BASE_IMAGE_GEN_CONCURRENCY, BASE_IMAGE_GEN_START_INTERVAL_MS: optional concurrency tuning for batch mode.

local config files

{skill-dir}/.image-skills/image-generation/EXTEND.md: optional. holds default_model, batch.max_workers, default_ar, default quality. agent writes this on first run.
{skill-dir}/.image-skills/image-generation/MODELS.json: optional bundled model registry, refreshed on demand via npm run discover-image-models.
{skill-dir}/.image-skills/image-generation/STYLE-PRESETS.md or via --style option: named preset bundles (e.g., cinematic, editorial, watercolor).

gateway contract

base url: https://api.weryai.com (hardcoded).
text-to-image endpoint: POST /v1/generation/text-to-image. requires model, prompt, aspect_ratio. optional: negative_prompt, quality, webhook_url, use_web_search, caller_id.
image-to-image endpoint: POST /v1/generation/image-to-image. same as above plus images[] (array of public https urls or local paths, converted to urls before submission).
status endpoint: GET /v1/generation/{taskId}/status. response includes task_status (waiting, processing, succeed, failed), data.images[] (on succeed), and msg (on failed).
auth header: Authorization: Bearer {IMAGE_GEN_API_KEY}.

runtime dependencies

node 16+ and npm 7+.
bun or npx (bun preferred for speed). if neither available, agent proposes installation.
{skill-dir}/scripts/main.ts: the only execution entrypoint (typescript, runs via bun or bun via npx).

user inputs (task parameters)

--prompt or --promptfiles: text source (inline string or assembled from markdown files).
--image / -o / --output: destination file path (.png inferred if no extension). required for single-task mode unless --dry-run.
--model / -m: model key (overrides defaults). must be present (cli, EXTEND.md, or IMAGE_GEN_DEFAULT_MODEL).
--style: named preset (e.g., cinematic, editorial); optional.
--ar / --aspect-ratio: WxH ratio (e.g., 16:9, 1:1). defaults to 1:1 or EXTEND.md value.
--size: WIDTHxHEIGHT pixels; infers ar if --ar omitted.
--quality: normal or 2k, mapped to gateway resolution.
--negative-prompt: text to exclude.
--ref / --reference: reference image (public https url or local path). triggers image-to-image.
--n: number of images to generate (default 1).
--webhook-url: optional callback on completion.
--use-web-search: true to include web search in prompt understanding.
--caller-id: optional task metadata.
--batchfile: json file with array of tasks (parallel mode).
--jobs: worker count for batch (default 1).
--poll-interval-ms, --poll-timeout-ms: control polling rhythm (defaults: 2000ms, 300000ms).
--no-download: skip file write after success.
--dry-run: print request body without api call.
--json: output batch report as json.

external network calls

weryai api calls go to https://api.weryai.com over https. requires outbound internet.
reference images must be public https urls or local files (converted to data or urls before submission). http is insecure and discouraged; local file:// and data: urls are rejected by the gateway.
optional webhook callbacks are fired on completion.

procedure

step 1: first-trigger readiness check (one-time per project/environment)

input: project path, workflow name (optional).

process:

run npm run ensure-ready -- --project . --workflow <workflow> from skill directory.
this checks local dependencies (node, npm, bun or npx), reads doctor report, and auto-runs bootstrap if scripts are missing.
if dependencies are missing, agent proposes installation ("i need to set up a small runtime , ok if i install it now? takes about 30 seconds"). on approval, silently install via npm install -g bun or platform equivalent. on decline or failure, inform user honestly that image generation is not available in this environment.
if IMAGE_GEN_API_KEY is not found in env or .env, agent tells user that image generation needs an api key, offers to configure it now, and writes it to .image-skills/image-generation/.env on user's behalf (never ask user to edit files manually).

output: toolchain ready, api key in env or local .env.

skip conditions: skip this step on subsequent runs if readiness was already confirmed for this project.

step 2: confirm or initialize default model

input: --model cli flag (optional), EXTEND.md (optional), IMAGE_GEN_DEFAULT_MODEL env (optional).

process:

check model priority: --model > EXTEND.md default_model > IMAGE_GEN_DEFAULT_MODEL.
if a model is already set, use it and move to step 3.
if no model is set anywhere:
- agent initializes EXTEND.md with default_model: GEMINI_3_1_FLASH_IMAGE (nano banana 2).
- tell user: "i've set nano banana 2 as the default model for this workspace. you can switch anytime if another model fits better."
- continue to step 3.
(optional) if the user immediately requests a different model, update EXTEND.md in place and skip the re-initialization message.

output: active model_key confirmed. user told which model will be used.

step 3: understand the visual request

input: user's brief or rough idea.

process:

if the request is clear and specific (e.g., "a cat in a meadow"), skip to step 4.
if the request is vague or underspecified (e.g., "something about nature"), use the brief helper:
- run npm run build-visual-brief -- --workflow <workflow> --topic "<topic>".
- this prompts the user with questions on: photorealistic vs illustration, color palette, shot language (close-up, wide, etc.), composition, aspect ratio, text density.
- gather answers and synthesize into a stronger prompt.
do not tell user to "go read the docs"; the agent completes the flow and retries.

output: clarified prompt, chosen aspect ratio, any style preference.

step 4: assemble the full generation request

input: prompt (inline or from files), model, style preset (optional), reference image (optional), aspect ratio, quality, batch file (optional).

process:

prompt source: if --promptfiles given, concatenate files in order. else use --prompt string. validate length against gateway limits (usually 2000 chars for prompt, 1000 for negative_prompt).
reference image: if --ref given, convert local file to public url (or accept existing https url). validate that urls are https or will be rejected. image-to-image endpoint will be used.
style preset: if --style given, look up in STYLE-PRESETS.md and merge preset attributes (e.g., style prompt suffix, quality boost) into final request.
aspect ratio: use --ar, else infer from --size, else use EXTEND.md default_ar, else default 1:1. validate format (e.g., 16:9 or 1:1).
quality & resolution: map --quality normal|2k or --imageSize 1K|2K|4K to gateway resolution field. if --resolution is explicit, use it.
batch mode (if --batchfile): load json, resolve task paths relative to batch file directory, apply job concurrency limit from cli or env.
single task mode: build request object with model, prompt, ar, negative_prompt, quality, webhook_url, use_web_search, caller_id.

output: validated request body, ready for submission.

step 5: user disclosure before generation starts

input: model name, estimated time.

process:

before api call, tell user what's about to happen. example: "generating your cover with nano banana 2, roughly 30 seconds."
always disclose the model name (not just the key). helps user understand quality expectations and eases later model switching.
for batch: estimate total count and rough time. example: "generating 5 images in parallel, about 1-2 minutes total."

output: user informed and ready. no silent generation.

step 6: submit task to gateway and poll

input: request body, poll interval (default 2000ms), poll timeout (default 300000ms = 5 min).

process:

dry-run mode (if --dry-run): print request json to stdout and exit. do not call api.
single task:
- POST request to weryai text-to-image or image-to-image endpoint (determined by presence of ref).
- receive taskId and task_status: waiting.
- poll GET /v1/generation/{taskId}/status every poll-interval-ms (default 2000ms).
- on each poll: check task_status.
  - waiting or processing: continue polling.
  - succeed: extract data.images[] (array of image urls), move to step 7.
  - failed: extract msg (error description), move to step 8.
- if polling exceeds poll-timeout-ms without completion, timeout and error out.
- retry logic: up to 3 retries on transient network errors (not on auth or param errors).
- if timeout occurs mid-poll, inform user proactively: "taking longer than usual, still waiting..."
batch mode (if --batchfile):
- submit all tasks in parallel (up to --jobs concurrency).
- track per-task status in a queue.
- poll each task independently.
- as tasks complete (succeed or fail), report incrementally (do not wait for all to finish).
- after all tasks done, compile json report (if --json requested).

output: either images[] urls on succeed, or error msg on fail/timeout.

step 7: download and display images

input: image urls from data.images[] (or skip if --no-download).

process:

if --no-download, skip to step 9.
for each url in images[]:
- fetch image via https with timeout, backoff retry, and minimum-payload validation.
- retry up to 3 times on network errors.
- write to --image path. if multiple images generated (e.g., --n 2), append index (e.g., out_1.png, out_2.png).
- ensure directory exists.
delivery to user:
- if the platform supports file sending, send the file directly.
- if the platform supports inline rendering, render inline.
- if neither, provide a clickable download url.
- never respond with only a filename like output.png. user must see or access the actual image.

output: image files written to disk and/or displayed to user.

step 8: handle errors

input: error code or message from gateway or network.

process:

do not dump raw error codes. translate to user-level explanation.
common errors:
- model does not exist or similar model-key error: refresh model registry (npm run discover-image-models) and offer to retry with a different model.
- auth error (invalid key, expired): offer to reconfigure IMAGE_GEN_API_KEY.
- prompt too long: suggest shortening or breaking into batch.
- reference image unreachable: check that url is https and public.
- network timeout: "taking longer than usual" or "temporarily unavailable, want to try again?"
- api rate limit: suggest retrying in a few seconds or batch with longer intervals.
example user-friendly error: "this model is temporarily unavailable. want to try nano banana 2 instead?" (not: "error 1001: model status unknown").

output: user-facing explanation and suggested action.

step 9: report completion

input: final status (success, error, or timeout).

process:

success: show generated images (step 7). for batch, list all completed images and any failures.
error: explain what went wrong and next step (retry, switch model, check image url, etc.).
batch json report (if --json): output array of tasks with per-task status, taskId, output_path, images[], msg (on error).

output: images displayed, report logged, user informed of outcome.

decision points

decision: model not configured

if --model, EXTEND.md default_model, and IMAGE_GEN_DEFAULT_MODEL are all absent,

then initialize with nano banana 2 (GEMINI_3_1_FLASH_IMAGE), write to EXTEND.md, tell user it's now the default, and remind them they can switch anytime.

else use the highest-priority model from the three sources.

decision: request is vague

if user's brief is underspecified (no details on style, color, composition, aspect ratio),

then run npm run build-visual-brief to ask guiding questions and refine the prompt.

else skip the brief helper and proceed to request assembly.

decision: reference image provided

if --ref is present,

then use image-to-image endpoint (POST /v1/generation/image-to-image) and validate that ref is a public https url or local file convertible to url.

else use text-to-image endpoint.

decision: polling timeout

if polling exceeds poll-timeout-ms (default 300000ms),

then inform user proactively ("still waiting, taking longer than expected"), and either continue polling or fail with user-friendly explanation.

else complete normally when task_status is succeed or failed.

decision: batch or single task

if --batchfile is provided,

then enter batch mode: load json, parallelize up to --jobs concurrency, report per-task status incrementally, and optionally output json report.

else single task mode: one prompt, one image output, simple poll-and-download flow.

decision: dependencies or api key missing on first run

if npm run ensure-ready detects missing tools or no IMAGE_GEN_API_KEY,

then agent proposes installation or configuration (never asks user to edit files or run commands), on approval proceeds silently, on decline explains honestly that image generation is unavailable.

else readiness confirmed, proceed to model check.

decision: model error during generation

if gateway returns "model does not exist" or similar model-key error,

then agent refreshes model registry and offers to recommend and retry with a different model, before asking user for any platform-side info.

else proceed with error explanation and next-step guidance.

decision: style preset applied

if --style is provided,

then look up preset in STYLE-PRESETS.md and merge preset attributes (e.g., style suffix, quality boost, aspect ratio override) into request.

else use base request without preset.

output contract

single task success

file location: {output-path} specified by --image or inferred from batch task. directory created if needed. multiple images indexed (out_1.png, out_2.png, etc.) if --n > 1.
file format: png (lossless). gateway always returns png.
file size: typically 500kb-5mb depending on resolution and model.
metadata: none persisted in file (gateway does not embed task id or model).

single task failure

no file written.
error message: user-friendly explanation (not raw code). example: "the model you requested isn't available right now. want to try a different one?"

batch mode (with `--json`)

output: json array of task results, each with:
- id: task id from batch json.
- taskId: gateway task id (for manual polling if needed).
- status: succeed, failed, or timeout.
- output_path: final file path (on succeed).
- images: array of image urls (on succeed).
- msg: error explanation (on failed or timeout).
- timestamp: unix ms when polling finished.

dry-run mode

output: json request body (pretty-printed to stdout).
no file written, no api call made.

polling state (intermediate)

cached result: if a prior run already fetched and stored a result for the same taskId, polling that id again returns the cached result without re-generating.

ui delivery

images must be shown to the user, not just filenames. if platform supports file sending, send file. if it supports inline render, render inline. if neither, provide clickable download url. a bare filename output is not acceptable.

outcome signal

user knows generation succeeded when

image is visible in the chat or platform (sent file, inline render, or url).
model name is mentioned in the completion message (e.g., "your image is ready, created with nano banana 2").
for batch: all tasks either completed or explicitly failed, with per-task status visible.

user knows generation failed when

error explanation is plain language, not a code (e.g., "this model isn't available right now" instead of "error 1001").
next step is suggested (e.g., "want to try a different model?" or "check that your image url is public and starts with https://").
no partial images shown if the full task failed.

user knows a retry is needed when

timeout message: "still waiting, taking longer than expected. want to try again?"
rate limit message: "generation is temporarily limited. trying again in a few seconds..."
auth message: "your api key may have expired. want to reconfigure it now?"

workflow is complete when

api key and default model are confirmed.
generated image(s) are visible to the user.
no blocking errors remain.

credits

original source: clawhub. enriched and restructured to implexa standards: explicit decision logic, edge-case handling (timeouts, rate limits, auth expiry, empty result sets), input/output contracts, and outcome signals.

Image Generation

related skills

Image Generation (image-generation-2)

intent

inputs

environment & auth

local config files

gateway contract

runtime dependencies

user inputs (task parameters)

external network calls

procedure

step 1: first-trigger readiness check (one-time per project/environment)

step 2: confirm or initialize default model

step 3: understand the visual request

step 4: assemble the full generation request

step 5: user disclosure before generation starts

step 6: submit task to gateway and poll

step 7: download and display images

step 8: handle errors

step 9: report completion

decision points

decision: model not configured

decision: request is vague

decision: reference image provided

decision: polling timeout

decision: batch or single task

decision: dependencies or api key missing on first run

decision: model error during generation

decision: style preset applied

output contract

single task success

single task failure

batch mode (with --json)

dry-run mode

polling state (intermediate)

ui delivery

outcome signal

user knows generation succeeded when

user knows generation failed when

user knows a retry is needed when

workflow is complete when

credits

Image Generation (`image-generation-2`)

batch mode (with `--json`)