Baoyu Image Gen

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream, Replicate and Agnes APIs...

installs

stars

karma

SkillRank score ↗

6.7/ 10dry-run tested

evaluated by implexa, claude-haiku-4-5 · 2026-05-30

baoyu-image-gen wraps ten image generation apis (openai, google, azure, dashscope, replicate, and others) behind a unified cli. supports text-to-image, reference images, batch generation, and multi-provider fallback with persistent preference loading.

structure

7.0

trigger phrases

8.0

procedure

6.0

edge cases

view original SKILL.md from clawhubclick to expand

---
name: baoyu-image-gen
description: AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream, Replicate and Agnes APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
version: 2.1.0
metadata:
  openclaw:
    homepage: https://github.com/JimLiu/baoyu-skills#baoyu-image-gen
    requires:
      anyBins:
        - bun
        - npx
---

# Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包), Replicate and Agnes.

## User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

1. **Prefer built-in user-input tools** exposed by the current agent runtime — e.g., `AskUserQuestion`, `request_user_input`, `clarify`, `ask_user`, or any equivalent.
2. **Fallback**: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
3. **Batching**: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete `AskUserQuestion` references below are examples — substitute the local equivalent in other runtimes.

## Script Directory

`{baseDir}` = this SKILL.md's directory. All `scripts/...` paths below are relative to `{baseDir}`. Main script: `{baseDir}/scripts/main.ts`. Batch payload helper: `{baseDir}/scripts/build-batch.ts`. Resolve `${BUN_X}`: prefer `bun`; else `npx -y bun`; else suggest `brew install oven-sh/bun/bun`.

## Step 0: Load Preferences ⛔ BLOCKING

This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.

Check these paths in order; first hit wins:

| Path | Scope |
|------|-------|
| `.baoyu-skills/baoyu-image-gen/EXTEND.md` | Project |
| `${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md` | XDG |
| `$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md` | User home |

- **Found** → load, parse, apply. If `default_model.[provider]` is null → ask model only.
- **Not found** → run first-time setup (`references/config/first-time-setup.md`) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.

Legacy compatibility: if `.baoyu-skills/baoyu-imagine/EXTEND.md` exists and the new path doesn't, the runtime renames it to `baoyu-image-gen`. If both exist, the runtime leaves them alone and uses the new path.

**EXTEND.md keys**: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: `references/config/preferences-schema.md`.

## Usage

Minimum working examples — see `references/usage-examples.md` for the full set including per-provider invocations and batch mode.

### Identity-preserving reference prompts

When the user wants a real person/character/object preserved from reference images, do **not** replace the reference with a long generic description. Prefer short, hard identity-preservation language:

- "Use the person/object in the reference image(s) as the same identity. Do not redesign it or create a similar-looking new subject."
- "Only change scene, clothing, pose, lighting, rendering style, and composition. Keep the face/proportions/hair/key accessories/overall identity from the references."
- If using multiple references, state that they are the same subject and should jointly define identity.

Pitfall: long descriptions like "young East Asian woman, oval face, clear eyes..." can cause the model to synthesize a new person matching the description instead of preserving the referenced person.

```bash
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio and high quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k

# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference image
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro

# OpenAI GPT Image 2
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-2

# Codex CLI (uses logged-in Codex subscription — no OPENAI_API_KEY required; requires `codex` on PATH)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider codex-cli --ar 16:9

# Batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4

# Build a batch file from outline.md + prompts/ (e.g. baoyu-article-illustrator output)
${BUN_X} {baseDir}/scripts/build-batch.ts --outline outline.md --prompts prompts --output batch.json --images-dir attachments
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
```

## Reference-Image Identity Preservation

When the user wants a person/object preserved from reference images:

- Prefer a small curated set of existing source references (usually 2–4) over many images; large multi-megabyte refs can destabilize streaming providers.
- Make the prompt say the references are the same subject and the output must use that identity. Avoid long generic facial-feature descriptions that can cause the model to synthesize a new similar-looking person.
- Do not use newly generated outputs as references unless the user explicitly asks; generated refs compound drift.
- If results become too polished or influencer-like, reduce stylized refs and add explicit anti-beautification constraints (no face slimming, eye enlargement, heavy makeup, commercial travel shoot, over-smoothing).
- If the subject should look younger/older, preserve the face and express age through clothing, posture, scene, and styling; do not ask the model to change facial identity.

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required in single-image mode) |
| `--batchfile <path>` | JSON batch file for multi-image generation |
| `--jobs <count>` | Worker count for batch mode (default: auto, max from config, built-in default 10) |
| `--provider google\|openai\|azure\|openrouter\|dashscope\|zai\|minimax\|jimeng\|seedream\|replicate\|codex-cli\|agnes` | Force provider (default: auto-detect; `codex-cli` is never auto-selected — must be pinned via CLI or EXTEND.md) |
| `--model <id>`, `-m` | Model ID — see provider references for defaults and allowed values |
| `--ar <ratio>` | Aspect ratio (`16:9`, `1:1`, `4:3`, …) |
| `--size <WxH>` | Explicit size (e.g., `1024x1024`; for `gpt-image-2`, width/height must be multiples of 16, max edge 3840px, ratio no wider than 3:1) |
| `--quality normal\|2k` | Quality preset (default: `2k`) |
| `--imageSize 1K\|2K\|4K` | Image size for Google/OpenRouter (default: from quality) |
| `--imageApiDialect openai-native\|ratio-metadata` | OpenAI-compatible endpoint dialect — use `ratio-metadata` for gateways that expect aspect-ratio `size` plus `metadata.resolution` |
| `--ref <files...>` | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0, DashScope `wan2.7-image-pro`/`wan2.7-image`. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0, or any DashScope model outside the `wan2.7-image*` family |
| `--n <count>` | Number of images. Replicate requires `--n 1` (single-output save semantics) |
| `--json` | JSON output |

## Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key |
| `OPENROUTER_API_KEY` | OpenRouter API key |
| `GOOGLE_API_KEY` | Google API key |
| `DASHSCOPE_API_KEY` | DashScope API key |
| `ZAI_API_KEY` (alias `BIGMODEL_API_KEY`) | Z.AI API key |
| `MINIMAX_API_KEY` | MiniMax API key |
| `REPLICATE_API_TOKEN` | Replicate API token |
| `JIMENG_ACCESS_KEY_ID`, `JIMENG_SECRET_ACCESS_KEY` | Jimeng (即梦) Volcengine credentials |
| `ARK_API_KEY` | Seedream (豆包) Volcengine ARK API key |
| `<PROVIDER>_IMAGE_MODEL` | Per-provider model override (`OPENAI_IMAGE_MODEL`, `GOOGLE_IMAGE_MODEL`, `DASHSCOPE_IMAGE_MODEL`, `ZAI_IMAGE_MODEL`/`BIGMODEL_IMAGE_MODEL`, `MINIMAX_IMAGE_MODEL`, `OPENROUTER_IMAGE_MODEL`, `REPLICATE_IMAGE_MODEL`, `JIMENG_IMAGE_MODEL`, `SEEDREAM_IMAGE_MODEL`, `AGNES_IMAGE_MODEL`) |
| `AZURE_OPENAI_DEPLOYMENT` (alias `AZURE_OPENAI_IMAGE_MODEL`) | Azure default deployment |
| `<PROVIDER>_BASE_URL` | Per-provider endpoint override |
| `AZURE_API_VERSION` | Azure image API version (default `2025-04-01-preview`) |
| `JIMENG_REGION` | Jimeng region (default `cn-north-1`) |
| `OPENAI_IMAGE_API_DIALECT` | `openai-native` \| `ratio-metadata` |
| `OPENROUTER_HTTP_REFERER`, `OPENROUTER_TITLE` | Optional OpenRouter attribution |
| `BAOYU_IMAGE_GEN_MAX_WORKERS` | Override batch worker cap |
| `BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY` | Per-provider concurrency (e.g., `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`; for codex-cli use `BAOYU_IMAGE_GEN_CODEX_CLI_CONCURRENCY`) |
| `BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS` | Per-provider start-gap |
| `BAOYU_CODEX_IMAGEGEN_BIN` | Override the codex-imagegen wrapper path for the `codex-cli` provider (default: bundled `scripts/codex-imagegen/main.ts`; accepts `.ts` or legacy `.sh`/binary) |
| `BAOYU_CODEX_IMAGEGEN_CACHE_DIR` | Enable idempotency cache for the `codex-cli` provider (off by default) |
| `BAOYU_CODEX_IMAGEGEN_TIMEOUT_MS` | Per-attempt `codex exec` timeout for the `codex-cli` provider (default: 300000 ms) |
| `BAOYU_CODEX_IMAGEGEN_RETRIES` | Wrapper-side retry attempts on retryable errors for the `codex-cli` provider (default: 2) |
| `BAOYU_CODEX_IMAGEGEN_LOG_FILE` | Append JSONL diagnostic log for the `codex-cli` provider |

**Load priority**: CLI args > EXTEND.md > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`

### Codex/ChatGPT OAuth is not an OpenAI API key

`--provider openai --model gpt-image-2` uses the standard OpenAI Images API (`/v1/images/generations` or `/v1/images/edits`) and requires `OPENAI_API_KEY`. A Codex or ChatGPT desktop login is a different entitlement and is not a drop-in replacement for `OPENAI_API_KEY`; do not paste a Codex OAuth token into `OPENAI_API_KEY` or only set `OPENAI_BASE_URL` to a Codex backend.

If the user wants to use their Codex subscription / GPT Image 2 entitlement without an OpenAI API key, route through a Codex-native backend instead of this skill's `openai` provider:

- In Codex runtime: use the native `imagegen` skill/tool.
- In non-Codex runtimes with `codex` CLI installed and logged in: use `baoyu-image-gen --provider codex-cli` (preferred — it gives you the same retry / cache / batch flow as every other provider). The provider spawns the bundled `scripts/codex-imagegen/main.ts`; the same code lives upstream at `packages/baoyu-codex-imagegen/src/main.ts` for standalone callers.
- In Hermes runtimes with a native `image_generate` tool: use that tool as a fallback, and state whether reference images were passed directly or reconstructed from extracted traits.

Do not modify the existing `openai` provider to silently consume Codex OAuth. The first-class Codex-CLI path is the dedicated `codex-cli` provider, which has its own auth (Codex login), route (`codex exec`), request shape, and tests. See `references/codex-oauth-vs-openai-api-key.md`.

## Model Resolution

Priority (highest → lowest) applies to every provider:

1. CLI flag `--model <id>`
2. EXTEND.md `default_model.[provider]`
3. Env var `<PROVIDER>_IMAGE_MODEL`
4. Built-in default

For OpenAI, the built-in default is `gpt-image-2`. `gpt-image-1.5`, `gpt-image-1`, and GPT Image snapshots remain selectable with `--model` or `OPENAI_IMAGE_MODEL`.

For Azure, `--model` / `default_model.azure` is the Azure deployment name. `AZURE_OPENAI_DEPLOYMENT` is the preferred env var; `AZURE_OPENAI_IMAGE_MODEL` is kept as a backward-compatible alias. If your Azure deployment is named after the underlying model, use `gpt-image-2`; otherwise use the exact custom deployment name.

EXTEND.md overrides env vars: if EXTEND.md sets `default_model.google: "gemini-3-pro-image"` and the env var sets `GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image`, EXTEND.md wins.

**Display model info before each generation**:

- `Using [provider] / [model]`
- `Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL`

## OpenAI-Compatible Gateway Dialects

`provider=openai` means the auth and routing entrypoint is OpenAI-compatible. It does **not** guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set `default_image_api_dialect` in EXTEND.md, `OPENAI_IMAGE_API_DIALECT`, or `--imageApiDialect`:

- `openai-native`: pixel `size` (`1536x1024`) and native OpenAI quality fields
- `ratio-metadata`: aspect-ratio `size` (`16:9`) plus `metadata.resolution` (`1K|2K|4K`) and `metadata.orientation`

Use `openai-native` for the OpenAI native API or strict clones; try `ratio-metadata` for compatibility gateways in front of Gemini or similar models. Current limitation: `ratio-metadata` applies only to text-to-image; reference-image edits still need `openai-native` or a provider with first-class edit support.

## Provider-Specific Guides

Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:

| Provider | Reference |
|----------|-----------|
| DashScope (Qwen-Image families, custom sizes) | `references/providers/dashscope.md` |
| Z.AI (GLM-Image, cogview-4) | `references/providers/zai.md` |
| MiniMax (image-01, subject-reference) | `references/providers/minimax.md` |
| OpenRouter (multimodal models, `/chat/completions` flow) | `references/providers/openrouter.md` |
| Replicate (nano-banana, Seedream, Wan) | `references/providers/replicate.md` |
| Codex CLI (wraps bundled `scripts/codex-imagegen/`; Codex login, no `OPENAI_API_KEY`) | `references/providers/codex-cli.md` |
| Agnes (agnes-image-2.1-flash, reference-image support) | `references/providers/agnes.md` |

## Provider Selection

1. `--ref` provided + no `--provider` → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax → Agnes (MiniMax's subject reference is more specialized toward character/portrait consistency)
2. `--provider` specified → use it (if `--ref`, must be google/openai/azure/openrouter/replicate/seedream/minimax/codex-cli/agnes)
3. Only one API key present → use that provider
4. Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream → Agnes
5. `codex-cli` is **never auto-selected** — set `default_provider: codex-cli` in EXTEND.md or pass `--provider codex-cli`. It spawns `codex exec` via the bundled `scripts/codex-imagegen/main.ts` TS entrypoint (run with `bun`) and uses the user's Codex subscription (no `OPENAI_API_KEY`). Requires `codex` on `PATH` with an active `codex login`.

## Quality Presets

| Preset | Google imageSize | OpenAI size | OpenRouter size | Replicate resolution | Use case |
|--------|------------------|-------------|-----------------|----------------------|----------|
| `normal` | 1K | 1024px target | 1K | 1K | Quick previews |
| `2k` (default) | 2K | 2048px target | 2K | 2K | Covers, illustrations, infographics |

Google/OpenRouter `imageSize` can be overridden with `--imageSize 1K|2K|4K`.

For OpenAI native `gpt-image-2`, `normal` maps to `quality=medium` and a low-latency valid size near the requested aspect ratio; `2k` maps to `quality=high` and 2048px-class sizes such as `2048x2048`, `2048x1152`, or `1152x2048`. Use explicit `--size` for valid custom or 4K outputs, e.g. `3840x2160`.

## Aspect Ratios

Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`.

- Google multimodal: `imageConfig.aspectRatio`
- OpenAI: `gpt-image-2` uses the closest valid custom size for the requested ratio; older GPT Image and DALL·E models use their closest supported fixed size
- OpenRouter: `imageGenerationOptions.aspect_ratio`; if only `--size <WxH>` is given, the ratio is inferred
- Replicate: behavior is model-specific — `google/nano-banana*` uses `aspect_ratio`, `bytedance/seedream-*` uses documented Replicate ratios, Wan 2.7 maps `--ar` to a concrete `size`
- MiniMax: official `aspect_ratio` values; if `--size <WxH>` is given without `--ar`, sends `width`/`height` for `image-01`

## Generation Mode

**Default**: sequential. **Batch parallel**: enabled automatically when `--batchfile` contains 2+ pending tasks.

| Situation | Prefer | Why |
|-----------|--------|-----|
| One image, or 1-2 simple images | Sequential | Lower coordination overhead, easier debugging |
| Multiple images with saved prompt files | Batch (`--batchfile`) | Reuses finalized prompts, applies shared throttling/retries, predictable throughput |
| Each image still needs its own reasoning / prompt writing / style exploration | Subagents | Work is still exploratory, each needs independent analysis |
| Input is `outline.md` + `prompts/` (e.g. from `baoyu-article-illustrator`) | Batch — use `{baseDir}/scripts/build-batch.ts` to assemble the payload | The outline + prompt files already contain everything needed |

Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.

**Parallel behavior**:

- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts
- Override with `--jobs <count>`
- Each image retries up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons

## Error Handling

- Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint

### Codex image2 fallback

If `--provider openai --model gpt-image-2` fails because `OPENAI_API_KEY` is missing but the current runtime has a native image-generation backend or the repo-level `codex-imagegen` wrapper is available, use that path rather than leaving the user waiting. Be explicit about whether the fallback is true reference-image generation or only a text-prompt reconstruction from extracted visual traits. See `references/codex-image2-fallback.md`.

## References

| File | Content |
|------|---------|
| `references/usage-examples.md` | Extended CLI examples across providers and batch mode |
| `references/codex-oauth-vs-openai-api-key.md` | Why Codex/ChatGPT OAuth image2 entitlement is not usable through baoyu-image-gen's standard OpenAI API-key provider |
| `references/codex-image2-fallback.md` | Practical fallback behavior when OpenAI API credentials are absent but Codex/native image generation is available |
| `references/providers/dashscope.md` | DashScope families, sizes, limits |
| `references/providers/zai.md` | Z.AI GLM-image / cogview-4 |
| `references/providers/minimax.md` | MiniMax image-01 + subject reference |
| `references/providers/openrouter.md` | OpenRouter multimodal flow |
| `references/providers/replicate.md` | Replicate supported families + guardrails |
| `references/providers/agnes.md` | Agnes (agnes-image-2.1-flash) sizing, refs, and limits |
| `references/config/preferences-schema.md` | EXTEND.md schema |
| `references/config/first-time-setup.md` | First-time setup flow |

## Extension Support

Custom configurations via EXTEND.md. See Step 0 for paths and schema.

don't have the plugin yet? install it then click "run inline in claude" again.

split original into 6 required components (intent, inputs, procedure, decision points, output contract, outcome signal), formalized external connections + env vars, broke out blocking step 0 with explicit load paths, documented reference-image support matrix as decision table, added edge cases (missing keys, rate limits, file validation, batch validation, gateway dialect mismatch, legacy config migration), maintained original author intent + all provider options.

Image Generation (AI SDK)

Item: Baoyu Image Gen
Rating: 6.7
Author: Implexa

intent

generate images via text-to-image or reference-image editing across 10+ providers (OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI, MiniMax, Jimeng, Seedream, Replicate). use this when the user asks to "generate", "create", or "draw" images. supports sequential single-image generation, parallel batch mode from saved prompt files, custom aspect ratios, quality presets, and identity-preserving reference workflows. requires a blocking first-time setup to load provider/model config.

inputs

external connections

Service	Auth	Setup
OpenAI Images API	`OPENAI_API_KEY`	standard OpenAI API key (not Codex/ChatGPT OAuth). scope: image generation + editing. required for `gpt-image-2`
Azure OpenAI	`AZURE_OPENAI_API_KEY`	deployment name in `AZURE_OPENAI_DEPLOYMENT` or `AZURE_OPENAI_IMAGE_MODEL`. optional `AZURE_API_VERSION` (default `2025-04-01-preview`)
Google Gemini	`GOOGLE_API_KEY`	free tier available. required for multimodal image generation + reference images on Google
OpenRouter	`OPENROUTER_API_KEY`	optional `OPENROUTER_HTTP_REFERER` + `OPENROUTER_TITLE` for attribution
DashScope (Qwen)	`DASHSCOPE_API_KEY`	Alibaba cloud. supports `qwen-image-2.0-pro`, `wan2.7-image`, `wan2.7-image-pro` with custom sizes
Z.AI	`ZAI_API_KEY` or `BIGMODEL_API_KEY`	GLM-Image, cogview-4. bigmodel.cn service
MiniMax	`MINIMAX_API_KEY`	`image-01` model. subject-reference support for character consistency
Replicate	`REPLICATE_API_TOKEN`	async polling model. supports Seedream, nano-banana, Wan families
Jimeng (即梦)	`JIMENG_ACCESS_KEY_ID` + `JIMENG_SECRET_ACCESS_KEY`	Volcengine credentials. optional `JIMENG_REGION` (default `cn-north-1`)
Seedream (豆包)	`ARK_API_KEY`	Volcengine ARK platform. no reference-image support in versions 3.0/4.0/5.0

env var load priority (highest first): CLI flags > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env

configuration files

EXTEND.md (first-time setup): blocking, must exist before generation. search in order (first hit wins):

.baoyu-skills/baoyu-image-gen/EXTEND.md (project scope)
${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md (XDG scope)
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md (home scope)

if missing, run first-time setup via AskUserQuestion (or fallback to numbered prompts) to collect: default provider, default model, quality preset, output save location. save EXTEND.md then continue.

EXTEND.md schema (keys):

default_provider: provider slug
default_model.[provider]: model ID for each provider (null = ask per generation)
default_quality: normal or 2k (default 2k)
default_aspect_ratio: ratio like 16:9, 1:1 (optional)
default_image_size: 1K|2K|4K (optional)
default_image_api_dialect: openai-native or ratio-metadata (OpenAI-compatible gateways only)
batch_worker_cap: max parallel workers (default 10, capped by provider limits)
provider_concurrency.[provider]: per-provider override, e.g. provider_concurrency.replicate: 5
provider_start_interval_ms.[provider]: gap between request starts in batch mode

legacy compatibility: if .baoyu-skills/baoyu-imagine/EXTEND.md exists and the new path doesn't, rename it to baoyu-image-gen automatically. if both exist, use the new path.

context & environment

{baseDir}: directory containing this SKILL.md. main script is {baseDir}/scripts/main.ts
runtime binary: prefer bun; fallback to npx -y bun; else suggest brew install oven-sh/bun/bun
cwd: user's current working directory (relative paths resolve here)

CLI parameters

flag	description
`--prompt <text>`, `-p`	single prompt string
`--promptfiles <files...>`	read + concatenate prompt from multiple files
`--image <path>`	output image path (required for single-image mode)
`--batchfile <path>`	JSON batch file (triggers parallel mode if 2+ tasks)
`--jobs <count>`	worker count for batch mode (default: auto, capped by config + provider limits)
`--provider <slug>`	force provider (openai, azure, google, openrouter, dashscope, zai, minimax, jimeng, seedream, replicate). auto-select if omitted
`--model <id>`, `-m`	model ID (default: loaded from EXTEND.md or env vars)
`--ar <ratio>`	aspect ratio (`16:9`, `1:1`, `4:3`, `3:4`, `9:16`, `2.35:1`)
`--size <WxH>`	explicit pixel size (e.g. `1024x1024`). for gpt-image-2: width/height multiples of 16, max edge 3840, ratio no wider than 3:1
`--quality normal\|2k`	quality preset (default from EXTEND.md or `2k`)
`--imageSize 1K\|2K\|4K`	Google/OpenRouter image size (overrides quality preset)
`--imageApiDialect openai-native\|ratio-metadata`	wire format for OpenAI-compatible gateways
`--ref <files...>`	reference images (PNG/JPG). supported: Google multimodal, OpenAI edits, Azure OpenAI edits, OpenRouter multimodal, Replicate, MiniMax subject-reference, Seedream 5.0/4.5/4.0, DashScope `wan2.7-image*`. not supported: Jimeng, Seedream 3.0, Z.AI
`--n <count>`	number of images (default 1). Replicate: use `--n 1` (single output semantics)
`--json`	output results as JSON (instead of file + human text)

procedure

step 0: load preferences (blocking)

image generation is blocked until EXTEND.md exists. this step must complete first.

search for EXTEND.md in order: .baoyu-skills/baoyu-image-gen/EXTEND.md, then ${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md, then $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md.
- input: paths to check (in priority order)
- output: EXTEND.md file content (if found) or "not found" signal
if EXTEND.md found, parse and load:
- input: EXTEND.md file path + content
- output: default_provider, default_model dict, default_quality, batch_worker_cap, per-provider limits
- side effect: if any field is null/missing, mark as "ask user per generation"
if EXTEND.md not found, run first-time setup:
- input: user interaction (AskUserQuestion tool or fallback numbered prompts)
- prompt for: provider choice, model ID (nullable), quality preset, output directory
- output: generated EXTEND.md written to project scope (.baoyu-skills/baoyu-image-gen/EXTEND.md)
- side effect: create parent directories if needed
- do not proceed to step 1 (resolve model) until EXTEND.md is written and readable
legacy compat check:
- input: paths .baoyu-skills/baoyu-imagine/EXTEND.md (old) and .baoyu-skills/baoyu-image-gen/EXTEND.md (new)
- if old exists and new doesn't, rename old to new
- if both exist, do nothing (use new path)

step 1: resolve provider + model

apply priority (highest first): CLI --provider / --model > EXTEND.md default_model.[provider] + default_provider > env vars <PROVIDER>_IMAGE_MODEL + OPENAI_BASE_URL etc. > built-in defaults
- input: CLI args, EXTEND.md, environment
- output: chosen provider slug, chosen model ID
if no provider specified and only one API key is set, use that provider (e.g. if only OPENAI_API_KEY exists, default to openai)
- input: environment variables (check for OPENAI_API_KEY, GOOGLE_API_KEY, AZURE_OPENAI_API_KEY, etc.)
- output: inferred provider, or fallback to built-in priority (Google > OpenAI > Azure > OpenRouter > DashScope > Z.AI > MiniMax > Replicate > Jimeng > Seedream)
if --ref images provided and no --provider, auto-select from reference-capable providers in order: Google > OpenAI > Azure > OpenRouter > Replicate > Seedream > MiniMax
- input: --ref <files> flag presence
- output: provider slug
display chosen provider + model before generation:
- output to user: "Using [provider] / [model]"
- output to user: "Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL"
if model is null (e.g. default_model.openai: null in EXTEND.md), ask user:
- input: user interaction (AskUserQuestion or fallback prompt)
- output: model ID for this generation

step 2: validate + finalize request parameters

parse --prompt, --promptfiles, or detect from prior context
- input: CLI --prompt <text> or --promptfiles <file1> <file2> ... or agent context (e.g. previous prompt in conversation)
- output: final prompt string (concatenate file contents if multiple files)
- edge case: if prompt is empty, error with hint to provide text
resolve output path:
- input: CLI --image <path> (single mode) or --batchfile <path> (batch mode)
- output: absolute output path (relative paths resolved from cwd)
- edge case: if directory doesn't exist, create it with parents
if --ar or --size provided, validate against provider + model rules:
- input: --ar <ratio>, --size <WxH>, provider, model
- lookup provider reference docs for size/ratio constraints (e.g. gpt-image-2: multiples of 16, max edge 3840, ratio no wider than 3:1)
- output: validated width, height, or error + hint
- edge case: invalid ratio (e.g. --ar 99:1) => warn, fall back to default
resolve quality preset to size parameters:
- input: --quality normal|2k (or EXTEND.md / env default)
- output: target size for provider (e.g. OpenAI normal => quality=medium + 1024px target, 2k => quality=high + 2048px target; Google normal => imageSize: 1K, 2k => imageSize: 2K)
- table: see provider-specific docs for exact mapping
if --ref <files> provided, validate reference images:
- input: reference file paths, provider, model
- check: provider + model support references (cross-check table in decision points)
- check: file exists, readable, is PNG or JPG
- output: validated reference paths, or error + hint "this provider doesn't support reference images"
- edge case: reference file > 5MB => warn user about potential streaming instability
if using references for identity preservation, refine prompt language:
- input: user prompt, reference image count
- inject constraint: "Use the person/object in the reference image(s) as the same identity. Do not redesign it or create a similar-looking new subject."
- avoid: long generic descriptions ("young East Asian woman, oval face...") that can synthesize new subjects
- output: augmented prompt string

step 3: select generation mode

check if batch mode:
- input: CLI --batchfile <path>
- if yes, go to step 4 (batch parallel)
- if no, go to step 5 (sequential)

step 4: batch parallel generation (if `--batchfile` present)

parse batch JSON file:
- input: --batchfile <path>
- schema: array of objects, each with prompt, promptfiles, output, ar, size, ref (optional fields)
- output: task queue
- edge case: malformed JSON => error with file/line hint
resolve each task's prompt, output, parameters (same as step 2, per task):
- input: batch task object
- output: validated prompt, output path, ar/size/quality for this task
initialize worker pool:
- input: --jobs <count> (default: auto, capped by EXTEND.md batch_worker_cap or built-in 10)
- input: provider concurrency limits (e.g. BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY)
- output: worker pool size (e.g. 4)
apply per-provider throttling (batch mode only):
- input: provider, worker count, EXTEND.md provider_start_interval_ms.[provider]
- default intervals tuned for throughput without RPM bursts: e.g. Replicate 200ms, OpenAI 100ms
- output: request start scheduling (stagger requests to stay under rate limits)
enqueue tasks and start workers:
- input: validated task queue, worker pool
- for each task, spawn a generation request (call step 5 sub-logic for single image)
- output: stream of results + errors
implement per-task retry logic:
- input: task, max retries (3)
- if generation fails: retry up to 3 times, backoff exponential or jitter
- output: success result or final error
aggregate results:
- input: all task results
- output: summary (e.g. "8/10 images generated, 2 failed: [task1: rate limit, task2: invalid prompt]")

step 5: sequential single-image generation

resolve OpenAI-compatible gateway dialect (if provider is openai-compatible):
- input: EXTEND.md default_image_api_dialect, env var OPENAI_IMAGE_API_DIALECT, CLI --imageApiDialect
- output: dialect choice (openai-native or ratio-metadata)
- default: openai-native
- if using ratio-metadata: size is expressed as aspect ratio string (e.g. "16:9") + metadata.resolution + metadata.orientation (instead of pixel "1536x1024")
call provider API:
- input: provider, model, prompt, size/ar, quality, reference images (if any), dialect
- do: POST to provider endpoint
- include auth headers (Bearer token, API key, etc.) from environment or config
- if --ref provided and provider is openai/azure: call /v1/images/edits (image editing endpoint) instead of /v1/images/generations
- output: raw API response (image URL or binary)
- error cases: see error handling section
save output:
- input: API response (PNG/JPG data or URL)
- if response is URL: download + save to --image <path>
- if response is binary: write directly to --image <path>
- output: file at --image <path>, with confirmation message
- side effect: create parent directories if needed
if --json flag: output structured result instead:
- input: generation metadata (provider, model, prompt, size, file path, duration)
- output: JSON object { "status": "success", "provider": "openai", "model": "gpt-image-2", "image_path": "...", "prompt": "...", "generation_time_ms": 1234 }

decision points

reference image support matrix

provider	text-to-image	reference images (edits)	notes
Google	yes	yes (multimodal prompt)	best for identity preservation
OpenAI	yes	yes (`/v1/images/edits`, PNG/JPG)	gpt-image-2 required
Azure OpenAI	yes	yes (edits, PNG/JPG only)	same as OpenAI
OpenRouter	yes	yes (multimodal models only)	check model docs
DashScope	yes	yes (`wan2.7-image*` only)	not `qwen-image-2.0-pro`
Z.AI	yes	no	no reference support
MiniMax	yes	yes (subject-reference mode)	best for character consistency
Jimeng	yes	no	not supported
Seedream	yes	maybe (v5.0/v4.5/v4.0)	not v3.0, not SeedEdit 3.0
Replicate	yes	yes (model-specific)	nano-banana, Seedream families

if user provides --ref but provider doesn't support it:

error message: "provider [X] doesn't support reference images. use --provider [Y] instead, or drop --ref"
suggest best option from table above

if user wants identity-preserved output but no provider selected:

auto-select provider from reference-capable list (priority: Google > OpenAI > Azure > OpenRouter > Replicate > Seedream > MiniMax)

missing API key

if required key is missing (e.g. OPENAI_API_KEY for openai provider):

error with setup instructions: "set OPENAI_API_KEY env var or add it to EXTEND.md"
if runtime has native image-generation tool or repo-level codex-imagegen wrapper available, offer: "use native image generation tool instead? (may not support references)"
do not generate

if Codex/ChatGPT OAuth is mistakenly provided as OPENAI_API_KEY:

error: "OPENAI_API_KEY must be OpenAI API key, not Codex OAuth token"
suggest: "use native image-gen tool or codex-imagegen wrapper instead"

generation failure + retry

if API returns error (rate limit, server error, invalid input):

in batch mode: retry up to 3 times (exponential backoff)
in sequential mode: retry once, then error with provider-specific hint
error cases:
- rate limit (429): wait 30s, retry
- invalid prompt (malicious/unsafe): error, suggest revision
- auth error (401/403): error, check API key + env var setup
- server error (5xx): retry once, then error
- network timeout (>60s): error with "provider is slow, try again later"

empty/missing result

if API returns success but no image data:

error: "provider returned empty response"
in batch mode: log as failed task
in sequential mode: error + hint "check prompt + provider status"

user has no API keys

if all provider API keys are missing:

block with: "no API keys found. set one of: OPENAI_API_KEY, GOOGLE_API_KEY, AZURE_OPENAI_API_KEY, etc."
link to EXTEND.md setup guide
offer to run first-time setup now (step 0)

multiple API keys present

if user has 2+ API keys but no --provider specified:

auto-select by priority: Google > OpenAI > Azure > OpenRouter > DashScope > Z.AI > MiniMax > Replicate > Jimeng > Seedream
output: "auto-selected [provider]. override with --provider "

invalid aspect ratio

if --ar 99:1 or unsupported ratio:

warn user: "aspect ratio 99:1 not supported. using default 1:1"
proceed with default (do not block generation)

reference image file issues

if --ref <path> file not found:

error: "reference file not found: [path]"

if --ref <path> is not PNG/JPG:

error: "reference must be PNG or JPG. got [format]"

if --ref <files> total size > 10MB:

warn: "reference images > 10MB may cause streaming issues. continue? (y/n)"
input: user yes/no
if no, error "aborted"
if yes, proceed

batch file validation

if --batchfile has 0 tasks or malformed JSON:

error: "batch file must contain array of tasks with prompt + output"

if --batchfile has 1 task:

warn: "batch mode with 1 task. use sequential (--prompt + --image) for simplicity"
proceed in sequential mode anyway (no performance penalty, clearer UX)

if --batchfile has 2+ tasks:

proceed to batch parallel mode (step 4)

OpenAI-compatible gateway dialect mismatch

if using --provider openai with a gateway that expects ratio-metadata but --imageApiDialect openai-native is set:

API error (likely invalid size format)
error message: "gateway expects --imageApiDialect ratio-metadata. set default_image_api_dialect in EXTEND.md or use --imageApiDialect ratio-metadata"

legacy EXTEND.md rename

if .baoyu-skills/baoyu-imagine/EXTEND.md exists and .baoyu-skills/baoyu-image-gen/EXTEND.md doesn't:

rename: old => new
output: "migrated config from baoyu-imagine to baoyu-image-gen"

if both exist:

do nothing, use new path, output: "both old + new config exist. using baoyu-image-gen"

output contract

file output (single mode)

location: path from --image <path> (relative to cwd)
format: PNG or JPG (provider-dependent; most return PNG)
permissions: readable by user, writable if overwriting
side effects: parent directories created if needed

stdout output (single mode)

human-readable confirmation: "Image saved to [path]"
metadata line: "Generated with [provider]/[model] in [N]ms"
if --json flag: JSON object instead (see below)

JSON output (`--json` flag)

{
  "status": "success",
  "provider": "openai",
  "model": "gpt-image-2",
  "prompt": "A cat",
  "image_path": "/absolute/path/to/image.png",
  "size": "1024x1024",
  "quality": "2k",
  "generation_time_ms": 2500,
  "api_response_time_ms": 2400
}

batch output

format: newline-delimited JSON or CSV table (implementation choice)
location: stdout (unless --output <file> specified)
rows: one per task, with status (success/failed), image path, generation time, error reason (if failed)
summary line: e.g. "Batch: 8/10 completed, 2 errors"

example:

task_id,status,output_path,generation_time_ms,error
1,success,./img1.png,2100,
2,success,./img2.png,2250,
3,failed,,0,rate limit after 2 retries
...

error output

format: plain text to stderr
structure: single-line error message + optional hint/fix
example: "Error: OPENAI_API_KEY not set. Set env var or add to EXTEND.md"

outcome signal

user knows the skill worked when:

single image generated:
- file appears at --image <path> with readable PNG/JPG
- stdout shows: "Image saved to [path]" + "Generated with [provider]/[model]"
batch images generated:
- multiple files written to output directory (one per task)
- stdout summary: "Batch: N/N completed, 0 errors" or "Batch: N/M completed, X errors"
- failed images listed with retry count + final error reason
JSON output requested:
- stdout is valid JSON (can be parsed + used in downstream scripts)
reference image used correctly:
- output visually preserves identity/likeness from reference(s)
- subject proportions/face/key features consistent with refs
- only composition/clothing/pose/lighting changed as intended
identity drift detected:
- output looks significantly different from reference (bad sign)
- user can re-run with smaller reference set + explicit constraint: "keep identity from reference"
error encountered:
- clear error message printed (not cryptic API response)
- actionable fix suggested (e.g. "set OPENAI_API_KEY" or "use --provider openai instead")

credits: original skill by clawhub (github.com/JimLiu/baoyu-skills). enriched to Implexa quality standards.

Baoyu Image Gen

related skills

Image Generation (AI SDK)

intent

inputs

external connections

configuration files

context & environment

CLI parameters

procedure

step 0: load preferences (blocking)

step 1: resolve provider + model

step 2: validate + finalize request parameters

step 3: select generation mode

step 4: batch parallel generation (if --batchfile present)

step 5: sequential single-image generation

decision points

reference image support matrix

missing API key

generation failure + retry

empty/missing result

user has no API keys

multiple API keys present

invalid aspect ratio

reference image file issues

batch file validation

OpenAI-compatible gateway dialect mismatch

legacy EXTEND.md rename

output contract

file output (single mode)

stdout output (single mode)

JSON output (--json flag)

batch output

error output

outcome signal

step 4: batch parallel generation (if `--batchfile` present)

JSON output (`--json` flag)