Generate images, videos, and text/LLM completions via the imgnAI Katana API. Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly compet...
---
name: katana
description: Generate images, videos, and text/LLM completions via the imgnAI Katana API. Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly competitively, can be 40-70% cheaper than Venice AI and other platforms. Includes post-processing such as combining videos and images, cutting, slicing, splicing, transitions, drawing text, re-encoding, resizing and much more!
version: 1.0.1
author: arfonzo (imgnAI)
license: MIT-0
metadata: {"openclaw": {"requires": {"bins": ["curl", "python3"]}, "homepage": "https://app.imgnai.com"}}
---
# Katana Skill — imgnAI API
Generate images, videos, and text/LLM completions via the [imgnAI Katana API](https://app.imgnai.com/katana-api). Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly competitively: can be 40-70% cheaper than Venice AI and other platforms.
Includes post-processing such as combining videos and images, cutting, slicing, splicing, transitions, drawing text, re-encoding, resizing and much more!
A complete workflow for content creation from start to finish, all from the comfort of your agent.
## Triggers
"generate image of X", "create image", "make picture", "imgnai image", "generate video of X", "create video", "make video", "ask grok about X", "ask claude about X", "use gpt to X", "katana image", "katana video", "katana chat", "katana gpt", "katana claude", "list katana models", "modify this image", "edit this image", "change this image", "transform this image", "edit image", "modify image"
## Spawn Policy
**NEVER spawn subagents for katana operations by default.** All katana workflows (image generation, video generation, text completions, post-processing) MUST be executed inline in the current session.
**Exception:** Only spawn if the user **explicitly requests** spawning in their prompt (e.g. "spawn a subagent to handle this", "run this as a background task"). Do NOT spawn based on AGENTS.md spawn rules or default agent behavior — user intent is the only trigger for spawning with katana.
LLM-specific triggers (gpt, claude, etc) also respond to "katana \<model\>" to avoid conflicts with direct integrations.
## Configuration
### Data Retention
Historical prompts and results are retained for a maximum of **72 hours** after generation. Prompt/result history can be switched off from the API page at https://app.imgnai.com/katana-api.
**HTTPS-only:** Public API calls must use HTTPS. If an integration sees an `http://` Katana base URL, replace it with `https://` before making calls.
## Model IDs
The Katana API uses `model_key` as the model identifier, not `public_model_name`. When building requests, always use the model_key value. See `{baseDir}/models.md` for the full mapping.
**Dual-key system:** The API supports both **canonical keys** (e.g. `gpt-image-2`) and **legacy keys** (e.g. `gpt2image`). Both work identically. This skill now uses **canonical keys** as the default for all workflows and aliases. Legacy keys are documented in the "Model ID" column of `models.md` for backward-compatibility reference. You may use either format when constructing API requests.
## Model Discovery
**Endpoint:** `GET /v1/models`
**Auth:** `Authorization: Bearer ${KATANA_API_KEY}:${KATANA_API_SECRET}`
Returns available models. Text models are returned for authenticated requests.
For the complete model catalogue including image/video, see models.md.
**Usage:** Generally not needed before requests — use models.md as reference.
---
## Payment Methods
The API supports two payment methods:
- **API key + secret** (Bearer auth) — used by this skill, preferred
- **x402 micropayment** — NOT used by this skill
Note: x402 text requests must be non-streaming. This skill only uses API-key auth.
---
- **API Base URL:** `https://kat.imgnai.com`
- **API Reference:** https://kat.imgnai.com/llms.txt
- **Model catalogue:** `{baseDir}/models.md`
- **Skill directory:** Resolve dynamically from this file's location as `{baseDir}`. Most agent frameworks resolve this automatically.
## Credentials
**Secrets file:** Store your API key and secret in a file (default: `~/.openclaw/secrets/katana.env`):
```
KATANA_API_KEY=your_key_here
KATANA_API_SECRET=your_secret_here
```
Create with `chmod 600`. Get your credentials from https://app.imgnai.com/katana-api.
**Loading:** All curl examples in this skill use `.` (dot) source to load credentials into the shell environment:
```bash
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}"
```
Override the default path with the `KATANA_SECRETS_FILE` environment variable.
### ⚠️ Credential Security (MANDATORY)
**NEVER display secrets in tool output.** The `.` source command loads credentials into shell variables silently — no output is produced. This is the correct and secure approach.
**Banned patterns:**
- `cat ~/.openclaw/secrets/katana.env`
- `KATANA_API_KEY=kat_live_... curl ...`
- Any form of reading secrets into tool output
**If credential loading fails:** Fix the secrets file path or contents. Do NOT bypass security by hardcoding values.
---
## Optional Dependencies
These are not required for core API usage but enable additional features:
| Binary | Needed for | Install |
|--------|------------|--------|
| `jq` | JSON parsing for API responses | `apt install jq` / `brew install jq` |
| `python3` | Payload building, JSON parsing fallback | Pre-installed on most systems |
| `ffmpeg` | Video post-processing (trim, join, effects) | `apt install ffmpeg` / `brew install ffmpeg` |
`jq` or `python3` is needed for JSON parsing. Post-processing requires `ffmpeg`.
---
## ⚠️ MANDATORY ROUTING — DO NOT SKIP
**Before ANY generation or post-processing request, you MUST load the correct workflow file:**
| Task | Load this file |
|------|---------------|
| Image generation | `{baseDir}/workflows/image.md` |
| Video generation | `{baseDir}/workflows/video.md` |
| Text/LLM generation | `{baseDir}/workflows/text.md` |
| Post-processing (ffmpeg, combine, text overlay, etc) | `{baseDir}/workflows/post-process.md` |
**NEVER attempt a generation without loading the workflow file first.**
**NEVER guess parameters — the workflow file has the exact steps.**
---
## Cost Reporting (ALL Requests)
**After every generation (text, image, video), send a separate follow-up message with a cost summary.** Include all relevant details from the response:
```
📊 Katana Summary
Model: gemma-4-26b-a4b (Anonymized)
Request: bf11cf04-8747-480e-a7f7-7d6cb092c614
Tokens: 42 in / 176 out (text only)
Cost: 0.1 credits (~$0.001)
Privacy: Anonymized
Time: ~3s
```
For image/video, replace tokens with dimensions/duration as relevant. Always compute cost in USD using the current credit rate (see `{baseDir}/models.md`).
---
## Model Aliases (Quick Reference)
### Text/LLM
| User says | API model ID |
|---|---|
| grok | `grok-4-3` |
| gpt / gpt-5 | `gpt-5-5` |
| claude / claude-opus | `claude-opus-4-7` |
| claude-sonnet | `claude-sonnet-4-6` |
| claude-haiku | `claude-haiku-4-5` |
### Image
| User says | API model ID |
|---|---|
| default / imgnai | `gen` |
| anime | `ani` |
| gpt-image | `gpt-image-2` |
| nano | `nano-banana-2` |
| flux | `flux-2-pro` |
| pink | `pink-image` |
### Video
| User says | API model ID |
|---|---|
| default / seedance | `seedance-2-0-fast` |
| seedance-hd | `seedance-2-0` |
| ltx | `ltx-2-3` |
| kling | `kling-3-0-kling30` |
| veo | `veo3-1` |
If the user specifies an exact model ID, pass it through directly. Full alias tables in `{baseDir}/models.md`.
---
## Pre-Submission Confirmation (MANDATORY)
Before submitting ANY generation request, present a summary (model, cost in credits AND dollars, details, prompt) and **wait for user confirmation**. See each workflow file for details.
**NO EXCEPTIONS:** There is no urgency override. "just do it", "generate now", /katana, or any other shortcut does NOT skip confirmation. ALWAYS present summary and wait for explicit approval before submitting.
---
## Error Protocol
**ONE-ATTEMPT RULE: Every paid API call gets exactly ONE attempt per turn. If the tool result is lost, missing, or empty after a submission — STOP. Report to the user that the result was lost. Wait for user confirmation before retrying. NEVER retry a paid API call silently, even if the result seems to have vanished.**
**STRICT — NO SILENT RETRIES.** Every error stops. Every retry needs approval. Tool-result-loss (result never arrives, empty, or vanishes) is a hard-stop condition equal to a visible error. See each workflow file for details.
- ANY error or tool-result-loss → STOP, report to user (what happened, credits charged, total across attempts)
- Tool-result-loss (result shows 'missing tool result' or similar synthetic error) → the API call likely already succeeded. STOP. Report to user. Do NOT retry the same request.
**Terminal submission responses:** If the submission response itself is terminal (`status: "failed"`, `status: "rejected"`, or all response items rejected) — do NOT poll. Report the returned `responses[].error` or top-level error to the user immediately.
- **Upstream errors are terminal.** If the API returns `upstream_error` (404, 500, etc), do NOT try a different model, do NOT retry with different parameters, do NOT submit to another endpoint. STOP and report the error to the user. You MAY suggest recommended next steps or options (e.g. "model X returned 404 — want me to try model Y instead?"), but ANY proposed plan requires explicit user approval before execution.
- Propose fix → wait for explicit user approval
- Banned: automatic retries, debug/test requests, parameter changes without telling user, lying about call counts, silent retries on lost results
---
## Concurrency Guard
**NEVER submit a new request while any previous request is still processing.** One request in flight at a time — no exceptions.
- Before submitting, verify no pending/processing requests exist
- If a previous request is still running (poll returns incomplete), either wait for it, ask the user to cancel, or ask the user to approve submitting a concurrent request
- This applies across ALL endpoints: text, image, and video
---
## Immediate Status Updates
After submitting async generations (image/video), deliver a confirmation to the user BEFORE starting the poll loop. Include the model, cost, and request_id.
## Async Polling
Image and video generations are asynchronous. After submitting, poll manually.
**Poll command:**
```bash
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'X-API-Key: %s\nX-API-Secret: %s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s "https://kat.imgnai.com/v1/generation-requests/${REQUEST_ID}" -H @"$_H" && rm -f "$_H"
```
**Raw response:** Pipe to `jq '.'`.
**Formatted:** Pipe to:
```bash
python3 -c "import sys,json; d=json.load(sys.stdin); r=d.get('responses',[]); [print(f\"Status: {r[i].get('status','?')}\\nURL: {a.get('original_data_url','')}\\nDims: {a.get('width','?')}x{a.get('height','?')}\\nCredits: {r[i].get('metadata',{}).get('credits_spent','?')}\\nExpires: {a.get('expires_at','')}\") for i in range(len(r)) for a in r[i].get('output_assets',[])]"
```
**`wait` parameter:** `wait=true` is available for convenience (blocks until complete), but production integrations should prefer polling with `wait=false` (the default).
**Polling pattern:** Extract `poll_after_seconds` from the submission response and use it as the initial polling interval. If the poll response includes a new `poll_after_seconds`, use that for the next interval. Fall back to polling every 30 seconds for the first 5 minutes, then every 60 seconds if `poll_after_seconds` is absent or null.
**Agent responsibility:** The agent decides how to schedule polls (intervals, background tasks, etc). Do not use long-running background processes — use single polls at intervals.
### ⚠️ Polling Pattern Constraints
**Keep `.` source and `curl` in the same command chain.** Shell `sleep` or `process poll` between commands breaks the env var loading — env vars are lost.
**Correct:** Single exec call containing the full chain (see poll command above).
**Wrong:** Separating `.` + `sleep` + `curl` into different exec calls.
**If your agent cannot chain commands:** Use the agent-native polling mechanism (background exec, process poll, etc) with the full command as one unit.
**Response handling for completed polls:**
- Extract `original_data_url` for delivery (full-resolution)
- Extract dimensions from `responses[].output_assets[].width/height` (NOT from submission response)
- Extract credits from `responses[].metadata.credits_spent`
- Extract expiry from `responses[].output_assets[].expires_at` — display in user's local timezone in delivery summary
## Response Handling
### Dimensions (IMPORTANT)
1. **Submission response** (`requests[].width/height`) — PREVIEW dimensions, NOT actual output size.
2. **Completed poll response** (`responses[].output_assets[].width/height`) — ACTUAL output dimensions.
**Always report dimensions from the completed poll response, never from the submission acknowledgement.**
### URL Fields (IMPORTANT)
- **`original_data_url`** — full-resolution original. **Always use this for delivery.**
- **`url`** — may be a compressed/reduced version. Do NOT use for delivery.
- **`thumbnail_image_url`** — small thumbnail only.
### CLIP Tag Metadata
`responses[].output_assets[].metadata.tags` contains CLIP-derived tags with confidence scores (e.g. `{"tag": "ceramic_mug", "confidence": 0.94}`). Only available on in-house imgnAI models — external/provider-hosted models return no CLIP-tag metadata.
### Model Normalization
Completed media responses may normalize `requests[].model` and `responses[].metadata.model` (e.g. legacy key → canonical key). Use `GET /v1/models` for canonical display names.
### Item Timestamps
- `responses[].started_at` — item-level processing start timestamp
- `responses[].completed_at` — item-level processing end timestamp
- `created_at` — top-level request submission timestamp
- `updated_at` — top-level request last-modified timestamp
- Useful for tracking actual generation time per item
### Asset Type Fields
- `responses[].output_assets[].kind` — asset type (e.g. `"image"`, `"video"`)
- `responses[].output_assets[].mime_type` — MIME type (e.g. `"image/png"`, `"video/mp4"`)
### ⚠️ Anti-Pattern Warning
Data is under `responses[].output_assets[]` — do NOT look for `results[].url`. That is NOT the Katana response shape.
### ⚠️ `output` Object
Do NOT send an `output` object for ordinary integrations. This is for internal/special use only.
## Payload Submission
Build the JSON payload in a temp file (required for large payloads and to avoid secrets in process listings):
```python
import json, tempfile
payload = {"requests": [{"type": "video", "model": "seedance-2-0-fast", "prompt": "<prompt>", "duration_seconds": 5, "aspect_ratio": "16:9"}]}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump(payload, f)
tmpfile = f.name
print(tmpfile)
```
### Secure Header Pattern
Write auth headers to a temp file to keep secrets out of `/proc/*/cmdline`. Source credentials at the start of each command chain.
**Image/Video requests** (X-API-Key + X-API-Secret):
```bash
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Content-Type: application/json\nX-API-Key: %s\nX-API-Secret: %s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s -X POST "https://kat.imgnai.com/v1/generation-requests?wait=false" -H @"$_H" -d @"$tmpfile" && rm -f "$_H"
```
**Text/LLM requests** (Bearer auth):
```bash
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Content-Type: application/json\nAuthorization: Bearer %s:%s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s -X POST "https://kat.imgnai.com/v1/chat/completions" -H @"$_H" -d @"$tmpfile" && rm -f "$_H"
```
Parse the JSON response. Extract `request_id`. Deliver confirmation to the user (model, cost, request_id).
---
## Credit Balance
```bash
. "${KATANA_SECRETS_FILE:-$HOME/.openclaw/secrets/katana.env}" && _H=$(mktemp) && chmod 600 "$_H" && printf 'Authorization: Bearer %s:%s\n' "$KATANA_API_KEY" "$KATANA_API_SECRET" > "$_H" && curl -s "https://kat.imgnai.com/v1/me/balance" -H @"$_H" && rm -f "$_H"
```
Calls `GET /v1/me/balance`. The API returns `credits` as a decimal string. Converts to USD using current credit rate (see `{baseDir}/models.md`).
---
## reference_assets (Typed Asset System)
`reference_assets` is an alternative to `image_urls`/`video_image_data` for providing media inputs with explicit role labels. Each asset has a `kind` and either `url` or `base64_data`.
### Image models
Accepted image-like asset kinds:
- `source_image` — primary source/input image
- `image` — generic image input
- `mask` — mask for inpainting/editing
- `style_reference` — style transfer reference
- `start_frame` — starting frame for animation
Example:
```json
{
"reference_assets": [
{"kind": "source_image", "url": "https://example.com/product.png"},
{"kind": "style_reference", "base64_data": "data:image/jpeg;base64,..."}
]
}
```
### Video models
Image kinds for video:
- `style_reference`, `reference_image`, `image` — map to video reference images
Audio kinds for video:
- `audio`, `source_audio`, `reference_audio`, `audio_reference` — map to audio reference inputs
Example:
```json
{
"reference_assets": [
{"kind": "reference_image", "url": "https://example.com/person.png"},
{"kind": "audio", "url": "https://example.com/voice.mp3"}
]
}
```
---
## llms.txt Freshness
This skill was built from the Katana API llms.txt reference document.
**Last synced:** 2026-05-23
**llms.txt URL:** https://kat.imgnai.com/llms.txt
**Stored checksum:** `9d149806987cf662f44c9a901eea98068359fb1529e757a56d6f593d7e56e33c`
### Pre-generation check
Before submitting ANY generation request, check if the llms.txt checksum has been verified in the last 24 hours. If stale:
1. Fetch: `curl -s https://kat.imgnai.com/llms.txt`
2. Compute SHA256: `sha256sum` (Linux) or `shasum -a 256` (macOS)
3. Compare to stored checksum
4. If CHANGED → tell the user: "The Katana API model list has been updated since this skill was last synced. This may include new models, pricing changes, or removed models. Would you like me to check for changes and update the skill?"
5. If user says YES → parse new llms.txt, update models.md, update checksum and date
6. If user says NO → proceed with current models
7. Update last-checked date regardless
### llms.txt update process
When llms.txt changes, compare old vs new **holistically**. Diff the full documents — do not limit the review to a predefined checklist. Document ALL changes found and update all affected skill files accordingly: `models.md`, `SKILL.md`, workflow files.
**DO NOT auto-update without user confirmation.**
**Explicit approval rule:** During the llms.txt update process, always summarise ALL changes found and ask the user for explicit permission before updating any skill files (models.md, SKILL.md, workflow files). Do not auto-update without confirmation.
---
## Delivery Patterns
Deliver the generated media to the user via your agent's messaging/file capability. Include: model name, resolution/dimensions, credits, dollar cost, description, and the full-res URL (`original_data_url`).
### ⚠️ URL Display (MANDATORY)
ALL image and video deliveries MUST include the **full download URL** (`original_data_url`) as clickable text in the delivery message — not just the inline media attachment.
Users need the URL to:
- Download the full-resolution file
- Share it externally
- Archive it before expiry
**Include ALL URLs returned** — `original_data_url`, `thumbnail_image_url`, `final_frame_image_url`, `thumbnail_silent_video_mp4_url` — any URL the API returns for the asset. Do not assume the user only wants one.
Example:
```
MEDIA:https://k.imgnai.com/abc123.mp4
🔗 Full-res: https://k.imgnai.com/abc123.mp4
🖼️ Thumbnail: https://k.imgnai.com/def456.jpg
🎞️ Silent preview: https://k.imgnai.com/ghi789.mp4
⏰ Expires: Fri 16 May 2026, 14:00 BST
```
### ⚠️ Expiry Warning (MANDATORY)
ALL image and video generation summaries MUST include:
1. The **expiry timestamp** extracted from `responses[].output_assets[].expires_at` in the completed poll response — convert to user's local timezone for display
2. A clear warning that content must be downloaded before expiry if the user wishes to keep it
Example format:
```
⏰ Expires: Fri 16 May 2026, 14:00 BST — download before expiry if you need it long-term.
```
**Do NOT calculate expiry manually.** The API provides `expires_at` in the poll response. Use it directly. The 72h retention window may change server-side; `expires_at` is always authoritative.
For text/LLM: return the model's response verbatim. Then send a separate follow-up message with a cost summary per the "Cost Reporting" section above. Text completions do not require an expiry warning (no media URL to expire).
---
*Last updated: 2026-05-23*
don't have the plugin yet? install it then click "run inline in claude" again.