Generate images and videos with AIsa. Supports four image models (Google Gemini 3 Pro Image, Alibaba Wan 2.7 image + image-pro, ByteDance Seedream) and four...
---
name: media-gen
description: 'Generate images and videos with AIsa. Supports four image models (Google Gemini 3 Pro Image, Alibaba Wan 2.7 image + image-pro, ByteDance Seedream) and four Wan video variants (wan2.6/2.7 Γ t2v/i2v). One API key; the bundled client routes each model to the correct endpoint automatically. Use when: the user needs AI image or video generation workflows.'
author: AIsa
version: 1.0.0
license: MIT
homepage: https://aisa.one
source: https://github.com/baofeng-tech/agent-skills-io/tree/main/targetSkills/media-gen
user-invocable: true
primaryEnv: AISA_API_KEY
requires:
bins:
- python3
env:
- AISA_API_KEY
metadata:
aisa:
emoji: π¬
requires:
bins:
- python3
env:
- AISA_API_KEY
primaryEnv: AISA_API_KEY
compatibility:
- openclaw
- claude-code
- hermes
openclaw:
emoji: π¬
requires:
bins:
- python3
env:
- AISA_API_KEY
primaryEnv: AISA_API_KEY
---
# Media Gen π¬
Generate images and videos with a single AIsa API key.
This skill covers the AIsa media-generation routes exposed across three image endpoints and one async video endpoint. The bundled client in `scripts/media_gen_client.py` picks the correct request shape for each supported model, including the schema differences between Wan video variants.
## Use when
- You want one neutral skill for AIsa image and video generation
- You need to switch between Gemini image, Wan image, Seedream, and Wan video models without rewriting requests
- You want a simple CLI for creating images, submitting async video jobs, polling task status, and downloading finished video output
## Compatibility
Works with any [agentskills.io](https://agentskills.io)-compatible harness, including:
- **Claude Code** and **Claude**
- **OpenAI Codex**
- **Cursor**
- **Gemini CLI**
- **OpenCode**, **Goose**, **OpenClaw**, **Hermes**
- and other tools that implement the [Agent Skills specification](https://agentskills.io/specification)
Requires Python 3, a POSIX shell, and `AISA_API_KEY` from [aisa.one](https://aisa.one).
## What you can do
### Image β Gemini (base64 inline)
```text
"Generate a cyberpunk-style city nightscape, neon lights, rainy night, cinematic feel"
```
### Image β Wan 2.7 (URL in chat response)
```text
"Generate an ultra-detailed product shot of a red panda, studio lighting, sharp focus"
```
### Image β Seedream (OpenAI-compatible, large format)
```text
"Generate a 2048Γ2048 magazine cover: neo-noir detective portrait, film grain"
```
### Video β text-to-video (Wan t2v)
```text
"Sweeping establishing shot of a neon cyberpunk skyline at dusk, 5 seconds"
```
### Video β image-to-video (Wan i2v)
```text
"Starting from this reference image, gentle camera push-in with parallax"
```
## Supported models
### Image generation β 4 models, 3 endpoints
| Model | Developer | Endpoint | Notes |
|---|---|---|---|
| `gemini-3-pro-image-preview` | Google | `POST /v1/models/{model}:generateContent` | Images returned as base64 in `candidates[].parts[].inline_data` |
| `wan2.7-image` | Alibaba | `POST /v1/chat/completions` | Images returned as URL parts in `choices[].message.content[]` (`type=image`) |
| `wan2.7-image-pro` | Alibaba | `POST /v1/chat/completions` | Higher fidelity |
| `seedream-4-5-251128` | ByteDance | `POST /v1/images/generations` | OpenAI-compatible; minimum 3,686,400 pixels |
### Video generation β 4 Wan variants, 1 endpoint
| Model | Kind | Image field | Output SR |
|---|---|---|---|
| `wan2.6-t2v` | text-to-video | *none* | 1080 |
| `wan2.6-i2v` | image-to-video | `input.img_url` (string) | 720 |
| `wan2.7-t2v` | text-to-video | *none* | 720 |
| `wan2.7-i2v` | image-to-video | **`input.media`** (array) | 720 |
> **Important:** `wan2.7-i2v` expects the reference image in `input.media` as an array of URLs, not `input.img_url` like `wan2.6-i2v`. The bundled client handles this automatically when you pass `--img-url`.
## Quick start
```bash
export AISA_API_KEY="your-key"
# Any image model β the client routes to the right endpoint
python3 scripts/media_gen_client.py image \
--model gemini-3-pro-image-preview \
--prompt "A cute red panda, cinematic lighting" \
--out out.png
python3 scripts/media_gen_client.py image \
--model wan2.7-image-pro \
--prompt "Ultra-detailed product shot of a red panda" \
--out out.png
python3 scripts/media_gen_client.py image \
--model seedream-4-5-251128 \
--prompt "Neo-noir detective portrait, film grain" \
--size 2048x2048 \
--out out.png
# Video β text-to-video
python3 scripts/media_gen_client.py video-create \
--model wan2.7-t2v \
--prompt "Sweeping shot of a neon cyberpunk skyline"
# Video β image-to-video on wan2.7-i2v
python3 scripts/media_gen_client.py video-create \
--model wan2.7-i2v \
--prompt "gentle zoom with parallax" \
--img-url "https://example.com/reference.jpg" \
--duration 5
# Wait and download
python3 scripts/media_gen_client.py video-wait \
--task-id <task_id> --download --out out.mp4
```
---
## Image generation β endpoint reference
### Gemini family β `POST /v1/models/{model}:generateContent`
Documentation: [Google Gemini Chat](https://aisa.one/docs/api-reference/chat/generatecontent).
```bash
curl -X POST "https://api.aisa.one/v1/models/gemini-3-pro-image-preview:generateContent" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents":[
{"role":"user","parts":[{"text":"A cute red panda, cinematic lighting"}]}
]
}'
```
Response contains `candidates[].parts[].inline_data` with `{mime_type, data}`, where `data` is a base64 PNG.
### Wan 2.7 family β `POST /v1/chat/completions`
Documentation: [Image Generation via Chat](https://aisa.one/docs/api-reference/chat/image-generation).
**Critical rule:** `messages[].content` must be an array of typed parts. A plain string returns HTTP 400 `invalid_parameter_error`.
```bash
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan2.7-image",
"messages": [
{"role":"user","content":[
{"type":"text","text":"A cute red panda, ultra-detailed, cinematic lighting"}
]}
],
"n": 1
}'
```
Images come back as `{type: "image", image: "<url>"}` parts inside `choices[].message.content[]`.
### Seedream β `POST /v1/images/generations`
Documentation: [OpenAI-Compatible Image Generations](https://aisa.one/docs/api-reference/chat/openai-image-generations).
```bash
curl -X POST "https://api.aisa.one/v1/images/generations" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedream-4-5-251128",
"prompt": "A cute red panda, ultra-detailed, cinematic lighting",
"n": 1,
"size": "2048x2048"
}'
```
Response: `data[].url` or `data[].b64_json`. Upstream enforces a minimum of 3,686,400 pixels. `1024Γ1024` and `1536Γ1536` are rejected. Any aspect ratio works as long as `width Γ height β₯ 3,686,400`.
---
## Video generation β endpoint reference
### Create task β `POST /apis/v1/services/aigc/video-generation/video-synthesis`
Documentation: [Create video generation task](https://aisa.one/docs/api-reference/video/post_services-aigc-video-generation-video-synthesis).
Header `X-DashScope-Async: enable` is required.
```bash
# wan2.6-t2v β text-to-video
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model":"wan2.6-t2v",
"input":{"prompt":"cinematic close-up, slow push-in"},
"parameters":{"resolution":"720P","duration":5}
}'
# wan2.7-i2v β image-to-video (input.media, not input.img_url)
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model":"wan2.7-i2v",
"input":{
"prompt":"gentle zoom with parallax",
"media":["https://example.com/reference.jpg"]
},
"parameters":{"resolution":"720P","duration":5}
}'
```
### Poll task β `GET /apis/v1/services/aigc/tasks/{task_id}`
Documentation: [Get video generation task result](https://aisa.one/docs/api-reference/video/get_services-aigc-tasks).
> `task_id` is a path parameter. The query-string form `?task_id=...` returns HTTP 500 `unsupported uri`.
```bash
curl "https://api.aisa.one/apis/v1/services/aigc/tasks/YOUR_TASK_ID" \
-H "Authorization: Bearer $AISA_API_KEY"
```
---
## Python client
The bundled client at `scripts/media_gen_client.py` auto-routes each image model to the correct endpoint and normalizes the response to a saved file.
```bash
# Image β model selects the endpoint
python3 scripts/media_gen_client.py image \
--model <gemini-3-pro-image-preview | wan2.7-image | wan2.7-image-pro | seedream-4-5-251128> \
--prompt "..." \
--out out.png
# Video β create task
python3 scripts/media_gen_client.py video-create \
--model <wan2.6-t2v | wan2.6-i2v | wan2.7-t2v | wan2.7-i2v> \
--prompt "..." \
[--img-url https://... (required for -i2v models)] \
[--duration 5|10] \
[--resolution 720P|1080P]
# Video β poll / wait / download
python3 scripts/media_gen_client.py video-status --task-id <id>
python3 scripts/media_gen_client.py video-wait --task-id <id> --poll 10 --timeout 600
python3 scripts/media_gen_client.py video-wait --task-id <id> --download --out out.mp4
```
## API reference
This skill calls the following AIsa endpoints directly:
- [Google Gemini Chat β `generateContent`](https://aisa.one/docs/api-reference/chat/generatecontent) β Gemini image models
- [Image Generation via Chat](https://aisa.one/docs/api-reference/chat/image-generation) β Wan 2.7 image family
- [OpenAI-Compatible Image Generations](https://aisa.one/docs/api-reference/chat/openai-image-generations) β Seedream
- [Create video generation task](https://aisa.one/docs/api-reference/video/post_services-aigc-video-generation-video-synthesis) β all four Wan video variants
- [Get video generation task result](https://aisa.one/docs/api-reference/video/get_services-aigc-tasks) β async polling
See the [full AIsa API Reference](https://aisa.one/docs/api-reference) for the complete catalog.
## License
MIT β see [LICENSE](../LICENSE) at the repo root.
don't have the plugin yet? install it then click "run inline in claude" again.