clawhubby @vincentlau2046-sudo

Picture Book Video

绘本故事脚本 → 视频 mp4（中英文双语版本）。自动完成：分镜图生成 → 静态画面/动画 → 串联 → TTS旁白 → ASS字幕 → 最终合成。输出中英文两个版本，附带抖音发布所需的标题/描述/话题。 TTS 优先使用 Qwen3-TTS（本地GPU，6角色音色库），失败回退 Edge TTS。

view source

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

picture-book-video converts children's story scripts into bilingual mp4 videos with auto-generated storyboards, character animation, tts narration, and ass subtitles, outputting chinese and english versions plus douyin metadata.

structure

9.0

trigger phrases

8.0

procedure

7.0

edge cases

6.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: picture-book-video
version: 1.1.0
description: >-
  绘本故事脚本 → 视频 mp4（中英文双语版本）。
  自动完成：分镜图生成 → 静态画面/动画 → 串联 → TTS旁白 → ASS字幕 → 最终合成。
  输出中英文两个版本，附带抖音发布所需的标题/描述/话题。
  TTS 优先使用 Qwen3-TTS（本地GPU，6角色音色库），失败回退 Edge TTS。

read_when:
  - 用户说"生成绘本视频"、"做绘本故事"、"绘本视频"
  - 用户提供绘本故事脚本要求生成视频
  - 用户提到"picture-book-video"、"琪琪OPC"、"绘本故事视频"
  - 用户需要为抖音生成儿童故事视频

metadata:
  openclaw:
    emoji: 📚
    priority: high
    category: video-generation
    tags:
      - picture-book
      - children
      - douyin
      - bilingual
      - tts
    conflicts_with: []
---

# 绘本故事视频 (Picture Book Video)

> 将绘本故事脚本 → 中英文双语视频（带字幕+旁白）

**核心流程**:
```
脚本解析 → 分镜图生成 → 画面合成 → 串联 → TTS旁白 → ASS字幕 → 最终视频
  (LLM)     (ComfyUI)    (ffmpeg)   (ffmpeg)  (EdgeTTS)  (脚本)    (ffmpeg)
 Phase 0    Phase 1      Phase 2    Phase 2   Phase 3    Phase 3   Phase 4
```

---

##  执行纪律

1. **Phase 分隔** — Phase 0-1 由 LLM 驱动（内容决策），Phase 2-4 由脚本驱动（技术合成）
2. **BLOCKING 步骤** — Phase 0（脚本评估）和 Phase 1（分镜方案确认）⛔ 必须等待用户响应
3. **禁止跳过确认** — 未经 Phase 1 用户确认，不得调用管线脚本
4. **脚本做技术，LLM 做内容** — 脚本不判断风格、不改写脚本、不做内容决策
5. **串行执行** — Phase 必须按顺序执行，不得跳跃
6. **双语输出** — 每个故事必须生成中文 + 英文两个版本

---

## Phase 0: 脚本评估

🚧 **GATE**: 用户提供了故事脚本

### 0.1 扫描输入

检查用户是否提供了：
- 故事脚本（必需）
- 合集名称（可选）
- 序列号（可选，如 S02E01）
- 合集描述（可选）

### 0.2 完整性评分

| 材料 | 必需 | 评分规则 |
|------|------|----------|
| 故事脚本 | ✅ 必需 | 无则直接报错退出 |
| 合集名称 | ❌ 可选 | 无则使用默认"琪琪的魔法故事屋" |
| 序列号 | ❌ 可选 | 无则自动生成 |
| 合集描述 | ❌ 可选 | 无则留空 |

### 0.3 交互策略

```
脚本完整 → 进入 Phase 1

脚本不完整 → ⛔ BLOCKING，提示用户补充
```

### 0.4 创建项目目录

```bash
PROJECT_NAME="picture-book-$(date +%Y%m%d)-<short-desc>"
PROJECT_DIR="<workspace>/project/${PROJECT_NAME}"
mkdir -p "${PROJECT_DIR}/input/"
mkdir -p "${PROJECT_DIR}/scenes/"
mkdir -p "${PROJECT_DIR}/output/"
mkdir -p "${PROJECT_DIR}/.temp/"

# 拷贝脚本到项目目录
cp <script_file> "${PROJECT_DIR}/input/"
```

---

## Phase 1: 分镜方案

🚧 **GATE**: 脚本评估通过

### 1.1 解析故事脚本

LLM 读取故事脚本，解析为分镜结构。每个分镜包含：
- 场景编号
- 旁白文本
- 画面描述（用于 ComfyUI 生成图片）
- 预计时长

输出格式：
```markdown
# 分镜方案

| 场景 | 旁白 | 画面描述 | 预计时长 |
|------|------|----------|----------|
| 1 | ... | ... | ...s |
...
```

### 1.2 风格确定

默认风格：**蜡笔儿童手绘风格**

LLM 向用户确认：
```
根据故事内容，推荐画面风格：
[1] 蜡笔儿童手绘（默认）
[2] 水彩风格
[3] 剪纸风格
请确认或自选。
```

### 1.3 生成 ComfyUI 提示词

为每个场景生成 Flux 文生图提示词，遵循风格规范。

### 1.4 保存分镜方案

生成 `<PROJECT_DIR>/scene_plan.md` 和 `<PROJECT_DIR>/scene_prompts.json`

### 1.5 方案确认

⛔ **BLOCKING** — 向用户展示分镜方案，等待确认：

```
请确认：
1. 确认生成 → 进入 Phase 2
2. 修改第X场景 → 重新生成 → 再次确认
```

---

## Phase 2: 画面生成

🚧 **GATE**: Phase 1 用户已确认

### 2.1 生成分镜图片

使用 ComfyUI 技能生成每个场景的图片：

```bash
python3 <SKILL_DIR>/scripts/generate_scenes.py \
  --prompts "<PROJECT_DIR>/scene_prompts.json" \
  --output-dir "<PROJECT_DIR>/scenes/" \
  --style "crayon"
```

### 2.2 生成封面

```bash
python3 <SKILL_DIR>/scripts/stage_cover.py \
  --output "<PROJECT_DIR>/scenes/cover.png" \
  --title "<标题>" \
  --subtitle "<合集名>" \
  --episode-id "<序列号>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --width 1920 --height 1080
```

### 2.3 画面验证

检查所有场景图片是否生成成功，分辨率是否为 1920x1080。

---

## Phase 3: 音频与字幕

🚧 **GATE**: Phase 2 完成

### 3.1 生成中文 TTS

**优先 Qwen3-TTS**（本地GPU，音色克隆+设计），**失败回退 Edge TTS**：

```bash
# 方式1: Qwen3-TTS（首选）
python3 ~/.openclaw/workspace/skills/tts-qwen3/scripts/qwen_tts.py \
  --text "<中文旁白全文>" \
  --voice narrator_teacher \
  --output "<PROJECT_DIR>/narration_cn.wav" \
  --fallback-edge true

# 方式2: Edge TTS（回退，自带 SRT）
python3 <SKILL_DIR>/scripts/tts.py \
  --text "<中文旁白全文>" \
  --output "<PROJECT_DIR>/narration_cn.mp3" \
  --srt "<PROJECT_DIR>/narration_cn.srt" \
  --voice zh-CN-XiaoyiNeural \
  --rate=-15%
```

**角色音色映射**（Qwen3-TTS）：
| 脚本角色 | --voice 参数 | 说明 |
|---------|-------------|------|
| 旁白/叙事 | narrator_teacher | 温暖女声 |
| 琪琪对话 | qiqi_clone | 克隆音色 |
| 小男孩 | boy_child | 活泼8岁 |
| 小女孩 | girl_child | 甜美7岁 |
| 大人男 | adult_male | 沉稳 |
| 大人女 | adult_female | 优雅 |

### 3.2 生成英文 TTS

```bash
python3 <SKILL_DIR>/scripts/tts.py \
  --text "<英文旁白全文>" \
  --output "<PROJECT_DIR>/narration_en.mp3" \
  --srt "<PROJECT_DIR>/narration_en.srt" \
  --voice en-US-JennyNeural \
  --rate=-15%
```

### 3.3 生成中文 ASS 字幕

```bash
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
  --srt "<PROJECT_DIR>/narration_cn.srt" \
  --output "<PROJECT_DIR>/subtitles_cn.ass" \
  --font-size 80
```

### 3.4 生成英文 ASS 字幕

```bash
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
  --srt "<PROJECT_DIR>/narration_en.srt" \
  --output "<PROJECT_DIR>/subtitles_en.ass" \
  --font-size 80
```

---

## Phase 4: 视频合成

🚧 **GATE**: Phase 3 完成

### 4.1 运行完整管线

```bash
# 中文版
python3 <SKILL_DIR>/scripts/pipeline.py \
  --scenes-dir "<PROJECT_DIR>/scenes/" \
  --audio "<PROJECT_DIR>/narration_cn.mp3" \
  --ass "<PROJECT_DIR>/subtitles_cn.ass" \
  --output "<PROJECT_DIR>/output/<episode_id>_cn.mp4" \
  --title "<标题>" \
  --subtitle "<合集名>" \
  --episode-id "<序列号>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --cover-duration 4.0 \
  --fade-duration 0.8

# 英文版
python3 <SKILL_DIR>/scripts/pipeline.py \
  --scenes-dir "<PROJECT_DIR>/scenes/" \
  --audio "<PROJECT_DIR>/narration_en.mp3" \
  --ass "<PROJECT_DIR>/subtitles_en.ass" \
  --output "<PROJECT_DIR>/output/<episode_id>_en.mp4" \
  --title "<英文标题>" \
  --subtitle "<英文合集名>" \
  --episode-id "<序列号>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --cover-duration 4.0 \
  --fade-duration 0.8
```

### 4.2 生成抖音发布描述

```markdown
# 抖音发布描述

## 中文版
- 标题：{中文标题}｜{合集名}
- 描述：{故事简介}
- 话题：#儿童故事 #{合集名} #睡前故事 #绘本动画

## 英文版
- 标题：{English Title}｜{English Series}
- 描述：{English synopsis}
- 话题：#英语启蒙 #磨耳朵英语 #{合集名} #儿童英语
```

### 4.3 质量验证

检查：
- 视频文件存在且 > 10MB
- H.264 + AAC 编码
- 1920×1080 分辨率
- 时长与音频匹配
- 字幕完整显示

---

##  抖音发布

视频生成后，使用 `douyin-browser-publish` 技能发布：

```bash
# 中文版
使用 douyin-browser-publish 技能发布：
- 视频：<PROJECT_DIR>/output/<episode_id>_cn.mp4
- 标题：{中文标题}｜{合集名}
- 话题：#儿童故事 #{合集名} #睡前故事 #绘本动画

# 英文版
使用 douyin-browser-publish 技能发布：
- 视频：<PROJECT_DIR>/output/<episode_id>_en.mp4
- 标题：{English Title}｜{English Series}
- 话题：#英语启蒙 #磨耳朵英语 #{合集名} #儿童英语
```

---

## 🛠️ 依赖要求

```bash
# 必需
python3 --version       # 3.8+
ffmpeg -version         # 5.0+
edge-tts --version      # 7.0+

# ComfyUI（文生图）
cd ~/ComfyUI && ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188

# 琪琪角色图
~/.openclaw/workspace/characters/qiqi_default.png
```

---

## ️ 故障排除

| 问题 | 解决 |
|------|------|
| ComfyUI 未启动 | `cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188` |
| TTS 失败 | 检查网络；文本超长时分段合成 |
| 字幕截断 | 检查 srt_to_ass.py 的 max_chars 设置（默认 24） |
| 视频合成失败 | 检查 ffmpeg 版本；检查场景图片存在 |

---

## 📁 输出文件

每个故事生成后，项目目录包含：

```
<PROJECT_DIR>/
├── input/
│   └── <script_file>          # 原始脚本
├── scenes/
│   ├── cover.png              # 封面（含琪琪角色）
│   ├── scene_01.png           # 场景图
│   ├── scene_02.png
│   └── ...
── narration_cn.mp3           # 中文旁白
├── narration_cn.srt           # 中文 SRT
├── subtitles_cn.ass           # 中文 ASS 字幕
── narration_en.mp3           # 英文旁白
├── narration_en.srt           # 英文 SRT
├── subtitles_en.ass           # 英文 ASS 字幕
├── output/
│   ├── <episode_id>_cn.mp4    # 中文视频
│   └── <episode_id>_en.mp4    # 英文视频
── scene_plan.md              # 分镜方案
├── scene_prompts.json         # ComfyUI 提示词
└── douyin_publish.md          # 抖音发布描述
```

---

## 🔗 相关技能

- **comfyui-image-video**: ComfyUI 文生图/视频生成
- **tts-qwen3**: Qwen3-TTS 本地语音合成（琪琪OPC首选）
- **tts-cosyvoice**: Edge TTS 语音合成（回退方案）
- **douyin-browser-publish**: 抖音视频发布
- **keynote-video**: PPT 转视频

---

*版本: v1.0 | 基于 琪琪OPC 项目管线 | 参考 keynote-video 架构*

related skills

semantically similar in the cross-vendor index

clawhub

72% match

AI科普视频工作室（Mac mini 16G适用）

AI科普视频全流程自动化制作技能。将数字人形象（Google Flow / SadTalker）、AI语音克隆（F5-TTS MLX）、Pillow内容幻灯片、逐字卡拉OK字幕（Pillow + FFmpeg）、以及专业级音视频QA整合为8阶段自动化流水线。覆盖：脚本策划 → 数字人生成 → TTS语音克隆 →...

don't have the plugin yet? install it then click "run inline in claude" again.

restructured into 6 implexa components with explicit decision branches, edge cases (rate limits, timeouts, fallbacks, missing files), blocking gates at phases 1 and 2, detailed io specs per step, and clear outcome signals for user confirmation at each phase.

Picture Book Video

convert picture book scripts into bilingual videos (chinese + english) with storyboards, narration, and subtitles. use this when someone asks to generate a picture book video, provides a story script, mentions "picture-book-video" or "琪琪OPC", or needs content for douyin. the skill handles everything from script parsing through final video composition, with user confirmation gates at storyboard and style approval stages.

intent

convert a picture book story script into a finished, bilingual (chinese and english) video file with animated storyboards, character narration, and synchronized subtitles. the skill auto-generates scene illustrations, synthesizes audio in two languages with voice character mapping, creates subtitle files, and composites everything into two mp4 outputs plus douyin publishing metadata. use this when a user provides a story script and wants a complete video production pipeline. the skill enforces approval gates at script validation and storyboard confirmation to prevent wasted compute on unapproved directions.

inputs

story script (required): text file or pasted markdown containing the picture book narrative. must include scene descriptions and dialogue. no script means hard stop at phase 0.
series name (optional): collection title (e.g. "琪琪的魔法故事屋"). defaults to "琪琪的魔法故事屋" if omitted.
episode id (optional): sequence identifier (e.g. "S02E01"). auto-generated if missing.
series description (optional): blurb about the collection. left blank if not provided.
comfyui instance (required): running at http://127.0.0.1:8188. must be live before phase 2 begins. set via COMFYUI_URL env var or default to localhost:8188.
qwen3-tts or edge-tts (required for phase 3): qwen3-tts requires local gpu and ~/.openclaw/workspace/skills/tts-qwen3/ path. edge-tts requires internet connectivity and edge-tts package. qwen3 is tried first; edge-tts is fallback.
ffmpeg (required for phase 4): version 5.0+. check with ffmpeg -version.
qiqi character image (required): png file at ~/.openclaw/workspace/characters/qiqi_default.png. used in cover generation and video branding.
workspace directory (required): base path for project directories, scenes, temp files, outputs. default to ~/.openclaw/workspace/projects/.
python 3.8+ (required): for all pipeline scripts and tts synthesis.

procedure

phase 0: script intake and validation

step 0.1: scan input input: user-provided story script (text, markdown, or file path). output: confirmation that script exists and is readable (non-empty file or pasted text). check for: story script (required), series name (optional), episode id (optional), series description (optional).

step 0.2: completeness scoring input: script scan results. output: pass/fail verdict on required materials. logic:

if script is empty or missing, fail immediately. return error: "no story script provided. please paste or upload a picture book script to begin."
if script exists and is readable, proceed to step 0.3.
if series name is missing, assign default "琪琪的魔法故事屋".
if episode id is missing, auto-generate as "EP--".
if series description is missing, leave blank.

step 0.3: create project directory input: series name, episode id, script content. output: project directory structure created and script copied into input folder. actions:

PROJECT_NAME = "picture-book-" + date(YYYYMMDD) + "-" + slugify(first-20-chars-of-script)
PROJECT_DIR = workspace_base + "/projects/" + PROJECT_NAME
create directories: input/, scenes/, output/, .temp/
copy user script to input/<script_filename>

step 0.4: blocking gate confirmation input: project directory created, script validated. output: user acknowledges script is ready, or requests changes. action: display script summary (title, length, character count) and ask "proceed to storyboard generation?" user must reply yes to continue.

phase 1: storyboard planning and style approval

step 1.1: parse script into storyboard input: validated script from phase 0. output: markdown table with columns: scene number, narration text (chinese), scene description (for image generation), estimated duration in seconds. action: use llm to break script into logical visual scenes. each scene should be 3-8 seconds of video (typical picture book pace). number scenes 001, 002, etc. extract dialogue and narration separately.

step 1.2: confirm visual style input: storyboard scenes. output: user-selected visual style. action: present user with 3 default options:

[1] crayon children's hand-drawn (default)
[2] watercolor
[3] paper cutout ask user to confirm default or pick alternative.

step 1.3: generate comfyui prompts input: storyboard scenes, selected visual style. output: json file with one prompt per scene, formatted for flux text-to-image model. each prompt includes: scene number, short narrative context, visual style keywords, character descriptions if any, color palette hints, composition notes. save to: PROJECT_DIR/scene_prompts.json example prompt: "illustration in crayon children's hand-drawn style, warm colors, a magical forest with glowing trees, a young girl in red dress looking up at stars, storybook illustration, soft edges, watercolor texture, children's book art, whimsical mood"

step 1.4: save and display storyboard plan input: storyboard table, style choice, comfyui prompts. output: markdown file with full plan. save to PROJECT_DIR/scene_plan.md display to user: full storyboard table, selected style, and a note about total estimated video length (sum of all scene durations).

step 1.5: blocking approval gate input: scene_plan.md and scene_prompts.json generated. output: user approval or revision request. action: show user the storyboard and ask "approve storyboard?" if yes, proceed to phase 2. if no, ask which scenes to revise and loop back to step 1.1. do not proceed to phase 2 until user explicitly approves.

phase 2: scene image generation and assembly

step 2.1: generate scene images via comfyui input: scene_prompts.json, comfyui url. output: png image files (1920x1080) for each scene. action:

for each scene in scene_prompts.json:
  call comfyui api with prompt
  save output to PROJECT_DIR/scenes/scene_<number>.png
  log hash and generation time
  if generation fails, retry up to 2 times with same prompt
  if still fails, log error and skip to next scene

edge case: if comfyui is unreachable, hard stop with error "comfyui at [url] is not responding. start comfyui and retry phase 2." edge case: if image generation timeout exceeds 5 minutes per scene, retry up to 2 times then skip.

step 2.2: generate cover image input: series name, episode id, title extracted from script, qiqi character image path. output: 1920x1080 png cover image with branding. action:

python3 <SKILL_DIR>/scripts/stage_cover.py \
  --output PROJECT_DIR/scenes/cover.png \
  --title "<title from script>" \
  --subtitle "<series name>" \
  --episode-id "<episode id>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --width 1920 --height 1080

if qiqi image is missing, generate cover without character image but still include text branding.

step 2.3: validate scene images input: all generated scene pngs in scenes/ directory. output: validation report (passed or failed scenes). action:

check each scene_*.png exists and is readable.
check resolution is exactly 1920x1080.
check file size > 50KB (filters out broken/blank images).
if any scene fails validation, log warning but do not halt (will impact final video but phase 3 can proceed).
report count of valid scenes vs. total expected scenes.

phase 3: audio and subtitle generation

step 3.1: extract narration text input: storyboard from phase 1. output: full chinese narration text (concatenated from all scene narration fields), full english translation. action: use llm to translate each scene's narration from chinese to english. maintain character voice consistency.

step 3.2: generate chinese tts input: chinese narration text. output: wav audio file and srt subtitle file with timing. action: attempt qwen3-tts first:

python3 ~/.openclaw/workspace/skills/tts-qwen3/scripts/qwen_tts.py \
  --text "<chinese narration full text>" \
  --voice "narrator_teacher" \
  --output PROJECT_DIR/narration_cn.wav \
  --fallback-edge true \
  --rate 1.0

if qwen3-tts fails (gpu unavailable, package missing, timeout > 10 mins), fall back to edge-tts:

python3 <SKILL_DIR>/scripts/tts.py \
  --text "<chinese narration full text>" \
  --output PROJECT_DIR/narration_cn.mp3 \
  --srt PROJECT_DIR/narration_cn.srt \
  --voice "zh-CN-XiaoyiNeural" \
  --rate "-15%"

voice character mapping (qwen3-tts only):

narrator / stage directions: narrator_teacher
qiqi (main character): qiqi_clone
young boy: boy_child
young girl: girl_child
adult male: adult_male
adult female: adult_female

edge cases: if narration text exceeds 5000 characters, split into chunks (max 2000 chars per request) and concatenate audio files with 0.5s silence between. edge case: if tts service is rate-limited, wait 30s and retry up to 3 times.

step 3.3: generate english tts input: english narration text (translated in step 3.1). output: mp3 audio file and srt subtitle file with timing. action:

python3 <SKILL_DIR>/scripts/tts.py \
  --text "<english narration full text>" \
  --output PROJECT_DIR/narration_en.mp3 \
  --srt PROJECT_DIR/narration_en.srt \
  --voice "en-US-JennyNeural" \
  --rate "-15%"

edge cases: same as step 3.2 (text length, rate limits, retry logic).

step 3.4: generate chinese ass subtitles input: narration_cn.srt file from step 3.2. output: ass subtitle file with styling. action:

python3 <SKILL_DIR>/scripts/srt_to_ass.py \
  --srt PROJECT_DIR/narration_cn.srt \
  --output PROJECT_DIR/subtitles_cn.ass \
  --font-size 80 \
  --font-name "Microsoft YaHei" \
  --color "FFFFFF" \
  --outline-color "000000" \
  --outline-width 2.0

edge case: if srt has lines longer than 24 characters, script automatically splits across two subtitle lines.

step 3.5: generate english ass subtitles input: narration_en.srt file from step 3.3. output: ass subtitle file with styling. action:

python3 <SKILL_DIR>/scripts/srt_to_ass.py \
  --srt PROJECT_DIR/narration_en.srt \
  --output PROJECT_DIR/subtitles_en.ass \
  --font-size 80 \
  --font-name "Arial" \
  --color "FFFFFF" \
  --outline-color "000000" \
  --outline-width 2.0

phase 4: video composition

step 4.1: composite chinese video input: scene images (cover.png, scene_*.png), narration_cn.mp3, subtitles_cn.ass. output: final mp4 video file (h.264 + aac). action:

python3 <SKILL_DIR>/scripts/pipeline.py \
  --scenes-dir PROJECT_DIR/scenes/ \
  --audio PROJECT_DIR/narration_cn.mp3 \
  --ass PROJECT_DIR/subtitles_cn.ass \
  --output PROJECT_DIR/output/<episode_id>_cn.mp4 \
  --title "<title>" \
  --subtitle "<series_name>" \
  --episode-id "<episode_id>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --cover-duration 4.0 \
  --fade-duration 0.8 \
  --fps 24 \
  --bitrate "5000k"

edge case: if final video duration does not match audio duration within 2 seconds, log warning and check frame drops. edge case: if video file size < 10MB, likely generation failed; check ffmpeg logs.

step 4.2: composite english video input: scene images (cover.png, scene_*.png), narration_en.mp3, subtitles_en.ass. output: final mp4 video file (h.264 + aac). action:

python3 <SKILL_DIR>/scripts/pipeline.py \
  --scenes-dir PROJECT_DIR/scenes/ \
  --audio PROJECT_DIR/narration_en.mp3 \
  --ass PROJECT_DIR/subtitles_en.ass \
  --output PROJECT_DIR/output/<episode_id>_en.mp4 \
  --title "<english_title>" \
  --subtitle "<english_series_name>" \
  --episode-id "<episode_id>" \
  --brand "琪琪的魔法故事屋" \
  --qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
  --cover-duration 4.0 \
  --fade-duration 0.8 \
  --fps 24 \
  --bitrate "5000k"

(same edge cases as step 4.1)

step 4.3: validate final videos input: both mp4 files in output/ directory. output: validation report. action:

verify both _cn.mp4 and _en.mp4 exist.
check file size > 10MB each.
check duration matches audio duration within 2 seconds.
verify h.264 codec and aac audio using ffprobe.
verify resolution is 1920x1080.
if validation fails on either video, log detailed error and alert user.

step 4.4: generate douyin publishing metadata input: script title, series name, episode id. output: markdown file with chinese and english douyin publishing descriptions. save to: PROJECT_DIR/douyin_publish.md format:

# 抖音发布描述

## 中文版
- 标题：<title>｜<series_name>
- 描述：<2-3 sentence story summary in chinese>
- 话题：#儿童故事 #<series_name> #睡前故事 #绘本动画

## 英文版
- 标题：<english_title>｜<english_series_name>
- 描述：<2-3 sentence story summary in english>
- 话题：#英语启蒙 #磨耳朵英语 #<series_name> #儿童英语

decision points

if no script provided at phase 0: hard stop. return error "no story script found. please paste or upload a picture book script (markdown or text file) to begin." user must provide script; no defaults exist.

if series name is missing: use default "琪琪的魔法故事屋". no user intervention required.

if episode id is missing: auto-generate as "EP--<hash(first-100-chars-of-script)>". no user intervention required.

if script validation passes but storyboard generation fails: halt at phase 1. ask user to clarify story structure or provide a more detailed script. do not proceed to phase 2.

if user does not approve storyboard at phase 1.5: loop back to step 1.1. ask which scenes to revise. regenerate those scenes and re-display for approval. do not proceed to phase 2 until explicit user approval received.

if comfyui is unreachable at phase 2.1: hard stop with error "comfyui at [url] not responding. ensure comfyui is running at http://127.0.0.1:8188 and retry." user must start comfyui before phase 2 can retry.

if scene image generation fails for > 50% of scenes: halt phase 2 and alert user. check comfyui logs for errors (oom, invalid prompts, etc.). ask user to retry or provide alternative prompts.

if qwen3-tts is unavailable at phase 3.2: automatically fall back to edge-tts without user interaction. log fallback event. both tts engines produce usable output; no blocking gate here.

if tts text exceeds 5000 characters: split into chunks, generate audio per chunk, concatenate with 0.5s silence. this is automatic; no user approval needed.

if tts generation times out (> 10 mins per chunk): retry up to 3 times with 30s delay between retries. if still fails after 3 retries, hard stop and ask user to check tts service status or try again later.

if final video validation fails: alert user with specific error (file size, duration, codec mismatch, etc.). offer options: retry phase 4, or debug specific phase (re-run step 3 audio or phase 2 scenes). user chooses next action.

if final video file size < 10MB: likely indicates encoding error. log ffmpeg stderr and halt. ask user to check disk space and ffmpeg installation.

output contract

success means all of the following exist in PROJECT_DIR/output/:

_cn.mp4: chinese version video, h.264 video codec + aac audio, 1920x1080 resolution, duration within 2 seconds of audio duration, file size > 10MB, audio clearly audible, subtitles synchronized and readable.
_en.mp4: english version video, same codec/resolution/duration/size specs as chinese version.
PROJECT_DIR/scene_plan.md: markdown file with storyboard table (scene number, narration, description, duration), selected visual style, total estimated video length.
PROJECT_DIR/douyin_publish.md: markdown file with chinese and english douyin metadata (title, description, hashtags) for each language version.
PROJECT_DIR/narration_cn.wav or .mp3: chinese audio file, duration matches final video duration, voice quality acceptable (no artifacts, audible speech).
PROJECT_DIR/narration_en.mp3: english audio file, duration matches final video duration, voice quality acceptable.
PROJECT_DIR/subtitles_cn.ass and subtitles_en.ass: ass subtitle files, timing synchronized with audio, all dialogue and narration present, no truncation.
PROJECT_DIR/scenes/: directory containing cover.png and all scene_*.png files (1920x1080 each, > 50KB each).

all file paths are absolute or relative to PROJECT_DIR. metadata in douyin_publish.md is plain text (no special formatting required).

outcome signal

user sees:

at end of phase 0: confirmation message "script validated. project created at [PROJECT_DIR]. proceeding to storyboard generation."
at end of phase 1: storyboard table displayed, user asked "approve storyboard?" only after explicit user approval ("yes" or "approved"), a success message appears: "storyboard approved. generating scenes..."
at end of phase 2: message "scene generation complete. [N] images created at [PROJECT_DIR]/scenes/. proceeding to audio and subtitles..."
at end of phase 3: message "audio and subtitles generated. chinese: [narration_cn.mp3/wav], english: [narration_en.mp3]. proceeding to video composition..."

at end of phase 4: final success message displayed:

✅ video generation complete!

outputs:
- chinese video: [PROJECT_DIR]/output/<episode_id>_cn.mp4
- english video: [PROJECT_DIR]/output/<episode_id>_en.mp4
- douyin metadata: [PROJECT_DIR]/douyin_publish.md

ready for douyin publishing. use douyin-browser-publish skill to upload both videos.

if any error occurs at any phase, user receives clear error message specifying which phase failed, why, and what action to take next (e.g. "retry phase 2", "check comfyui status", "provide revised script").

both mp4 files are playable in standard video players (vlc, ffplay, browser). user can immediately preview final videos without additional processing. video durations, audio sync, and subtitle timing are visually confirmed in preview.

troubleshooting

problem	check	solution
comfyui not responding at phase 2	is comfyui running? check http://127.0.0.1:8188 in browser.	start comfyui: `cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188`
scene generation timeouts	comfyui slow or oom. check comfyui logs for "out of memory".	reduce prompt complexity or wait for comfyui to free vram. retry phase 2.
tts fails at phase 3	network issue or tts service down. check internet connectivity.	retry phase 3 up to 3 times. if edge-tts timeout > 5 mins, check edge-tts package version (`edge-tts --version`).
text split across two subtitle lines incorrectly	srt_to_ass.py max_chars setting (default 24).	check script and increase max_chars if needed. regenerate subtitles.
final video has no audio	audio file missing or ffmpeg did not mux audio. check PROJECT_DIR/narration_cn.mp3 exists and is readable.	verify ffmpeg version >= 5.0. retry phase 4.
final video duration does not match audio	frame drops or scene timing issue. check scene images are all 1920x1080 and present.	check ffmpeg logs in .temp/ directory. retry phase 4 with verbose logging.
video file size very small (< 5MB)	encoding failed or ffmpeg crashed. check disk space.	verify ffmpeg installed correctly. check .temp/ for partial files. retry phase 4.
qiqi character image not found	missing file at ~/.openclaw/workspace/characters/qiqi_default.png.	provide valid path or skip character overlay (cover and video will still generate).

related skills

comfyui-image-video: text-to-image and video generation via comfyui api.
tts-qwen3: qwen3-tts local speech synthesis with voice cloning (primary tts for this skill).
tts-cosyvoice or edge-tts: fallback speech synthesis if qwen3-tts unavailable.
douyin-browser-publish: automate douyin video upload (used after phase 4 completes).
keynote-video: ppt to video conversion (referenced for architecture pattern).

version: 1.1.0 | enriched for implexa quality standards | original: clawhub琪琪opc project