绘本故事脚本 → 视频 mp4(中英文双语版本)。 自动完成:分镜图生成 → 静态画面/动画 → 串联 → TTS旁白 → ASS字幕 → 最终合成。 输出中英文两个版本,附带抖音发布所需的标题/描述/话题。 TTS 优先使用 Qwen3-TTS(本地GPU,6角色音色库),失败回退 Edge TTS。
---
name: picture-book-video
version: 1.1.0
description: >-
绘本故事脚本 → 视频 mp4(中英文双语版本)。
自动完成:分镜图生成 → 静态画面/动画 → 串联 → TTS旁白 → ASS字幕 → 最终合成。
输出中英文两个版本,附带抖音发布所需的标题/描述/话题。
TTS 优先使用 Qwen3-TTS(本地GPU,6角色音色库),失败回退 Edge TTS。
read_when:
- 用户说"生成绘本视频"、"做绘本故事"、"绘本视频"
- 用户提供绘本故事脚本要求生成视频
- 用户提到"picture-book-video"、"琪琪OPC"、"绘本故事视频"
- 用户需要为抖音生成儿童故事视频
metadata:
openclaw:
emoji: 📚
priority: high
category: video-generation
tags:
- picture-book
- children
- douyin
- bilingual
- tts
conflicts_with: []
---
# 绘本故事视频 (Picture Book Video)
> 将绘本故事脚本 → 中英文双语视频(带字幕+旁白)
**核心流程**:
```
脚本解析 → 分镜图生成 → 画面合成 → 串联 → TTS旁白 → ASS字幕 → 最终视频
(LLM) (ComfyUI) (ffmpeg) (ffmpeg) (EdgeTTS) (脚本) (ffmpeg)
Phase 0 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 Phase 4
```
---
## 执行纪律
1. **Phase 分隔** — Phase 0-1 由 LLM 驱动(内容决策),Phase 2-4 由脚本驱动(技术合成)
2. **BLOCKING 步骤** — Phase 0(脚本评估)和 Phase 1(分镜方案确认)⛔ 必须等待用户响应
3. **禁止跳过确认** — 未经 Phase 1 用户确认,不得调用管线脚本
4. **脚本做技术,LLM 做内容** — 脚本不判断风格、不改写脚本、不做内容决策
5. **串行执行** — Phase 必须按顺序执行,不得跳跃
6. **双语输出** — 每个故事必须生成中文 + 英文两个版本
---
## Phase 0: 脚本评估
🚧 **GATE**: 用户提供了故事脚本
### 0.1 扫描输入
检查用户是否提供了:
- 故事脚本(必需)
- 合集名称(可选)
- 序列号(可选,如 S02E01)
- 合集描述(可选)
### 0.2 完整性评分
| 材料 | 必需 | 评分规则 |
|------|------|----------|
| 故事脚本 | ✅ 必需 | 无则直接报错退出 |
| 合集名称 | ❌ 可选 | 无则使用默认"琪琪的魔法故事屋" |
| 序列号 | ❌ 可选 | 无则自动生成 |
| 合集描述 | ❌ 可选 | 无则留空 |
### 0.3 交互策略
```
脚本完整 → 进入 Phase 1
脚本不完整 → ⛔ BLOCKING,提示用户补充
```
### 0.4 创建项目目录
```bash
PROJECT_NAME="picture-book-$(date +%Y%m%d)-<short-desc>"
PROJECT_DIR="<workspace>/project/${PROJECT_NAME}"
mkdir -p "${PROJECT_DIR}/input/"
mkdir -p "${PROJECT_DIR}/scenes/"
mkdir -p "${PROJECT_DIR}/output/"
mkdir -p "${PROJECT_DIR}/.temp/"
# 拷贝脚本到项目目录
cp <script_file> "${PROJECT_DIR}/input/"
```
---
## Phase 1: 分镜方案
🚧 **GATE**: 脚本评估通过
### 1.1 解析故事脚本
LLM 读取故事脚本,解析为分镜结构。每个分镜包含:
- 场景编号
- 旁白文本
- 画面描述(用于 ComfyUI 生成图片)
- 预计时长
输出格式:
```markdown
# 分镜方案
| 场景 | 旁白 | 画面描述 | 预计时长 |
|------|------|----------|----------|
| 1 | ... | ... | ...s |
...
```
### 1.2 风格确定
默认风格:**蜡笔儿童手绘风格**
LLM 向用户确认:
```
根据故事内容,推荐画面风格:
[1] 蜡笔儿童手绘(默认)
[2] 水彩风格
[3] 剪纸风格
请确认或自选。
```
### 1.3 生成 ComfyUI 提示词
为每个场景生成 Flux 文生图提示词,遵循风格规范。
### 1.4 保存分镜方案
生成 `<PROJECT_DIR>/scene_plan.md` 和 `<PROJECT_DIR>/scene_prompts.json`
### 1.5 方案确认
⛔ **BLOCKING** — 向用户展示分镜方案,等待确认:
```
请确认:
1. 确认生成 → 进入 Phase 2
2. 修改第X场景 → 重新生成 → 再次确认
```
---
## Phase 2: 画面生成
🚧 **GATE**: Phase 1 用户已确认
### 2.1 生成分镜图片
使用 ComfyUI 技能生成每个场景的图片:
```bash
python3 <SKILL_DIR>/scripts/generate_scenes.py \
--prompts "<PROJECT_DIR>/scene_prompts.json" \
--output-dir "<PROJECT_DIR>/scenes/" \
--style "crayon"
```
### 2.2 生成封面
```bash
python3 <SKILL_DIR>/scripts/stage_cover.py \
--output "<PROJECT_DIR>/scenes/cover.png" \
--title "<标题>" \
--subtitle "<合集名>" \
--episode-id "<序列号>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--width 1920 --height 1080
```
### 2.3 画面验证
检查所有场景图片是否生成成功,分辨率是否为 1920x1080。
---
## Phase 3: 音频与字幕
🚧 **GATE**: Phase 2 完成
### 3.1 生成中文 TTS
**优先 Qwen3-TTS**(本地GPU,音色克隆+设计),**失败回退 Edge TTS**:
```bash
# 方式1: Qwen3-TTS(首选)
python3 ~/.openclaw/workspace/skills/tts-qwen3/scripts/qwen_tts.py \
--text "<中文旁白全文>" \
--voice narrator_teacher \
--output "<PROJECT_DIR>/narration_cn.wav" \
--fallback-edge true
# 方式2: Edge TTS(回退,自带 SRT)
python3 <SKILL_DIR>/scripts/tts.py \
--text "<中文旁白全文>" \
--output "<PROJECT_DIR>/narration_cn.mp3" \
--srt "<PROJECT_DIR>/narration_cn.srt" \
--voice zh-CN-XiaoyiNeural \
--rate=-15%
```
**角色音色映射**(Qwen3-TTS):
| 脚本角色 | --voice 参数 | 说明 |
|---------|-------------|------|
| 旁白/叙事 | narrator_teacher | 温暖女声 |
| 琪琪对话 | qiqi_clone | 克隆音色 |
| 小男孩 | boy_child | 活泼8岁 |
| 小女孩 | girl_child | 甜美7岁 |
| 大人男 | adult_male | 沉稳 |
| 大人女 | adult_female | 优雅 |
### 3.2 生成英文 TTS
```bash
python3 <SKILL_DIR>/scripts/tts.py \
--text "<英文旁白全文>" \
--output "<PROJECT_DIR>/narration_en.mp3" \
--srt "<PROJECT_DIR>/narration_en.srt" \
--voice en-US-JennyNeural \
--rate=-15%
```
### 3.3 生成中文 ASS 字幕
```bash
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
--srt "<PROJECT_DIR>/narration_cn.srt" \
--output "<PROJECT_DIR>/subtitles_cn.ass" \
--font-size 80
```
### 3.4 生成英文 ASS 字幕
```bash
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
--srt "<PROJECT_DIR>/narration_en.srt" \
--output "<PROJECT_DIR>/subtitles_en.ass" \
--font-size 80
```
---
## Phase 4: 视频合成
🚧 **GATE**: Phase 3 完成
### 4.1 运行完整管线
```bash
# 中文版
python3 <SKILL_DIR>/scripts/pipeline.py \
--scenes-dir "<PROJECT_DIR>/scenes/" \
--audio "<PROJECT_DIR>/narration_cn.mp3" \
--ass "<PROJECT_DIR>/subtitles_cn.ass" \
--output "<PROJECT_DIR>/output/<episode_id>_cn.mp4" \
--title "<标题>" \
--subtitle "<合集名>" \
--episode-id "<序列号>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--cover-duration 4.0 \
--fade-duration 0.8
# 英文版
python3 <SKILL_DIR>/scripts/pipeline.py \
--scenes-dir "<PROJECT_DIR>/scenes/" \
--audio "<PROJECT_DIR>/narration_en.mp3" \
--ass "<PROJECT_DIR>/subtitles_en.ass" \
--output "<PROJECT_DIR>/output/<episode_id>_en.mp4" \
--title "<英文标题>" \
--subtitle "<英文合集名>" \
--episode-id "<序列号>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--cover-duration 4.0 \
--fade-duration 0.8
```
### 4.2 生成抖音发布描述
```markdown
# 抖音发布描述
## 中文版
- 标题:{中文标题}|{合集名}
- 描述:{故事简介}
- 话题:#儿童故事 #{合集名} #睡前故事 #绘本动画
## 英文版
- 标题:{English Title}|{English Series}
- 描述:{English synopsis}
- 话题:#英语启蒙 #磨耳朵英语 #{合集名} #儿童英语
```
### 4.3 质量验证
检查:
- 视频文件存在且 > 10MB
- H.264 + AAC 编码
- 1920×1080 分辨率
- 时长与音频匹配
- 字幕完整显示
---
## 抖音发布
视频生成后,使用 `douyin-browser-publish` 技能发布:
```bash
# 中文版
使用 douyin-browser-publish 技能发布:
- 视频:<PROJECT_DIR>/output/<episode_id>_cn.mp4
- 标题:{中文标题}|{合集名}
- 话题:#儿童故事 #{合集名} #睡前故事 #绘本动画
# 英文版
使用 douyin-browser-publish 技能发布:
- 视频:<PROJECT_DIR>/output/<episode_id>_en.mp4
- 标题:{English Title}|{English Series}
- 话题:#英语启蒙 #磨耳朵英语 #{合集名} #儿童英语
```
---
## 🛠️ 依赖要求
```bash
# 必需
python3 --version # 3.8+
ffmpeg -version # 5.0+
edge-tts --version # 7.0+
# ComfyUI(文生图)
cd ~/ComfyUI && ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188
# 琪琪角色图
~/.openclaw/workspace/characters/qiqi_default.png
```
---
## ️ 故障排除
| 问题 | 解决 |
|------|------|
| ComfyUI 未启动 | `cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188` |
| TTS 失败 | 检查网络;文本超长时分段合成 |
| 字幕截断 | 检查 srt_to_ass.py 的 max_chars 设置(默认 24) |
| 视频合成失败 | 检查 ffmpeg 版本;检查场景图片存在 |
---
## 📁 输出文件
每个故事生成后,项目目录包含:
```
<PROJECT_DIR>/
├── input/
│ └── <script_file> # 原始脚本
├── scenes/
│ ├── cover.png # 封面(含琪琪角色)
│ ├── scene_01.png # 场景图
│ ├── scene_02.png
│ └── ...
── narration_cn.mp3 # 中文旁白
├── narration_cn.srt # 中文 SRT
├── subtitles_cn.ass # 中文 ASS 字幕
── narration_en.mp3 # 英文旁白
├── narration_en.srt # 英文 SRT
├── subtitles_en.ass # 英文 ASS 字幕
├── output/
│ ├── <episode_id>_cn.mp4 # 中文视频
│ └── <episode_id>_en.mp4 # 英文视频
── scene_plan.md # 分镜方案
├── scene_prompts.json # ComfyUI 提示词
└── douyin_publish.md # 抖音发布描述
```
---
## 🔗 相关技能
- **comfyui-image-video**: ComfyUI 文生图/视频生成
- **tts-qwen3**: Qwen3-TTS 本地语音合成(琪琪OPC首选)
- **tts-cosyvoice**: Edge TTS 语音合成(回退方案)
- **douyin-browser-publish**: 抖音视频发布
- **keynote-video**: PPT 转视频
---
*版本: v1.0 | 基于 琪琪OPC 项目管线 | 参考 keynote-video 架构*
don't have the plugin yet? install it then click "run inline in claude" again.
restructured into 6 implexa components with explicit decision branches, edge cases (rate limits, timeouts, fallbacks, missing files), blocking gates at phases 1 and 2, detailed io specs per step, and clear outcome signals for user confirmation at each phase.
convert picture book scripts into bilingual videos (chinese + english) with storyboards, narration, and subtitles. use this when someone asks to generate a picture book video, provides a story script, mentions "picture-book-video" or "琪琪OPC", or needs content for douyin. the skill handles everything from script parsing through final video composition, with user confirmation gates at storyboard and style approval stages.
convert a picture book story script into a finished, bilingual (chinese and english) video file with animated storyboards, character narration, and synchronized subtitles. the skill auto-generates scene illustrations, synthesizes audio in two languages with voice character mapping, creates subtitle files, and composites everything into two mp4 outputs plus douyin publishing metadata. use this when a user provides a story script and wants a complete video production pipeline. the skill enforces approval gates at script validation and storyboard confirmation to prevent wasted compute on unapproved directions.
COMFYUI_URL env var or default to localhost:8188.~/.openclaw/workspace/skills/tts-qwen3/ path. edge-tts requires internet connectivity and edge-tts package. qwen3 is tried first; edge-tts is fallback.ffmpeg -version.~/.openclaw/workspace/characters/qiqi_default.png. used in cover generation and video branding.~/.openclaw/workspace/projects/.step 0.1: scan input input: user-provided story script (text, markdown, or file path). output: confirmation that script exists and is readable (non-empty file or pasted text). check for: story script (required), series name (optional), episode id (optional), series description (optional).
step 0.2: completeness scoring input: script scan results. output: pass/fail verdict on required materials. logic:
step 0.3: create project directory input: series name, episode id, script content. output: project directory structure created and script copied into input folder. actions:
PROJECT_NAME = "picture-book-" + date(YYYYMMDD) + "-" + slugify(first-20-chars-of-script)
PROJECT_DIR = workspace_base + "/projects/" + PROJECT_NAME
create directories: input/, scenes/, output/, .temp/
copy user script to input/<script_filename>
step 0.4: blocking gate confirmation input: project directory created, script validated. output: user acknowledges script is ready, or requests changes. action: display script summary (title, length, character count) and ask "proceed to storyboard generation?" user must reply yes to continue.
step 1.1: parse script into storyboard input: validated script from phase 0. output: markdown table with columns: scene number, narration text (chinese), scene description (for image generation), estimated duration in seconds. action: use llm to break script into logical visual scenes. each scene should be 3-8 seconds of video (typical picture book pace). number scenes 001, 002, etc. extract dialogue and narration separately.
step 1.2: confirm visual style input: storyboard scenes. output: user-selected visual style. action: present user with 3 default options:
step 1.3: generate comfyui prompts input: storyboard scenes, selected visual style. output: json file with one prompt per scene, formatted for flux text-to-image model. each prompt includes: scene number, short narrative context, visual style keywords, character descriptions if any, color palette hints, composition notes. save to: PROJECT_DIR/scene_prompts.json example prompt: "illustration in crayon children's hand-drawn style, warm colors, a magical forest with glowing trees, a young girl in red dress looking up at stars, storybook illustration, soft edges, watercolor texture, children's book art, whimsical mood"
step 1.4: save and display storyboard plan input: storyboard table, style choice, comfyui prompts. output: markdown file with full plan. save to PROJECT_DIR/scene_plan.md display to user: full storyboard table, selected style, and a note about total estimated video length (sum of all scene durations).
step 1.5: blocking approval gate input: scene_plan.md and scene_prompts.json generated. output: user approval or revision request. action: show user the storyboard and ask "approve storyboard?" if yes, proceed to phase 2. if no, ask which scenes to revise and loop back to step 1.1. do not proceed to phase 2 until user explicitly approves.
step 2.1: generate scene images via comfyui input: scene_prompts.json, comfyui url. output: png image files (1920x1080) for each scene. action:
for each scene in scene_prompts.json:
call comfyui api with prompt
save output to PROJECT_DIR/scenes/scene_<number>.png
log hash and generation time
if generation fails, retry up to 2 times with same prompt
if still fails, log error and skip to next scene
edge case: if comfyui is unreachable, hard stop with error "comfyui at [url] is not responding. start comfyui and retry phase 2." edge case: if image generation timeout exceeds 5 minutes per scene, retry up to 2 times then skip.
step 2.2: generate cover image input: series name, episode id, title extracted from script, qiqi character image path. output: 1920x1080 png cover image with branding. action:
python3 <SKILL_DIR>/scripts/stage_cover.py \
--output PROJECT_DIR/scenes/cover.png \
--title "<title from script>" \
--subtitle "<series name>" \
--episode-id "<episode id>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--width 1920 --height 1080
if qiqi image is missing, generate cover without character image but still include text branding.
step 2.3: validate scene images input: all generated scene pngs in scenes/ directory. output: validation report (passed or failed scenes). action:
step 3.1: extract narration text input: storyboard from phase 1. output: full chinese narration text (concatenated from all scene narration fields), full english translation. action: use llm to translate each scene's narration from chinese to english. maintain character voice consistency.
step 3.2: generate chinese tts input: chinese narration text. output: wav audio file and srt subtitle file with timing. action: attempt qwen3-tts first:
python3 ~/.openclaw/workspace/skills/tts-qwen3/scripts/qwen_tts.py \
--text "<chinese narration full text>" \
--voice "narrator_teacher" \
--output PROJECT_DIR/narration_cn.wav \
--fallback-edge true \
--rate 1.0
if qwen3-tts fails (gpu unavailable, package missing, timeout > 10 mins), fall back to edge-tts:
python3 <SKILL_DIR>/scripts/tts.py \
--text "<chinese narration full text>" \
--output PROJECT_DIR/narration_cn.mp3 \
--srt PROJECT_DIR/narration_cn.srt \
--voice "zh-CN-XiaoyiNeural" \
--rate "-15%"
voice character mapping (qwen3-tts only):
edge cases: if narration text exceeds 5000 characters, split into chunks (max 2000 chars per request) and concatenate audio files with 0.5s silence between. edge case: if tts service is rate-limited, wait 30s and retry up to 3 times.
step 3.3: generate english tts input: english narration text (translated in step 3.1). output: mp3 audio file and srt subtitle file with timing. action:
python3 <SKILL_DIR>/scripts/tts.py \
--text "<english narration full text>" \
--output PROJECT_DIR/narration_en.mp3 \
--srt PROJECT_DIR/narration_en.srt \
--voice "en-US-JennyNeural" \
--rate "-15%"
edge cases: same as step 3.2 (text length, rate limits, retry logic).
step 3.4: generate chinese ass subtitles input: narration_cn.srt file from step 3.2. output: ass subtitle file with styling. action:
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
--srt PROJECT_DIR/narration_cn.srt \
--output PROJECT_DIR/subtitles_cn.ass \
--font-size 80 \
--font-name "Microsoft YaHei" \
--color "FFFFFF" \
--outline-color "000000" \
--outline-width 2.0
edge case: if srt has lines longer than 24 characters, script automatically splits across two subtitle lines.
step 3.5: generate english ass subtitles input: narration_en.srt file from step 3.3. output: ass subtitle file with styling. action:
python3 <SKILL_DIR>/scripts/srt_to_ass.py \
--srt PROJECT_DIR/narration_en.srt \
--output PROJECT_DIR/subtitles_en.ass \
--font-size 80 \
--font-name "Arial" \
--color "FFFFFF" \
--outline-color "000000" \
--outline-width 2.0
step 4.1: composite chinese video input: scene images (cover.png, scene_*.png), narration_cn.mp3, subtitles_cn.ass. output: final mp4 video file (h.264 + aac). action:
python3 <SKILL_DIR>/scripts/pipeline.py \
--scenes-dir PROJECT_DIR/scenes/ \
--audio PROJECT_DIR/narration_cn.mp3 \
--ass PROJECT_DIR/subtitles_cn.ass \
--output PROJECT_DIR/output/<episode_id>_cn.mp4 \
--title "<title>" \
--subtitle "<series_name>" \
--episode-id "<episode_id>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--cover-duration 4.0 \
--fade-duration 0.8 \
--fps 24 \
--bitrate "5000k"
edge case: if final video duration does not match audio duration within 2 seconds, log warning and check frame drops. edge case: if video file size < 10MB, likely generation failed; check ffmpeg logs.
step 4.2: composite english video input: scene images (cover.png, scene_*.png), narration_en.mp3, subtitles_en.ass. output: final mp4 video file (h.264 + aac). action:
python3 <SKILL_DIR>/scripts/pipeline.py \
--scenes-dir PROJECT_DIR/scenes/ \
--audio PROJECT_DIR/narration_en.mp3 \
--ass PROJECT_DIR/subtitles_en.ass \
--output PROJECT_DIR/output/<episode_id>_en.mp4 \
--title "<english_title>" \
--subtitle "<english_series_name>" \
--episode-id "<episode_id>" \
--brand "琪琪的魔法故事屋" \
--qiqi "~/.openclaw/workspace/characters/qiqi_default.png" \
--cover-duration 4.0 \
--fade-duration 0.8 \
--fps 24 \
--bitrate "5000k"
(same edge cases as step 4.1)
step 4.3: validate final videos input: both mp4 files in output/ directory. output: validation report. action:
step 4.4: generate douyin publishing metadata input: script title, series name, episode id. output: markdown file with chinese and english douyin publishing descriptions. save to: PROJECT_DIR/douyin_publish.md format:
# 抖音发布描述
## 中文版
- 标题:<title>|<series_name>
- 描述:<2-3 sentence story summary in chinese>
- 话题:#儿童故事 #<series_name> #睡前故事 #绘本动画
## 英文版
- 标题:<english_title>|<english_series_name>
- 描述:<2-3 sentence story summary in english>
- 话题:#英语启蒙 #磨耳朵英语 #<series_name> #儿童英语
if no script provided at phase 0: hard stop. return error "no story script found. please paste or upload a picture book script (markdown or text file) to begin." user must provide script; no defaults exist.
if series name is missing: use default "琪琪的魔法故事屋". no user intervention required.
if episode id is missing: auto-generate as "EP-
if script validation passes but storyboard generation fails: halt at phase 1. ask user to clarify story structure or provide a more detailed script. do not proceed to phase 2.
if user does not approve storyboard at phase 1.5: loop back to step 1.1. ask which scenes to revise. regenerate those scenes and re-display for approval. do not proceed to phase 2 until explicit user approval received.
if comfyui is unreachable at phase 2.1: hard stop with error "comfyui at [url] not responding. ensure comfyui is running at http://127.0.0.1:8188 and retry." user must start comfyui before phase 2 can retry.
if scene image generation fails for > 50% of scenes: halt phase 2 and alert user. check comfyui logs for errors (oom, invalid prompts, etc.). ask user to retry or provide alternative prompts.
if qwen3-tts is unavailable at phase 3.2: automatically fall back to edge-tts without user interaction. log fallback event. both tts engines produce usable output; no blocking gate here.
if tts text exceeds 5000 characters: split into chunks, generate audio per chunk, concatenate with 0.5s silence. this is automatic; no user approval needed.
if tts generation times out (> 10 mins per chunk): retry up to 3 times with 30s delay between retries. if still fails after 3 retries, hard stop and ask user to check tts service status or try again later.
if final video validation fails: alert user with specific error (file size, duration, codec mismatch, etc.). offer options: retry phase 4, or debug specific phase (re-run step 3 audio or phase 2 scenes). user chooses next action.
if final video file size < 10MB: likely indicates encoding error. log ffmpeg stderr and halt. ask user to check disk space and ffmpeg installation.
success means all of the following exist in PROJECT_DIR/output/:
all file paths are absolute or relative to PROJECT_DIR. metadata in douyin_publish.md is plain text (no special formatting required).
user sees:
✅ video generation complete!
outputs:
- chinese video: [PROJECT_DIR]/output/<episode_id>_cn.mp4
- english video: [PROJECT_DIR]/output/<episode_id>_en.mp4
- douyin metadata: [PROJECT_DIR]/douyin_publish.md
ready for douyin publishing. use douyin-browser-publish skill to upload both videos.
both mp4 files are playable in standard video players (vlc, ffplay, browser). user can immediately preview final videos without additional processing. video durations, audio sync, and subtitle timing are visually confirmed in preview.
| problem | check | solution |
|---|---|---|
| comfyui not responding at phase 2 | is comfyui running? check http://127.0.0.1:8188 in browser. | start comfyui: cd ~/ComfyUI && LD_LIBRARY_PATH=~/comfyui-venv/lib/python3.12/site-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH ~/comfyui-venv/bin/python main.py --listen 127.0.0.1 --port 8188 |
| scene generation timeouts | comfyui slow or oom. check comfyui logs for "out of memory". | reduce prompt complexity or wait for comfyui to free vram. retry phase 2. |
| tts fails at phase 3 | network issue or tts service down. check internet connectivity. | retry phase 3 up to 3 times. if edge-tts timeout > 5 mins, check edge-tts package version (edge-tts --version). |
| text split across two subtitle lines incorrectly | srt_to_ass.py max_chars setting (default 24). | check script and increase max_chars if needed. regenerate subtitles. |
| final video has no audio | audio file missing or ffmpeg did not mux audio. check PROJECT_DIR/narration_cn.mp3 exists and is readable. | verify ffmpeg version >= 5.0. retry phase 4. |
| final video duration does not match audio | frame drops or scene timing issue. check scene images are all 1920x1080 and present. | check ffmpeg logs in .temp/ directory. retry phase 4 with verbose logging. |
| video file size very small (< 5MB) | encoding failed or ffmpeg crashed. check disk space. | verify ffmpeg installed correctly. check .temp/ for partial files. retry phase 4. |
| qiqi character image not found | missing file at ~/.openclaw/workspace/characters/qiqi_default.png. | provide valid path or skip character overlay (cover and video will still generate). |
version: 1.1.0 | enriched for implexa quality standards | original: clawhub琪琪opc project