Phy Video Bgm

Analyze a video's mood and add AI-generated BGM. Optionally speed up/slow down. Uses Gemini for video analysis and fal.ai Lyria2 for music generation. Trigge...

installs

stars

karma

SkillRank score ↗

7.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

phy-video-bgm analyzes video mood via gemini and generates matching background music using fal.ai lyria2, then mixes it with ffmpeg. supports speed adjustment, volume control, and style override.

structure

8.0

trigger phrases

8.0

procedure

8.0

edge cases

5.0

documentation

7.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: video-bgm
description: Analyze a video's mood and add AI-generated BGM. Optionally speed up/slow down. Uses Gemini for video analysis and fal.ai Lyria2 for music generation. Triggers on "add bgm", "add music to video", "video bgm", or any request to add background music to a video file.
homepage: https://canlah.ai
metadata: {"openclaw": {"emoji": "🎵", "os": ["darwin", "linux"]}}
---

# Video BGM Skill

Analyze a video's content and mood, generate matching BGM via AI, and mix it in.

---

## Pipeline

```
Input Video → Gemini Analysis → Lyria2 BGM → FFmpeg Mix → Output
                (mood/style)     (generate)    (speed + volume + fade)
```

## Dependencies

- **Python**: `python3` (or activate your project venv if available)
- **Gemini API**: `GOOGLE_GENAI_API_KEY` (video understanding)
- **fal.ai**: `FAL_API_KEY` (Lyria2 music generation)
- **FFmpeg**: system install

## Setup

```bash
# Install required packages
pip install google-generativeai httpx

# Set API keys via environment variables
export GOOGLE_GENAI_API_KEY="your_google_api_key"
export FAL_API_KEY="your_fal_api_key"
```

## Usage

```
/video-bgm <path-to-video>
/video-bgm <path-to-video> --speed 1.1
/video-bgm <path-to-video> --speed 1.1 --volume 5
/video-bgm <path-to-video> --style "lo-fi chill"
```

## Arguments

| Arg | Default | Description |
|-----|---------|-------------|
| `path` | (required) | Path to input video file |
| `--speed` | `1.0` | Speed multiplier (e.g. 1.1 = 10% faster) |
| `--volume` | `5.0` | BGM volume multiplier (Lyria2 output is quiet) |
| `--fade-in` | `1.5` | Fade in duration in seconds |
| `--fade-out` | `3.0` | Fade out duration in seconds |
| `--style` | (auto) | Override music style (skip Gemini analysis) |

## Step-by-Step Process

### Step 1: Analyze Video with Gemini

Upload video to Gemini 2.0 Flash and get deep mood analysis.

```python
import google.generativeai as genai
import os

GOOGLE_API_KEY = os.environ.get("GOOGLE_GENAI_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)
```

Use this analysis prompt (acts as music supervisor, not generic):

```
Watch this video very carefully. You are a music supervisor for commercials.

Tell me:
1. What BRAND POSITIONING does this video convey? (luxury? affordable? aspirational?)
2. What is the EMOTIONAL JOURNEY of the viewer? Be specific at each moment.
3. What REAL commercial music references would fit? Name specific ad styles
   (Four Seasons resort? Apple reveal? Volvo? Pottery Barn? Nike?)
4. What is the ENERGY LEVEL? Contemplative/still or forward momentum?
5. What tempo, instruments, and production style would ACTUALLY work?
   Be honest - classical piano is often too stuffy. Consider modern alternatives.

Then provide a SINGLE music generation prompt (2-3 sentences) that captures
the ideal BGM. Focus on: instruments, tempo BPM, mood adjectives, production style.
Format: MUSIC_PROMPT: <your prompt here>
```

### Step 2: Extract the MUSIC_PROMPT

Parse the Gemini response to find the `MUSIC_PROMPT:` line. This becomes the Lyria2 prompt.

Always APPEND these constraints to any Lyria2 prompt:
```
No vocals, no drums, no percussion hits, no sound effects.
```

### Step 3: Generate BGM via fal.ai Lyria2

```python
import httpx
import os

FAL_API_KEY = os.environ.get("FAL_API_KEY")

resp = httpx.post(
    "https://fal.run/fal-ai/lyria2",
    headers={
        "Authorization": f"Key {FAL_API_KEY}",
        "Content-Type": "application/json",
    },
    json={"prompt": music_prompt},
    timeout=120.0,
)
audio_url = resp.json()["audio"]["url"]
```

Lyria2 generates ~32s of audio. Output is WAV, 48kHz stereo.

### Step 4: FFmpeg — Speed + Strip Audio + Mix BGM

```bash
# Step 4a: Speed up video (if requested) and strip any existing audio
ffmpeg -y -i INPUT.mp4 \
  -filter:v "setpts=PTS/{speed}" \
  -an \
  -c:v libx264 -preset medium -crf 18 \
  OUTPUT_speedup.mp4

# Step 4b: Get sped-up duration
DURATION=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 OUTPUT_speedup.mp4)

# Step 4c: Mix BGM with volume boost, fade in/out, trim to video length
ffmpeg -y \
  -i OUTPUT_speedup.mp4 \
  -i bgm.wav \
  -filter_complex "[1:a]volume={volume},atrim=0:{duration},afade=t=in:st=0:d={fade_in},afade=t=out:st={duration-fade_out}:d={fade_out}[a]" \
  -map 0:v -map "[a]" \
  -c:v copy -c:a aac -b:a 192k \
  -shortest \
  OUTPUT_final.mp4
```

### Step 5: Open Result

Open the final video for user review. Also open the BGM separately so they can evaluate the music alone.

## Key Learnings

- **Lyria2 output is VERY QUIET** — always use volume multiplier (default 5.0)
- **Don't over-specify the Lyria2 prompt** — "three acts" style prompts produce chaotic results. Keep it to instruments + tempo + mood + style reference.
- **Let Gemini act as music supervisor** — it gives much better style recommendations than generic "relaxing piano" defaults
- **Always strip existing audio first** (`-an`) — some videos have unwanted audio tracks
- **Classical piano is usually wrong** — for luxury/lifestyle, modern minimalist (Rhodes, guitar, cello) works better

## Output Files

All files are saved next to the input video:
```
input_video.mp4          → original
input_video_speedup.mp4  → sped up, no audio
input_video_bgm.wav      → generated BGM
input_video_final.mp4    → final output with BGM
```

## Examples

```bash
# Basic: analyze and add BGM
/video-bgm ~/Desktop/product_video.mp4

# Speed up 10% and add BGM
/video-bgm ~/Desktop/product_video.mp4 --speed 1.1

# Override style (skip Gemini analysis)
/video-bgm ~/Desktop/ad.mp4 --style "upbeat modern pop, synth pads, 100 BPM"

# Adjust volume
/video-bgm ~/Desktop/quiet_video.mp4 --volume 8
```

---

## Author

**[Canlah AI](https://canlah.ai)** — Run performance marketing without breaking your brand.

- GitHub: [github.com/PHY041](https://github.com/PHY041)
- All Skills: [clawhub.ai/PHY041](https://clawhub.ai/PHY041)

don't have the plugin yet? install it then click "run inline in claude" again.

intent

use gemini's video understanding to analyze a video's mood, brand positioning, and emotional journey, then generate matching background music via fal.ai lyria2, mix it in with ffmpeg (with optional speed adjustment), and output a final video with bgm. use this skill when you need to add contextually appropriate background music to a video without manually composing or licensing tracks.

inputs

required parameters:

path: file path to input video (mp4, mov, or other ffmpeg-compatible format)

optional parameters:

--speed: speed multiplier for video playback (default 1.0, e.g. 1.1 = 10% faster, 0.9 = 10% slower)
--volume: bgm volume multiplier (default 5.0, since lyria2 output is quiet)
--fade-in: fade-in duration in seconds (default 1.5)
--fade-out: fade-out duration in seconds (default 3.0)
--style: override music style and skip gemini analysis (optional string, e.g. "upbeat modern pop, synth pads, 100 bpm")

external connections:

gemini api: requires GOOGLE_GENAI_API_KEY env var. needs video understanding capability (gemini 2.0 flash or later). oauth not required, api key auth only.
fal.ai lyria2: requires FAL_API_KEY env var. music generation model. https://fal.run/fal-ai/lyria2. api key auth only.
ffmpeg: system binary for video encoding/mixing. must be installed and in system path.

system dependencies:

python 3.8+
pip packages: google-generativeai, httpx
ffmpeg (system install, check with ffmpeg -version)
ffprobe (comes with ffmpeg)

procedure

step 1: validate inputs and set up

inputs: video file path, optional speed/volume/fade/style parameters.

check that input video file exists and is readable.
verify GOOGLE_GENAI_API_KEY and FAL_API_KEY env vars are set.
verify ffmpeg and ffprobe are in system path.
if any check fails, exit with clear error message.

outputs: validated file path, parsed parameters (speed, volume, fade-in, fade-out, style override).

step 2: analyze video with gemini (unless style override provided)

inputs: video file path, gemini api key.

configure gemini client with api key.
upload video to gemini 2.0 flash using file upload api.
send music supervisor prompt: ask gemini to analyze (1) brand positioning, (2) emotional journey at key moments, (3) real commercial music references (e.g. Four Seasons, Apple, Nike style), (4) energy level (contemplative vs forward momentum), (5) tempo/instruments/production style. request a single 2-3 sentence music generation prompt at the end, prefixed with MUSIC_PROMPT:.
wait for gemini response (may take 10-30s depending on video length and api load).
parse response to extract the line starting with MUSIC_PROMPT:. extract everything after that prefix as the music prompt string.

outputs: music prompt string extracted from gemini response. if gemini response is malformed or times out, exit with error.

step 3: append constraints to music prompt

inputs: music prompt string from step 2 (or from style override).

append this constraint string to the prompt: "no vocals, no drums, no percussion hits, no sound effects."

outputs: final music prompt ready for lyria2.

step 4: generate bgm via fal.ai lyria2

inputs: final music prompt, fal api key, timeout 120 seconds.

make http post to https://fal.run/fal-ai/lyria2 with json body {"prompt": music_prompt} and header Authorization: Key {FAL_API_KEY}.
wait for response (may take 30-60s).
extract audio url from response json at response["audio"]["url"].
download audio file from url (lyria2 outputs ~32 seconds of 48khz stereo wav).
save bgm wav to disk next to input video with name pattern {input_stem}_bgm.wav.

outputs: bgm wav file on disk. if lyria2 times out or rate-limits, retry once after 5 second delay; if second attempt fails, exit with error.

step 5: speed up video and strip existing audio (if speed != 1.0)

inputs: input video path, speed multiplier, output path.

if speed == 1.0, skip to step 6.
if speed != 1.0, run ffmpeg: ffmpeg -y -i INPUT.mp4 -filter:v "setpts=PTS/{speed}" -an -c:v libx264 -preset medium -crf 18 OUTPUT_speedup.mp4.
this re-encodes the video at new speed and removes any existing audio track (-an flag).
save output as {input_stem}_speedup.mp4.

outputs: sped-up video file (or original video if speed == 1.0). if ffmpeg encode fails or times out, exit with error.

step 6: get duration of (possibly sped-up) video

inputs: video file path (speedup or original).

run ffprobe -v quiet -show_entries format=duration -of csv=p=0 {video_file} to get duration in seconds.
store as float variable duration.

outputs: duration in seconds. if ffprobe fails, exit with error.

step 7: mix bgm with video, apply volume/fade, trim to video length

inputs: video file (speedup or original), bgm wav file, volume multiplier, fade-in duration, fade-out duration, output path, video duration.

run ffmpeg: ffmpeg -y -i {video_file} -i {bgm_wav} -filter_complex "[1:a]volume={volume},atrim=0:{duration},afade=t=in:st=0:d={fade_in},afade=t=out:st={duration-fade_out}:d={fade_out}[a]" -map 0:v -map "[a]" -c:v copy -c:a aac -b:a 192k -shortest {output_file}.
this applies volume boost to bgm, trims bgm to video duration, applies fade-in and fade-out, mixes bgm with video, and encodes audio as aac 192k.
save output as {input_stem}_final.mp4.

outputs: final video with bgm mixed in. if ffmpeg mix fails, exit with error.

step 8: report completion and open files

inputs: final video path, bgm wav path.

print completion message with paths to final video and bgm wav.
open final video in default video player (platform-specific: open on macos, xdg-open on linux).
open bgm wav in default audio player so user can review music separately.

outputs: user can see/hear result. if open fails, log warning but do not exit.

decision points

if --style parameter is provided: skip step 2 (gemini analysis) and use the provided style string directly as the music prompt. jump to step 3 (append constraints).
if speed == 1.0: skip step 5 (re-encode video) and use original video directly in step 7. this saves encoding time and quality loss.
if video file does not exist or is unreadable: exit with error message stating file path and suggesting user check path.
if GOOGLE_GENAI_API_KEY or FAL_API_KEY env vars are missing: exit with error message listing which keys are missing and instructions to set them.
if ffmpeg or ffprobe are not in system path: exit with error message and suggest user install ffmpeg (brew install ffmpeg on macos, apt-get install ffmpeg on ubuntu).
if gemini response does not contain MUSIC_PROMPT: line: exit with error stating gemini response was malformed. suggest user try again or provide explicit style override.
if lyria2 api returns 429 (rate limit) on first attempt: retry once after 5 second exponential backoff. if second attempt also rate limits, exit with error.
if lyria2 api returns 401 (auth failure): exit with error stating FAL_API_KEY is invalid or expired.
if ffmpeg encode takes > 5 minutes for a single video: log warning and ask user if they want to cancel. continue if user approves.
if video is shorter than fade-out duration: adjust fade-out to (duration - 0.5) to ensure fade does not exceed video length.
if bgm file is empty or corrupt after download: exit with error and suggest user retry lyria2 generation.

output contract

success is defined by:

final video file at path {input_stem}_final.mp4 (or same directory as input). format is mp4 with h.264 video, aac 192k audio, video duration unchanged (or sped up/slowed down as requested), bgm mixed with fade-in/out applied.
bgm wav file at path {input_stem}_bgm.wav. format is wav, 48khz stereo, ~32 seconds duration (generated by lyria2).
intermediate files (if speed != 1.0): {input_stem}_speedup.mp4 (sped-up video with no audio).
no output if any validation or generation step fails. original video and any partial intermediate files remain untouched.
file naming: all output files saved in same directory as input video.

outcome signal

user sees completion message printed to stdout listing paths to final video and bgm wav file.
final video file opens automatically in default video player and plays with bgm mixed in, volume boosted, and fades applied.
bgm wav file opens automatically in default audio player so user can evaluate music quality separately.
user can verify

Phy Video Bgm

related skills

intent

inputs

procedure

decision points

output contract

outcome signal