Analyze a video's mood and add AI-generated BGM. Optionally speed up/slow down. Uses Gemini for video analysis and fal.ai Lyria2 for music generation. Trigge...
---
name: video-bgm
description: Analyze a video's mood and add AI-generated BGM. Optionally speed up/slow down. Uses Gemini for video analysis and fal.ai Lyria2 for music generation. Triggers on "add bgm", "add music to video", "video bgm", or any request to add background music to a video file.
homepage: https://canlah.ai
metadata: {"openclaw": {"emoji": "🎵", "os": ["darwin", "linux"]}}
---
# Video BGM Skill
Analyze a video's content and mood, generate matching BGM via AI, and mix it in.
---
## Pipeline
```
Input Video → Gemini Analysis → Lyria2 BGM → FFmpeg Mix → Output
(mood/style) (generate) (speed + volume + fade)
```
## Dependencies
- **Python**: `python3` (or activate your project venv if available)
- **Gemini API**: `GOOGLE_GENAI_API_KEY` (video understanding)
- **fal.ai**: `FAL_API_KEY` (Lyria2 music generation)
- **FFmpeg**: system install
## Setup
```bash
# Install required packages
pip install google-generativeai httpx
# Set API keys via environment variables
export GOOGLE_GENAI_API_KEY="your_google_api_key"
export FAL_API_KEY="your_fal_api_key"
```
## Usage
```
/video-bgm <path-to-video>
/video-bgm <path-to-video> --speed 1.1
/video-bgm <path-to-video> --speed 1.1 --volume 5
/video-bgm <path-to-video> --style "lo-fi chill"
```
## Arguments
| Arg | Default | Description |
|-----|---------|-------------|
| `path` | (required) | Path to input video file |
| `--speed` | `1.0` | Speed multiplier (e.g. 1.1 = 10% faster) |
| `--volume` | `5.0` | BGM volume multiplier (Lyria2 output is quiet) |
| `--fade-in` | `1.5` | Fade in duration in seconds |
| `--fade-out` | `3.0` | Fade out duration in seconds |
| `--style` | (auto) | Override music style (skip Gemini analysis) |
## Step-by-Step Process
### Step 1: Analyze Video with Gemini
Upload video to Gemini 2.0 Flash and get deep mood analysis.
```python
import google.generativeai as genai
import os
GOOGLE_API_KEY = os.environ.get("GOOGLE_GENAI_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)
```
Use this analysis prompt (acts as music supervisor, not generic):
```
Watch this video very carefully. You are a music supervisor for commercials.
Tell me:
1. What BRAND POSITIONING does this video convey? (luxury? affordable? aspirational?)
2. What is the EMOTIONAL JOURNEY of the viewer? Be specific at each moment.
3. What REAL commercial music references would fit? Name specific ad styles
(Four Seasons resort? Apple reveal? Volvo? Pottery Barn? Nike?)
4. What is the ENERGY LEVEL? Contemplative/still or forward momentum?
5. What tempo, instruments, and production style would ACTUALLY work?
Be honest - classical piano is often too stuffy. Consider modern alternatives.
Then provide a SINGLE music generation prompt (2-3 sentences) that captures
the ideal BGM. Focus on: instruments, tempo BPM, mood adjectives, production style.
Format: MUSIC_PROMPT: <your prompt here>
```
### Step 2: Extract the MUSIC_PROMPT
Parse the Gemini response to find the `MUSIC_PROMPT:` line. This becomes the Lyria2 prompt.
Always APPEND these constraints to any Lyria2 prompt:
```
No vocals, no drums, no percussion hits, no sound effects.
```
### Step 3: Generate BGM via fal.ai Lyria2
```python
import httpx
import os
FAL_API_KEY = os.environ.get("FAL_API_KEY")
resp = httpx.post(
"https://fal.run/fal-ai/lyria2",
headers={
"Authorization": f"Key {FAL_API_KEY}",
"Content-Type": "application/json",
},
json={"prompt": music_prompt},
timeout=120.0,
)
audio_url = resp.json()["audio"]["url"]
```
Lyria2 generates ~32s of audio. Output is WAV, 48kHz stereo.
### Step 4: FFmpeg — Speed + Strip Audio + Mix BGM
```bash
# Step 4a: Speed up video (if requested) and strip any existing audio
ffmpeg -y -i INPUT.mp4 \
-filter:v "setpts=PTS/{speed}" \
-an \
-c:v libx264 -preset medium -crf 18 \
OUTPUT_speedup.mp4
# Step 4b: Get sped-up duration
DURATION=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 OUTPUT_speedup.mp4)
# Step 4c: Mix BGM with volume boost, fade in/out, trim to video length
ffmpeg -y \
-i OUTPUT_speedup.mp4 \
-i bgm.wav \
-filter_complex "[1:a]volume={volume},atrim=0:{duration},afade=t=in:st=0:d={fade_in},afade=t=out:st={duration-fade_out}:d={fade_out}[a]" \
-map 0:v -map "[a]" \
-c:v copy -c:a aac -b:a 192k \
-shortest \
OUTPUT_final.mp4
```
### Step 5: Open Result
Open the final video for user review. Also open the BGM separately so they can evaluate the music alone.
## Key Learnings
- **Lyria2 output is VERY QUIET** — always use volume multiplier (default 5.0)
- **Don't over-specify the Lyria2 prompt** — "three acts" style prompts produce chaotic results. Keep it to instruments + tempo + mood + style reference.
- **Let Gemini act as music supervisor** — it gives much better style recommendations than generic "relaxing piano" defaults
- **Always strip existing audio first** (`-an`) — some videos have unwanted audio tracks
- **Classical piano is usually wrong** — for luxury/lifestyle, modern minimalist (Rhodes, guitar, cello) works better
## Output Files
All files are saved next to the input video:
```
input_video.mp4 → original
input_video_speedup.mp4 → sped up, no audio
input_video_bgm.wav → generated BGM
input_video_final.mp4 → final output with BGM
```
## Examples
```bash
# Basic: analyze and add BGM
/video-bgm ~/Desktop/product_video.mp4
# Speed up 10% and add BGM
/video-bgm ~/Desktop/product_video.mp4 --speed 1.1
# Override style (skip Gemini analysis)
/video-bgm ~/Desktop/ad.mp4 --style "upbeat modern pop, synth pads, 100 BPM"
# Adjust volume
/video-bgm ~/Desktop/quiet_video.mp4 --volume 8
```
---
## Author
**[Canlah AI](https://canlah.ai)** — Run performance marketing without breaking your brand.
- GitHub: [github.com/PHY041](https://github.com/PHY041)
- All Skills: [clawhub.ai/PHY041](https://clawhub.ai/PHY041)
don't have the plugin yet? install it then click "run inline in claude" again.
use gemini's video understanding to analyze a video's mood, brand positioning, and emotional journey, then generate matching background music via fal.ai lyria2, mix it in with ffmpeg (with optional speed adjustment), and output a final video with bgm. use this skill when you need to add contextually appropriate background music to a video without manually composing or licensing tracks.
required parameters:
path: file path to input video (mp4, mov, or other ffmpeg-compatible format)optional parameters:
--speed: speed multiplier for video playback (default 1.0, e.g. 1.1 = 10% faster, 0.9 = 10% slower)--volume: bgm volume multiplier (default 5.0, since lyria2 output is quiet)--fade-in: fade-in duration in seconds (default 1.5)--fade-out: fade-out duration in seconds (default 3.0)--style: override music style and skip gemini analysis (optional string, e.g. "upbeat modern pop, synth pads, 100 bpm")external connections:
GOOGLE_GENAI_API_KEY env var. needs video understanding capability (gemini 2.0 flash or later). oauth not required, api key auth only.FAL_API_KEY env var. music generation model. https://fal.run/fal-ai/lyria2. api key auth only.system dependencies:
ffmpeg -version)step 1: validate inputs and set up
inputs: video file path, optional speed/volume/fade/style parameters.
GOOGLE_GENAI_API_KEY and FAL_API_KEY env vars are set.outputs: validated file path, parsed parameters (speed, volume, fade-in, fade-out, style override).
step 2: analyze video with gemini (unless style override provided)
inputs: video file path, gemini api key.
MUSIC_PROMPT:.MUSIC_PROMPT:. extract everything after that prefix as the music prompt string.outputs: music prompt string extracted from gemini response. if gemini response is malformed or times out, exit with error.
step 3: append constraints to music prompt
inputs: music prompt string from step 2 (or from style override).
outputs: final music prompt ready for lyria2.
step 4: generate bgm via fal.ai lyria2
inputs: final music prompt, fal api key, timeout 120 seconds.
{"prompt": music_prompt} and header Authorization: Key {FAL_API_KEY}.response["audio"]["url"].{input_stem}_bgm.wav.outputs: bgm wav file on disk. if lyria2 times out or rate-limits, retry once after 5 second delay; if second attempt fails, exit with error.
step 5: speed up video and strip existing audio (if speed != 1.0)
inputs: input video path, speed multiplier, output path.
ffmpeg -y -i INPUT.mp4 -filter:v "setpts=PTS/{speed}" -an -c:v libx264 -preset medium -crf 18 OUTPUT_speedup.mp4.-an flag).{input_stem}_speedup.mp4.outputs: sped-up video file (or original video if speed == 1.0). if ffmpeg encode fails or times out, exit with error.
step 6: get duration of (possibly sped-up) video
inputs: video file path (speedup or original).
ffprobe -v quiet -show_entries format=duration -of csv=p=0 {video_file} to get duration in seconds.duration.outputs: duration in seconds. if ffprobe fails, exit with error.
step 7: mix bgm with video, apply volume/fade, trim to video length
inputs: video file (speedup or original), bgm wav file, volume multiplier, fade-in duration, fade-out duration, output path, video duration.
ffmpeg -y -i {video_file} -i {bgm_wav} -filter_complex "[1:a]volume={volume},atrim=0:{duration},afade=t=in:st=0:d={fade_in},afade=t=out:st={duration-fade_out}:d={fade_out}[a]" -map 0:v -map "[a]" -c:v copy -c:a aac -b:a 192k -shortest {output_file}.{input_stem}_final.mp4.outputs: final video with bgm mixed in. if ffmpeg mix fails, exit with error.
step 8: report completion and open files
inputs: final video path, bgm wav path.
outputs: user can see/hear result. if open fails, log warning but do not exit.
if --style parameter is provided: skip step 2 (gemini analysis) and use the provided style string directly as the music prompt. jump to step 3 (append constraints).
if speed == 1.0: skip step 5 (re-encode video) and use original video directly in step 7. this saves encoding time and quality loss.
if video file does not exist or is unreadable: exit with error message stating file path and suggesting user check path.
if GOOGLE_GENAI_API_KEY or FAL_API_KEY env vars are missing: exit with error message listing which keys are missing and instructions to set them.
if ffmpeg or ffprobe are not in system path: exit with error message and suggest user install ffmpeg (brew install ffmpeg on macos, apt-get install ffmpeg on ubuntu).
if gemini response does not contain MUSIC_PROMPT: line: exit with error stating gemini response was malformed. suggest user try again or provide explicit style override.
if lyria2 api returns 429 (rate limit) on first attempt: retry once after 5 second exponential backoff. if second attempt also rate limits, exit with error.
if lyria2 api returns 401 (auth failure): exit with error stating FAL_API_KEY is invalid or expired.
if ffmpeg encode takes > 5 minutes for a single video: log warning and ask user if they want to cancel. continue if user approves.
if video is shorter than fade-out duration: adjust fade-out to (duration - 0.5) to ensure fade does not exceed video length.
if bgm file is empty or corrupt after download: exit with error and suggest user retry lyria2 generation.
success is defined by:
final video file at path {input_stem}_final.mp4 (or same directory as input). format is mp4 with h.264 video, aac 192k audio, video duration unchanged (or sped up/slowed down as requested), bgm mixed with fade-in/out applied.
bgm wav file at path {input_stem}_bgm.wav. format is wav, 48khz stereo, ~32 seconds duration (generated by lyria2).
intermediate files (if speed != 1.0): {input_stem}_speedup.mp4 (sped-up video with no audio).
no output if any validation or generation step fails. original video and any partial intermediate files remain untouched.
file naming: all output files saved in same directory as input video.