creatorauto-generated · unproven

How do I turn a video (or topic) + a headshot into 3-5 on-brand, high-CTR YouTube thumbnails the way a top agency would: study the channel's winning style, pick the sharpest hook, generate a cinematic background, cut out the real person, and composite crisp bold text + badges as true layers

the agent that answers this

Raw image to viral thumbnail. Turn a video (or topic) + a headshot into 3-5 on-brand, high-CTR YouTube thumbnails the way a top agency would: study the channel's winning style, pick the sharpest hook, generate a cinematic background, cut out the real person, and composite crisp bold text + badges as true layers.

1 on a scheduleUpdated Jun 14v2

Cost: Free; on your own plan
Runs: On demand; or scheduled
Built from: 11 steps; 2 verified skills
Runs in: Claude or Codex; as you

what you get

3-5 finished 1280x720 thumbnails (background + real face cut-out + razor-sharp Impact text + badges/arrows), each a distinct hook, ready to pick and publish.

example result

illustrative

A finished run delivers: 3-5 finished 1280x720 thumbnails (background + real face cut-out + razor-sharp Impact text + badges/arrows), each a distinct hook, ready to pick and publish.

Example output, to show the shape of the deliverable. It has not run on your data. Build the agent to get this on yours.

the steps

11 steps · 2 from verified skills

Step 1tool
Preflight: ensure tooling is ready. Free CLIs (auto-install if missing): Remotion (npx), ffmpeg, rembg via `uvx --from "rembg[cpu,cli]" rembg ...` (downloads u2net model first run; the [cli] extra alone fails, use [cpu,cli]). Paid: Runway ML API key (dev.runwayml.com). NOTE: gen4_image promptText must be condensed (~500 chars) or it 400s; Impact font resolves on macOS Chromium for the text layer. If a paid key is missing, stop and tell the user what to get + the cheaper alternative.
Your model fills this step
- No integration? npm i remotion
- No integration? uvx rembg[cpu,cli]
Step 2decision
Confirm the inputs: the source video (or topic) this thumbnail is for, the subject headshot to composite, and 2-3 of the channel's best existing thumbnails to match the house style.
Decision step
Step 3
Study the channel's existing thumbnails + this video's transcript/topic. Extract the repeatable viral pattern (bold split/contrast layout, ONE huge number-or-word hook, the face cut-out, SOLD/check/cross badges, arrows, white+yellow heavy Impact text) AND the single sharpest curiosity-gap hook for THIS video.
youtube-video-analyst — an installable skill for AI agents, published by shipshitdev/library.
Forensic deconstruction of YouTube videos to extract viral formulas, hooks, and retention mechanics. Analyzes video transcripts across 11 systematic sections: hook architecture, structural blueprint, retention mechanics, emotional…
Full skill: youtube-video-analyst
Step 4decision
Generate 3-5 distinct thumbnail concepts (each = layout + a 3-6 word hook + accent color), scored for click-through: curiosity gap, contrast/legibility at small size, emotional face, on-brand. Pick the top concepts to produce.
Decision step
Step 5decision
PROMPT AGENT: for each chosen concept, write (a) the background image prompt — photoreal cinematic scene, NO text, NO faces, brand motif (e.g. compass), a clean/darker band reserved for the face composite, condensed to ~500 chars; and (b) the exact text + badge/arrow spec (strings, fonts, colors, positions, shadows).
Decision step
Step 6tool
Generate each background image via the Runway image API (gen4_image, 1920x1080, no text/no face) and download locally.
Your model fills this step
- No integration? Nano Banana / gemini_2.5_flash via Runway
- No integration? gen4_image_turbo with a brand reference image
- No integration? hand-built gradient/stock background
Step 7tool
Cut the subject headshot to a clean transparent PNG (rembg) and cache it for reuse on future thumbnails.
Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transpa...
--- name: hyperframes-media description: Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net). Use when…
Full skill: Hyperframes Media
- No integration? rembg via uvx --from "rembg[cpu,cli]"
- No integration? remove.bg / Photoroom
- No integration? ffmpeg chromakey on a solid backdrop
Step 8tool
FACE TOUCH-UP — always produce BOTH treatments and carry both forward: (a) FREE ffmpeg color-grade on the cut-out (contrast + warmth + saturation + sharpen, preserve alpha) — identity-perfect, $0; (b) Nano Banana (gemini_2.5_flash) image-to-image relight using the headshot as a reference image — studio relight + warm rim/edge light + clean skin retouch, KEEP the exact same face/identity/outfit — then rembg the relit result. The graded face is the safe default; the relit face often reads better/more 'pro' on YouTube.
Your model fills this step
- No integration? if Runway is unavailable or the relit likeness drifts, ship the free-graded face only
Step 9tool
Composite in Remotion (1280x720 still): background + face cut-out + bold Impact text (white + yellow, heavy drop shadow) + badges/arrows as crisp layers. Render BOTH a graded-face and a relit-face version of each chosen concept.
Your model fills this step
- No integration? Pillow/Canvas or HTML-to-screenshot if Remotion is unavailable
Step 10decision
Quality gate: critique each thumbnail as a domain expert, a skeptic, and the end user — legible at small size, on-brand, real likeness preserved (especially the relit face), text crisp. Fix clear problems in place.
Decision step
Step 11decision
Present the variants side by side — each hook in BOTH the free-graded and the AI-relit face treatment — for the creator to pick and tweak (hook/colors/face pose/badges) before publishing. No auto-publish.
Decision step

runs hands-free with

Claude for Chrome
pulls live data from sites that have no API (an MLS, Zillow, your CRM web UI) and clicks through web tasks for you, so the data-gathering steps run themselves instead of leaving placeholders to fill

Connect these and the agent gathers its own data and delivers on a schedule, instead of leaving you blanks to fill.

Run this agent in Implexa, on your own Claude or Codex, free

Get the Implexa app (or connect your Claude Code / Codex), then say build the Raw image to viral thumbnail agent and approve the schedule. It runs as you, on your real data, on the subscription you already pay for, and gets sharper each run. Your agent's memory is yours and travels with you across Claude, Codex, and whatever comes next. About 5 minutes to your first real run.

Run this agent in Implexa →

common questions

How do I turn a video (or topic) + a headshot into 3-5 on-brand, high-CTR YouTube thumbnails the way a top agency would: study the channel's winning style, pick the sharpest hook, generate a cinematic background, cut out the real person, and composite crisp bold text + badges as true layers?

The Raw image to viral thumbnail agent. 3-5 finished 1280x720 thumbnails (background + real face cut-out + razor-sharp Impact text + badges/arrows), each a distinct hook, ready to pick and publish.

Is the Raw image to viral thumbnail agent free?

Yes. It runs on the Claude or Codex subscription you already pay for, so there is no extra AI bill and no per-run charge. You can build and run unlimited agents on the free plan.

How often does the Raw image to viral thumbnail agent run?

You choose: run it on demand, or put it on a schedule (hourly, daily, weekly). Once scheduled it runs unattended, as you, on your own machine.

What does the Raw image to viral thumbnail agent need to run?

Install Implexa into your Claude or Codex, then connect Claude for Chrome so it can gather its own data and deliver hands-free. Implexa never touches your accounts or credentials.

Does the Raw image to viral thumbnail agent use my data? Is it private?

It runs as you, on your own machine, on your real data. The model runs inside your own Claude or Codex, so Implexa never sees your data, accounts, or credentials. Your agent's memory is yours and travels with you across Claude, Codex, and whatever comes next.

How do I build the Raw image to viral thumbnail agent?

Install Implexa into your Claude or Codex, then say "build the Raw image to viral thumbnail agent" and approve the schedule. Implexa assembles the 11 steps (2 from verified skills) and it runs on its own. About 5 minutes to your first real run.

Can I change what the Raw image to viral thumbnail agent does?

Yes. Tell it what to change in plain language and it revises its steps; the next scheduled run uses the change, with no re-scheduling. Every change is versioned, and a run can even propose its own improvements.

changelog

v2Jun 14manual
added 1 step; rebound step 1 (skills.sh/firebase-basics -> gap), step 9 (decision -> gap)
v1Jun 14generated
auto-generated from "Turn a video (or topic) + a headshot into 3-5 on-brand, high-CTR YouTube thumbna"

Agents are alive: every change is a version, and a run can propose improvements that get reviewed and applied.