Extract a clean plain-text transcript from existing YouTube captions — native Node.js, zero npm dependencies. Use when the user asks to summarize, quote, or...
--- name: youtube-transcript-native-node description: Extract a clean plain-text transcript from existing YouTube captions — native Node.js, zero npm dependencies. Use when the user asks to summarize, quote, or extract captions/transcript text from a YouTube URL. Wraps the `yt-dlp` binary on PATH; writes subtitles to a temp dir, parses .vtt captions, strips timestamps/HTML tags, and prints clean text or JSON. No API keys required. --- # YouTube Transcript (Native Node) Version: 1.1.1 / publishable utility. Minimal YouTube caption extractor. Native Node.js, zero npm dependencies, wraps the external `yt-dlp` binary. ## Risk / invocation class Risk class: **external binary wrapper / YouTube network access / third-party content**. Use deliberately. This skill does not call a web API directly, but `yt-dlp` talks to YouTube and the local environment owns the `yt-dlp` PATH/binary supply-chain trust boundary. ## Input packet Required: - `url`: full YouTube URL from the user. - `goal`: raw transcript, summary input, quote extraction, timestamped notes, or JSON handoff. - `privacy_sensitivity`: normal, private/client, or unknown. - `language`: default `en` unless another language is requested. Optional: - `timestamps`: needed or not. - `json`: needed for downstream tool use. - `dedup_preference`: default auto-caption rolling-window dedup, or `--no-dedup` to preserve rolling-window/repeated-phrase artifacts as much as possible. Exact consecutive duplicate cue text may still be collapsed during VTT parsing. - `output_destination`: chat summary, saved file, downstream summarizer, etc. Stop or ask before use if the video/context is private or client-sensitive and sending access to YouTube via `yt-dlp` is not appropriate. ## Output packet Return compactly: - source YouTube URL - language requested and whether timestamps/JSON were used - transcript status: success, no captions, dependency missing, private/blocked/rate-limited, or failed - whether captions appear auto-generated when known - saved path if the transcript was separately written to a file - concise transcript summary or excerpt, unless the user requested raw text - caveats and next safe step ## Security behavior - Accepts only `http(s)` YouTube URLs on `youtube.com`, `www.youtube.com`, `m.youtube.com`, or `youtu.be`. - Validates `--lang` as a simple subtitle language code before invoking `yt-dlp`. - Spawns `yt-dlp` with an argv array and no shell; it does not execute user-provided commands. - Bounds the subprocess with a 120-second timeout. - Creates and removes a temporary subtitle directory under the OS temp path. - Refuses to print transcripts larger than 2,000,000 characters. - Reads no API keys, env secrets, or credential/config files. - Static-analysis `child_process` warnings are expected because this skill intentionally wraps trusted `yt-dlp`. ## When to use Use this when: - the user provides a YouTube URL and wants spoken text/captions; - clean plain text is needed for summarization, search, or quoting; - the video has creator-uploaded subtitles or auto-generated captions. Do not use this when: - the user expects actual audio transcription; this extracts existing captions only; - the platform is not YouTube; - the video is a live stream that has not ended; - the video/content is privacy-sensitive and should not be accessed via YouTube/yt-dlp; - `yt-dlp` is not installed/on PATH and installing it has not been approved. ## Commands Script: `scripts/fetch.mjs` ```powershell node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --lang es node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json node "<skill-dir>\scripts\fetch.mjs" --help ``` For all flags, dedup details, output formats, dependency notes, and troubleshooting, load `references/youtube-transcript-contract.md`. ## Operating guidance - Pass the full user-provided YouTube URL; do not invent/transform URL forms unnecessarily. - Default to `--lang en` unless another language is clear. - Use default plain text for direct human reading and summaries. - Use `--json` as the default structured handoff for research triage, summarization, and downstream tooling. - Use `--timestamps` only when timestamped notes, quote traceability, or debugging are needed; it is an advanced/evidence mode, not the recommended default for reading. - Use `--json --timestamps` only for machine traceability workflows that need timestamp anchors inside JSON; it is not intended as a human-readable inspection format. - Save long transcripts to a file when useful; do not paste giant transcripts unless requested. - Summarize first and quote sparingly by default. - Respect copyright and platform terms; do not republish long/full transcripts unless the user has rights or permission. - Note that captions may be auto-generated and imperfect. ## Required checks before publishing/updating Minimum no-video/no-network checks: ```powershell node --check skills\youtube-transcript-native-node\scripts\fetch.mjs node skills\youtube-transcript-native-node\scripts\fetch.mjs --help node skills\youtube-transcript-native-node\scripts\fetch.mjs --url "https://example.com/watch?v=not-youtube" --json ``` The invalid-host smoke should fail before invoking `yt-dlp`. Optional environment check: ```powershell yt-dlp --version ``` Do not install/update `yt-dlp` as part of this skill without explicit approval. ## Public / ClawHub exposure Classification: **publishable utility with external binary + YouTube access**. Before public update, run sanitizer/static checks and ensure docs clearly disclose: - `yt-dlp` dependency and PATH/binary trust boundary; - YouTube-only URL allowlist; - no API keys/env secrets/config reads; - temp-directory behavior; - no audio/video download and no audio transcription; - expected `child_process` static-analysis warning. Respect copyright and platform terms in examples, docs, and outputs: prefer summaries and brief quotes; do not publish long/full third-party transcripts unless rights or permission are clear. Do not include private/internal/client strategy, operator-specific operational notes, or full third-party transcript samples in a public release. _Last reviewed: 2026-05-25_
don't have the plugin yet? install it then click "run inline in claude" again.
by @clawhub
restructured into implexa six-component format, extracted implicit decision logic for privacy gating and error handling, documented yt-dlp as external dependency with setup notes, added edge cases for rate limits, auth failures, empty captions, and output size limits, preserved original procedure faithfully.
Extract a clean plain-text transcript from existing YouTube captions using native Node.js with zero npm dependencies. Wraps the yt-dlp binary on PATH, writes subtitles to a temp dir, parses .vtt captions, strips timestamps and HTML tags, and outputs clean text or JSON.
Use this skill when the user provides a YouTube URL and needs spoken text or captions extracted for summarization, quoting, searching, or downstream processing. The skill handles creator-uploaded subtitles and auto-generated captions without requiring API keys. It does not perform actual audio transcription; it extracts only existing captions already published on the video. Use it for plain-text summaries, research triage with JSON output, timestamped notes, or quote extraction.
Required:
url: full YouTube URL from user (http(s) only; youtube.com, www.youtube.com, m.youtube.com, or youtu.be hosts only).goal: intended use case (raw transcript, summary input, quote extraction, timestamped notes, or JSON handoff for downstream tools).Optional:
language: subtitle language code (defaults to en). validated as a simple language code before execution.timestamps: boolean; include timecodes in output. default false. use only when timestamped notes, quote traceability, or debugging are needed.json: boolean; output structured JSON instead of plain text. default false. use as default structured handoff for research triage and downstream tooling.dedup_preference: string; auto (default, rolling-window dedup) or --no-dedup (preserve repeated-phrase artifacts). exact consecutive duplicate cue text may still collapse during VTT parsing.output_destination: string; chat summary, saved file, downstream summarizer, etc.privacy_sensitivity: enum; normal, private/client, or unknown. use to gate execution on sensitive content.External dependency:
yt-dlp binary on PATH. not installed or updated by this skill; must be pre-installed with explicit approval. see "Troubleshooting" below if missing.validate and parse the YouTube URL. accept only http(s) scheme. reject non-YouTube hosts (youtube.com, www.youtube.com, m.youtube.com, youtu.be only). extract video ID. output: validated URL string or validation error.
check privacy sensitivity. if privacy_sensitivity is private/client or unknown, ask user explicitly: "this will send the URL to YouTube via yt-dlp; do you approve?" stop if user declines. output: approval signal or rejection.
validate language code. if language provided, check it matches a simple subtitle language pattern (e.g., en, es, fr, zh-CN). reject invalid codes. output: validated language string or error.
create temporary subtitle directory. use OS temp path. output: temp directory path string.
invoke yt-dlp subprocess. construct argv array with --write-auto-subs, --skip-download, --sub-format vtt, --lang {language}, and --output {temp-dir}/%(id)s. spawn with 120-second timeout. do not use shell. output: subprocess exit code, stdout, stderr.
handle subprocess failures. if exit code nonzero, check stderr for known patterns: "private video", "video not found", "rate limited", "network error", or missing yt-dlp binary. map to human-readable status. output: error status string or transcript file path.
parse .vtt file. read generated subtitle file from temp dir. split on WEBVTT header and cue blocks. extract cue text (skip timecode lines and HTML tags like <v>, <c>, </c>). output: array of text strings or parse error.
apply dedup logic. if dedup_preference is auto, use rolling-window dedup to remove consecutive duplicate cue text. if --no-dedup, preserve rolling-window repeats. output: deduplicated or raw cue array.
enforce output size limit. if joined transcript exceeds 2,000,000 characters, truncate and flag in output. output: truncated transcript string or flag.
format output. if json is true, build object with keys: url, language, timestamps, captions_auto_generated (boolean if detectable), transcript (string or array of cue objects with text and optional start_time, end_time), status, saved_path (if written to file). if json is false and timestamps is false, return plain text only. if timestamps is true, prepend timecodes to each line. output: formatted string or JSON object.
clean up temp directory. delete temp subtitle directory and all contents. output: success or cleanup warning.
return result packet. include source URL, language used, transcript status (success, no captions, dependency missing, private/blocked, rate-limited, or failed), whether captions are auto-generated, saved path if applicable, concise summary or excerpt (unless raw text requested), and any caveats. output: result packet object or string.
if privacy_sensitivity is private/client or unknown: ask user for explicit approval to access via yt-dlp before proceeding. if user declines, return early with "user declined privacy gate" status.
if yt-dlp binary is not on PATH: check stderr for "command not found" or "yt-dlp is not recognized". return status "dependency missing" with guidance: "install yt-dlp or add to PATH; do not auto-install." do not invoke skill further.
if subprocess exits nonzero: parse stderr for "private video" (return status "private/blocked"), "video not found" (return status "no captions"), "HTTP Error 429" (return status "rate-limited"), network timeout (return status "network timeout"), or generic error. return status "failed" with brief error detail.
if no .vtt file is generated: return status "no captions available" (video has no subtitles). do not fail hard; report clearly.
if json is true and timestamps is true: output transcript as array of cue objects with text, start_time (HH:MM:SS format), end_time fields. this is for machine traceability workflows only, not human-readable inspection.
if transcript exceeds 2,000,000 characters: truncate at 2,000,000 boundary and include flag "truncated": true in output. prompt user to save to file instead of pasting.
if user goal is "raw transcript": return full plain text (no summary). if goal is "summary" or "quotes", return summary or excerpts only by default, not full transcript.
Success case:
--json flag used) or plain text string.{
"url": "https://www.youtube.com/watch?v=...",
"video_id": "...",
"language": "en",
"timestamps_included": false,
"captions_auto_generated": true,
"transcript": "clean text here...",
"status": "success",
"saved_path": null,
"character_count": 5432,
"truncated": false,
"caveats": "Auto-generated captions may contain errors."
}
--timestamps, transcript field becomes array:"transcript": [
{ "text": "hello world", "start_time": "00:00:05", "end_time": "00:00:07" },
...
]
--json): returns string with optional timecode prefixes per line.Failure cases:
no captions, private/blocked, rate-limited, network timeout, dependency missing, or failed.node "<skill-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID"
node "<skill-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --lang es
node "<skill-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps
node "<skill-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json
node "<skill-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json --timestamps
node "<skill-dir>/scripts/fetch.mjs" --help
--lang en unless another language is explicit.--json as default structured handoff for research triage, summarization, and downstream tooling.--timestamps only for timestamped notes, quote traceability, or debugging; it is not the recommended default for reading.--json --timestamps only for machine traceability workflows that need timestamp anchors; not intended as human-readable format.yt-dlp as part of this skill without explicit approval.--lang as a simple subtitle language code before invoking yt-dlp.yt-dlp with an argv array and no shell; does not execute user-provided commands.child_process warnings are expected; this skill intentionally wraps trusted yt-dlp."yt-dlp: command not found" or "is not recognized": yt-dlp is not on PATH. install via pip install yt-dlp (or package manager for your OS) and ensure it is in your PATH. this skill does not auto-install.
"private video" or "video not found": video is private, deleted, or does not exist. check URL and user access; there is no fallback.
"HTTP Error 429": rate-limited by YouTube. wait 30 minutes to 1 hour and retry. do not hammer the API.
"network error" or timeout after 120 seconds: network issue or YouTube is unreachable. check internet connection and retry. if persistent, YouTube may be blocking yt-dlp temporarily.
no .vtt file generated: video has no subtitles (creator did not upload, and auto-captions are disabled or unavailable). return status "no captions available" and stop gracefully.
transcript exceeds 2,000,000 characters: transcript is truncated and flagged. prompt user to save to file with --output-file or similar flag if available.
"invalid language code": language parameter does not match expected format. default to en or ask user for clarification.
minimal no-video/no-network checks:
node --check scripts/fetch.mjs
node scripts/fetch.mjs --help
node scripts/fetch.mjs --url "https://example.com/watch?v=not-youtube" --json
the invalid-host smoke should fail before invoking yt-dlp.
optional environment check:
yt-dlp --version
classification: publishable utility with external binary and YouTube access.
before public update, run sanitizer/static checks and ensure docs clearly disclose:
yt-dlp dependency and PATH/binary trust boundary.child_process static-analysis warning.respect copyright and platform terms in examples, docs, and outputs: prefer summaries and brief quotes; do not publish long or full third-party transcripts unless rights or permission are clear.
do not include private, internal, operator-specific strategy, or full third-party transcript samples in a public release.
credits: original skill from clawhub. native node.js wrapper around yt-dlp.