YouTube Transcript Native Node

Extract a clean plain-text transcript from existing YouTube captions - native Node.js, zero npm dependencies. Use when the user asks to summarize, quote, or...

installs

974

stars

karma

SkillRank score ↗

8.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

youtube-transcript-native-node extracts clean plain-text captions from youtube videos using the yt-dlp binary, with no npm dependencies or api keys. supports multiple languages, optional timestamps, and json output for downstream processing.

structure

9.0

trigger phrases

8.0

procedure

8.0

edge cases

8.0

documentation

9.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: youtube-transcript-native-node
description: Extract a clean plain-text transcript from existing YouTube captions - native Node.js, zero npm dependencies. Use when the user asks to summarize, quote, or extract captions/transcript text from a YouTube URL. Wraps the `yt-dlp` binary on PATH; writes subtitles to a temp dir, parses .vtt captions, strips timestamps/HTML tags, and prints clean text or JSON. No API keys required.
version: 1.1.4
risk_class: external-binary-youtube-network-third-party-content
---

# YouTube Transcript (Native Node)

Version: 1.1.4 / publishable utility.

Minimal YouTube caption extractor. Native Node.js, zero npm dependencies, wraps the external `yt-dlp` binary.

## Risk / invocation class

Risk class: **external binary wrapper / YouTube network access / third-party content**.

Use deliberately. This skill does not call a web API directly, but `yt-dlp` talks to YouTube and the local environment owns the `yt-dlp` PATH/binary supply-chain trust boundary.

## Input packet

Required:

- `url`: full YouTube URL from the user.
- `goal`: raw transcript, summary input, quote extraction, timestamped notes, or JSON handoff.
- `privacy_sensitivity`: normal, private/client, or unknown.
- `language`: default `en` unless another language is requested.

Optional:

- `timestamps`: needed or not.
- `json`: needed for downstream tool use.
- `dedup_preference`: default auto-caption rolling-window dedup, or `--no-dedup` to preserve rolling-window/repeated-phrase artifacts as much as possible. Exact consecutive duplicate cue text may still be collapsed during VTT parsing.
- `output_destination`: chat summary, saved file, downstream summarizer, etc.

Stop or ask before use if the video/context is private or client-sensitive and sending access to YouTube via `yt-dlp` is not appropriate.

## Output packet

Return compactly:

- source YouTube URL
- language requested and whether timestamps/JSON were used
- transcript status: success, no captions, dependency missing, private/blocked/rate-limited, or failed
- whether captions appear auto-generated when known
- saved path if the transcript was separately written to a file
- concise transcript summary or excerpt, unless the user requested raw text
- caveats and next safe step

## Security behavior

- Accepts only `http(s)` YouTube URLs on `youtube.com`, `www.youtube.com`, `m.youtube.com`, or `youtu.be`.
- Validates `--lang` as a simple subtitle language code before invoking `yt-dlp`.
- Spawns `yt-dlp` with an argv array and no shell; it does not execute user-provided commands.
- Bounds the subprocess with a 120-second timeout.
- Creates and removes a temporary subtitle directory under the OS temp path.
- Refuses to print transcripts larger than 2,000,000 characters.
- Reads no API keys, env secrets, or credential/config files. Offline regression hooks are inert unless `YOUTUBE_TRANSCRIPT_SELFTEST=1` is set by `scripts/self-test.mjs`; do not set self-test hooks for normal transcript extraction.
- Passes `--ignore-config` so user-level `yt-dlp` config does not silently alter wrapper behavior.
- Static-analysis `child_process` warnings are expected because this skill intentionally wraps trusted `yt-dlp`.

## When to use

Use this when:

- the user provides a YouTube URL and wants spoken text/captions;
- clean plain text is needed for summarization, search, or quoting;
- the video has creator-uploaded subtitles or auto-generated captions.

Do not use this when:

- the user expects actual audio transcription; this extracts existing captions only;
- the platform is not YouTube;
- the video is a live stream that has not ended;
- the video/content is privacy-sensitive and should not be accessed via YouTube/yt-dlp;
- `yt-dlp` is not installed/on PATH and installing it has not been approved.

## Commands

Script: `scripts/fetch.mjs`

```powershell
node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID"
node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --lang es
node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps
node "<skill-dir>\scripts\fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json
node "<skill-dir>\scripts\fetch.mjs" --help
```

For all flags, dedup details, output formats, dependency notes, and troubleshooting, load `references/youtube-transcript-contract.md`.

## Operating guidance

- Pass the full user-provided YouTube URL; do not invent/transform URL forms unnecessarily.
- Default to `--lang en` unless another language is clear.
- Use default plain text for direct human reading and summaries.
- Use `--json` as the default structured handoff for research triage, summarization, and downstream tooling.
- Use `--timestamps` only when timestamped notes, quote traceability, or debugging are needed; it is an advanced/evidence mode, not the recommended default for reading.
- Use `--json --timestamps` only for machine traceability workflows that need timestamp anchors inside JSON; it is not intended as a human-readable inspection format.
- Save long transcripts to a file when useful; do not paste giant transcripts unless requested.
- Summarize first and quote sparingly by default.
- Respect copyright and platform terms; do not republish long/full transcripts unless the user has rights or permission.
- Note that captions may be auto-generated and imperfect.

## Required checks before publishing/updating

Minimum no-video/no-network checks:

```powershell
node --check skills\youtube-transcript-native-node\scripts\fetch.mjs
node skills\youtube-transcript-native-node\scripts\self-test.mjs
node skills\youtube-transcript-native-node\scripts\fetch.mjs --help
node skills\youtube-transcript-native-node\scripts\fetch.mjs --url "https://example.com/watch?v=not-youtube" --json
```

The invalid-host smoke should fail before invoking `yt-dlp`.

Optional environment check:

```powershell
yt-dlp --version
```

Do not install/update `yt-dlp` as part of this skill without explicit approval.

## Public registry exposure

Classification: **publishable utility with external binary + YouTube access**.

Before public update, run sanitizer/static checks and ensure docs clearly disclose:

- `yt-dlp` dependency and PATH/binary trust boundary;
- YouTube-only URL allowlist;
- no API keys/env secrets/config reads;
- temp-directory behavior and stderr temp-path scrubbing;
- no audio/video download and no audio transcription;
- expected `child_process` static-analysis warning.
- best-effort scrub of temp- and home-directory paths from the last lines of `yt-dlp` stderr; unrelated absolute paths emitted by `yt-dlp` itself may remain.

Respect copyright and platform terms in examples, docs, and outputs: prefer summaries and brief quotes; do not publish long/full third-party transcripts unless rights or permission are clear.

Do not include private/internal/client strategy, operator-specific operational notes, or full third-party transcript samples in a public release.

## Changelog

- `1.1.4`: ClawHub publication/version refresh after public-readiness review; no runtime behavior change.
- `1.1.3`: Add stubbed offline yt-dlp fixture tests for dependency-missing, nonzero-exit-with-VTT, 429 hint, temp/home path scrubbing, output-size guard, timeout, and output modes; gate self-test hooks behind `YOUTUBE_TRANSCRIPT_SELFTEST=1`; continue when usable VTT subtitles are produced despite nonzero yt-dlp exit; kill active yt-dlp child on SIGINT/SIGTERM; broaden local-path scrubbing and scrub unexpected/read-error paths.
- `1.1.2`: Add offline self-test fixtures, export parser/allowlist helpers for tests, pass `--ignore-config`, remove subtitle conversion postprocessor to avoid ffmpeg ambiguity, scrub temp path from yt-dlp error tails, and surface 429 retry guidance.
- `1.1.1`: Public docs cleanup: normalized input/output packet wording, structured handoff wording, and changelog language; no runtime behavior change.

don't have the plugin yet? install it then click "run inline in claude" again.

reorganized into implexa's six required components, expanded decision points to cover all error branches (rate-limit, blocked, privacy confirmation, dependency missing, size guards), documented yt-dlp binary as external dependency with installation guidance, added edge cases (nonzero exit with valid vtt, 429 handling, output size limits, timeout), preserved all original procedure steps faithfully with explicit inputs and outputs, clarified output contract with structured metadata fields, and defined outcome signals for success and failure conditions.

YouTube Transcript Native Node

Item: YouTube Transcript Native Node
Rating: 8.3
Author: Implexa

Extract a clean plain-text transcript from existing YouTube captions. native Node.js, zero npm dependencies, wraps the yt-dlp binary on PATH. writes subtitles to a temp dir, parses .vtt captions, strips timestamps and HTML tags, outputs clean text or JSON. no API keys required.

intent

use this skill when a user provides a YouTube URL and wants the spoken text from existing captions. extracts captions only (not audio transcription). returns clean plain text for summaries, quotes, and extraction tasks. runs natively on Node.js without external npm packages. wraps yt-dlp as a trusted external binary. use when caption text is needed for summarization, search, quoting, or downstream tooling.

inputs

required:

url: full YouTube URL (youtube.com, www.youtube.com, m.youtube.com, or youtu.be domains only). validated before invocation.
language: language code for subtitle request. defaults to en. validated as a simple subtitle language code before spawning yt-dlp.

optional:

timestamps: boolean. include timestamp markers in output. defaults to false. used for timestamped notes, quote traceability, or debugging.
json: boolean. return structured JSON instead of plain text. defaults to false. use for downstream tooling, research triage, and summarization pipelines.
dedup_preference: string. auto (default) collapses rolling-window duplicate caption text. --no-dedup preserves rolling-window artifacts (exact consecutive duplicate cue text may still collapse during VTT parsing).
output_destination: string. hint for where result goes: chat summary, saved file, downstream summarizer, etc. informational only.
privacy_sensitivity: string. normal, private/client, or unknown. if private/client, confirm with user before accessing YouTube.

external dependency:

yt-dlp binary on PATH. required. install via package manager or download from https://github.com/yt-dlp/yt-dlp/releases. this skill does not install it. set PATH environment variable to include the yt-dlp directory.

environment (optional, for self-test only):

YOUTUBE_TRANSCRIPT_SELFTEST=1: enables offline regression fixtures in scripts/self-test.mjs. do not set for normal transcript extraction.

procedure

validate URL: check that url matches YouTube URL patterns (youtube.com, www.youtube.com, m.youtube.com, youtu.be). reject non-http(s) and non-YouTube hosts. input: url. output: accept/reject signal, error message if invalid.
validate language code: confirm language is a simple subtitle language code (e.g., en, es, fr). reject complex/shell-injection patterns. input: language parameter. output: validated language code or error.
check yt-dlp availability: spawn yt-dlp --version to confirm binary is on PATH and executable. timeout 10 seconds. input: PATH environment. output: version string or dependency-missing error.
create temp directory: create an OS-specific temp subdirectory under system temp path (e.g., /tmp/youtube-transcript-XXXXX or %TEMP%\youtube-transcript-XXXXX). input: none. output: temp directory path.
spawn yt-dlp subprocess: invoke yt-dlp --ignore-config --skip-download --write-subs --sub-lang <language> --sub-format vtt -o <temp-dir>/%(id)s <url>. use child_process.spawnSync with argv array (no shell). timeout: 120 seconds. input: url, language, temp-dir path. output: exit code, stdout, stderr.
handle yt-dlp exit/errors: check exit code and stderr. if exit code nonzero, check for usable .vtt files in temp dir anyway (some videos produce subtitles despite nonzero exit). if stderr contains "429" or rate-limit hint, emit guidance to retry later. if no .vtt files exist, return no-captions error. input: exit code, stderr, temp-dir contents. output: proceed with parsing or error signal (dependency-missing, private/blocked, rate-limited, failed).
find .vtt file: scan temp directory for .vtt subtitle file. expected naming: <video-id>.vtt or <video-id>.<language>.vtt. if multiple .vtt files, prefer exact language match. input: temp-dir path, language. output: .vtt file path or file-not-found error.
read and parse .vtt: read .vtt file as UTF-8 text. parse VTT cue blocks: extract timestamp ranges and cue text. strip HTML tags (, , etc.). collapse exact consecutive duplicate cue text (dedup always applies to duplicates; --no-dedup only affects rolling-window heuristics). input: .vtt file path, dedup_preference. output: array of cue objects: {timestamp: "HH:MM:SS --> HH:MM:SS", text: "clean text"} (or {text: "clean text"} if no timestamps).
apply formatting: if timestamps is true, join cues with timestamp prefixes (e.g., "[00:00:10] caption text"). if false, join cues with space or newline. input: cue array, timestamps flag. output: formatted string.
apply output mode: if json is true, serialize as JSON object with metadata: {url, language, timestamps_included, auto_generated_hint, transcript: [...]}. if false, return plain text. input: formatted string, cue metadata, json flag. output: string (plain text or JSON).
size guard: if output exceeds 2,000,000 characters, truncate with warning. do not output beyond limit. input: formatted output. output: capped output or truncation error.
scrub paths from stderr: attempt to remove temp-dir and home-directory absolute paths from any error messages for privacy. input: stderr from yt-dlp. output: scrubbed stderr string.
clean up temp directory: remove temp directory and all contents. input: temp-dir path. output: success or warning if cleanup fails.
return result: compile final output packet with transcript, metadata, status, caveats, and next steps. input: formatted transcript, status, temp-dir path (if saved separately), error messages, metadata. output: structured result object or string.

decision points

if privacy_sensitivity is "private" or "client":

ask user to confirm that accessing this video via YouTube and yt-dlp is appropriate before proceeding. if user declines or is unsure, stop and do not invoke yt-dlp.

if yt-dlp is not on PATH:

return "dependency-missing" error. do not proceed. guide user to install yt-dlp via package manager or direct download.

if yt-dlp exits nonzero but .vtt file exists and is readable:

attempt to parse the .vtt file anyway. many videos produce usable subtitles despite nonzero exit (e.g., due to missing metadata). flag the behavior in output metadata ("partial success, yt-dlp reported error but captions extracted").

if yt-dlp exits nonzero and no .vtt file exists:

check stderr for "429" or rate-limit patterns. if present, return "rate-limited" status with guidance to retry after delay. else return "failed" status with yt-dlp stderr (scrubbed).

if no captions exist for the video:

return "no-captions" status. note that the video may have no creator captions or auto-generated captions. suggest checking YouTube directly or requesting transcript from creator.

if video is private, deleted, or blocked:

yt-dlp stderr will indicate access denied. return "blocked" status with note that user/operator does not have access. do not attempt parsing.

if output exceeds 2,000,000 characters:

truncate to 2,000,000 characters. emit truncation warning. do not exceed limit under any condition.

if timestamps is true and json is true:

include timestamps in JSON cue objects. this is an advanced mode for machine traceability workflows. not intended as human-readable output.

if json is true:

default to including auto_generated_hint in metadata (boolean, "true" if captions appear auto-generated from vtt metadata or heuristics, "unknown" otherwise).

if dedup_preference is "--no-dedup":

preserve rolling-window duplicate phrases during formatting. exact consecutive duplicate cue text is still collapsed at the VTT parsing level (unavoidable).

output contract

return a result object or string with these fields:

url: the input YouTube URL (unchanged).
language: the requested language code (e.g., "en").
status: one of "success", "no-captions", "dependency-missing", "rate-limited", "blocked", "private-declined", "failed", "truncated".
timestamps_included: boolean. whether output includes timestamp markers.
json_format: boolean. whether output is JSON or plain text.
auto_generated: boolean or "unknown". whether captions appear auto-generated.
transcript: the extracted text (plain text string or JSON array of {timestamp, text} or {text} objects).
character_count: integer. length of transcript in characters.
file_saved: string or null. path to transcript file if separately saved; else null.
caveats: array of strings. warnings (auto-generated, truncated, rolling-window artifacts preserved, etc.).
next_steps: array of strings. guidance on retry, alternative approaches, or follow-up actions.

if json flag is false, return plain text transcript as primary output with metadata as a brief summary header.

if output is saved to a file, note the file path in file_saved and indicate in result.

outcome signal

the user knows the skill worked when:

a readable plain-text or JSON transcript appears in chat or is saved to a file.
the transcript contains recognizable spoken dialogue or captions from the video.
the status field is "success" or "truncated" (truncation is partial success).
timestamps are present (if requested).
the result includes the original URL, language requested, character count, and any caveats (e.g., "auto-generated", "partial success").
if the user requested a summary or quotes, the text is usable for extraction.

failure signals:

status is "no-captions", "blocked", "failed", "dependency-missing", or "rate-limited".
no transcript text appears; only error messages or "try again later" guidance.
the result notes that yt-dlp is not installed or unreachable.
the user confirms the video is private or client-sensitive and declines to proceed.

YouTube Transcript Native Node

related skills

YouTube Transcript Native Node

intent

inputs

procedure

decision points

output contract

outcome signal