HN Podcast Transcriber

Automatically fetch, transcribe, and archive Hacker News podcast episodes (Hacker News Morning Brief). Use when the user wants to set up a podcast transcript...

installs

stars

karma

SkillRank score ↗

6.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-07-12

hn-podcast-transcriber fetches hacker news morning brief episodes via rss, transcribes audio with openai whisper, and archives transcripts as searchable markdown. supports custom podcast feeds and model selection.

structure

7.0

trigger phrases

6.0

procedure

7.0

edge cases

4.0

documentation

7.0

strengths

SKILL.md

---
name: hn-podcast-transcriber
description: Automatically fetch, transcribe, and archive Hacker News podcast episodes (Hacker News Morning Brief). Use when the user wants to set up a podcast transcription pipeline, archive HN podcast episodes as searchable text, transcribe podcast audio to markdown, or schedule periodic HN podcast ingestion. Also use for any podcast RSS feed transcription workflow.
---

# HN Podcast Transcriber

Fetch new episodes from the Hacker News Morning Brief podcast RSS feed, transcribe with Whisper, and archive as searchable markdown.

## Prerequisites

- **whisper** CLI installed (`pip install openai-whisper`)
- **ffmpeg** on PATH (required by whisper; download from https://ffmpeg.org)
- **python3** with standard library (no extra deps for the fetch script)
- Disk space for audio files (~5-10 MB per episode)

## Quick Start

Run the main script to fetch and transcribe all new episodes:

```bash
bash scripts/fetch_and_transcribe.sh --archive ~/hn-podcast-archive
```

First run processes all episodes. Subsequent runs only process new ones (tracked via `state.json`).

## Options

| Flag | Default | Description |
|------|---------|-------------|
| `--feed URL` | HN Morning Brief RSS | Podcast RSS feed URL |
| `--archive DIR` | `./hn-podcast-archive` | Archive root directory |
| `--model MODEL` | `turbo` | Whisper model (tiny/base/small/medium/large/turbo) |
| `--limit N` | 0 (all) | Max new episodes to process per run |

## Custom Feeds

Point at any podcast RSS feed:

```bash
bash scripts/fetch_and_transcribe.sh --feed "https://example.com/podcast/feed.xml" --archive ./my-podcast-archive
```

## Scheduling

Set up an OpenClaw cron job for daily checks:

1. Create an isolated cron job that runs the script
2. Or add a heartbeat check in HEARTBEAT.md

## Archive Structure

See [references/archive-layout.md](references/archive-layout.md) for directory layout and state.json schema.

## Workflow Summary

1. Download RSS feed → parse `<item>` entries
2. Skip already-processed episodes (state.json lookup)
3. Download audio (mp3/m4a) to episode directory
4. Run `whisper` to produce `.txt` transcript
5. Generate cleaned `transcript.md` with title + date header
6. Update state.json with processed episode ID

## Notes

- Whisper models cache to `~/.cache/whisper` after first download
- Use `--model tiny` for speed, `--model large` for best accuracy
- Average episode (~6 min) takes ~1-2 min with turbo model on CPU
- For GPU acceleration, install ffmpeg with CUDA support

don't have the plugin yet? install it then click "run inline in claude" again.

HN Podcast Transcriber

SKILL.md

related skills