Item: site-crawler
Rating: 5.9
Author: Implexa

SKILL.md

Site Crawler Skill

Respectfully crawl documentation sites and web content for RAG ingestion.

Overview

Documentation sites, blogs, and knowledge bases contain valuable structured content. This skill covers:

Respectful crawling (robots.txt, rate limiting)

Structure-preserving extraction

Incremental updates (only fetch changed pages)

Sitemap-based discovery

Prerequisites

# HTTP client
pip install httpx

site-crawler

SKILL.md

related skills