Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scraping.…
Crawl4AI Skill - Web Crawler & Scraper Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出 智能网页爬虫和爬取工具,支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output. 核心功能 | Core Features 🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key 🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别 📝 Web Scraping 网页抓取 - Smart scraper, data extraction 📄 LLM-Optimized Output - Fit Markdown, 省 Token 80% ⚡ Dynamic Page Scraping - JavaScript 渲染页面爬取 快速开始 | Quick Start 安装 | Installation pip install crawl4ai-skill Web Search | 网页搜索 # Search the web with DuckDuckGo crawl4ai-skill search "python web scraping" Web Scraping | 单页爬取 # Scrape a single web page crawl4ai-skill crawl https://example.com Web Crawling | 全站爬虫 # Crawl entire website / spider crawl4ai-skill crawl-site https://docs.python.org --max-pages 50 使用场景 | Use Cases 场景 1:Web Crawler for Documentation | 文档站爬虫 # Crawl documentation site with spider crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100 爬虫效果 | Crawler Output: ❌ 移除:导航栏、侧边栏、广告 ✅ 保留:标题、正文、代码块 📊 Token:50,000 → 10,000(-80%) 场景 2:Search + Scrape | 搜索+爬取 # Search and scrape top results crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3 场景 3:Dynamic Page Scraping | 动态页面抓取 JavaScript 渲染的页面爬取(雪球、知乎等): # Scrape JavaScript-heavy pages crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2 命令参考 | Commands 命令 Command 说明 Description search <query> Web search 网页搜索 crawl <url> Web scraping 单页爬取 crawl-site <url> Web crawling 全站爬虫 search-and-crawl <query> Search + scrape 搜索并爬取 常用参数 | Common Options # Web Search 搜索 --num-results 10 # Number of results # Web Scraping 爬取 --format fit_markdown # Output format --output result.md # Output file --wait-until networkidle # Wait strategy for dynamic pages --delay 2 # Additional wait time (seconds) --wait-for ".selector" # Wait for specific element # Web Crawling 爬虫 --max-pages 100 # Max pages to crawl --max-depth 3 # Max crawl depth 输出格式 | Output Formats fit_markdown(推荐 Recommended) 智能提取,节省 80% Token。Smart extraction, save 80% tokens. crawl4ai-skill crawl https://example.com --format fit_markdown raw_markdown 保留完整结构。Preserve full structure. crawl4ai-skill crawl https://example.com --format raw_markdown 为什么选择这个爬虫?| Why This Crawler? ✅ 免费爬虫 Free Crawler - 无需 API key,开箱即用 ✅ 智能爬取 Smart Scraper - 自动去噪,提取核心内容 ✅ 全站爬虫 Site Crawler - 支持 sitemap,递归爬取 ✅ 动态爬取 Dynamic Scraping - JavaScript 渲染页面支持 ✅ 搜索集成 Search Integration - DuckDuckGo 搜索内置 链接 | Links 📦 PyPI 💻 GitHub 🦞 ClawHub
don't have the plugin yet? install it then click "run inline in claude" again.