Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic…
Firecrawl Knowledge Base
Use this to turn URLs or topics into organized LLM-ready content.
Onboarding Interview
Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.
Firecrawl Collection Plan
Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.
For files, follow the Firecrawl download-style convention:
.firecrawl/
<hostname>/
<path>/
index.md
Parallel Work
If appropriate, use sub-agents or equivalent parallel task runners:
one docs section per researcher
official docs, tutorials, community discussions, and references by source type
source scraping vs chunk generation vs manifest generation
Output Modes
Reference: markdown files, index.md, and sources.json.
RAG: markdown files plus chunk files and manifest.json.
Training: scraped source files plus training-data.jsonl and training-metadata.json.
Docs mirror: complete markdown mirror with a table of contents.
Final Deliverable
# Knowledge Base: [Source]
## Summary
[What was collected and why]
## Output Structure
[Files/directories created]
## Coverage
[Sections, source types, counts]
## Usage Notes
[How to use in RAG, docs, training, or agent context]
## Sources
[URLs collected]
## Rerun Inputs
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
Quality Bar
Preserve code examples and formatting.
Remove boilerplate navigation where possible.
Include source URLs in frontmatter or metadata.don't have the plugin yet? install it then click "run inline in claude" again.