Web page and website scraping with Firecrawl API. Use this skill when scraping web articles, blog posts, documentation pages, paywalled content, or…
Firecrawl Scraping
Overview
Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content.
Quick Decision Tree
What are you scraping?
│
├── Single page (article, blog, docs)
│ └── references/single-page.md
│ └── Script: scripts/firecrawl_scrape.py
│
└── Entire website (multiple pages, crawling)
└── references/website-crawler.md
└── (Use Apify Website Content Crawler for multi-page)
Environment Setup
# Required in .env
FIRECRAWL_API_KEY=fc-your-api-key-here
Get your API key: https://firecrawl.dev/app/api-keys
Common Usage
Simple Scrape
python scripts/firecrawl_scrape.py "https://example.com/article"
With Options
python scripts/firecrawl_scrape.py "https://wsj.com/article" \
--proxy stealth \
--format markdown summary \
--timeout 60000
Proxy Modes
Mode
Use Case
basic
Standard sites, fastest
stealth
Anti-bot protection, premium content (WSJ, NYT)
auto
Let Firecrawl decide (recommended)
Output Formats
markdown - Clean markdown content (default)
html - Raw HTML
summary - AI-generated summary
screenshot - Page screenshot
links - All links on page
Cost
~1 credit per page. Stealth proxy may use additional credits.
Security Notes
Credential Handling
Store FIRECRAWL_API_KEY in .env file (never commit to git)
API keys can be regenerated at https://firecrawl.dev/app/api-keys
Never log or print API keys in script output
Use environment variables, not hardcoded values
Data Privacy
Only scrapes publicly accessible web pages
Scraped content is processed by Firecrawl servers temporarily
Markdown output stored locally in .tmp/ directory
Screenshots (if requested) are stored locally
No persistent data retention by Firecrawl after request
Access Scopes
API key provides full access to scraping features
No granular permission scopes available
Monitor usage via Firecrawl dashboard
Compliance Considerations
Robots.txt: Firecrawl respects robots.txt by default
Public Content Only: Only scrape publicly accessible pages
Terms of Service: Respect target site ToS
Rate Limiting: Built-in rate limiting prevents abuse
Stealth Proxy: Use stealth mode only when necessary (paywalled news, not auth bypass)
GDPR: Scraped content may contain PII - handle accordingly
Copyright: Respect intellectual property rights of scraped content
Troubleshooting
Common Issues
Issue: Credits exhausted
Symptoms: API returns "insufficient credits" or quota exceeded error
Cause: Account credits depleted
Solution:
Check credit balance at https://firecrawl.dev/app
Upgrade plan or purchase additional credits
Reduce scraping frequency
Use basic proxy mode to conserve credits
Issue: Page not rendering correctly
Symptoms: Empty content or partial HTML returned
Cause: JavaScript-heavy page not fully loading
Solution:
Enable JavaScript rendering with --js-render flag
Increase timeout with --timeout 60000 (60 seconds)
Try stealth proxy mode for protected sites
Wait for specific elements with --wait-for selector
Issue: 403 Forbidden error
Symptoms: Script returns 403 status code
Cause: Site blocking automated access
Solution:
Enable stealth proxy mode
Add delay between requests
Try at different times (some sites rate limit by time)
Check if site requires login (not supported)
Issue: Empty markdown output
Symptoms: Scrape succeeds but markdown is empty or malformed
Cause: Dynamic content loaded after page load, or unusual page structure
Solution:
Increase wait time for JavaScript to execute
Use --wait-for to wait for specific content
Try html format to see raw content
Check if content is in an iframe (not always supported)
Issue: Timeout errors
Symptoms: Request times out before completion
Cause: Slow page load or large page content
Solution:
Increase timeout value (up to 120000ms)
Use basic proxy for faster response
Target specific page sections if possible
Check if site is experiencing issues
Resources
references/single-page.md - Single page scraping details
references/website-crawler.md - Multi-page website crawling
Integration Patterns
Scrape and Analyze
Skills: firecrawl-scraping → parallel-research
Use case: Scrape competitor pages, then analyze content strategy
Flow:
Scrape competitor website pages with Firecrawl
Convert to clean markdown
Use parallel-research to analyze positioning, messaging, features
Scrape and Document
Skills: firecrawl-scraping → content-generation
Use case: Create summary documents from web research
Flow:
Scrape multiple article pages on a topic
Combine markdown content
Generate summary document via content-generation
Scrape and Enrich CRM
Skills: firecrawl-scraping → attio-crm
Use case: Enrich company records with website data
Flow:
Scrape company website (about page, team page, product pages)
Extract key information (funding, team size, products)
Update company record in Attio CRM with enriched data
1d:["$don't have the plugin yet? install it then click "run inline in claude" again.