Item: firecrawl-scraping
Rating: 6.9
Author: Implexa

Web page and website scraping with Firecrawl API. Use this skill when scraping web articles, blog posts, documentation pages, paywalled content, or…

SKILL.md

Firecrawl Scraping

Overview

Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content.

Quick Decision Tree

What are you scraping?
│
├── Single page (article, blog, docs)
│   └── references/single-page.md
│   └── Script: scripts/firecrawl_scrape.py
│
└── Entire website (multiple pages, crawling)
    └── references/website-crawler.md
    └── (Use Apify Website Content Crawler for multi-page)

Environment Setup

# Required in .env
FIRECRAWL_API_KEY=fc-your-api-key-here

Get your API key: https://firecrawl.dev/app/api-keys

Common Usage

Simple Scrape

python scripts/firecrawl_scrape.py "https://example.com/article"

With Options

python scripts/firecrawl_scrape.py "https://wsj.com/article" \
  --proxy stealth \
  --format markdown summary \
  --timeout 60000

Proxy Modes

Mode
Use Case

basic
Standard sites, fastest

stealth
Anti-bot protection, premium content (WSJ, NYT)

auto
Let Firecrawl decide (recommended)

Output Formats

markdown - Clean markdown content (default)

html - Raw HTML

summary - AI-generated summary

screenshot - Page screenshot

links - All links on page

Cost

~1 credit per page. Stealth proxy may use additional credits.

Security Notes

Credential Handling

Store FIRECRAWL_API_KEY in .env file (never commit to git)

API keys can be regenerated at https://firecrawl.dev/app/api-keys

Never log or print API keys in script output

Use environment variables, not hardcoded values

Data Privacy

Only scrapes publicly accessible web pages

Scraped content is processed by Firecrawl servers temporarily

Markdown output stored locally in .tmp/ directory

Screenshots (if requested) are stored locally

No persistent data retention by Firecrawl after request

Access Scopes

API key provides full access to scraping features

No granular permission scopes available

Monitor usage via Firecrawl dashboard

Compliance Considerations

Robots.txt: Firecrawl respects robots.txt by default

Public Content Only: Only scrape publicly accessible pages

Terms of Service: Respect target site ToS

Rate Limiting: Built-in rate limiting prevents abuse

Stealth Proxy: Use stealth mode only when necessary (paywalled news, not auth bypass)

GDPR: Scraped content may contain PII - handle accordingly

Copyright: Respect intellectual property rights of scraped content

Troubleshooting

Common Issues

Issue: Credits exhausted

Symptoms: API returns "insufficient credits" or quota exceeded error
Cause: Account credits depleted
Solution:

Check credit balance at https://firecrawl.dev/app

Upgrade plan or purchase additional credits

Reduce scraping frequency

Use basic proxy mode to conserve credits

Issue: Page not rendering correctly

Symptoms: Empty content or partial HTML returned
Cause: JavaScript-heavy page not fully loading
Solution:

Enable JavaScript rendering with --js-render flag

Increase timeout with --timeout 60000 (60 seconds)

Try stealth proxy mode for protected sites

Wait for specific elements with --wait-for selector

Issue: 403 Forbidden error

Symptoms: Script returns 403 status code
Cause: Site blocking automated access
Solution:

Enable stealth proxy mode

Add delay between requests

Try at different times (some sites rate limit by time)

Check if site requires login (not supported)

Issue: Empty markdown output

Symptoms: Scrape succeeds but markdown is empty or malformed
Cause: Dynamic content loaded after page load, or unusual page structure
Solution:

Increase wait time for JavaScript to execute

Use --wait-for to wait for specific content

Try html format to see raw content

Check if content is in an iframe (not always supported)

Issue: Timeout errors

Symptoms: Request times out before completion
Cause: Slow page load or large page content
Solution:

Increase timeout value (up to 120000ms)

Use basic proxy for faster response

Target specific page sections if possible

Check if site is experiencing issues

Resources

references/single-page.md - Single page scraping details

references/website-crawler.md - Multi-page website crawling

Integration Patterns

Scrape and Analyze

Skills: firecrawl-scraping → parallel-research
Use case: Scrape competitor pages, then analyze content strategy
Flow:

Scrape competitor website pages with Firecrawl

Convert to clean markdown

Use parallel-research to analyze positioning, messaging, features

Scrape and Document

Skills: firecrawl-scraping → content-generation
Use case: Create summary documents from web research
Flow:

Scrape multiple article pages on a topic

Combine markdown content

Generate summary document via content-generation

Scrape and Enrich CRM

Skills: firecrawl-scraping → attio-crm
Use case: Enrich company records with website data
Flow:

Scrape company website (about page, team page, product pages)

Extract key information (funding, team size, products)

Update company record in Attio CRM with enriched data

1d:["$

firecrawl-scraping

SKILL.md

related skills