LinkedIn Scraper

Scrape LinkedIn profiles using the user's Chrome profile. Use when asked to find leads, scrape LinkedIn profiles, extract contact data from LinkedIn, or buil...

view source

installs

stars

karma

SkillRank score ↗

7.4/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

linkedin-scraper uses authenticated chrome sessions to extract profile data and bulk search results with built-in rate limiting and stealth patterns. stores results as json or duckdb records.

structure

9.0

trigger phrases

8.0

procedure

8.0

edge cases

7.0

documentation

7.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: linkedin-scraper
description: Scrape LinkedIn profiles using the user's Chrome profile. Use when asked to find leads, scrape LinkedIn profiles, extract contact data from LinkedIn, or build prospect lists. Triggers include "find founders on LinkedIn", "scrape this LinkedIn profile", "get LinkedIn data for these people", "build a lead list from LinkedIn".
metadata: { "openclaw": { "emoji": "🔍" } }
---

# LinkedIn Scraper — Chrome Profile Web Scraping

Scrape LinkedIn profiles and search results using the user's authenticated Chrome browser session. No API keys needed — uses the browser tool with the Chrome profile relay.

## Prerequisites

- Chrome browser with active LinkedIn login
- Browser relay connected (Chrome extension or openclaw browser profile)
- DuckDB workspace for storing results (optional)

## Core Workflow

### 1. Single Profile Scrape

```
browser → open LinkedIn profile URL
browser → snapshot (extract structured data)
→ Parse: name, headline, title, company, location, education, experience, connections, about
→ Return structured JSON or insert into DuckDB
```

### 2. Search + Bulk Scrape

```
browser → open LinkedIn search URL with filters
browser → snapshot (extract result cards)
→ Parse each result: name, title, company, profile URL
→ For each profile URL: open → snapshot → parse full profile
→ Batch insert into DuckDB
```

### 3. Company Page Scrape

```
browser → open LinkedIn company page
→ Parse: company name, industry, size, description, specialties, employee count
→ Navigate to /people tab for employee list
```

## Implementation Rules

### Rate Limiting (CRITICAL)
- **Minimum 3-5 second delay** between page loads
- **Maximum 80 profiles per session** (LinkedIn rate limits)
- **Randomize delays** between 3-8 seconds (avoid detection)
- After every 20 profiles, take a **60-second break**
- If CAPTCHA or "unusual activity" detected, **stop immediately** and alert user

### Stealth Patterns
- Use natural scrolling (scroll down slowly, pause, scroll more)
- Don't scrape the same search results page more than twice
- Vary the order of profile visits (don't go sequentially)
- Close and reopen tabs periodically

### Data Extraction — Profile Page
From a LinkedIn profile snapshot, extract these fields:

| Field | Location | Notes |
|-------|----------|-------|
| name | Main heading h1 | Full name |
| headline | Below name | Title + Company usually |
| location | Location section | City, State/Country |
| current_title | Experience section, first entry | Most recent role |
| current_company | Experience section, first entry | Company name |
| education | Education section | School, degree, dates |
| connections | Connections count | Number or "500+" |
| about | About section | Bio text (may need "see more" click) |
| experience | Experience section | All roles with dates |
| profile_url | Browser URL bar | Canonical LinkedIn URL |

### Data Extraction — Search Results
From LinkedIn search results page:

| Field | Location |
|-------|----------|
| name | Result card heading |
| headline | Below name in card |
| location | Card metadata |
| profile_url | Link href on name |
| mutual_connections | Card footer |

## Search URL Patterns

```
# People search
https://www.linkedin.com/search/results/people/?keywords={query}

# With filters
&geoUrn=%5B%22103644278%22%5D          # United States
&network=%5B%22F%22%2C%22S%22%5D        # 1st + 2nd connections
&currentCompany=%5B%22{company_id}%22%5D # Current company
&schoolFilter=%5B%22{school_id}%22%5D    # School filter

# YC founders (common query)
https://www.linkedin.com/search/results/people/?keywords=Y%20Combinator%20founder

# Company employees
https://www.linkedin.com/company/{slug}/people/
```

## DuckDB Integration

When storing to DuckDB, use the Ironclaw workspace database:

```sql
-- Check if leads/contacts object exists
SELECT * FROM objects WHERE name = 'leads' OR name = 'contacts';

-- Insert via the EAV pattern or direct pivot view
INSERT INTO v_leads ("Name", "Title", "Company", "LinkedIn URL", "Location", "Source")
VALUES (?, ?, ?, ?, ?, 'LinkedIn Scrape');
```

If no suitable object exists, create one:
```sql
-- Use Ironclaw's object creation pattern from the dench skill
```

## Error Handling

| Error | Action |
|-------|--------|
| "Sign in" page | LinkedIn session expired — alert user to re-login in Chrome |
| CAPTCHA / Security check | Stop immediately, wait 30+ min, alert user |
| "Profile not found" | Skip, log URL as invalid |
| Rate limit (429) | Stop, wait 15 min, retry with longer delays |
| Empty snapshot | Page still loading — wait 3s and re-snapshot |

## Output Formats

### JSON (default)
```json
{
  "name": "Jane Doe",
  "headline": "CEO at Acme Corp",
  "current_title": "CEO",
  "current_company": "Acme Corp",
  "location": "San Francisco, CA",
  "linkedin_url": "https://www.linkedin.com/in/janedoe",
  "connections": "500+",
  "education": [{"school": "Stanford", "degree": "BS CS", "years": "2010-2014"}],
  "experience": [{"title": "CEO", "company": "Acme Corp", "duration": "2020-Present"}],
  "scraped_at": "2026-02-17T14:30:00Z"
}
```

### Progress Reporting
For bulk scrapes, report progress:
```
Scraping: 15/50 profiles (30%) — Last: Jane Doe (Acme Corp)
Rate: ~4 profiles/min — ETA: 9 min remaining
```

## Safety
- Never scrape private/restricted profiles
- Respect LinkedIn's robots.txt for public pages
- Store data locally only (DuckDB) — never exfiltrate
- User must have legitimate LinkedIn access
- This tool assists the user's own manual browsing at scale

related skills

semantically similar in the cross-vendor index

clawhub

71% match

LinkedIn Bulk Connect

Send LinkedIn connection requests to a list of people via browser automation and track status in a CSV/TSV file. Use when the user wants to bulk-connect with a list of people on LinkedIn (founders, sp

don't have the plugin yet? install it then click "run inline in claude" again.

expanded original into implexa's six-component structure, made decision logic explicit (session expiry, captcha, rate limits, network failures), added edge cases (timeouts, truncated content, missing objects), documented duckdb integration with query examples, clarified rate limiting rules with concrete delays, added outcome signals for verification, kept original author unattributed per source.

LinkedIn Scraper

scrape linkedin profiles and search results using your authenticated Chrome browser session. no api keys needed, just your existing LinkedIn login. use this when you need to find leads, extract profile data, or build prospect lists at scale.

intent

extract structured data from LinkedIn profiles, search results, and company pages using your Chrome session. operates without requiring API keys by leveraging your browser's authenticated state via the Chrome profile relay. run this when you need to find founders, scrape specific profiles, gather contact data for multiple people, or compile lead lists. the skill respects LinkedIn's rate limits and detection patterns to avoid account flags or CAPTCHAs.

inputs

required:

Chrome browser with active LinkedIn login session
Browser relay connected (Chrome extension or openclaw browser profile mode)
Target URL or search query (LinkedIn profile URL, search results page, or company page)

optional but recommended:

DuckDB workspace for persisting scraped data (use ironclaw's contacts or leads object)
Delay preferences (default: randomized 3-8 seconds between requests)
Batch size preference (default: 80 profiles max per session)

edge cases to account for:

LinkedIn session expiry during scrape (user logged out)
Rate limiting (429 errors or implicit throttling)
CAPTCHA or "unusual activity" alerts
Dynamic content requiring scrolling or "see more" clicks
Network timeouts (retry up to 3 times with exponential backoff)
Empty or missing fields on certain profiles

procedure

step 1: validate chrome session and set up

input: browser relay connection, target URL or search query
check if browser is connected and LinkedIn is accessible (load linkedin.com homepage)
verify no "Sign in" page or CAPTCHA is present
output: browser ready signal, session confirmed

step 2: for single profile scrape

input: LinkedIn profile URL (https://www.linkedin.com/in/{slug}/)
open profile URL in browser
wait 3-5 seconds for page to fully load (account for lazy-loaded content)
trigger snapshot to capture full DOM
output: raw HTML snapshot

step 3: extract profile fields from snapshot

input: HTML snapshot from step 2
parse and extract: name (h1), headline (below name), location, current title, current company, education (list), experience (list), connections count, about section, profile URL
handle "see more" expansions by noting if about or experience is truncated
output: structured JSON object with all extracted fields plus scraped_at timestamp

step 4: for search result scrape (bulk)

input: LinkedIn search URL with optional filters (keywords, geography, company, school)
open search URL in browser
wait 4 seconds for results to render
trigger snapshot to capture result cards
output: raw HTML snapshot of search page

step 5: parse search results from snapshot

input: HTML snapshot from step 4
extract from each result card: name, headline, location, profile URL, mutual connections count
compile list of profile URLs for follow-up scrapes
output: array of result objects with profile URLs

step 6: bulk profile scrape loop

input: array of profile URLs from step 5, delay settings
for each URL in list:
- apply randomized delay (3-8 seconds default) to avoid sequential patterns
- open profile URL
- wait for full load
- snapshot and parse (repeat steps 2-3)
- collect result in batch
- every 20 profiles, insert 60-second break
- check for CAPTCHA or rate limit signals (stop if detected)
output: array of parsed profile objects

step 7: optional company page scrape

input: LinkedIn company page URL (https://www.linkedin.com/company/{slug}/)
open company page
wait 4 seconds for load
snapshot and extract: company name, industry, size, description, specialties, employee count
navigate to /people tab if employee list is desired
output: company object + optional employee list snapshot

step 8: persist to duckdb (if workspace available)

input: batch of parsed profiles or company data, workspace connection string
check if "leads" or "contacts" object exists in DuckDB workspace
insert via EAV pattern or direct pivot view: insert into v_leads or equivalent with columns name, title, company, linkedin_url, location, source ('LinkedIn Scrape')
if no suitable object exists, log output as JSON and alert user to create object manually
output: row count inserted or JSON file path

step 9: return results to user

input: final batch of parsed profiles
format as JSON array or CSV (user preference)
include progress summary if bulk scrape (e.g. "15/50 profiles, 30%, ETA 9min")
output: structured data file or display

decision points

if chrome session is not authenticated (sign-in page detected):

action: stop immediately, alert user "LinkedIn session expired, please re-login in Chrome and retry"
do not proceed with scraping

if CAPTCHA or "unusual activity" modal is detected:

action: stop immediately, do not attempt to solve CAPTCHA or continue
alert user: "LinkedIn detected unusual activity, please wait 30+ minutes before retrying"
note: this is a LinkedIn security measure; continuing triggers account restrictions

if rate limit 429 error occurs or implicit throttling is suspected:

action: stop batch loop, wait 15 minutes minimum before resuming
extend randomized delay to 5-10 seconds for next attempt
reduce batch size (scrape 40-50 profiles instead of 80)

if profile page fails to load or returns 404:

action: skip profile, log URL as invalid, continue to next in batch
do not retry the same profile more than once per session

if snapshot captures incomplete data (e.g. about section truncated, experience list shows "... and X more"):

action: note field as truncated in JSON (add "truncated": true flag)
if critical to user's goal, attempt one "see more" click and re-snapshot
do not retry more than once per field to avoid rate limit risk

if duckdb workspace is not available or no leads/contacts object exists:

action: return results as JSON file or CSV export
alert user with object creation guidance or location where JSON was saved

if network timeout or browser connection drops:

action: wait 5 seconds, attempt to reconnect browser relay
retry the current URL up to 3 times with exponential backoff (5s, 10s, 20s)
if still fails, log as network error and skip to next profile in batch

output contract

success state:

JSON array (or CSV table) of profiles with all extracted fields populated
each profile object contains: name, headline, current_title, current_company, location, linkedin_url, connections, education (array), experience (array), about, scraped_at (ISO 8601 timestamp)
fields marked as "truncated": true if content was incomplete on page
missing optional fields (e.g. about section) are null, not omitted

file location (if persisted):

duckdb: inserted into v_leads (or equivalent contacts table) in ironclaw workspace
local: JSON saved to {workspace}/exports/linkedin_scrape_{timestamp}.json or CSV to {workspace}/exports/linkedin_scrape_{timestamp}.csv

progress metadata (for bulk scrapes):

include summary object with fields: total_requested, total_scraped, total_skipped, success_rate, estimated_time_remaining, rate_profiles_per_minute

example single profile output:

{
  "name": "Jane Doe",
  "headline": "CEO at Acme Corp",
  "current_title": "CEO",
  "current_company": "Acme Corp",
  "location": "San Francisco, CA",
  "linkedin_url": "https://www.linkedin.com/in/janedoe",
  "connections": "500+",
  "education": [{"school": "Stanford", "degree": "BS Computer Science", "years": "2010-2014"}],
  "experience": [{"title": "CEO", "company": "Acme Corp", "duration": "2020-Present", "description": null}, {"title": "VP Engineering", "company": "Prior Inc", "duration": "2018-2020", "description": null}],
  "about": "Building the future of distributed systems.",
  "about_truncated": false,
  "scraped_at": "2026-02-17T14:30:00Z"
}

example bulk scrape summary:

{
  "summary": {
    "total_requested": 50,
    "total_scraped": 48,
    "total_skipped": 2,
    "success_rate": 0.96,
    "profiles_per_minute": 4.2,
    "estimated_time_remaining_minutes": 9,
    "errors": ["janedoe2 (404 - profile not found)", "johndoe3 (network timeout)"]
  },
  "profiles": [...]
}

outcome signal

you know the skill worked when:

single profile: structured JSON object appears in chat or file export with all expected fields (name, headline, company, location, URL, etc.) matching what you see on the LinkedIn page
bulk scrape: progress updates appear every 5-10 profiles showing count, percentage, and ETA; final summary shows success_rate >= 90% with no unexpected stops
duckdb persistence: run SELECT COUNT(*) FROM v_leads WHERE source = 'LinkedIn Scrape'; and confirm new rows are present with correct data
error handling: if CAPTCHA or session expires, clear alert message appears before scrape stops (not a silent failure)
no account flags: LinkedIn doesn't show security warnings, CAPTCHA, or "unusual activity" during or immediately after scrape (sign of proper rate limiting)
data quality: extracted fields match what's visible on the actual LinkedIn pages (spot-check 2-3 profiles); no hallucinated or corrupted data