Screen and rank Chinese social media KOLs by matching keyword content within a time window using web search aggregation, reporting evidence and confidence.
SKILL.md

---
name: kol-content-screening
description: Screen and rank Chinese social media KOLs (抖音 Douyin / 小红书 Xiaohongshu / 今日头条 Toutiao / 视频号 / B站) by whether they have published content matching a given keyword (brand, competitor, topic) within a time window, using web search aggregation rather than platform APIs. Use when the user provides a list of KOL accounts (handle / nickname / fan count / homepage URL) and asks to find which ones posted about X (e.g., 比亚迪 / 宁德时代 / DM-i / 王传福) in the past N months, or to rank/sort them by relevance / fan count / interaction. Triggers on phrases like "筛选达人", "KOL 内容查询", "媒体账号比亚迪相关内容", "查这批博主有没有发过 X", "投放筛选". Optimized for marketing / PR / competitive-intel workflows where the platform's own API is unavailable. NOT for: real per-video interaction counts (need 蝉妈妈/灰豚/新红 paid data services), discovering new accounts from scratch, or non-Chinese platforms.
---

# KOL Content Screening (Web-Search Based)

Screen Chinese social media KOL lists for keyword-matching content within a time window. Output a ranked, evidence-backed report. Used heavily in PR / marketing / competitive intel work where you receive a "已知账号清单 + 关键词 + 时间窗" and need to know "谁发过、谁没发"。

## Hard Truths Up Front

Tell the user these before promising anything:

1. **No reliable per-video interaction counts via web search.** 抖音/小红书 single-video 点赞/评论/收藏 are not stably indexed by general search engines. Mark as "无公开数据" rather than guess. For real numbers the user must use 蝉妈妈 / 灰豚 / 新红 (抖音), 千瓜 / 新红 (小红书), or platform open APIs.
2. **"未发现" ≠ "没发过".** Web search has indexing gaps. Always frame negatives as "公开检索未发现证据 (within N months)". Never claim a creator definitely hasn't posted X.
3. **Same nickname ≠ same person.** Verify by handle/UID/homepage URL, not by name. 抖音号 / 小红书 user_id / 头条 author UID are the only reliable identifiers.
4. **Time window matters.** State the explicit window (e.g. `2025-05-05 ~ 2026-05-05`) in the report header. Old content (>1 year) gets marked separately, not mixed into "active" set.

If the user asks for accurate single-video互动量 排序, **stop and warn**: this needs paid data services. Get explicit acknowledgement before proceeding with web-only screening.

## Core Workflow

### 1. Intake (clarify before running)

Always confirm 5 parameters before spawning sub-agents:

| Parameter | Example | Notes |
|-----------|---------|-------|
| Platforms | 抖音 + 小红书 + 头条 | Each platform = independent sub-task |
| Account list | (CSV/table from user) | Need: handle/UID + nickname + fan count + homepage URL |
| Keywords | 比亚迪 / BYD / 王传福 / DM-i / 仰望 | Include EN + CN + product lines + key person names |
| Time window | 近 12 个月 (`YYYY-MM-DD ~ YYYY-MM-DD`) | Compute exact dates; don't pass "近一年" verbatim |
| Sort dimension | 有内容档→粉丝量降序 / 互动量 / 关键词命中数 | Without 互动量数据来源, default to fan count desc within match-tier |

**Common intake mistake**: User pastes a Windows-clipboard HTML fragment (`Version:1.0 StartHTML:...`) — that's the raw clipboard envelope. The actual table data is below it. Parse account list directly from the rest of the paste.

### 2. Group & Parallelize

For >15 accounts on one platform, split into groups of 8–10 and spawn parallel sub-agents. Empirically: 36 抖音 accounts → 4 groups of ~9, 24 小红书 accounts → 3 groups of 8.

```
Per platform → 拆 N 组 → 每组 1 sub-agent → 并行 → 各自写文件 → 主 session 汇总排序
```

Each sub-agent writes ONE file. File naming convention:
```
{platform-prefix}-{keyword-slug}-research-group{N}.md
```
where `platform-prefix` is `douyin` / `xhs` / `tt` / `sph` (视频号) / `bilibili`.

Sub-agent prompt template: see `references/subagent-prompt-template.md`.

### 3. Per-account search procedure

Each sub-agent, for each account, runs **at least two** queries on the chosen web search tool (xiaosu-search or equivalent):

```
Q1: "{nickname}" {handle} {keyword}
Q2: "{nickname}" {keyword} site:{platform-domain}
Q3 (if Q1+Q2 weak): {nickname} {keyword} {YYYY}   # last 12 months explicit
```

Where `{platform-domain}` is `douyin.com` / `xiaohongshu.com` / `toutiao.com` / etc.

For each hit, the sub-agent records:
- **Title** (or first line of post)
- **Date** (verify within window — outside-window hits noted separately)
- **URL** (must point to the creator's own post, not third-party reposts/quotes)
- **Stance** (正向 / 中性 / 负向 / 仅提及) — affects PR usability
- **Confidence** (高 / 中 / 低) — based on evidence strength

For each account, the sub-agent must explicitly check for **ID collision**: search for the nickname alone, see if the top hits are this person's handle. If collision is detected (e.g. "南希Nancy" — multiple persons), flag it.

See `references/platform-search-tips.md` for platform-specific quirks (site filters, profile URL formats, common false positives).

### 4. Aggregate & Rank

Main session reads all group files and merges into one ranked table. Default ranking:

```
Tier 1 🟢  — 近一年内有明确证据（带 URL、日期、内容摘要）
Tier 2 🟡  — 仅旧内容（>窗口）/ 间接提及 / 证据较弱
Tier 3 🔴  — 公开检索未发现
```

Within each tier: sort by fan count desc by default. If user asked for interaction-based ranking but data is unavailable, **state this explicitly** in the report and fall back to fan count + provide caveat.

Final report structure: see `references/output-schema.md`.

### 5. Deliver

Output to `<workdir>/<keyword-slug>-kol-screening-{YYYYMMDD}.md` (markdown table) plus per-platform group files. If user wants 飞书 Sheet, build the markdown first, then offer to push via `lark-cli sheets` (separate skill).

## Failure Modes Seen In The Wild

Document these in the report so the user can interpret correctly:

- **Handle drift** — User pastes "楠姐财经科技头条" but real similar accounts are "楠姐聊财经" / "楠姐科技说" / "楠姐谈股论今". Report all candidates, flag uncertainty, ask user to confirm.
- **Cross-platform leak** — A 视频号 creator's content shows up only via 新浪/百度 reposts. That's still valid evidence the post exists, but mark source as `via 新浪 (转载)`.
- **Brand homonyms** — "比亚迪" matches food brand 拿铁/拿铁酱 etc.; "宁德时代" rarely collides but "宁德" alone matches geography. Use full brand name + a disambiguator keyword (王传福, 刀片电池, 车型名 for BYD; 麒麟电池, 神行, 凝聚态 for CATL).
- **Stale fan counts** — User-supplied fan numbers are snapshots. Don't recompute; record as-given with date if user provided one.
- **Profile-not-found** — Sometimes the homepage URL in the user's list 404s. Report as "主页失效", do NOT skip the account silently.

## Honest Reporting Discipline

Every report MUST include a **methodology disclaimer block** at the top:
- Data source (web search, which provider)
- What CAN'T be obtained (per-video interaction counts, follower-only content, etc.)
- Time window (explicit dates)
- Confidence framing ("未发现 ≠ 没发过")

Template in `references/output-schema.md`.

## Quick Reference

- Sub-agent prompt template → `references/subagent-prompt-template.md`
- Platform-specific search tips → `references/platform-search-tips.md`
- Output schema + methodology block → `references/output-schema.md`
- Decision table: when to refuse / when to upgrade to paid data → `references/escalation.md`
Kol Content Screening

SKILL.md

related skills