Search and download arXiv AI papers with broad or top-tier filtering, sorting by relevance or date, avoiding duplicates via local CSV management.

SKILL.md

---
name: ai-paper-researcher
description: An arXiv paper search engine designed specifically for scientific researchers. Supports dual modes: "Broad Search" and "Top-Tier Strict Filtering", along with relevance and date sorting. Automatically checks for duplicates, downloads PDFs, and maintains a local CSV paper library.
---

# AI Academic Paper Researcher

## 1. Skill Positioning & Core Objective
This skill aims to assist researchers in the AI field by searching for arXiv literature and automating PDF downloads and local file management.
**Core Principle:** All download records must rely on the local `workspace/paper_list/paper_list.csv` for deduplication to prevent repeated downloads.

## 2. Tools & Dependencies
- **Execution Script:** `python arxiv_tool.py`
- **Target Conference List:** The `target.csv` file located in the same directory as this skill (contains the names of top-tier conferences or journals the user follows, e.g., CVPR, NeurIPS, ICLR).

## 3. Sorting Strategy Selection
Before executing any search, you must decide which sorting parameter (`--sort`) to use based on the user's intent:
- **Searching for Classic Theories / Well-known Algorithms (Classic/Influential):** If the user searches for specific well-known algorithms (e.g., "Adam", "ResNet") or foundational papers in core fields, you **MUST use `--sort relevance`**. Otherwise, because arXiv defaults to returning a large number of newly submitted papers, classic older papers will be pushed out of the search results.
- **Tracking Latest Frontiers (Latest Trends):** If the user explicitly requests "latest", "this year", or "recent weeks" papers, please use `--sort date`.

## 4. Two Retrieval Modes
Infer the required mode based on the user's query:

### Mode A: Broad Search (All Relevant Mode)
**Trigger Condition:** The user only provides a research direction without restricting the papers to be published in top-tier conferences.
**Execution Logic:**
1. Run `python arxiv_tool.py search "[query]" --max 15 --sort [selected sorting strategy]`.
2. Ignore the `comment` field in the JSON response.
3. Exclude papers where `is_downloaded: true` in the results.
4. Select the papers most relevant to the user's needs and proceed directly to the download process.

### Mode B: Top-Tier Conference/Journal Strict Filtering (Top-Tier Verification Mode)
**Trigger Condition:** The user explicitly requests "top-tier conferences", "top journals", or specifies certain conferences (e.g., "Help me find Adam-related papers from past ICLR conferences").
**Execution Logic:**
1. **Read Target List:** Use the file reading tool to view the contents of `target.csv` to get the list of target conferences/journals.
2. **Initial Search:** Run `python arxiv_tool.py search "[query]" --max 30 --sort [selected sorting strategy]`. *(Note: The script automatically fetches the latest version of the paper, so if it has been accepted by a top conference, the comment will contain the relevant information.)*
3. **LLM Semantic Verification (CRITICAL):**
- Carefully review the `comment` field in the JSON of each candidate paper.
- Determine whether any conference listed in `target.csv` is present in the `comment`.
- **Note on Variations:** Be tolerant of abbreviations, year suffixes, or non-standard formatting of conference names when matching (e.g., `Accepted to ICLR 2015`, `NeurIPS'23`, `Appears in CVPR`). As long as it semantically refers to the target conference, consider it a successful match.
- If the `comment` is empty, or does not contain a publication statement for the target conference, you **MUST exclude** the paper.
4. Exclude already downloaded papers (`is_downloaded: true`).
5. Proceed to the download process for the successfully verified papers.

## 5. Download & File Persistence
1. For the filtered papers, execute the download command one by one: `python arxiv_tool.py download [arxiv_id]`.
2. Collect the script's return results.

## 6. Reporting Standard
After completing the search and download, report the final results to the user:
- Explicitly state which retrieval mode was used (Mode A/B) and which sorting method (Date/Relevance).
- List the successfully downloaded papers (Format: `[ArXiv ID] Title - (Matched conference, if any)`).

ai-paper-researcher

SKILL.md

related skills