Name: Scientific Inquiry
Availability: InStock
Author: liwenyu2002
Rigorous evidence-based inquiry: decompose fuzzy questions, retrieve & grade evidence (S/A/B/C/D), cross-validate, and output conclusions with confidence int...
SKILL.md

---
name: scientific-inquiry-en
description: "Rigorous evidence-based inquiry: decompose fuzzy questions, retrieve & grade evidence (S/A/B/C/D), cross-validate, and output conclusions with confidence intervals. Includes Step 0 user confirmation to prevent direction drift."
---

# 🧪 Scientific Inquiry

> **Security Notice:** This skill uses self-modification (via `skill_manage`) but ONLY when the user explicitly commands it. See the "Controlled Self-Evolution" section for details. This prevents prompt injection and unintended auto-modification.

## Trigger Conditions

**Activate this skill when** the user asks any of the following:

- **Fact-checking:** "Is X true?" "Is X reliable?"
- **Data research:** "What's the trend/data/distribution of X?" "Look up data on X"
- **Industry research:** "How is market X doing?" "Analyze industry X"
- **Verification:** "I heard X, does that check out?" "Can this conclusion hold?"
- **Comparison:** "Which is better, X or Y?" "Compare X and Y"
- User explicitly says: "research", "investigate", "verify", "look into", "analyze", "check"

> Even simple requests (like "check this stat") activate this skill if they involve systematic information gathering.

## Core Pipeline

### Step 0: Problem Analysis → User Confirmation (Critical! Prevents Direction Drift)

Upon receiving a question, **do NOT start searching yet**. First output a research plan template:

> **📋 Research Plan**
>
> **Question:** [Restate the original question to confirm alignment]
>
> **Research type:** Fact-check / Data research / Industry study / Comparison / Trend analysis
>
> **Sub-questions:**
> 1. [Sub-question A] — Verifiability: High/Medium/Low → [Expected sources]
> 2. [Sub-question B] — Verifiability: High/Medium/Low → [Expected sources]
> 3. [Sub-question C] — Verifiability: High/Medium/Low → [Expected sources]
>
> **Methodology:**
> - Primary search path: [Specific tools/APIs/databases]
> - Keywords: [Search terms]
> - Fallback if key data is unavailable: [Alternative approach]
>
> **Expected output:**
> - Expected confidence: High/Medium/Low
> - Main uncertainties: [Anticipated blind spots]
>
> ---
>
> ✅ Does this direction look good? Let me know and I'll proceed with Step 1-4.

**Do NOT make any retrieval tool calls until the user confirms.**

### Step 1: Decompose Into Sub-questions

Break the fuzzy question into verifiable atomic statements. For each:

- **Verifiability:** High (public data/literature) / Medium (indirect evidence) / Low (little public info)
- **Evidence type:** Quantitative (specific numbers) / Qualitative (trend judgment)
- **Source direction:** E.g., academic papers, official data, industry reports, news articles

### Step 1.5 (Critical Prerequisite): Verify Baseline Facts

**Before formal research, check these prerequisites:**

#### 🔴 Time-Baseline Check
- Search product/event + "launch" "announce" "release" — confirm **if it already happened**
- High-risk categories: consumer electronics, policy changes, earnings calls, product releases
- If results show the event has occurred, **pivot immediately** — don't keep analyzing based on old data

> **Classic failure mode:** User asks "will Huawei phones get more expensive?" You analyze storage cost trends for 30 minutes. Meanwhile, the Pura 90 already launched with published pricing. You're predicting history.

#### 🔴 Search Engine Diagnostic
Before committing to a search tool, quickly test availability:

1. **Try web_search first** — simple query, check if results come back normally
2. **If web_search fails** — use `curl -sL` to Google/Bing/DuckDuckGo; distinguish CAPTCHA from timeout
3. **Three failure modes:**
   - **CAPTCHA block** (Google's "sorry" page / DuckDuckGo checkbox grid / Baidu slider) → switch search engine immediately
     - Do NOT retry the same engine more than 2 times
     - Try a different engine or use the video platform fallback (Step 2b)
   - **Timeout / empty page** (`(empty page)` or `ERR_TIMED_OUT`) → network/proxy issue
     - First confirm basic connectivity with `curl` to a simple HTTP target
     - Bing's `(empty page)` sometimes resolves after pressing Enter/submitting the search form
   - **Login redirect** (site search requiring auth) → abandon, use alternative sources
4. **Choose fallback channel based on failure mode** — see Step 2b below

> This step prevents wasted calls on dead search channels. If all search engines are blocked, video platform titles + vertical media browsing is 10x more productive than retrying Google.


### Step 2: Evidence Retrieval (Classified & Graded)

Every piece of evidence **MUST** be annotated with source and grade. See the "Evidence Classification Discipline" section for detailed definitions.

Prioritize S/A-grade evidence; B/C are supplementary only.

```
S-grade: Primary academic literature / Official statistics / Raw data APIs
A-grade: Authoritative media / Professional reports / Fully cited secondary sources
B-grade: Industry analysis / Forum discussions / Indirect data
C-grade: Social media / Single samples / Non-professional interpretations
D-grade: No source / Rumors / Obvious conflicts of interest
```

Present findings as an evidence table:

| Evidence | Source | URL | Grade | Sub-question |
|----------|--------|-----|-------|-------------|
| ... | ... | ... | ... | ... |

**Source URLs are mandatory.** A bare site name (e.g., "YouTube") is not a valid source. Even search engine results should link to the search page or specific result.

### Step 2b: Fallback Search Strategies

When mainstream search engines are blocked or return empty results:

**1️⃣ Video platform search** — YouTube (for pricing/product info), or local equivalents
   - Video titles often contain structured data (prices, specs, dates)
   - Multiple creator titles covering the same number → higher confidence
   - Upload date ≈ event date, accurate to the day
   - Comments and related recommendations can reveal additional intel
   - Search multiple keyword variants (product + price / product + launch / CEO + statement)

**2️⃣ Direct access to vertical media**
   - Tech news sites, industry publications
   - Note: some require login; try site-specific Google search syntax

**3️⃣ E-commerce platforms**
   - Official brand stores, marketplaces
   - Note: may redirect to login pages

**4️⃣ Social media**
   - Weibo, Twitter/X, Reddit — if accessible

**5️⃣ Text-mode search engines**
   - DuckDuckGo lite, Startpage
   - Note: may still trigger CAPTCHA

> **Priority:** Video platform titles > Vertical media > E-commerce > Social media. Video title info density and timeliness often exceed other sources for consumer products.

### Step 3: Cross-Validation

For each sub-question:
- At least **2 independent sources**
- Label inter-evidence relationship: **Consistent** / **Contradictory** / **Complementary**
- If contradictory, analyze possible causes (methodology differences / vested interests / time window / sample bias)

### Step 4: Conclusion Output (✅/⚠️/❌ Symbol Format)

Two-block output:

**Block A — Claim Verification Report (one line per key finding)**

```
✅ CONFIRMED: 「Pura 90 starts at ¥4,699」→ 5 creator video titles agree + financial media report
⚠️ UNVERIFIABLE: 「Huawei stockpiled 100M NAND chips」→ single comment section post (D-grade), no media confirmation
❌ CONTRADICTED: 「Pura 90 will be more expensive than Pura 80」→ actual launch price ¥4,699, same as predecessor
```

**Block B — Overall Judgment**

```
Proposition: [One-sentence restatement]

Confidence:
  ✅ High (≥80%) — Multiple S/A-grade evidence consistent
  ⚠️ Medium (50-80%) — Key data gaps exist
  ❌ Low (<50%) — Mostly inference

Top-3 Key Evidence (with URLs):
  1. [Evidence A] — S-grade — [Source](URL)
  2. [Evidence B] — A-grade — [Source](URL)
  3. [Evidence C] — B-grade — [Source](URL)

Core Uncertainties:
  - [Uncertainty 1]
  - [Uncertainty 2]
```

## Evidence Classification Discipline (Critical!)

Evidence grades are decoration — they are the LIFEBLOOD of your conclusion.

### Grade Definitions

| Grade | Definition | Examples | Usable? |
|-------|-----------|----------|---------|
| **S** | Primary academic lit / Official stats / Raw data APIs / Authoritative market reports | Peer-reviewed papers, government statistics, exchange data | ✅ Standalone |
| **A** | Respected media / Professional analysis / Fully cited secondary sources | Reuters, Bloomberg, financial analyst reports | ✅ Needs ≥1 corroboration |
| **B** | Industry analysis / Forum discussions / Indirect data / Raw executive quotes | CEO statements (cross-verified across video titles), tech news | ✅ Needs ≥2 cross-references |
| **C** | Social media / Single samples / Non-professional reading / Snippet from search results | Individual blog posts, Reddit answers, single YouTube title | ⚠️ Leads only, cannot conclude |
| **D** | No source / Rumors / Obvious conflict of interest / **User comment section** | YouTube/Reddit comment section, anonymous forum posts | ❌ **Never** use as evidence |

### Core Rules

1. **Video titles = C-grade (weak lead starting point)**
   - Same data point confirmed in 3+ independent creator titles → upgrade to B
   - Combined with professional media coverage → A-

2. **Comment section user posts = D-grade (unreliable by default)**
   - **Never cite as evidence**, no matter how detailed or plausible!
   - Use comment info only as "search suggestions" — take the keyword, find a real source

3. **Source URLs are mandatory, not optional**
   - Every evidence item MUST include a full URL
   - "Found on YouTube" is not a valid source
   - Search engine result page URLs count if you label the search term

4. **Better to say less than to fabricate**
   - When key data is missing, mark "pending collection" or "no reliable source found"
   - Never fill gaps with D-grade material or assumed values
   - Wrong conclusions should be DELETED entirely, not left as "to be verified"

## Controlled Self-Evolution (方案B — Guarded Mode)

> **🔴 Security Constraint:** This skill's self-modification is gated behind explicit user commands.
>
> User provides feedback → default action: update memory only (no skill file change)
> User says "update the skill" / "commit this to the skill" / "add this to the workflow" → then execute skill_manage
>
> This prevents: malicious input injection / accidental trigger during research / unconfirmed auto-modification

### Recording Phase (Default Behavior)

When the user provides improvement feedback:

1. **Store in memory first** — `memory(action='add', ...)` records preferences and lessons
2. No automatic `skill_manage` calls, no SKILL.md modification

### Upgrade Phase (Explicit User Command Required)

Only execute `skill_manage(patch)` when the user explicitly says:

- "Update the skill"
- "Add this to the skill"
- "Commit this to the workflow / to common pitfalls"
- "Add this to evidence grades / trigger conditions / search strategies"
- Any phrase containing "update skill", "commit to skill", "save to skill"

Common trigger scenarios:

| User feedback type | Record to memory | Upgrade to skill |
|-------------------|-----------------|-----------------|
| **Direction correction**: "This sub-question isn't the point" | ✅ Default | When user says "update the skill accordingly" |
| **Evidence standard**: "This source isn't good enough" | ✅ Default | When user says "add this to the evidence discipline" |
| **Format preference**: "Too long / give me a short version first" | ✅ Default | When user says "save this format to the skill" |
| **New scenario**: "This isn't just fact-checking, it's data research" | ✅ Default | When user says "add this to trigger conditions" |
| **Methodology**: "You should plan before executing" | ✅ Default | When user says "add this to the workflow" |
| **Recurring error** (≥2 same class) | ✅ Default | When user says "add this to common pitfalls" |

## Scenario Types

| Scenario | Characteristics | Watch Out For |
|----------|----------------|--------------|
| Fact-check | Verify a specific claim | Find primary source, watch for telephone game distortions |
| Trend analysis | Predict direction of a metric | Separate short-term noise from long-term trends, note data window |
| Comparison | Compare options | Ensure full dimension coverage, avoid survivorship bias |
| Causal analysis | Did A cause B? | Distinguish correlation from causation, watch for confounders |
| Consumer pricing/product research | Product pricing and storage strategy | **First verify if the product is already launched!** Check executive statements; find raw component cost data from market research firms |

## Quality Checklist

- [ ] Step 0 plan output and user confirmation received?
- [ ] Each sub-question has ≥1 evidence source?
- [ ] Every evidence item graded?
- [ ] Contradictory evidence analyzed for probable cause?
- [ ] Conclusion includes confidence level and uncertainties?
- [ ] **Discipline check**: Any comment-section UGC cited as evidence? Source URLs complete? Any "to be verified" speculation left?

## Common Pitfalls

- **Don't skip Step 0:** Even if the direction seems obvious. Wrong direction × fast search = wasted time.
- **Don't search only for supporting evidence:** Actively look for counter-arguments. Avoid confirmation bias.
- **Distinguish "no evidence" from "evidence against":** Not finding something ≠ it doesn't exist. Label as "not found", not "disproven".
- **Watch data timeliness:** Especially for prices and policies. Note the collection date.
- **Keep user updated during long searches:** If retrieval exceeds 5 steps, report progress between steps. No silent running.
- **Verify product/event existence before predicting:** The most common embarrassing mistake — predicting a "soon to launch" product that already launched.
- **Never cite comment-section UGC as evidence:** Default grade D. Use comments only as search leads.
- **Distinguish "search result title" from "comment post":** A YouTube/Reddit video title is C-grade (creator's public info). A comment on that video is D-grade. Different worlds.
- **Source URLs must be complete:** Bare site names don't count. Search result page URLs with labeled search terms count.

---
Scientific Inquiry

SKILL.md

related skills