Predict rice agronomic traits (yield, plant height, heading date, grain size, etc.) from genotype and environmental data using pre-trained MMoE deep learning...
SKILL.md

---
name: rice-phenotype-prediction
description: >-
  Predict rice agronomic traits (yield, plant height, heading date, grain size, etc.)
  from genotype and environmental data using pre-trained MMoE deep learning models.
  Use when the user asks about rice phenotype prediction, crop trait estimation,
  genotype-environment interaction, or environmental stress effects on rice.
  Supports Chinese and English. Trigger terms: 水稻, 表型, 预测, 株高, 产量, 粒长,
  抽穗期, 千粒重, 结实率, rice, phenotype, yield, trait, stress.
---

# Rice Phenotype Prediction

Self-contained skill for predicting 10 rice agronomic traits via pre-trained MMoE models.
All models, data, and scripts are inside this directory — give users this one folder.

## Setup

### First-time check
```bash
python <SKILL_DIR>/scripts/check_env.py
```
This verifies Python dependencies and data integrity. If packages are missing:
```bash
pip install -r <SKILL_DIR>/requirements.txt
```

Required: `torch>=2.0 numpy pandas scikit-learn scipy requests`
GPU is optional — CPU works (just slower). If GPU is present, `cuda:0` is used automatically.

### `<SKILL_DIR>` convention

Throughout this file, `<SKILL_DIR>` means the absolute path to this skill's root directory
(the folder containing this SKILL.md). When running commands, substitute with the actual path.
`--base_dir` is optional; if omitted, scripts auto-detect it from their own location.

## Supported Traits

| Code | Chinese | English | Unit |
|------|---------|---------|------|
| HD | 抽穗期 | Heading Date | days |
| PH | 株高 | Plant Height | cm |
| PL | 穗长 | Panicle Length | cm |
| TN | 分蘖数 | Tiller Number | count |
| GP | 每穗粒数 | Grains Per Panicle | count |
| SSR | 结实率 | Seed Setting Rate | % |
| TGW | 千粒重 | Thousand Grain Weight | g |
| GL | 粒长 | Grain Length | mm |
| GW | 粒宽 | Grain Width | mm |
| Y | 产量 | Yield | kg/ha |

## Supported Locations (7 built-in stations)

| Code | City | Lat | Lon |
|------|------|-----|-----|
| km | 昆明 | 25.02 | 102.68 |
| gzl | 六盘水 | 26.59 | 104.83 |
| nn | 南宁 | 22.82 | 108.37 |
| wh | 武汉 | 30.58 | 114.27 |
| hf | 合肥 | 31.82 | 117.25 |
| hz | 杭州 | 30.25 | 120.17 |
| th | 通化 | 41.73 | 125.94 |

Any input lat/lon is auto-matched to the nearest station via Haversine distance.
For locations with internet, daily weather data can also be fetched from NASA POWER API for the exact coordinates.

## Stress Types

| Type | Chinese | Default effect |
|------|---------|----------------|
| high_temp | 高温胁迫 | +3°C max / +2°C min |
| low_temp | 低温胁迫 | -3°C max / -2°C min |
| drought | 干旱胁迫 | 90% precipitation reduction |
| flood | 涝害胁迫 | 3x precipitation increase |
| low_light | 寡照胁迫 | 60% PAR reduction |

## Prediction Commands

### Full prediction (recommended)
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1
```

### Genotype-only / environment-only
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --mode gene
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --mode env
```

### Specific traits
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --trait PH,Y
```

### With stress
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --stress high_temp
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --stress high_temp --stress_delta 5.0
```

### Multiple samples
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample "sample1,sample2,sample3"
```

### Custom genotype file
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --genotype_file /path/to/user_vae.csv
```
Format: CSV with 1024 columns (VAE-encoded features), first column = sample index.

### Force CPU / specific device
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --device cpu
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --device cuda:0
```

### Human-readable table
```bash
python <SKILL_DIR>/scripts/predict.py --lat 30.5 --lon 114.3 --sample sample1 --output table
```

### All CLI arguments
| Arg | Default | Description |
|-----|---------|-------------|
| `--lat` | required | Latitude |
| `--lon` | required | Longitude |
| `--sample` | None | Built-in sample ID(s), comma-separated (sample1..sample3925) |
| `--genotype_file` | None | Custom 1024-dim VAE CSV path |
| `--mode` | full | `gene`, `env`, or `full` |
| `--trait` | all | Comma-separated trait codes or `all` |
| `--stress` | None | Stress type name |
| `--stress_delta` | None | Override temperature delta |
| `--device` | auto | `auto`, `cpu`, or `cuda:0` |
| `--year` | 2024 | Year for environmental data |
| `--output` | json | `json` or `table` |
| `--base_dir` | auto | Override skill directory path |

## Handling User Requests

### 1. Extract location
- "经纬度30.5, 114.3" → `--lat 30.5 --lon 114.3`
- "武汉" → `--lat 30.58 --lon 114.27`
- "北纬25度，东经103度" → `--lat 25 --lon 103`

### 2. Map trait names
- 株高/plant height → PH
- 产量/yield → Y
- 粒长/grain length → GL
- 抽穗期/heading date → HD
- 千粒重/1000-grain weight → TGW
- 穗长/panicle length → PL
- 结实率/seed setting rate → SSR
- 每穗粒数/grains per panicle → GP
- 粒宽/grain width → GW
- 分蘖数/tiller number → TN

### 3. Map stress requests
- 高温/heat → high_temp
- 低温/cold/chilling → low_temp
- 干旱/drought → drought
- 洪涝/flooding → flood
- 阴天/寡照/low light → low_light
- "高温+5度" → `--stress high_temp --stress_delta 5.0`

### 4. Genotype data
- Built-in samples: `--sample sample1` (3925 available: sample1..sample3925)
- User file: `--genotype_file /path/to/file.csv`

### 5. Interpreting output
JSON contains: `location`, `genotype_prediction`, `environment_prediction`, `stress_prediction`, `trait_info`.

Report `environment_prediction` as primary (has environmental context).
Compare `genotype_prediction` as baseline.
For stress, compare normal vs stressed values.

Rounding: HD/TN/GP → integer, PH/PL/TGW/SSR → 1 decimal, GL/GW → 2 decimals, Y → integer.

## Directory Structure
```
rice_prediction/                   ← give users this folder
├── SKILL.md                       ← this file
├── requirements.txt               ← pip dependencies
├── data/
│   ├── grid_points.json           ← 7 station coordinates
│   ├── vae_features.csv           ← 3925 built-in genotype samples (1024-dim VAE)
│   ├── season_history.csv         ← historical season data for normalization
│   ├── env_cache/                 ← cached daily weather (auto-populated)
│   ├── models_env/                ← 10 trait-specific env+gene models (~4.6MB each)
│   └── models_gene/               ← 7 location-specific genotype models (~8MB each)
└── scripts/
    ├── predict.py                 ← main entry point
    ├── check_env.py               ← dependency checker
    ├── model_def.py               ← MMoE model architectures
    ├── grid_manager.py            ← nearest grid point finder
    ├── env_data_fetcher.py        ← NASA POWER API fetcher + cache
    ├── env_processor.py           ← environmental feature engineering
    └── stress_simulator.py        ← stress scenario simulation
```

## Architecture (for reference)
- **Model**: Multi-gate Mixture-of-Experts (MMoE) with ResidualMLP experts
- **Genotype features**: 1024-dim VAE latent encoding of genomic data
- **Environment features**: 53 season-aggregated variables from daily weather
- **Environmental data**: NASA POWER API (auto-fetched and cached locally)
GAIN

SKILL.md

related skills