Generate a personalised nutrition report from your genetic data (23andMe, AncestryDNA, or VCF). Analyses 40+ genes affecting nutrient metabolism, absorption,...
SKILL.md

---
name: nutrigenomics
description: Generate a personalised nutrition report from your genetic data (23andMe, AncestryDNA, or VCF). Analyses 40+ genes affecting nutrient metabolism, absorption, and food sensitivities. All processing is local — your genetic data never leaves your device.
metadata: {"openclaw": {"requires": {"bins": ["python3"]}, "emoji": "🧬"}}
---

# Nutrigenomics — Personalised Nutrition from Genetic Data

**Skill ID**: `nutrigenomics`
**Version**: 0.3.2
**Status**: Beta
**Author**: David de Lorenzo
**Requires**: Python 3.11+, pandas, numpy, matplotlib, seaborn, reportlab (optional)

---

## What This Skill Does

The Nutrigenomics generates a **personalised nutrition report** from consumer
genetic data (23andMe, AncestryDNA raw files or VCF). It interrogates a curated
set of nutritionally-relevant SNPs drawn from GWAS Catalog, ClinVar, and
peer-reviewed nutrigenomics literature, then translates genotype calls into
actionable dietary and supplementation guidance — all computed locally.

**Key outputs**
- Markdown nutrition report with risk scores and per-SNP genotype calls
- Radar chart of nutrient risk profile
- Gene × nutrient heatmap
- Reproducibility bundle (`README_reproducibility.txt`, `environment.yml`, `checksums.txt`, `provenance.json`)

---

## Trigger Phrases

The Bio Orchestrator should route to this skill when the user says anything like:

- "personalised nutrition", "nutrigenomics", "diet genetics"
- "what should I eat based on my DNA"
- "nutrient metabolism", "vitamin absorption genetics"
- "MTHFR", "APOE", "FTO", "BCMO1", "VDR", "FADS1/2"
- "folate", "omega-3", "vitamin D", "caffeine metabolism", "lactose", "gluten"
- Input files: `.txt` or `.csv` (23andMe), `.csv` (AncestryDNA), `.vcf`

---

## Curated SNP Panel

### Macronutrient Metabolism

| Gene    | SNP        | Nutrient Impact                          | Evidence |
|---------|------------|------------------------------------------|----------|
| FTO     | rs9939609  | Energy balance, fat mass, carb sensitivity | Strong (GWAS) |
| PPARG   | rs1801282  | Fat metabolism, insulin sensitivity      | Moderate |
| APOA5   | rs662799   | Triglyceride response to dietary fat     | Strong |
| TCF7L2  | rs7903146  | Carbohydrate metabolism, T2D risk        | Strong |
| ADRB2   | rs1042713  | Fat oxidation, exercise × diet interaction | Moderate |

### Micronutrient Metabolism

| Gene    | SNP        | Nutrient                | Effect of risk allele            |
|---------|------------|-------------------------|----------------------------------|
| MTHFR   | rs1801133  | Folate / B12            | ↓ 5-MTHF conversion (~70%)       |
| MTHFR   | rs1801131  | Folate / B12            | ↓ enzyme activity (~30%)         |
| MTR     | rs1805087  | B12 / homocysteine      | ↑ homocysteine risk              |
| BCMO1   | rs7501331  | Beta-carotene → Vitamin A | ↓ conversion (~50%)             |
| BCMO1   | rs12934922 | Beta-carotene → Vitamin A | ↓ conversion (compound het)    |
| VDR     | rs2228570  | Vitamin D absorption    | ↓ VDR function                   |
| VDR     | rs731236   | Vitamin D               | ↓ bone mineral density response  |
| GC      | rs4588     | Vitamin D binding       | ↑ deficiency risk                |
| SLC23A1 | rs33972313 | Vitamin C transport     | ↓ renal reabsorption             |
| ALPL    | rs1256335  | Vitamin B6              | ↓ alkaline phosphatase activity  |

### Omega-3 / Fatty Acid Metabolism

| Gene    | SNP        | Nutrient             | Effect                          |
|---------|------------|----------------------|---------------------------------|
| FADS1   | rs174546   | LC-PUFA synthesis    | ↑/↓ EPA/DHA from ALA            |
| FADS2   | rs1535     | LC-PUFA synthesis    | Modulates omega-6:omega-3 ratio |
| ELOVL2  | rs953413   | DHA synthesis        | ↓ elongation of EPA→DHA         |
| APOE    | rs429358   | Saturated fat response | ε4 → ↑ LDL-C on high SFA diet |
| APOE    | rs7412     | Saturated fat response | Combined with rs429358 for ε typing |

### Caffeine & Alcohol

| Gene    | SNP        | Compound    | Effect                         |
|---------|------------|-------------|--------------------------------|
| CYP1A2  | rs762551   | Caffeine    | Slow/Fast metaboliser          |
| AHR     | rs4410790  | Caffeine    | Modulates CYP1A2 induction     |
| ADH1B   | rs1229984  | Alcohol     | Acetaldehyde accumulation risk |
| ALDH2   | rs671       | Alcohol     | Asian flush / toxicity risk    |

### Food Sensitivities

| Gene    | SNP        | Sensitivity          | Effect                          |
|---------|------------|----------------------|---------------------------------|
| MCM6    | rs4988235  | Lactose intolerance  | Non-persistence of lactase      |
| HLA-DQ2 | Proxy SNPs | Coeliac / gluten     | HLA-DQA1/DQB1 risk haplotypes   |

### Antioxidant & Detoxification

| Gene    | SNP        | Pathway              | Effect                          |
|---------|------------|----------------------|---------------------------------|
| SOD2    | rs4880     | Manganese SOD        | ↓ mitochondrial antioxidant     |
| GPX1    | rs1050450  | Selenium / GSH-Px    | ↓ glutathione peroxidase        |
| GSTT1   | Deletion   | Glutathione-S-trans  | Null genotype → ↑ oxidative risk|
| NQO1    | rs1800566  | Coenzyme Q10         | ↓ CoQ10 regeneration            |
| COMT    | rs4680     | Catechol / B vitamins | Met/Val → methylation load     |

---

## Algorithm

### 1. Input Parsing (`parse_input.py`)

Accepts:
- 23andMe `.txt` or `.csv` (tab-separated: rsid, chromosome, position, genotype)
- AncestryDNA `.csv`
- Standard VCF (extracts GT field)

Auto-detects format from header lines. Normalises alleles to forward strand using
a hard-coded reference table (avoids requiring external databases).

### 2. Genotype Extraction (`extract_genotypes.py`)

For each SNP in the panel:
1. Look up rsid in parsed data
2. Return genotype string (e.g. `"AT"`, `"TT"`, `"AA"`)
3. Flag as `"NOT_TESTED"` if absent (common for chip-to-chip variation)

### 3. Risk Scoring (`score_variants.py`)

Each SNP is scored on a **0 / 0.5 / 1.0** scale:
- `0.0` — homozygous reference (lowest risk)
- `0.5` — heterozygous
- `1.0` — homozygous risk allele

Composite **Nutrient Risk Scores** (0–10) are computed per nutrient domain by
summing weighted SNP scores. Weights are derived from reported effect sizes
(beta coefficients or OR) in the primary literature.

Risk categories:
- **0–3**: Low risk — standard dietary advice applies
- **3–6**: Moderate risk — dietary optimisation recommended
- **6–10**: Elevated risk — consider testing and targeted supplementation

> **Important caveat**: These are polygenic risk indicators based on common
> variants. They are not diagnostic. Rare pathogenic variants (e.g. MTHFR
> compound heterozygosity with high homocysteine) require clinical confirmation.

### 4. Report Generation (`generate_report.py`)

Outputs a structured Markdown report with:
- Executive summary (top 3 personalised findings)
- Per-nutrient sections: genotype table → interpretation → recommendation
- Radar chart (matplotlib) of nutrient risk scores
- Gene × nutrient heatmap (seaborn)
- Supplement interactions table
- Disclaimer section
- Reproducibility block

### 5. Reproducibility Bundle (`repro_bundle.py`)

Exports to the output directory (not committed to the repo):
- `README_reproducibility.txt` — step-by-step instructions to reproduce the analysis manually
- `environment.yml` — pinned conda environment
- `checksums.txt` — SHA-256 checksums of the SNP panel and output report (input file intentionally excluded to avoid persisting a fingerprint of genetic data)
- `provenance.json` — timestamp, version, and format arguments (input filename intentionally omitted)

**Note**: No executable scripts are generated. The reproducibility bundle contains
only text files for documentation and integrity verification.

---

## Execution

To run the analysis on a user-provided genetic file, execute this command directly:

```bash
python {baseDir}/openclaw_adapter.py --input <path_to_genetic_file> --format auto
```

To run a demo without real genetic data (synthetic patient file included with the skill):

```bash
python {baseDir}/openclaw_adapter.py --input {baseDir}/tests/synthetic_patient.csv --format 23andme
```

`{baseDir}` is replaced by OpenClaw at runtime with the absolute path to this skill's folder. Do not substitute it manually. Output is written to a timestamped directory (`nutrigenomics_output_YYYYMMDD_HHMMSS/`) in the current working directory and persists until manually deleted.

Supported `--format` values: `auto` (default), `23andme`, `ancestry`, `vcf`.

## Usage

```bash
# From 23andMe raw data
openclaw "Generate my personalised nutrition report from genome.csv"

# From VCF
openclaw "Run Nutrigenomics analysis on variants.vcf and flag any folate pathway risks"

# Targeted query
openclaw "What does my APOE status mean for my saturated fat intake?"

# Run the demo report (no real genetic data needed)
openclaw "Run a demo nutrigenomics report using the synthetic patient file"
```

---

## File Structure

```
skills/nutrigenomics/
├── SKILL.md                      ← this file (agent instructions)
├── nutrigenomics.py            ← main entry point
├── parse_input.py                ← multi-format parser
├── extract_genotypes.py          ← SNP lookup engine
├── score_variants.py             ← risk scoring algorithm
├── generate_report.py            ← Markdown + figures
├── repro_bundle.py               ← reproducibility export
├── data/
│   └── snp_panel.json            ← curated SNP definitions
├── tests/
│   ├── synthetic_patient.csv     ← fixed 23andMe-format test data (for pytest)
│   └── test_nutrigenomics.py           ← pytest suite
└── examples/
    ├── generate_patient.py       ← random patient generator (demo use)
    ├── data/                     ← generated patient files land here (gitignored)
    └── output/
        ├── nutrigenomics_report.md     ← pre-rendered demo report
        ├── nutrigenomics_radar.png     ← demo radar chart (nutrient risk profile)
        └── nutrigenomics_heatmap.png   ← demo gene × nutrient heatmap
```

> **Note**: Runtime output directories and randomly generated patient files are
> excluded from version control. Only the pre-rendered demo
> report in `examples/output/` is committed.

---

## Privacy

All computation runs **locally** — no genetic data is ever transmitted to external
servers or third-party services.

**What the report contains**: The Markdown report includes per-SNP genotype calls
(e.g. `AT`, `TT`) for each of the 58 panel SNPs analysed. This is intentional:
knowing your specific genotype at each nutrition-related locus is what makes the
report actionable. Full raw genome data from the input file is not reproduced in
the report; only the 58 panel SNPs are included.

**File persistence**: Output files (report, figures, reproducibility bundle) are
written to a timestamped `nutrigenomics_output_YYYYMMDD_HHMMSS/` directory under
the working directory and **persist on disk until manually deleted**. The input
file is read-only and is never copied into the output directory.

If you are running this skill on behalf of others or in a shared environment,
delete the output directory once the user has downloaded their results.

---

## Limitations & Disclaimer

1. **Not a medical device.** This skill provides educational, research-oriented
   nutrigenomics analysis. It does not constitute medical advice.
2. **Common variants only.** The panel covers SNPs with MAF > 1% in at least one
   major population. Rare pathogenic variants are out of scope.
3. **Population context.** Effect sizes are predominantly derived from European
   GWAS cohorts. Risk estimates may not generalise equally across all ancestries.
4. **Gene–environment interaction.** Genetic risk scores interact with baseline
   diet, lifestyle, microbiome, and epigenetic state. A "high risk" score does not
   mean a nutrient deficiency is present — it means the individual may benefit from
   monitoring.
5. **Simpson's Paradox note.** Population-level associations used to derive weights
   may not reflect individual trajectories (see Corpas 2025, *Nutrigenomics and
   the Ecological Fallacy*).

---

## Roadmap

- [ ] **v0.2**: Microbiome × genotype interaction module (16S rRNA input)
- [ ] **v0.3**: Longitudinal tracking — compare reports across time
- [ ] **v0.4**: HLA typing for immune-mediated food reactions (coeliac, gluten sensitivity)
- [ ] **v1.0**: Multi-omics integration (metabolomics + genomics + dietary recall)

---

## References

This skill's SNP panel and methodology are informed by peer-reviewed nutrigenomics research. For verification and additional details, consult:

- **PubMed MEDLINE**: https://pubmed.ncbi.nlm.nih.gov/
- **GWAS Catalog**: https://www.ebi.ac.uk/gwas/ (published genome-wide association studies)
- **ClinVar**: https://www.ncbi.nlm.nih.gov/clinvar/ (variant interpretations)

Users are encouraged to verify specific claims through these authoritative sources and with qualified healthcare providers.

---

## Contributing

The SNP panel (`data/snp_panel.json`) is maintained by the skill author.
To suggest additions or corrections, contact David de Lorenzo directly via
GitHub ([@drdaviddelorenzo](https://github.com/drdaviddelorenzo)) or open
an issue on GitHub.
Nutrigenomics

SKILL.md

related skills