Model Card Drafter

Use this skill when an ML engineer, data scientist, MLOps team, or responsible-AI lead needs to draft a Model Card for a machine-learning or AI model. Covers...

view source

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-31

model-card-drafter walks through eight sequential steps to collect model metadata, training details, evaluation results, and ethical considerations, then assembles a draft model card aligned to google's standard and eu ai act requirements. flags documentation gaps and requires sign-off before publication.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

7.0

documentation

8.0

view original SKILL.md from clawhubclick to expand

---
name: model-card-drafter
description: >
  Use this skill when an ML engineer, data scientist, MLOps team, or responsible-AI
  lead needs to draft a Model Card for a machine-learning or AI model. Covers intended
  use, training data, evaluation metrics, disaggregated performance, limitations, and
  ethical considerations. Produces a DRAFT Model Card aligned to Google's Model Cards
  standard and EU AI Act technical documentation requirements for MLOps and governance review.
---

# Model Card Drafter

Converts a model description, training details, and evaluation results into a structured Model Card — the standard responsible-AI artifact for documenting a machine-learning model's intended use, performance, limitations, and ethical risks. Outputs a DRAFT for ML engineer and governance review before publication or regulatory filing.

## Flow

Ask one question at a time. Wait for the user's answer before proceeding to the next step.

### Step 1 — Model Identification

Collect:
- Model name and version
- Model type (e.g., binary classifier, multi-class classifier, regression, generative language model, object detection, embedding model)
- Organization or team responsible
- Date (or version date)
- License (if applicable)

### Step 2 — Intended Use

Collect:
- Primary intended use case (what task the model is designed to perform)
- Primary intended users (who will use the model and in what context)
- Out-of-scope uses (tasks or contexts for which the model must not be used)

Prompt the user: "Are there any use cases where this model should explicitly NOT be applied?" Record as a separate "Out-of-Scope Use" section.

### Step 3 — Training Data

Collect:
- Data sources (name, origin, collection method)
- Date range of training data
- Preprocessing and filtering steps applied
- Known data gaps, biases, or demographic imbalances in the training set
- Data licensing and consent status (public dataset, proprietary, licensed, synthetic)

If the user cannot describe training data: record as "Not disclosed" and flag as a documentation gap requiring resolution before publication.

### Step 4 — Evaluation Data

Collect:
- Test/evaluation dataset name and source
- Whether the evaluation set is held-out from training (must confirm)
- Known differences between evaluation data and real-world deployment data
- Data splits used (e.g., 80/10/10 train/val/test)

### Step 5 — Performance Metrics

Collect primary and secondary evaluation metrics (e.g., accuracy, F1, AUC-ROC, BLEU, precision, recall, RMSE, calibration).

Then collect disaggregated performance results: prompt the user to provide performance broken down by at least two subgroups relevant to the model's use (e.g., age group, gender, race/ethnicity, geography, language, income bracket, device type). If disaggregated results are not available, record as "Not yet evaluated" and flag as a high-priority gap.

### Step 6 — Ethical Considerations

Collect:
- Sensitive attributes the model processes or predicts (e.g., race, gender, health status, financial status)
- Known or anticipated disparate impacts across demographic groups
- Potential for misuse or harm if misapplied
- Privacy risks (does the model process or expose personal data?)
- Any fairness interventions applied during training or post-processing

### Step 7 — Limitations and Recommendations

Collect:
- Known failure modes or edge cases
- Performance degradation conditions (distribution shift, data quality issues, temporal drift)
- Conditions under which the model must not be deployed without additional review
- Recommended human oversight level (none / human-in-the-loop / human-on-the-loop / human-in-command)
- Recommended monitoring and re-evaluation cadence

### Step 8 — DRAFT Model Card Assembly

Assemble the DRAFT using the Output Format below. Label the document clearly:

```
DRAFT — Requires ML Engineer and Governance Review
Model Card Version: [version]
Date: [date]
```

Flag every field marked "Not disclosed" or "Not yet evaluated" with a `[DOCUMENTATION GAP — MUST RESOLVE BEFORE PUBLICATION]` annotation.

## Key Rules

- **Never** fabricate performance numbers, training data descriptions, or evaluation results not provided by the user.
- **Always** include a disaggregated performance section; if data is absent, flag it prominently.
- **Always** include an out-of-scope use section.
- **Always** label the output DRAFT and include a reviewer sign-off block.
- **Never** recommend publication or regulatory submission of a Model Card with unresolved documentation gaps.
- **Never** suggest a model is safe or unbiased without evidence from actual evaluation results.
- **Ask one question at a time**; do not present all fields as a single form unless the user explicitly requests batch input.
- If the model processes sensitive attributes (health, finance, criminal justice, employment), add a bolded **HIGH-SENSITIVITY USE CASE** flag at the top of the Ethical Considerations section.

## Output Format

Produce a structured Markdown document with the following sections in order:

```
# Model Card: [Model Name] v[Version]

**Status:** DRAFT — Requires ML Engineer and Governance Review
**Date:** [date]
**Organization:** [team/org]
**License:** [license or "Not disclosed"]

---

## Model Details

| Field | Value |
|-------|-------|
| Model name | |
| Version | |
| Model type | |
| Organization | |
| Date | |
| License | |

## Intended Use

**Primary intended uses:**
[description]

**Primary intended users:**
[description]

**Out-of-scope uses:**
[description]

## Training Data

**Sources:** [list]
**Date range:** [range]
**Preprocessing:** [description]
**Known biases or gaps:** [description]
**Licensing / consent:** [status]

## Evaluation Data

**Dataset:** [name and source]
**Held-out from training:** [Yes / No / Not confirmed — flag if not confirmed]
**Known distribution gaps:** [description]
**Splits:** [e.g., 80/10/10]

## Performance Metrics

**Primary metric:** [metric] = [value]
**Secondary metrics:** [list with values]

### Disaggregated Performance

| Subgroup | [Metric 1] | [Metric 2] |
|----------|------------|------------|
| [Group A] | | |
| [Group B] | | |

[DOCUMENTATION GAP — MUST RESOLVE BEFORE PUBLICATION] if missing.

## Ethical Considerations

**Sensitive attributes processed:** [list]
**Known disparate impacts:** [description]
**Potential for misuse:** [description]
**Privacy risks:** [description]
**Fairness interventions:** [description]

## Limitations and Recommendations

**Known failure modes:** [list]
**Performance degradation conditions:** [list]
**Deployment restrictions:** [list]
**Recommended human oversight level:** [level]
**Recommended re-evaluation cadence:** [cadence]

---

## Reviewer Sign-Off

| Role | Name | Date | Signature |
|------|------|------|-----------|
| ML Engineer / Model Owner | | | |
| MLOps / Governance Lead | | | |
| Responsible AI Reviewer | | | |

*This Model Card is a DRAFT. It must not be published, filed with regulators, or cited in external communications until all documentation gaps are resolved and all sign-offs are obtained.*
```

## Feedback

If this skill did not meet your needs or you encountered a workflow it does not cover, share your feedback at: https://github.com/archlab-space/Open-Skill-Hub/issues

Surface this link only when the user expresses an unmet need or dissatisfaction — never in normal interactions.

related skills

semantically similar in the cross-vendor index

clawhub

70% match

Ai Ethics Review

Conduct a structured ethical review of an AI or ML feature, model, or product. Use when preparing to deploy an AI system, assessing algorithmic risk, auditin...

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit inputs section with required and optional parameters, expanded decision points to cover all conditional branches (missing training data, missing disaggregated performance, unconfirmed held-out eval set, high-sensitivity use cases, batch input mode), formalized output contract with file location and data format requirements, and clarified outcome signal as a 11-point checklist of concrete success criteria.

Model Card Drafter

converts a model description, training details, and evaluation results into a structured model card. the model card is the standard responsible-AI artifact for documenting a machine-learning model's intended use, performance, limitations, and ethical risks. outputs a draft for ML engineer and governance review before publication or regulatory filing.

intent

use this skill when an ML engineer, data scientist, MLOps team, or responsible-AI lead needs to document a machine-learning or AI model for internal review, governance, or regulatory compliance. the skill guides you through collecting model identity, intended use, training and evaluation data provenance, performance metrics (including disaggregated results across demographic subgroups), ethical risks, and deployment constraints. output is a structured DRAFT model card aligned to Google's Model Cards standard and EU AI Act technical documentation requirements. this skill is most useful before a model ships to production or when preparing for audit, governance review, or regulatory filing.

inputs

required:

model name, version, type (binary classifier, multi-class classifier, regression, generative language model, object detection, embedding model, etc.)
organization or team responsible for the model
model release or version date
primary intended use case and intended users (who uses it, in what context)
training data sources, date range, preprocessing steps applied
evaluation dataset name and source, confirmation that eval set is held-out from training
performance metrics (accuracy, F1, AUC-ROC, BLEU, precision, recall, RMSE, calibration, or domain-specific metrics)
disaggregated performance results broken down by at least two demographic or contextual subgroups (age, gender, race/ethnicity, geography, language, income bracket, device type, etc.)
sensitive attributes the model processes or predicts
known failure modes, edge cases, and performance degradation conditions
recommended human oversight level and re-evaluation cadence

optional:

model license (open-source, proprietary, etc.)
data licensing and consent status
known data gaps or imbalances in training set
fairness interventions applied during training or post-processing
privacy risks and data exposure surface area

note: if training data provenance, disaggregated performance, or evaluation methodology cannot be provided, the skill flags these as documentation gaps blocking publication.

procedure

ask one question at a time. wait for the user's answer before proceeding to the next question. do not present all fields as a single form unless the user explicitly requests batch input.

step 1: model identification

ask the user for:

model name and version
model type (e.g., binary classifier, multi-class classifier, regression, generative language model, object detection, embedding model)
organization or team responsible
release or version date
license (if applicable)

record each answer in a structured log.

step 2: intended use

ask the user for:

primary intended use case: what task is the model designed to perform?
primary intended users: who will use the model and in what context?
out-of-scope uses: explicitly ask "are there any use cases where this model should NOT be applied?" record as a separate out-of-scope section.

step 3: training data

ask the user for:

data sources: name, origin, collection method
date range of training data
preprocessing and filtering steps applied
known data gaps, biases, or demographic imbalances in the training set
data licensing and consent status (public dataset, proprietary, licensed, synthetic)

if the user cannot describe training data, record as "not disclosed" and flag as a documentation gap requiring resolution before publication.

step 4: evaluation data

ask the user for:

test/evaluation dataset name and source
confirmation: is the evaluation set held-out from training? (must explicitly confirm yes or no)
known differences between evaluation data and real-world deployment data
data splits used (e.g., 80/10/10 train/val/test)

if the user cannot confirm the eval set is held-out, flag as a critical documentation gap.

step 5: performance metrics

ask the user for primary and secondary evaluation metrics (e.g., accuracy, F1, AUC-ROC, BLEU, precision, recall, RMSE, calibration). record exact numeric values or performance scores.

then ask for disaggregated performance results: "can you provide performance results broken down by at least two subgroups relevant to your model's use (e.g., age group, gender, race/ethnicity, geography, language, income bracket, device type)?" collect results for each subgroup in a structured table.

if disaggregated results are not available, record as "not yet evaluated" and flag as a high-priority documentation gap.

step 6: ethical considerations

ask the user for:

sensitive attributes the model processes or predicts (race, gender, health status, financial status, criminal justice involvement, employment eligibility, etc.)
known or anticipated disparate impacts across demographic groups
potential for misuse or harm if misapplied
privacy risks: does the model process or expose personal data?
fairness interventions applied during training or post-processing

if the model processes attributes related to health, finance, criminal justice, or employment, flag as high-sensitivity use case.

step 7: limitations and recommendations

ask the user for:

known failure modes or edge cases
performance degradation conditions (distribution shift, data quality issues, temporal drift, out-of-distribution inputs, etc.)
conditions under which the model must not be deployed without additional review
recommended human oversight level: none, human-in-the-loop, human-on-the-loop, or human-in-command
recommended monitoring and re-evaluation cadence (e.g., monthly, quarterly, annually)

step 8: draft model card assembly

assemble the draft using the output format section below. label the document clearly:

DRAFT ,  Requires ML Engineer and Governance Review
Model Card Version: [version]
Date: [date]

flag every field marked "not disclosed" or "not yet evaluated" with a [DOCUMENTATION GAP , MUST RESOLVE BEFORE PUBLICATION] annotation. do not proceed to publication without user confirmation that all gaps are resolved.

output: structured markdown document ready for review by ML engineers, MLOps leads, and governance teams.

decision points

if training data provenance is unavailable: record as "not disclosed" and flag as a documentation gap blocking publication. do not fabricate or infer training data details.

if disaggregated performance is not available: record as "not yet evaluated" and flag as a high-priority gap. do not suggest the model is fair or unbiased without evidence from actual disaggregated evaluation results.

if the evaluation set is not confirmed to be held-out from training data: flag as a critical documentation gap. do not proceed without confirmation.

if the model processes sensitive attributes related to health, finance, criminal justice, or employment: add a bolded high-sensitivity use case flag at the top of the ethical considerations section.

if the user requests batch input (e.g., a pre-filled form instead of step-by-step questions): allow the user to provide all fields at once, then proceed directly to step 8 (assembly).

if documentation gaps remain after all steps are collected: do not recommend publication or regulatory submission. include a reviewer sign-off block requiring explicit acknowledgment that gaps are accepted or will be resolved in a follow-up iteration.

if the user expresses dissatisfaction or an unmet need: surface the feedback link: https://github.com/archlab-space/Open-Skill-Hub/issues

output contract

output is a markdown document with the following structure and file format:

file location: return as a markdown (.md) file or markdown-formatted text block suitable for copy-paste into a document management or governance system.

document structure:

frontmatter: model name, version, organization, license, draft status, date
model details table: name, version, type, organization, date, license
intended use section: primary uses, intended users, out-of-scope uses
training data section: sources, date range, preprocessing, known biases/gaps, licensing/consent status
evaluation data section: dataset name/source, held-out confirmation, distribution gaps, data splits
performance metrics section: primary metrics, secondary metrics, disaggregated performance table with at least two subgroup rows
ethical considerations section: sensitive attributes, known disparate impacts, misuse potential, privacy risks, fairness interventions (with high-sensitivity flag if applicable)
limitations and recommendations section: failure modes, degradation conditions, deployment restrictions, oversight level, re-evaluation cadence
reviewer sign-off block: table with roles, names, dates, and signature placeholders

data format: all numeric metrics reported with exact values. all documentation gaps annotated with [DOCUMENTATION GAP , MUST RESOLVE BEFORE PUBLICATION]. all fields marked DRAFT status at top and bottom of document. reviewer sign-off block includes explicit disclaimer that the model card must not be published, filed with regulators, or cited externally until all gaps are resolved and all sign-offs are obtained.

outcome signal

the user knows the skill worked when:

all model identification details (name, version, type, org, date, license) are captured in a structured table.
intended use, intended users, and out-of-scope uses are clearly stated in separate subsections.
training data provenance (sources, date range, preprocessing, known biases, licensing/consent) is documented with any "not disclosed" gaps flagged.
evaluation data is confirmed to be held-out from training, with known distribution gaps noted. if eval set is not held-out, this is flagged as a critical gap.
primary and secondary performance metrics are reported with exact numeric values.
disaggregated performance table shows results for at least two demographic or contextual subgroups. if disaggregated results are unavailable, this is flagged as a high-priority gap.
sensitive attributes processed by the model are listed. if the model processes health, finance, criminal justice, or employment data, a high-sensitivity flag is present.
ethical considerations section documents known or anticipated disparate impacts, misuse potential, privacy risks, and any fairness interventions applied.
limitations section lists known failure modes, performance degradation conditions, deployment restrictions, recommended human oversight level, and re-evaluation cadence.
document is labeled DRAFT at top and bottom, with a reviewer sign-off block and clear disclaimer that publication requires resolution of all flagged documentation gaps.
every field marked "not disclosed" or "not yet evaluated" is annotated with [DOCUMENTATION GAP , MUST RESOLVE BEFORE PUBLICATION].

output format template

produce a structured markdown document with the following sections in order:

# Model Card: [Model Name] v[Version]

**Status:** DRAFT ,  Requires ML Engineer and Governance Review
**Date:** [date]
**Organization:** [team/org]
**License:** [license or "Not disclosed"]

---

## Model Details

| Field | Value |
|-------|-------|
| Model name | |
| Version | |
| Model type | |
| Organization | |
| Date | |
| License | |

## Intended Use

**Primary intended uses:**
[description]

**Primary intended users:**
[description]

**Out-of-scope uses:**
[description]

## Training Data

**Sources:** [list]
**Date range:** [range]
**Preprocessing:** [description]
**Known biases or gaps:** [description]
**Licensing / consent:** [status]

## Evaluation Data

**Dataset:** [name and source]
**Held-out from training:** [Yes / No / Not confirmed ,  flag if not confirmed]
**Known distribution gaps:** [description]
**Splits:** [e.g., 80/10/10]

## Performance Metrics

**Primary metric:** [metric] = [value]
**Secondary metrics:** [list with values]

### Disaggregated Performance

| Subgroup | [Metric 1] | [Metric 2] |
|----------|------------|------------|
| [Group A] | | |
| [Group B] | | |

[DOCUMENTATION GAP ,  MUST RESOLVE BEFORE PUBLICATION] if missing.

## Ethical Considerations

**Sensitive attributes processed:** [list]
**Known disparate impacts:** [description]
**Potential for misuse:** [description]
**Privacy risks:** [description]
**Fairness interventions:** [description]

## Limitations and Recommendations

**Known failure modes:** [list]
**Performance degradation conditions:** [list]
**Deployment restrictions:** [list]
**Recommended human oversight level:** [level]
**Recommended re-evaluation cadence:** [cadence]

---

## Reviewer Sign-Off

| Role | Name | Date | Signature |
|------|------|------|-----------|
| ML Engineer / Model Owner | | | |
| MLOps / Governance Lead | | | |
| Responsible AI Reviewer | | | |

*This Model Card is a DRAFT. It must not be published, filed with regulators, or cited in external communications until all documentation gaps are resolved and all sign-offs are obtained.*

credits: original skill authored by archlab-space. enriched and standardized per Implexa quality guidelines.