Item: agent-platform-eval-flywheel
Rating: 5.2
Author: Implexa

agent-platform-eval-flywheel

Measures and improves the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model,…

installs

stars

karma

SkillRank score ↗

5.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-07-04

agent-platform-eval-flywheel guides iterative evaluation and improvement of genai models and agents using google cloud's evaluation sdk, covering dataset creation, metric selection, and failure analysis with concrete remediation.

structure

6.0

trigger phrases

5.0

procedure

4.0

edge cases

3.0

documentation

5.0

strengths

SKILL.md

Agent Platform Eval Flywheel Skill

Help users evaluate and iteratively improve GenAI models and agents using
the Agent Platform GenAI Evaluation SDK (google.genai / agentplatform).

When to use this skill

Evaluating GenAI agents or models with the Agent Platform GenAI
Evaluation SDK (client.evals.evaluate()).

Creating evaluation datasets from session traces, pandas DataFrames, or
synthetic generation.

Selecting, configuring, or writing custom evaluation metrics.

Analyzing rubric verdicts, loss patterns, and clustering failures.

Suggesting concrete code/prompt improvements based on eval results.

Evaluating a model served on an Agent Platform endpoint (BYOM) or a
Model-as-a-Service (MaaS) model by ID — including deploying the model
first if needed. For this case, follow
references/deployment.md and use the
endpoint_evaluation.py / maas_evaluation.py scripts.

don't have the plugin yet? install it then click "run inline in claude" again.

agent-platform-eval-flywheel

SKILL.md

related skills