Measures and improves the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model,…
Agent Platform Eval Flywheel Skill Help users evaluate and iteratively improve GenAI models and agents using the Agent Platform GenAI Evaluation SDK (google.genai / agentplatform). When to use this skill Evaluating GenAI agents or models with the Agent Platform GenAI Evaluation SDK (client.evals.evaluate()). Creating evaluation datasets from session traces, pandas DataFrames, or synthetic generation. Selecting, configuring, or writing custom evaluation metrics. Analyzing rubric verdicts, loss patterns, and clustering failures. Suggesting concrete code/prompt improvements based on eval results. Evaluating a model served on an Agent Platform endpoint (BYOM) or a Model-as-a-Service (MaaS) model by ID — including deploying the model first if needed. For this case, follow references/deployment.md and use the endpoint_evaluation.py / maas_evaluation.py scripts.
don't have the plugin yet? install it then click "run inline in claude" again.