back
loading skill details...
MUST READ before running any ADK evaluation. ADK evaluation methodology — eval metrics, evalset schema, LLM-as-judge, tool trajectory scoring, and common…
ADK Evaluation Guide Scaffolded project? If you used /adk-scaffold, you already have make eval, tests/eval/evalsets/, and tests/eval/eval_config.json. Start with make eval and iterate from there. Non-scaffolded? Use adk eval directly — see Running Evaluations below. Reference Files File Contents references/criteria-guide.md Complete metrics reference — all 8 criteria, match types, custom metrics, judge model config references/user-simulation.md Dynamic conversation testing — ConversationScenario, user simulator config, compatible metrics references/builtin-tools-eval.md google_search and model-internal tools — trajectory behavior, metric compatibility references/multimodal-eval.md Multimodal inputs — evalset schema, built-in metric limitations, custom evaluator pattern The Eval-Fix Loop
don't have the plugin yet? install it then click "run inline in claude" again.