back
loading skill details...
Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and…
Arize Experiment Skill SPACE — All --space flags and the ARIZE_SPACE env var accept a space name (e.g., my-workspace) or a base64 space ID (e.g., U3BhY2U6...). Find yours with ax spaces list. Concepts Experiment = a named evaluation run against a specific dataset version, containing one run per example Experiment Run = the result of processing one dataset example -- includes the model output, optional evaluations, and optional metadata Dataset = a versioned collection of examples; every experiment is tied to a dataset and a specific dataset version Evaluation = a named metric attached to a run (e.g., correctness, relevance), with optional label, score, and explanation The typical flow: export a dataset → process each example → collect outputs and evaluations → create an experiment with the runs. Prerequisites Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.
don't have the plugin yet? install it then click "run inline in claude" again.