Name: arize-experiment
Availability: InStock
Author: arize-ai

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and…

SKILL.md

Arize Experiment Skill

SPACE — All --space flags and the ARIZE_SPACE env var accept a space name (e.g., my-workspace) or a base64 space ID (e.g., U3BhY2U6...). Find yours with ax spaces list.

Concepts

Experiment = a named evaluation run against a specific dataset version, containing one run per example

Experiment Run = the result of processing one dataset example -- includes the model output, optional evaluations, and optional metadata

Dataset = a versioned collection of examples; every experiment is tied to a dataset and a specific dataset version

Evaluation = a named metric attached to a run (e.g., correctness, relevance), with optional label, score, and explanation

The typical flow: export a dataset → process each example → collect outputs and evaluations → create an experiment with the runs.

Prerequisites

Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.