INVOKE THIS SKILL when building evaluation pipelines for LangSmith. Covers three core components: (1) Creating Evaluators - LLM-as-Judge, custom code; (2)…
Build evaluation pipelines for LangSmith with LLM-as-Judge and custom code evaluators. Three core components: creating evaluators (LLM-as-Judge or custom code), defining run functions to capture agent outputs and trajectories, and running evaluations locally or auto-running via uploaded evaluators Supports both offline evaluators (comparing run outputs to dataset examples) and online evaluators (real-time quality checks on production runs) Requires LangSmith API key and project configuration; includes Python and TypeScript examples with structured output support for LLM judges Critical workflow: inspect actual agent output structure and dataset schema before writing evaluators; query LangSmith traces to verify trajectory data and field names match LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # REQUIRED LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys OPENAI_API_KEY=your_openai_key # For LLM as Judge Authentication is REQUIRED: either set the LANGSMITH_API_KEY environment variable, or pass the --api-key flag to CLI commands (preferred): langsmith evaluator list --api-key $LANGSMITH_API_KEY
don't have the plugin yet? install it then click "run inline in claude" again.