Run python examples in auto mode with logging, rerun helpers, and background control.

SKILL.md

---
name: examples-auto-run
description: Run python examples in auto mode with logging, rerun helpers, and background control.
---

# examples-auto-run

## What it does

- Runs `uv run examples/run_examples.py` with:
- Optional dependency extras enabled by default:
`litellm`, `any-llm`, `sqlalchemy`, `redis`, `blaxel`, `modal`, `runloop`, and `temporal`.
- `EXAMPLES_INTERACTIVE_MODE=auto` (auto-input/auto-approve).
- Per-example logs under `.tmp/examples-start-logs/`.
- Main summary log path passed via `--main-log` (also under `.tmp/examples-start-logs/`).
- Generates a rerun list of failures at `.tmp/examples-rerun.txt` when `--write-rerun` is set.
- Provides start/stop/status/logs/tail/collect/rerun helpers via `run.sh`.
- Background option keeps the process running with a pidfile; `stop` cleans it up.

## Usage

```bash
# Start (auto mode; interactive included by default)
.agents/skills/examples-auto-run/scripts/run.sh start [extra args to run_examples.py]
# Examples:
.agents/skills/examples-auto-run/scripts/run.sh start --filter basic
.agents/skills/examples-auto-run/scripts/run.sh start --include-server --include-audio

# Check status
.agents/skills/examples-auto-run/scripts/run.sh status

# Stop running job
.agents/skills/examples-auto-run/scripts/run.sh stop

# List logs
.agents/skills/examples-auto-run/scripts/run.sh logs

# Tail latest log (or specify one)
.agents/skills/examples-auto-run/scripts/run.sh tail
.agents/skills/examples-auto-run/scripts/run.sh tail main_20260113-123000.log

# Collect rerun list from a main log (defaults to latest main_*.log)
.agents/skills/examples-auto-run/scripts/run.sh collect

# Rerun only failed entries from rerun file (auto mode)
.agents/skills/examples-auto-run/scripts/run.sh rerun
```

## Codex execution environment

- When Codex invokes `start` or `rerun`, run the command outside the Codex sandbox by default (`sandbox_permissions=require_escalated`). Many examples start nested sandboxes, browsers, npm helpers, or cloud/local service processes; running from inside the Codex sandbox can produce environment-only failures such as `sandbox-exec: sandbox_apply: Operation not permitted`, Playwright cache permission errors, or npm cache permission errors.
- Use sandboxed execution only when the user explicitly asks for it or when running a narrow dry-run / log inspection command that does not execute examples.

## Defaults (overridable via env)

- `EXAMPLES_INTERACTIVE_MODE=auto`
- `EXAMPLES_INCLUDE_INTERACTIVE=1`
- `EXAMPLES_INCLUDE_SERVER=0`
- `EXAMPLES_INCLUDE_AUDIO=0`
- `EXAMPLES_INCLUDE_EXTERNAL=0`
- `EXAMPLES_UV_EXTRAS="litellm any-llm sqlalchemy redis blaxel modal runloop temporal"` (set to an empty string to disable extras)
- Auto-approvals in auto mode: `APPLY_PATCH_AUTO_APPROVE=1`, `SHELL_AUTO_APPROVE=1`, `AUTO_APPROVE_MCP=1`

## Log locations

- Main logs: `.tmp/examples-start-logs/main_*.log`
- Per-example logs (from `run_examples.py`): `.tmp/examples-start-logs/<module_path>.log`
- Rerun list: `.tmp/examples-rerun.txt`
- Stdout logs: `.tmp/examples-start-logs/stdout_*.log`

## Notes

- The runner delegates to `uv run --extra ... examples/run_examples.py`, which already writes per-example logs and supports `--collect`, `--rerun-file`, and `--print-auto-skip`.
- `examples/sandbox/extensions/vercel_runner.py` is temporarily excluded from auto runs due to credential issues. Do not force-run it until the credential setup is fixed.
- `start` uses `--write-rerun` so failures are captured automatically.
- If `.tmp/examples-rerun.txt` exists and is non-empty, invoking the skill with no args runs `rerun` by default.

## Behavioral validation (Codex/LLM responsibility)

The runner does not perform any automated behavioral validation. After every foreground `start` or `rerun`, **Codex must manually validate** all exit-0 entries:

1. Read the example source (and comments) to infer intended flow, tools used, and expected key outputs.
2. Open the matching per-example log under `.tmp/examples-start-logs/`.
3. Confirm the intended actions/results occurred; flag omissions or divergences.
4. Do this for **all passed examples**, not just a sample.
5. Report immediately after the run with concise citations to the exact log lines that justify the validation.

examples-auto-run

SKILL.md

related skills