Validates historical behavior of stock ranking, factor, and portfolio-selection strategies using reproducible backtests, benchmark comparison, turnover, draw...
--- name: strategy-backtester description: Validates historical behavior of stock ranking, factor, and portfolio-selection strategies using reproducible backtests, benchmark comparison, turnover, drawdown, and bias warnings. compatibility: Requires historical signal/ranking CSVs and price history CSVs; uses local scripts and does not require network access. --- # Strategy Backtester ## Purpose Use this skill to test whether a ranking, factor mix, or portfolio-selection rule had useful historical behavior before treating it as an investment signal. ## Scope - Equity ranking and selection strategies. - Periodic rebalance backtests from local CSV inputs. - Benchmark comparison when benchmark data is available. - Bias and robustness review. ## Non-goals - Do not claim that historical performance predicts future returns. - Do not optimize parameters until a preferred result appears. - Do not issue absolute buy/sell instructions. - Do not fetch live market data. ## Input contract Required inputs: - `SIGNAL_CSV`: rows with `date`, `ticker`, and `score`. - `PRICE_CSV`: rows with `date`, `ticker`, and `close`. - `REBALANCE_FREQUENCY`: `monthly`, `quarterly`, or `yearly`. - `TOP_N`: number of selected names per rebalance. Optional inputs: - `BENCHMARK_CSV`: rows with `date` and `close` or `return`. - `FEE_BPS`: round-trip fee assumption in basis points. - `SLIPPAGE_BPS`: slippage assumption in basis points. - `UNIVERSE_HISTORY`: point-in-time membership if available. ## Execution workflow 1. Validate input files and required columns. 2. Estimate whether the test window and symbol coverage are sufficient. 3. Run `scripts/backtest_strategy.py` with explicit rebalance, fee, slippage, and top-N assumptions. 4. Review performance metrics and benchmark comparison. 5. Identify bias risks and robustness gaps. 6. Return the required output sections. ## Required output format 1. `Backtest Setup` - Strategy name, test window, rebalance frequency, top-N, fees, slippage, benchmark. 2. `Performance Summary` - Total return, CAGR, volatility, max drawdown, Sharpe, Sortino, turnover, hit rate when available. 3. `Benchmark Comparison` - Relative return, relative drawdown, and tracking observations when benchmark data exists. 4. `Robustness and Bias Warnings` - Survivorship bias, lookahead bias, data-snooping risk, liquidity assumptions, fee/slippage sensitivity. 5. `Confidence and Data Gaps` - Confidence level and missing inputs that could change the conclusion. 6. `Handoff Bundle` - Include `strategy_name`, `test_window`, `rebalance_frequency`, `fee_assumption`, `slippage_assumption`, `benchmark`, `metrics`, `bias_warnings`, `confidence`, and `data_gaps`. ## Shared confidence rubric - `High`: point-in-time signals, adequate price coverage, benchmark available, fees/slippage included, and test window covers multiple market regimes. - `Medium`: usable history and price coverage, but one major robustness input is missing. - `Low`: short history, missing benchmark, sparse price coverage, likely survivorship/lookahead risk, or no fee/slippage assumptions. ## Guardrails - Separate observed backtest results from assumptions and inference. - Always state that backtests are historical simulations, not forecasts. - Downgrade confidence if the test appears overfit or data is not point-in-time. - Treat backtest output as one input to `stock-picker-orchestrator`, not as a trading command. ## Trigger examples - "Backtest this VN30 value-quality ranking." - "Check whether this stock ranking strategy beat VNINDEX historically." - "Validate this screening rule before using it for shortlist selection."
don't have the plugin yet? install it then click "run inline in claude" again.
added explicit inputs with setup guidance, expanded procedure into granular steps with input/output per step, extracted decision logic for missing data, benchmark absence, survivorship, and sensitivity analysis, detailed output contract with specific formats and field definitions, and outcome signal for validation and failure modes.
Use this skill to test whether a ranking, factor mix, or portfolio-selection rule had useful historical behavior before treating it as an investment signal. Run it when you need to validate a strategy's past performance, measure it against a benchmark, and surface hidden biases (survivorship, lookahead, data-snooping) before committing capital or handing off to live execution. this skill separates observed backtest results from forecasts. historical performance does not predict future returns.
required inputs:
SIGNAL_CSV: CSV file with columns date (YYYY-MM-DD), ticker (string), and score (numeric). one row per signal per rebalance period. must be sorted by date.PRICE_CSV: CSV file with columns date (YYYY-MM-DD), ticker (string), and close (numeric). daily or periodic close prices. covers the entire backtest window and all tickers in SIGNAL_CSV.REBALANCE_FREQUENCY: string, one of monthly, quarterly, or yearly. defines portfolio reset frequency.TOP_N: integer, number of highest-scoring names to select per rebalance. must be positive and less than total universe size.optional inputs:
BENCHMARK_CSV: CSV file with columns date (YYYY-MM-DD) and either close (numeric) or return (numeric). used for relative performance and tracking error. if absent, benchmark comparison is skipped.FEE_BPS: integer, round-trip transaction fee in basis points (default: 0). applied at each rebalance.SLIPPAGE_BPS: integer, expected slippage in basis points (default: 0). applied at entry and exit.UNIVERSE_HISTORY: CSV file with columns date, ticker, and in_universe (boolean). point-in-time membership to flag survivorship bias. if absent, all tickers are assumed eligible for entire window.BENCHMARK_NAME: string, human-readable name of benchmark (e.g., "VNINDEX"). optional but recommended for output clarity.no network access required. all inputs are local files.
validate input files and columns
date, ticker, score columns and is sorted by date ascending.date, ticker, close columns. flag if any ticker in SIGNAL_CSV is missing from PRICE_CSV.estimate test window and coverage adequacy
run backtest engine
compute performance metrics
identify bias risks and robustness gaps
compile output bundle
if PRICE_CSV is missing data for a ticker on a rebalance date:
if BENCHMARK_CSV is not provided:
if TOP_N cannot be achieved on a rebalance date (e.g., only 5 tickers have scores but TOP_N=10):
if UNIVERSE_HISTORY shows survivorship:
if FEE_BPS or SLIPPAGE_BPS sensitivity analysis shows >50% performance degradation:
if test window is <2 years or rebalance count is <10:
output must include six sections in this order:
1. backtest setup
2. performance summary
3. benchmark comparison (only if BENCHMARK_CSV provided)
4. robustness and bias warnings
5. confidence and data gaps
6. handoff bundle
strategy_name, test_window, rebalance_frequency, top_n, fee_assumption, slippage_assumption, benchmark, metrics (all calculated metrics), bias_warnings (list), confidence_level, data_gaps (list)all numbers rounded to 2 decimal places except percentages (1 decimal place). all dates in YYYY-MM-DD format.
the skill worked if you can answer these questions with confidence:
the skill failed if:
success is a complete, signed-off backtest report with all six output sections, clear caveats about historical vs. forward performance, and a go/no-go recommendation with explicit assumptions.