Strategy Backtester

Item: Strategy Backtester
Rating: 7.3
Author: Implexa

Validates historical behavior of stock ranking, factor, and portfolio-selection strategies using reproducible backtests, benchmark comparison, turnover, draw...

view source

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

strategy-backtester validates equity ranking and portfolio selection strategies against historical data, producing reproducible backtests with benchmark comparison, bias detection, and confidence assessment without claiming predictive power.

structure

9.0

trigger phrases

8.0

procedure

7.0

edge cases

7.0

documentation

7.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: strategy-backtester
description: Validates historical behavior of stock ranking, factor, and portfolio-selection strategies using reproducible backtests, benchmark comparison, turnover, drawdown, and bias warnings.
compatibility: Requires historical signal/ranking CSVs and price history CSVs; uses local scripts and does not require network access.
---

# Strategy Backtester

## Purpose

Use this skill to test whether a ranking, factor mix, or portfolio-selection rule had useful historical behavior before treating it as an investment signal.

## Scope

- Equity ranking and selection strategies.
- Periodic rebalance backtests from local CSV inputs.
- Benchmark comparison when benchmark data is available.
- Bias and robustness review.

## Non-goals

- Do not claim that historical performance predicts future returns.
- Do not optimize parameters until a preferred result appears.
- Do not issue absolute buy/sell instructions.
- Do not fetch live market data.

## Input contract

Required inputs:
- `SIGNAL_CSV`: rows with `date`, `ticker`, and `score`.
- `PRICE_CSV`: rows with `date`, `ticker`, and `close`.
- `REBALANCE_FREQUENCY`: `monthly`, `quarterly`, or `yearly`.
- `TOP_N`: number of selected names per rebalance.

Optional inputs:
- `BENCHMARK_CSV`: rows with `date` and `close` or `return`.
- `FEE_BPS`: round-trip fee assumption in basis points.
- `SLIPPAGE_BPS`: slippage assumption in basis points.
- `UNIVERSE_HISTORY`: point-in-time membership if available.

## Execution workflow

1. Validate input files and required columns.
2. Estimate whether the test window and symbol coverage are sufficient.
3. Run `scripts/backtest_strategy.py` with explicit rebalance, fee, slippage, and top-N assumptions.
4. Review performance metrics and benchmark comparison.
5. Identify bias risks and robustness gaps.
6. Return the required output sections.

## Required output format

1. `Backtest Setup`
- Strategy name, test window, rebalance frequency, top-N, fees, slippage, benchmark.

2. `Performance Summary`
- Total return, CAGR, volatility, max drawdown, Sharpe, Sortino, turnover, hit rate when available.

3. `Benchmark Comparison`
- Relative return, relative drawdown, and tracking observations when benchmark data exists.

4. `Robustness and Bias Warnings`
- Survivorship bias, lookahead bias, data-snooping risk, liquidity assumptions, fee/slippage sensitivity.

5. `Confidence and Data Gaps`
- Confidence level and missing inputs that could change the conclusion.

6. `Handoff Bundle`
- Include `strategy_name`, `test_window`, `rebalance_frequency`, `fee_assumption`, `slippage_assumption`, `benchmark`, `metrics`, `bias_warnings`, `confidence`, and `data_gaps`.

## Shared confidence rubric

- `High`: point-in-time signals, adequate price coverage, benchmark available, fees/slippage included, and test window covers multiple market regimes.
- `Medium`: usable history and price coverage, but one major robustness input is missing.
- `Low`: short history, missing benchmark, sparse price coverage, likely survivorship/lookahead risk, or no fee/slippage assumptions.

## Guardrails

- Separate observed backtest results from assumptions and inference.
- Always state that backtests are historical simulations, not forecasts.
- Downgrade confidence if the test appears overfit or data is not point-in-time.
- Treat backtest output as one input to `stock-picker-orchestrator`, not as a trading command.

## Trigger examples

- "Backtest this VN30 value-quality ranking."
- "Check whether this stock ranking strategy beat VNINDEX historically."
- "Validate this screening rule before using it for shortlist selection."

related skills

semantically similar in the cross-vendor index

skills.sh

74% match

backtest-expert

backtest-expert — an installable skill for AI agents, published by tradermonty/claude-trading-skills.

by @tradermonty

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit inputs with setup guidance, expanded procedure into granular steps with input/output per step, extracted decision logic for missing data, benchmark absence, survivorship, and sensitivity analysis, detailed output contract with specific formats and field definitions, and outcome signal for validation and failure modes.

Strategy Backtester

intent

Use this skill to test whether a ranking, factor mix, or portfolio-selection rule had useful historical behavior before treating it as an investment signal. Run it when you need to validate a strategy's past performance, measure it against a benchmark, and surface hidden biases (survivorship, lookahead, data-snooping) before committing capital or handing off to live execution. this skill separates observed backtest results from forecasts. historical performance does not predict future returns.

inputs

required inputs:

SIGNAL_CSV: CSV file with columns date (YYYY-MM-DD), ticker (string), and score (numeric). one row per signal per rebalance period. must be sorted by date.
PRICE_CSV: CSV file with columns date (YYYY-MM-DD), ticker (string), and close (numeric). daily or periodic close prices. covers the entire backtest window and all tickers in SIGNAL_CSV.
REBALANCE_FREQUENCY: string, one of monthly, quarterly, or yearly. defines portfolio reset frequency.
TOP_N: integer, number of highest-scoring names to select per rebalance. must be positive and less than total universe size.

optional inputs:

BENCHMARK_CSV: CSV file with columns date (YYYY-MM-DD) and either close (numeric) or return (numeric). used for relative performance and tracking error. if absent, benchmark comparison is skipped.
FEE_BPS: integer, round-trip transaction fee in basis points (default: 0). applied at each rebalance.
SLIPPAGE_BPS: integer, expected slippage in basis points (default: 0). applied at entry and exit.
UNIVERSE_HISTORY: CSV file with columns date, ticker, and in_universe (boolean). point-in-time membership to flag survivorship bias. if absent, all tickers are assumed eligible for entire window.
BENCHMARK_NAME: string, human-readable name of benchmark (e.g., "VNINDEX"). optional but recommended for output clarity.

no network access required. all inputs are local files.

procedure

validate input files and columns
- input: SIGNAL_CSV, PRICE_CSV, REBALANCE_FREQUENCY, TOP_N, optional BENCHMARK_CSV and UNIVERSE_HISTORY.
- check that SIGNAL_CSV has date, ticker, score columns and is sorted by date ascending.
- check that PRICE_CSV has date, ticker, close columns. flag if any ticker in SIGNAL_CSV is missing from PRICE_CSV.
- check that REBALANCE_FREQUENCY is valid enum.
- check that TOP_N is positive integer and does not exceed minimum universe size on any rebalance date.
- if BENCHMARK_CSV provided, check columns and date range.
- if UNIVERSE_HISTORY provided, check columns and flag any tickers in SIGNAL_CSV that are never in universe.
- output: validation report. proceed only if all required columns present and TOP_N is achievable.
estimate test window and coverage adequacy
- input: validated SIGNAL_CSV and PRICE_CSV, UNIVERSE_HISTORY (optional).
- calculate date range (min date to max date in SIGNAL_CSV).
- count distinct tickers in SIGNAL_CSV and PRICE_CSV.
- calculate rebalance dates based on REBALANCE_FREQUENCY.
- flag if test window is shorter than 2 years (high risk of curve-fit).
- flag if fewer than 10 rebalance events (insufficient sample).
- flag if price coverage is below 90% (sparse data raises backtest reliability risk).
- flag if UNIVERSE_HISTORY shows survivorship (tickers disappear or appear mid-window).
- output: coverage summary including window length, rebalance count, missing price data percentage, and survivorship risk level.
run backtest engine
- input: validated SIGNAL_CSV, PRICE_CSV, REBALANCE_FREQUENCY, TOP_N, FEE_BPS, SLIPPAGE_BPS, BENCHMARK_CSV (optional).
- for each rebalance date: select TOP_N tickers with highest score on that date.
- for each selected ticker: calculate entry price (open on rebalance date or next available close) minus SLIPPAGE_BPS, apply FEE_BPS as round-trip cost.
- hold selected portfolio until next rebalance date. calculate daily returns.
- at rebalance: exit holdings, apply fees and slippage, enter new holdings.
- accumulate daily portfolio returns and calculate equity curve.
- output: daily returns series, equity curve, rebalance log with entry/exit prices and fees.
compute performance metrics
- input: daily returns series, equity curve, BENCHMARK_CSV (optional).
- calculate total return, CAGR (compound annual growth rate), annualized volatility (252-day std dev).
- calculate max drawdown (peak-to-trough decline).
- calculate Sharpe ratio (excess return / volatility, assume 0% risk-free rate or use treasury rates if available).
- calculate Sortino ratio (excess return / downside volatility, only negative returns count).
- calculate portfolio turnover as percentage of portfolio value rebalanced per period.
- if signals available for all holdings on all dates, calculate hit rate (percentage of holdings that outperformed entry price at exit).
- if BENCHMARK_CSV provided: calculate relative return (strategy return - benchmark return), relative max drawdown, information ratio (tracking error), and correlation.
- output: metrics dict with all values labeled and timestamped.
identify bias risks and robustness gaps
- input: SIGNAL_CSV, PRICE_CSV, UNIVERSE_HISTORY, coverage summary from step 2, metrics from step 4.
- survivorship bias: flag if UNIVERSE_HISTORY shows tickers removed mid-window, or if PRICE_CSV coverage drops after certain date, or if tickers in SIGNAL_CSV cease trading.
- lookahead bias: flag if SIGNAL_CSV dates are earlier than rebalance execution date (e.g., signal generated on date X but portfolio rebalanced on X-1).
- data-snooping bias: flag if TOP_N was tuned to optimize backtest results, or if REBALANCE_FREQUENCY was chosen after observing backtest performance.
- liquidity risk: flag if selected portfolio concentration (% of portfolio in single ticker) or position sizes suggest illiquidity. flag if signal scores show extreme values (>3 std dev from mean).
- fee and slippage sensitivity: re-run backtest with doubled FEE_BPS and SLIPPAGE_BPS. flag if CAGR or Sharpe degrades by >50%.
- regime dependency: flag if backtest spans bull and bear markets. estimate performance in each regime separately (optional).
- output: bias risk report with severity (high/medium/low) for each category.
compile output bundle
- input: validation report, coverage summary, equity curve, metrics, bias risks, confidence level.
- structure output as six sections (see output contract below).
- include raw data (daily returns, rebalance log) as appendix or separate CSV.
- output: complete backtest report (text + data files).

decision points

if PRICE_CSV is missing data for a ticker on a rebalance date:

use last available close price (forward-fill).
flag as data gap in confidence section. lower confidence to medium or low if >10% of rebalances are affected.

if BENCHMARK_CSV is not provided:

skip benchmark comparison section. do not invent a benchmark.
note in output that relative performance cannot be assessed.

if TOP_N cannot be achieved on a rebalance date (e.g., only 5 tickers have scores but TOP_N=10):

use all available tickers up to TOP_N. flag in rebalance log.
if this occurs on >50% of rebalance dates, mark confidence as low.

if UNIVERSE_HISTORY shows survivorship:

exclude tickers not yet in universe at signal date. backtest only eligible tickers.
clearly mark this as "point-in-time universe constraint" in output.
confidence remains high if survivorship is properly handled; drops to medium if survivorship is ignored.

if FEE_BPS or SLIPPAGE_BPS sensitivity analysis shows >50% performance degradation:

flag result as "fragile" and recommend sensitivity table in output.
suggest that strategy may not be robust to realistic costs.

if test window is <2 years or rebalance count is <10:

set confidence to low automatically.
note in output that results are preliminary and insufficient for deployment.

output contract

output must include six sections in this order:

1. backtest setup

strategy name (or identifier)
test window (start date to end date)
rebalance frequency and next rebalance date (if applicable)
top-N selection count
fee assumption (basis points, round-trip)
slippage assumption (basis points)
benchmark name and data range (if available)
universe constraint applied (yes/no, and note if point-in-time)

2. performance summary

total return (%) and CAGR (%)
annualized volatility (%)
max drawdown (%)
Sharpe ratio and Sortino ratio
average turnover (%) per rebalance
hit rate (%) if available
number of rebalance events and total holdings across all rebalances

3. benchmark comparison (only if BENCHMARK_CSV provided)

benchmark total return (%) and CAGR (%)
strategy vs. benchmark relative return (basis points)
strategy vs. benchmark max drawdown comparison
information ratio (tracking error)
correlation to benchmark
outperformance in bull vs. bear regimes (optional breakdown)

4. robustness and bias warnings

survivorship bias: presence, severity, and impact assessment
lookahead bias: flagged if signal dates precede execution
data-snooping risk: flagged if parameters tuned post-hoc
liquidity assumptions: concentration limits and price impact notes
fee and slippage sensitivity: CAGR and Sharpe under doubled costs
regime dependency: performance notes if data spans multiple market regimes

5. confidence and data gaps

confidence level (high, medium, or low) with justification
list of missing inputs that could change conclusion (e.g., point-in-time universe, live slippage data, sector constraints)
data coverage quality (% of tickers, % of dates, data freshness)
caveats: "this is a historical simulation and does not predict future returns"

6. handoff bundle

JSON or dict containing: strategy_name, test_window, rebalance_frequency, top_n, fee_assumption, slippage_assumption, benchmark, metrics (all calculated metrics), bias_warnings (list), confidence_level, data_gaps (list)
optional: CSV export of daily returns, rebalance log, equity curve
optional: sensitivity table (CAGR and Sharpe under fee/slippage ranges)

all numbers rounded to 2 decimal places except percentages (1 decimal place). all dates in YYYY-MM-DD format.

outcome signal

the skill worked if you can answer these questions with confidence:

did the strategy outperform or underperform the benchmark in the historical period? by how much?
what were the maximum drawdown and volatility? are they acceptable?
what are the key risks to this strategy (survivorship, lookahead, liquidity, curve-fit)?
should this strategy advance to live testing, or does it need refinement?
what are the cost and regime sensitivities?

the skill failed if:

any required input column is missing or malformed.
TOP_N cannot be achieved on >50% of rebalance dates and is not flagged.
backtest results are returned without bias warnings or confidence caveats.
benchmark comparison is missing when BENCHMARK_CSV is provided.
confidence level is high but test window is <2 years or rebalance count is <10.

success is a complete, signed-off backtest report with all six output sections, clear caveats about historical vs. forward performance, and a go/no-go recommendation with explicit assumptions.