How do you walk-forward test a calibration or threshold rule?

Question

Accepted Answer

Walk-forward testing a calibration or threshold rule means refitting it only on the training slice, freezing it for the next forward window, and checking whether reliability, trade count, and net performance remain stable as time moves on. Short answer: A threshold or calibration rule should be treated like any other learned component. Fit it on the training window, lock it, and then judge it on the next forward slice without letting later information leak backward into the decision boundary. What the walk-forward loop must include: If the model score is retrained but the threshold is chosen once on the whole sample, the validation is still compromised. The thresholding layer, the calibration map, and any state-dependent entry or exit logic should all follow the same train-then-freeze discipline. What to measure beyond Sharpe: Watch bucket reliability, trade count stability, turnover near the boundary, and sensitivity to nearby threshold values. A threshold can preserve headline performance while quietly becoming less trustworthy as the market changes. Why this matters on Alphora: Alphora's workflow already separates local research from verified and forward evidence. Walk-forward threshold testing fits naturally into that approach because it stops a pretty full-sample cutoff from masquerading as a robust trading policy before the strategy reaches paper or shared portfolio deployment.