Learn / Research process

Back to learn

How do you walk-forward test a calibration or threshold rule?

Walk-forward testing a calibration or threshold rule means refitting it only on the training slice, freezing it for the next forward window, and checking whether reliability, trade count, and net performance remain stable as time moves on.

What to remember

  • Refit the model and calibrator on each training slice.
  • Choose the threshold using only information available in that slice.
  • Evaluate the next window on both forecast reliability and actual trade behavior.

Short answer

A threshold or calibration rule should be treated like any other learned component. Fit it on the training window, lock it, and then judge it on the next forward slice without letting later information leak backward into the decision boundary.

What the walk-forward loop must include

If the model score is retrained but the threshold is chosen once on the whole sample, the validation is still compromised. The thresholding layer, the calibration map, and any state-dependent entry or exit logic should all follow the same train-then-freeze discipline.

  • Refit the model and calibrator on each training slice.
  • Choose the threshold using only information available in that slice.
  • Evaluate the next window on both forecast reliability and actual trade behavior.

What to measure beyond Sharpe

Watch bucket reliability, trade count stability, turnover near the boundary, and sensitivity to nearby threshold values. A threshold can preserve headline performance while quietly becoming less trustworthy as the market changes.

Why this matters on Alphora

Alphora's workflow already separates local research from verified and forward evidence. Walk-forward threshold testing fits naturally into that approach because it stops a pretty full-sample cutoff from masquerading as a robust trading policy before the strategy reaches paper or shared portfolio deployment.