How do you calibrate a trading probability?

Question

Accepted Answer

Learn how trading probability calibration works, what it should be measured against, and why a sharp-looking score is not enough on its own. Short answer: A trading probability is calibrated when the numbers it outputs match the outcomes that matter downstream. If a model says a setup has a 70 percent chance of success, then trades in that bucket should behave roughly like that over a relevant validation window, not just in a polished in-sample chart. What you calibrate against: The target depends on what the score is meant to drive. Some teams calibrate against realized hit rate for a binary decision. Others care more about return buckets, expected value, or payoff-weighted outcomes because a correct tiny trade and a correct large trade should not count the same. How teams usually do it: A practical workflow is to start with reliability plots and bucket tests, then apply a monotonic calibration layer such as Platt scaling or isotonic regression on held-out windows. The exact method matters less than the discipline of fitting it on past data and freezing it before the next evaluation slice. What to validate before trusting it: Calibration is only useful if it survives time. Check whether nearby windows produce similar curves, whether the reliability breaks under higher costs or slower execution, and whether the same confidence bucket still means the same thing once the strategy moves from backtest into walk-forward and paper phases.