Calibration: Fighting Oversimplification and Sparse Data
This paper addresses a common problem in model calibration: isotonic regression, due to the calibration dataset being much smaller than the original training set, oversimplifies the probability distribution, losing the model's fine-grained distinctions. The paper analyzes this 'data sparsity induced flattening' phenomenon and proposes several diagnostic methods to distinguish between justifiable simplification due to noise and oversimplification due to data limitations. Finally, it introduces the Calibre package, which, by relaxing isotonic constraints or using smooth monotone models, maintains calibration accuracy while preserving as much of the original model's discriminatory power as possible.
Read more