Prevalence adjustment

In this notebook, we will discuss how prevalence will affect the calibration of the model in a binary classification problem and how to adjust for prevalence differences.Prevalence shift happens when the proportion of positive and negative cases in your evaluation data is different from what the model was trained on. This can cause the predicted probabilities to be systematically off, even if the model itself hasn’t changed.

When we discuss calibration, we usually refer to whether the probability output by the model matches the posterior probability of the true outcome.

\[P(D=1|\hat{p} = p) = p ,\forall p \in [0,1]\]

where \(\hat{p}\) is the predicted probability of the true outcome being 1.

However, the posterior probability of the true outcome being 1 depends on the prevalence of the outcome 1. Using Bayes’ theorem, we can derive the following relationship:

\[P(D=1|\hat{p} = p) = \frac{P(\hat{p} = p|D=1)P(D=1)}{P(\hat{p} = p)}\]

The term \(P(\hat{p} = p|D=1)\) is independent of prevalence for a given model. The term \(P(D=1)\) is the prevalence of the outcome 1. The term \(P(\hat{p} = p)\) is the marginal probability of the predicted probability being \(p\) and implicitly depends on the prevalence of the true outcome. We can expand the denominator using the fact that \(P(\hat{p} = p) = P(\hat{p} = p|D=1)\eta + P(\hat{p} = p|D=0)(1-\eta)\). Further rearranging the above equation will lead to the following equation:

\[P(D=1|\hat{p}=p) = \frac{\text{LR}(p) \times \eta}{\text{LR}(p) \times \eta + 1 - \eta}\]

where \(\text{LR}(p) = \frac{P(\hat{p} = p|D=1)}{P(\hat{p} = p|D=0)}\) is the likelihood ratio of the predicted probability being \(p\) given the true outcome being 1 and 0 respectively, and \(\eta\) is the prevalence of the outcome 1.

The likelihood ratio is independent of the prevalence, so that the model can be calibrated for a specific prevalence but will become mis-calibrated for a different prevalence. We can say such a model is “intrinsically calibrated”, meaning that the likelihood ratio of the model with a specific prevalence produced a correct posterior probability of the true outcome being 1.

An intrinsically calibrated model can be adapted to a population with a different prevalence but the same probability distribution within class. To adjust for prevalence differences, we rely on the fact that the likelihood ratio is independent of the prevalence. We can use the following equation to adjust the predicted probability of the true outcome being 1 for a different prevalence:

\[P(D=1|\hat{p}=p) = \frac{\eta LR(p)}{\eta LR(p) + (1-\eta)} = p\]

\[LR(p) = \frac{p}{1-p} \cdot \frac{1-\eta}{\eta}\]

\[P'(D=1|\hat{p}=p) = \frac{\eta' LR(p)}{\eta' LR(p) + (1-\eta')} = \frac{\eta'/(1-\eta')}{(1/p-1)(\eta/(1-\eta))} = p'\]

where \(\eta\) is the prevalence of the derivation population (aka the population for which the model is calibrated) and \(\eta'\) is the prevalence of the outcome 1 in the new population. We will refer to \(p'\) as the adjusted probability.

In practice, we might have a dataset with the true label (which we can use to calculate the prevalence \(\eta\)) and predicted probability of the true outcome being 1. We can search for the derivation prevalence \(\eta\) that minimizes cross-entropy loss between the adjusted probability \(p'\) and the posterior probability of the true outcome being 1.

\[\min_{\eta} \sum_{i=1}^{N} \left(y_i \log(p_i') + (1-y_i) \log(1-p_i')\right)\]

Notice that minimizing cross-entropy loss with respect to \(\eta\) is equivalent to minimizing the KL divergence since the prevalence adjustment is a monotonic transformation and doesn’t affect the resolution component of the cross-entropy loss.

Preform prevalence adjustment in calzone

We will demonstrate how to perform prevalence adjustment in calzone. The first method is to find optimal prevalence first and apply the adjustment.

[1]:

##

from calzone.utils import find_optimal_prevalence,apply_prevalence_adjustment,data_loader,fake_binary_data_generator
import numpy as np
# We generate data and drop the prevalence

np.random.seed(123)
fakedata_generator = fake_binary_data_generator(alpha_val=0.5, beta_val=0.5)
X, y = fakedata_generator.generate_data(5000)
### drop half the outcome 1 prevalence
class_1_index = (y==1)
class_1_samples = np.where(class_1_index)[0]
drop_indices = np.random.choice(class_1_samples, size=int(len(class_1_samples)/2), replace=False)

mask = np.ones(len(y), dtype=bool)
mask[drop_indices] = False

y = y[mask]
X = X[mask]
optimal_prevalence,adjusted_p = find_optimal_prevalence(y, X, class_to_calculate=1)
print("Dataset prevalence: ", np.mean(y))
print("Derived prevalence: ", optimal_prevalence)

Dataset prevalence:  0.3300531914893617
Derived prevalence:  0.49863799264980607

The function return both the derived prevalence and the adjusted probability. We can also use the derived prevalence adjustment factor to perform the adjustment mannually.

[2]:

### Prevalence Adjustment
from calzone.metrics import lowess_regression_analysis
proba_adjust = apply_prevalence_adjustment(optimal_prevalence, y, X, class_to_calculate=1)
print('Loess ICI before prevalence adjustment: ', lowess_regression_analysis(y, X, class_to_calculate=1)[0])
print('Loess ICI after prevalence adjustment: ', lowess_regression_analysis(y, proba_adjust, class_to_calculate=1)[0])

Loess ICI before prevalence adjustment:  0.07961758926734244
Loess ICI after prevalence adjustment:  0.008745511902314453

calzone also provides a argument to perform prevalence adjustment directly from the CalibrationMetrics class.

[3]:

### We calculate the Calibration metrics before and after prevalence adjustment
from calzone.metrics import CalibrationMetrics
calmetrics = CalibrationMetrics()
before_prevalence = calmetrics.calculate_metrics(y,X, metrics=['ECE-H','COX','Loess'],perform_pervalance_adjustment=False)
after_prevalence = calmetrics.calculate_metrics(y,X, metrics=['ECE-H','COX','Loess'],perform_pervalance_adjustment=True)

[4]:

for key in before_prevalence.keys():
    print(key)
    print('before adjustment:',before_prevalence[key],', after adjustment:',after_prevalence[key])

ECE-H topclass
before adjustment: 0.014081013182402267 , after adjustment: 0.010355911839501922
ECE-H
before adjustment: 0.0841517729106883 , after adjustment: 0.013671230516636386
COX coef
before adjustment: 0.9400481147756811 , after adjustment: 0.9400481147756811
COX intercept
before adjustment: -0.6897839569176842 , after adjustment: -0.029403495083063648
COX coef lowerci
before adjustment: 0.8754203499121679 , after adjustment: 0.8754203499121678
COX coef upperci
before adjustment: 1.0046758796391944 , after adjustment: 1.0046758796391944
COX intercept lowerci
before adjustment: -0.7837388214288888 , after adjustment: -0.12775157222121533
COX intercept upperci
before adjustment: -0.5958290924064796 , after adjustment: 0.06894458205508802
COX ICI
before adjustment: 0.0841517733462589 , after adjustment: 0.007508966220374058
Loess ICI
before adjustment: 0.07961758926734244 , after adjustment: 0.008745511902314453

Prevalence adjustment and constant shift in logit of class-of-interest

In the section, we will prove that the prevalence shift is equivalent to a constant shift in logit of class-of-interest. In other words, prevalence adjustment can be done by addint a constant to the logit of class-of-interest. For the calibrated case, the likelihood ratio of the two classes is:

\[LR(p) = \frac{\frac{e^{x_2}}{e^{x_1} + e^{x_2}}}{\frac{e^{x_1}}{e^{x_1} + e^{x_2}}} \cdot \frac{1-\eta}{\eta} = e^{x_2 - x_1} \cdot \frac{1-\eta}{\eta}\]

Assumer we add a constant \(c\) to the logit of class-of-interest (\(x_2\) here), the likelihood ratio becomes:

\[LR'(p) = e^{x_2 - x_1 + c} \cdot \frac{1-\eta}{\eta}\]

And the posterior probability becomes:

\[P'(D=1|\hat{p}=p) = \frac{\eta LR'(p)}{\eta LR'(p) + (1-\eta)} = \frac{\eta LR(p) \cdot e^c}{\eta LR(p) \cdot e^c + (1-\eta)}\]

Which is equivalent to the posterior probability after prevalence adjustment:

\[\frac{\eta' LR(p)}{\eta' LR(p) + (1-\eta')}\]

By setting

\[\eta' = \frac{1}{1 + e^a \left(\frac{1-\eta}{\eta}\right)}\]

Therefore, prevalence adjustment is equivalent to a constant shift in logit of class-of-interest.

References

Chen, W., Sahiner, B., Samuelson, F., Pezeshk, A., & Petrick, N. (2018). Calibration of medical diagnostic classifier scores to the probability of disease. Statistical Methods in Medical Research, 27(5), 1394–1409. https://doi.org/10.1177/0962280216661371

Gu, W., & Pepe, M. S. (2011). Estimating the diagnostic likelihood ratio of a continuous marker. Biostatistics, 12(1), 87–101. https://doi.org/10.1093/biostatistics/kxq045

Tian, J., Liu, Y.-C., Glaser, N., Hsu, Y.-C., & Kira, Z. (2020). Posterior Re-calibration for Imbalanced Datasets (No. arXiv:2010.11820). arXiv. http://arxiv.org/abs/2010.11820

Horsch, K., Giger, M. L., & Metz, C. E. (2008). Prevalence scaling: applications to an intelligent workstation for the diagnosis of breast cancer. Academic radiology, 15(11), 1446–1457. https://doi.org/10.1016/j.acra.2008.04.022