{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prevalence adjustment\n",
    "\n",
    "In this notebook, we will discuss how prevalence will affect the calibration of the model in a binary classification problem and how to adjust for prevalence differences.Prevalence shift happens when the proportion of positive and negative cases in your evaluation data is different from what the model was trained on. This can cause the predicted probabilities to be systematically off, even if the model itself hasn’t changed.\n",
    "\n",
    "When we discuss calibration, we usually refer to whether the probability output by the model matches the posterior probability of the true outcome. \n",
    "\n",
    "$$\n",
    "P(D=1|\\hat{p} = p) = p ,\\forall p \\in [0,1]\n",
    "$$\n",
    "where $\\hat{p}$ is the predicted probability of the true outcome being 1.\n",
    "\n",
    "However, the posterior probability of the true outcome being 1 depends on the prevalence of the outcome 1. Using Bayes' theorem, we can derive the following relationship:\n",
    "$$\n",
    "P(D=1|\\hat{p} = p) = \\frac{P(\\hat{p} = p|D=1)P(D=1)}{P(\\hat{p} = p)}\n",
    "$$\n",
    "\n",
    "The term $P(\\hat{p} = p|D=1)$ is independent of prevalence for a given model. The term $P(D=1)$ is the prevalence of the outcome 1. The term $P(\\hat{p} = p)$ is the marginal probability of the predicted probability being $p$ and implicitly depends on the prevalence of the true outcome. We can expand the denominator using the fact that $P(\\hat{p} = p) = P(\\hat{p} = p|D=1)\\eta + P(\\hat{p} = p|D=0)(1-\\eta)$. Further rearranging the above equation will lead to the following equation:\n",
    "\n",
    "$$\n",
    "P(D=1|\\hat{p}=p) = \\frac{\\text{LR}(p) \\times \\eta}{\\text{LR}(p) \\times \\eta + 1 - \\eta}\n",
    "$$\n",
    "where $\\text{LR}(p) = \\frac{P(\\hat{p} = p|D=1)}{P(\\hat{p} = p|D=0)}$ is the likelihood ratio of the predicted probability being $p$ given the true outcome being 1 and 0 respectively,  and $\\eta$ is the prevalence of the outcome 1.\n",
    "\n",
    "The likelihood ratio is independent of the prevalence, so that the model can be calibrated for a specific prevalence but will become mis-calibrated for a different prevalence. We can say such a model is \"intrinsically calibrated\", meaning that the likelihood ratio of the model with a specific prevalence produced a correct posterior probability of the true outcome being 1.\n",
    "\n",
    "An intrinsically calibrated model can be adapted to a population with a different prevalence but the same probability distribution within class. To adjust for prevalence differences, we rely on the fact that the likelihood ratio is independent of the prevalence. We can use the following equation to adjust the predicted probability of the true outcome being 1 for a different prevalence:\n",
    "\n",
    "$$\n",
    "P(D=1|\\hat{p}=p) = \\frac{\\eta LR(p)}{\\eta LR(p) + (1-\\eta)} = p\n",
    "$$\n",
    "\n",
    "$$\n",
    "LR(p) = \\frac{p}{1-p} \\cdot \\frac{1-\\eta}{\\eta}\n",
    "$$\n",
    "\n",
    "$$\n",
    "P'(D=1|\\hat{p}=p) = \\frac{\\eta' LR(p)}{\\eta' LR(p) + (1-\\eta')} = \\frac{\\eta'/(1-\\eta')}{(1/p-1)(\\eta/(1-\\eta))} = p'\n",
    "$$\n",
    "\n",
    "where $\\eta$ is the prevalence of the derivation population (aka the population for which the model is calibrated) and $\\eta'$ is the prevalence of the outcome 1 in the new population. We will refer to $p'$ as the adjusted probability.\n",
    "\n",
    "In practice, we might have a dataset with the true label (which we can use to calculate the prevalence $\\eta$) and predicted probability of the true outcome being 1. We can search for the derivation prevalence $\\eta$ that minimizes cross-entropy loss between the adjusted probability $p'$ and the posterior probability of the true outcome being 1.\n",
    "\n",
    "$$\n",
    "\\min_{\\eta} \\sum_{i=1}^{N} \\left(y_i \\log(p_i') + (1-y_i) \\log(1-p_i')\\right)\n",
    "$$\n",
    "\n",
    "Notice that minimizing cross-entropy loss with respect to $\\eta$ is equivalent to minimizing the KL divergence since the prevalence adjustment is a monotonic transformation and doesn't affect the resolution component of the cross-entropy loss."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preform prevalence adjustment in calzone\n",
    "\n",
    "We will demonstrate how to perform prevalence adjustment in calzone. The first method is to find optimal prevalence first and apply the adjustment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset prevalence:  0.3300531914893617\n",
      "Derived prevalence:  0.49863799264980607\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "\n",
    "from calzone.utils import find_optimal_prevalence,apply_prevalence_adjustment,data_loader,fake_binary_data_generator\n",
    "import numpy as np\n",
    "# We generate data and drop the prevalence \n",
    "\n",
    "np.random.seed(123)\n",
    "fakedata_generator = fake_binary_data_generator(alpha_val=0.5, beta_val=0.5)\n",
    "X, y = fakedata_generator.generate_data(5000)\n",
    "### drop half the outcome 1 prevalence\n",
    "class_1_index = (y==1)\n",
    "class_1_samples = np.where(class_1_index)[0]\n",
    "drop_indices = np.random.choice(class_1_samples, size=int(len(class_1_samples)/2), replace=False)\n",
    "\n",
    "mask = np.ones(len(y), dtype=bool)\n",
    "mask[drop_indices] = False\n",
    "\n",
    "y = y[mask]\n",
    "X = X[mask]\n",
    "optimal_prevalence,adjusted_p = find_optimal_prevalence(y, X, class_to_calculate=1)\n",
    "print(\"Dataset prevalence: \", np.mean(y))\n",
    "print(\"Derived prevalence: \", optimal_prevalence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function return both the derived prevalence and the adjusted probability. We can also use the derived prevalence adjustment factor to perform the adjustment mannually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loess ICI before prevalence adjustment:  0.07961758926734244\n",
      "Loess ICI after prevalence adjustment:  0.008745511902314453\n"
     ]
    }
   ],
   "source": [
    "### Prevalence Adjustment\n",
    "from calzone.metrics import lowess_regression_analysis\n",
    "proba_adjust = apply_prevalence_adjustment(optimal_prevalence, y, X, class_to_calculate=1)\n",
    "print('Loess ICI before prevalence adjustment: ', lowess_regression_analysis(y, X, class_to_calculate=1)[0])\n",
    "print('Loess ICI after prevalence adjustment: ', lowess_regression_analysis(y, proba_adjust, class_to_calculate=1)[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "calzone also provides a argument to perform prevalence adjustment directly from the CalibrationMetrics class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "### We calculate the Calibration metrics before and after prevalence adjustment\n",
    "from calzone.metrics import CalibrationMetrics\n",
    "calmetrics = CalibrationMetrics()\n",
    "before_prevalence = calmetrics.calculate_metrics(y,X, metrics=['ECE-H','COX','Loess'],perform_pervalance_adjustment=False)\n",
    "after_prevalence = calmetrics.calculate_metrics(y,X, metrics=['ECE-H','COX','Loess'],perform_pervalance_adjustment=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ECE-H topclass\n",
      "before adjustment: 0.014081013182402267 , after adjustment: 0.010355911839501922\n",
      "ECE-H\n",
      "before adjustment: 0.0841517729106883 , after adjustment: 0.013671230516636386\n",
      "COX coef\n",
      "before adjustment: 0.9400481147756811 , after adjustment: 0.9400481147756811\n",
      "COX intercept\n",
      "before adjustment: -0.6897839569176842 , after adjustment: -0.029403495083063648\n",
      "COX coef lowerci\n",
      "before adjustment: 0.8754203499121679 , after adjustment: 0.8754203499121678\n",
      "COX coef upperci\n",
      "before adjustment: 1.0046758796391944 , after adjustment: 1.0046758796391944\n",
      "COX intercept lowerci\n",
      "before adjustment: -0.7837388214288888 , after adjustment: -0.12775157222121533\n",
      "COX intercept upperci\n",
      "before adjustment: -0.5958290924064796 , after adjustment: 0.06894458205508802\n",
      "COX ICI\n",
      "before adjustment: 0.0841517733462589 , after adjustment: 0.007508966220374058\n",
      "Loess ICI\n",
      "before adjustment: 0.07961758926734244 , after adjustment: 0.008745511902314453\n"
     ]
    }
   ],
   "source": [
    "for key in before_prevalence.keys():\n",
    "    print(key)\n",
    "    print('before adjustment:',before_prevalence[key],', after adjustment:',after_prevalence[key])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prevalence adjustment and constant shift in logit of class-of-interest\n",
    "\n",
    "In the section, we will prove that the prevalence shift is equivalent to a constant shift in logit of class-of-interest. In other words, prevalence adjustment can be done by addint a constant to the logit of class-of-interest. For the calibrated case, the likelihood ratio of the two classes is:\n",
    "$$\n",
    "LR(p) = \\frac{\\frac{e^{x_2}}{e^{x_1} + e^{x_2}}}{\\frac{e^{x_1}}{e^{x_1} + e^{x_2}}} \\cdot \\frac{1-\\eta}{\\eta} = e^{x_2 - x_1} \\cdot \\frac{1-\\eta}{\\eta}\n",
    "$$\n",
    "\n",
    "Assumer we add a constant $c$ to the logit of class-of-interest ($x_2$ here), the likelihood ratio becomes:\n",
    "$$\n",
    "LR'(p) = e^{x_2 - x_1 + c} \\cdot \\frac{1-\\eta}{\\eta}\n",
    "$$\n",
    "\n",
    "And the posterior probability becomes:\n",
    "$$\n",
    "P'(D=1|\\hat{p}=p) = \\frac{\\eta LR'(p)}{\\eta LR'(p) + (1-\\eta)} = \\frac{\\eta LR(p) \\cdot e^c}{\\eta LR(p) \\cdot e^c + (1-\\eta)}\n",
    "$$\n",
    "\n",
    "Which is equivalent to the posterior probability after prevalence adjustment:\n",
    "$$\n",
    "\\frac{\\eta' LR(p)}{\\eta' LR(p) + (1-\\eta')}\n",
    "$$\n",
    "By setting \n",
    "$$\n",
    "\\eta' = \\frac{1}{1 + e^a \\left(\\frac{1-\\eta}{\\eta}\\right)}\n",
    "$$\n",
    "\n",
    "Therefore, prevalence adjustment is equivalent to a constant shift in logit of class-of-interest."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "Chen, W., Sahiner, B., Samuelson, F., Pezeshk, A., & Petrick, N. (2018). Calibration of medical diagnostic classifier scores to the probability of disease. Statistical Methods in Medical Research, 27(5), 1394–1409. https://doi.org/10.1177/0962280216661371\n",
    "\n",
    "\n",
    "Gu, W., & Pepe, M. S. (2011). Estimating the diagnostic likelihood ratio of a continuous marker. Biostatistics, 12(1), 87–101. https://doi.org/10.1093/biostatistics/kxq045\n",
    "\n",
    "\n",
    "Tian, J., Liu, Y.-C., Glaser, N., Hsu, Y.-C., & Kira, Z. (2020). Posterior Re-calibration for Imbalanced Datasets (No. arXiv:2010.11820). arXiv. http://arxiv.org/abs/2010.11820\n",
    "\n",
    "Horsch, K., Giger, M. L., & Metz, C. E. (2008). Prevalence scaling: applications to an intelligent workstation for the diagnosis of breast cancer. Academic radiology, 15(11), 1446–1457. https://doi.org/10.1016/j.acra.2008.04.022\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "uq",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}