{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COX calibration analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Theoretical Background\n", "\n", "Cox calibration analysis is both a logistic recalibration technique and a method to examine the current calibration of a model. To perform the analysis, we first to fit a logistic regression model using logit (log odds, aka $\\log\\frac{\\hat{p}}{1-\\hat{p}}$) as the predictor variable and the outcome as the target variable. \n", "\n", "$$\n", "p_{new} = \\frac{1}{1+e^{-(a + b \\cdot \\log\\frac{\\hat{p}}{1-\\hat{p}})}}\n", "$$\n", "\n", "In the case of perfect calibration, $P(Y=1|p=\\hat{p}) = \\hat{p}$ and the new probability $p_{new}$ is equal to the original probability $\\hat{p}$. That means $a=0$ and $b=1$. If $b>1$, the model is under-confident at high probabilities and over-confident at low probabilities for the class-of-interest. If $b<1$, the model is over-confident at high probabilities and under-confident at low probabilities for the class-of-interest. If $a>0$, the model is over-confident at all probabilities for the class-of-interest. If $a<0$, the model is under-confident at all probabilities for the class-of-interest. The confidence interval of $a$ and $b$ can be used to guide the calibration of the model. The user can also choose to fix $a=0$ and fit for $b$ only and vice versa, then there will be no interaction between $a$ and $b$ and the confidence interval can be used as a statistical test to test for perfect calibration. \n", "\n", "\n", "## Pros of Cox calibration analysis\n", "\n", "Cox calibration analysis doesn't depend on binning of data, which is a big advantage since common metrics such as ECE/MCE and HL test all depend on binning and we have shown that changing binning can lead to different results. We can also use it to perform statistical tests by fixing $a$ to 0 and test whether $b=1$ and the other way around to test for perfect calibration. Also, the fitted values of $a$ and $b$ can tell us how the model is miscalibrated, whether it is an overall under- or over-confidence or if it is over-confident in some ranges and under-confident in others. For example, if $a$ is not close to 0 while $b$ is close to 1, it likely indicates a prevalence shift. See more details in the prevalence adjustment notebook.\n", "\n", "## Cons of Cox calibration analysis\n", "\n", "Cox Calibration analysis can only assess weak calibration , meaning whether $P(Y=1|\\hat{p}=p) = p$ for all $p$. It only captures certain types of miscalibration (general over/under-confidence). A model can have $a=0$ and $b=1$ and still be miscalibrated. For example, a model is over-confident at low and high probabilities and under-confident at intermediate probabilities could has $a=0$ and $b=1$ but still be miscalibrated.\n", "\n", "## Calculating Cox slope and intercept with calzone\n", "There are two ways to calculate the Cox slope and intercept. Calling the Cox function gives you more control over the calculation, including fixing $a=0$ or $b=1$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: y No. Observations: 5000\n", "Model: Logit Df Residuals: 4998\n", "Method: MLE Df Model: 1\n", "Date: Wed, 02 Oct 2024 Pseudo R-squ.: 0.4438\n", "Time: 15:53:27 Log-Likelihood: -1927.5\n", "converged: True LL-Null: -3465.6\n", "Covariance Type: nonrobust LLR p-value: 0.000\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -0.0450 0.040 -1.123 0.262 -0.123 0.034\n", "x1 0.9942 0.029 34.212 0.000 0.937 1.051\n", "==============================================================================\n" ] } ], "source": [ "from calzone.utils import reliability_diagram,data_loader\n", "from calzone.metrics import cox_regression_analysis\n", "import numpy as np\n", "\n", "### loading the data\n", "wellcal_dataloader = data_loader(data_path=\"../../../example_data/simulated_welldata.csv\")\n", "\n", "### calculating cox slope and intercept\n", "cox_slope, cox_intercept,cox_slope_ci,cox_intercept_ci = cox_regression_analysis(wellcal_dataloader.labels, wellcal_dataloader.probs,class_to_calculate=1,print_results=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also fix the intercept $a=0$ by using the `fix_intercept=True` option. Similarly, you can fix the slope $b=1$ by using the `fix_slope=True` option." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.385628\n", " Iterations: 0\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: y No. Observations: 5000\n", "Model: Logit Df Residuals: 4999\n", "Method: MLE Df Model: 0\n", "Date: Wed, 02 Oct 2024 Pseudo R-squ.: 0.4436\n", "Time: 15:53:27 Log-Likelihood: -1928.1\n", "converged: False LL-Null: -3465.6\n", "Covariance Type: nonrobust LLR p-value: nan\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 0 0 nan nan 0 0\n", "x1 0.9939 0.029 34.210 0.000 0.937 1.051\n", "==============================================================================\n", "\n", "Model has been estimated subject to linear equality constraints.\n" ] } ], "source": [ "### fixing intercept and calculating cox slope \n", "cox_slope, cox_intercept,cox_slope_ci,cox_intercept_ci = cox_regression_analysis(wellcal_dataloader.labels, wellcal_dataloader.probs,class_to_calculate=1, fix_intercept=True,print_results=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can use the CalibrationMetrics class to compute the COX slope and intercept." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: y No. Observations: 5000\n", "Model: Logit Df Residuals: 4998\n", "Method: MLE Df Model: 1\n", "Date: Wed, 02 Oct 2024 Pseudo R-squ.: 0.4438\n", "Time: 15:53:27 Log-Likelihood: -1927.5\n", "converged: True LL-Null: -3465.6\n", "Covariance Type: nonrobust LLR p-value: 0.000\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -0.0450 0.040 -1.123 0.262 -0.123 0.034\n", "x1 0.9942 0.029 34.212 0.000 0.937 1.051\n", "==============================================================================\n" ] }, { "data": { "text/plain": [ "{'COX coef': 0.9942499557748269,\n", " 'COX intercept': -0.04497652296600376,\n", " 'COX coef lowerci': 0.9372902801721911,\n", " 'COX coef upperci': 1.0512096313774626,\n", " 'COX intercept lowerci': -0.12348577118577644,\n", " 'COX intercept upperci': 0.03353272525376893,\n", " 'COX ICI': 0.005610391483826338}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from calzone.metrics import CalibrationMetrics\n", "calmetrics = CalibrationMetrics()\n", "calmetrics.calculate_metrics(wellcal_dataloader.labels, wellcal_dataloader.probs, metrics=['COX'],print_results=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting COX slope and intercept can be used to calibrate the model but it is beyond the scope of this package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## P-value of COX slope and intecept test\n", "\n", "Although Cox calibration analysis is usually only used to estimate the overall calibration trend, the resulting estimates of the slope and intercept can also be used to test whether the model is well calibrated (moderate calibration). We will do a demostrate on the statistical size of the slope and intercept test below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### The size of slope test\n", "from calzone.utils import fake_binary_data_generator\n", "np.random.seed(123)\n", "fakedata_generator = fake_binary_data_generator(alpha_val=0.5, beta_val=0.5)\n", "cal_metrics = CalibrationMetrics()\n", "sample_size = 1000\n", "simulation_size = 1000\n", "results = []\n", "# generate data\n", "for i in range(simulation_size):\n", " X, y = fakedata_generator.generate_data(sample_size)\n", " if i == 0:\n", " tempresult = cal_metrics.calculate_metrics(y, X, ['COX'],return_numpy=False,fix_intercept=True) #we need to fix the intercept to be 0\n", " keys = list(tempresult.keys())\n", " results.append(np.array(list(tempresult.values())))\n", " else:\n", " tempresult = cal_metrics.calculate_metrics(y, X, ['COX'],return_numpy=True,fix_intercept=True) #we need to fix the intercept to be 0\n", " results.append(tempresult)\n", "results = np.array(results)\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The size of the Cox slope test is: 0.039000000000000035\n" ] } ], "source": [ "Cox_slope = results[:,0]\n", "Cox_slope_lowerci = results[:,2]\n", "Cox_slope_upperci = results[:,3]\n", "chance = np.logical_and(Cox_slope_lowerci<=1, Cox_slope_upperci>=1)\n", "print('The size of the Cox slope test is: ', 1-np.mean(chance))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also do the intercept test:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### The size of intercept test\n", "from calzone.utils import fake_binary_data_generator\n", "np.random.seed(123)\n", "fakedata_generator = fake_binary_data_generator(alpha_val=0.5, beta_val=0.5)\n", "cal_metrics = CalibrationMetrics()\n", "sample_size = 1000\n", "simulation_size = 1000\n", "results = []\n", "# generate data\n", "for i in range(simulation_size):\n", " X, y = fakedata_generator.generate_data(sample_size)\n", " if i == 0:\n", " tempresult = cal_metrics.calculate_metrics(y, X, ['COX'],return_numpy=False,fix_slope=True) #we need to fix the slope to be 1\n", " keys = list(tempresult.keys())\n", " results.append(np.array(list(tempresult.values())))\n", " else:\n", " tempresult = cal_metrics.calculate_metrics(y, X, ['COX'],return_numpy=True,fix_slope=True) #we need to fix the slope to be 1\n", " results.append(tempresult)\n", "results = np.array(results)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The size of the Cox intercept test is: 0.05600000000000005\n" ] } ], "source": [ "Cox_intercept = results[:,1]\n", "Cox_intercept_lowerci = results[:,4]\n", "Cox_intercept_upperci = results[:,5]\n", "chance = np.logical_and(Cox_intercept_lowerci<=0, Cox_intercept_upperci>=0)\n", "print('The size of the Cox intercept test is: ', 1-np.mean(chance))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The test results in different size values but they are closer than with the HL test." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "Cox, D. R. (1958). Two Further Applications of a Model for Binary Regression.\n", "\n", "Calster, B. V., & Steyerberg, E. W. (2018). Calibration of Prognostic Risk Scores. In R. S. Kenett, N. T. Longford, W. W. Piegorsch, & F. Ruggeri (Eds.), Wiley StatsRef: Statistics Reference Online (1st ed., pp. 1–10). Wiley. https://doi.org/10.1002/9781118445112.stat08078\n", "\n", "Huang, Y., Li, W., Macheret, F., Gabriel, R. A., & Ohno-Machado, L. (2020). A tutorial on calibration measurements and calibration models for clinical prediction models. Journal of the American Medical Informatics Association, 27(4), 621–633. https://doi.org/10.1093/jamia/ocz228\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "uq", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 2 }