calzone package

Submodules

calzone.metrics module

Metrics calculation functions for the Calibration Measure package.

class calzone.metrics.CalibrationMetrics(class_to_calculate=1, num_bins=10)[source]

Bases: object

A class for calculating calibration metrics for classification models.

__init__(class_to_calculate=1, num_bins=10)[source]

Initialize the CalibrationMetrics class.

Parameters:

class_to_calculate (int, optional) – The class index to calculate the metrics for. Defaults to 1.
num_bins (int, optional) – Number of bins to use for the ECE/MCE/HL calculations. Defaults to 10.

bootstrap(y_true, y_proba, metrics, perform_pervalance_adjustment=False, n_samples=1000, **kwargs)[source]

Run bootstrap and return a numpy structured array with correct field names.

This function performs bootstrap resampling to estimate the distribution of calibration metrics. It generates multiple samples with replacement from the input data and calculates the specified metrics for each sample.

Parameters:

y_true (array-like) – True labels.
y_proba (array-like) – Predicted probabilities.
metrics (list of str) – List of metric names to calculate.
perform_pervalance_adjustment (bool, optional) – Whether to perform prevalence adjustment for each bootstrap sample. Defaults to False.
n_samples (int, optional) – Number of bootstrap samples to generate. Defaults to 1000.
**kwargs – Additional keyword arguments to pass to the metric calculation functions.

Returns:

numpy.ndarray – A structured array containing the bootstrapped metrics. Each field in the array corresponds to a metric, and each row represents a bootstrap sample.

calculate_metrics(y_true, y_proba, metrics, perform_pervalance_adjustment=False, return_numpy=False, **kwargs)[source]

Calculate the specified calibration metrics.

This function computes various calibration metrics for binary classification models. It supports multiple metrics and can perform prevalence adjustment if needed.

List of available metrics:

SpiegelhalterZ: Spiegelhalter’s Z-test for calibration
ECE-H: Expected Calibration Error with equal-space binning
MCE-H: Maximum Calibration Error with equal-space binning
HL-H: Hosmer-Lemeshow test with equal-space binning
ECE-C: Expected Calibration Error with equal-count binning
MCE-C: Maximum Calibration Error with equal-count binning
HL-C: Hosmer-Lemeshow test with equal-count binning
COX: Cox regression analysis
Loess: Locally Estimated Scatterplot Smoothing regression analysis

Parameters:

y_true (numpy.ndarray) – True labels.
y_proba (numpy.ndarray) – Predicted probabilities.
metrics (list of str or 'all') – List of metric names to calculate. If ‘all’, calculates all available metrics.
perform_pervalance_adjustment (bool, optional) – Whether to perform prevalence adjustment. Defaults to False.
return_numpy (bool, optional) – Whether to return the results as a numpy array. Defaults to False.
**kwargs – Additional keyword arguments to pass to the metric calculation functions.

Returns:

dict or numpy.ndarray – A dictionary containing the calculated metrics, or a numpy array if return_numpy is True.

optimal_prevalence_adjustment(y_true, y_proba)[source]

Perform optimal prevalence adjustment and return adjusted probabilities.

This function finds the optimal prevalence value that minimizes the difference between the predicted and actual positive rates, and then adjusts the input probabilities accordingly.

Parameters:

y_true (array-like) – True labels.
y_proba (array-like) – Predicted probabilities.

Returns:

optimal_prevalence (float) – Optimal prevalence value.
adjusted_proba (array-like) – Adjusted probabilities. First column is the adjusted probabilities for the other class, second column is the adjusted probabilities for the class of interest.

calzone.metrics.cal_ICI(y_adjust, y_proba)[source]

Calculate the Integrated Calibration Index (ICI) for given adjusted probabilities.

Parameters:

y_adjust (array-like) – Adjusted probabilities. Shape (n_samples,).
y_proba (array-like) – Original predicted probabilities. Shape (n_samples,).

Returns:

float – The Integrated Calibration Index (ICI) value.

Note

The ICI is calculated by calculating the mean absolute difference between predicted probabilities and the adjusted probabilities.

calzone.metrics.cal_ICI_cox(coef, intercept, y_proba, class_to_calculate=1, epsilon=1e-07, **kwargs)[source]

Calculate the Integrated Calibration Index (ICI) for a given Cox regression model.

The ICI measures the average absolute difference between the predicted probabilities and the probabilities transformed by the fitted Cox regression model.

Parameters:

coef (float) – The coefficient (slope) from the Cox regression.
intercept (float) – The intercept from the Cox regression.
y_proba (array-like) – Predicted probabilities. Should be of shape (n_samples, n_classes).
class_to_calculate (int) – The class to calculate the ICI for in multi-class problems. Default is 1.
epsilon (float) – Small value to avoid numerical instability when clipping probabilities. Default is 1e-7.

Returns:

ICI (float) – The Integrated Calibration Index.

Note

Lower ICI values indicate better calibration.
The function applies the inverse logit transformation to the predicted probabilities using the coefficients from the Cox regression.

calzone.metrics.cal_ICI_func(func, y_proba, class_to_calculate=1)[source]

Calculate the Integrated Calibration Index (ICI) for a given calibration function.

Parameters:

func (callable) – The calibration function to evaluate.
y_proba (array-like) – Predicted probabilities for each class. Shape (n_samples, n_classes).
class_to_calculate (int, optional) – The class index to calculate the ICI for. Defaults to 1.

Returns:

float – The Integrated Calibration Index (ICI) value.

Note

The ICI is calculated by calculating the mean absolute difference between predicted probabilities and the calibration function evaluated at predicted probabilities.

calzone.metrics.calculate_ece_mce(reliability, confindence, bin_counts)[source]

Calculate Expected Calibration Error (ECE) and Maximum Calibration Error (MCE).

These metrics assess the calibration of a classification model by comparing predicted probabilities to observed frequencies.

Parameters:

reliability (array-like) – Array of observed frequencies for each bin.
confindence (array-like) – Array of predicted probabilities for each bin.
bin_counts (array-like) – Array of sample counts in each bin.

Returns:

ece (float) – The Expected Calibration Error.
mce (float) – The Maximum Calibration Error.

Note

ECE is a weighted average of the absolute differences between confidence and reliability.
MCE is the maximum absolute difference between confidence and reliability across all bins.
Lower values of both ECE and MCE indicate better calibration.

calzone.metrics.cox_regression_analysis(y_true, y_proba, epsilon=1e-07, class_to_calculate=1, print_results=False, fix_intercept=False, fix_slope=False, **kwargs)[source]

Perform Cox regression analysis for classification calibration.

This function fits a logistic regression model to the logit of predicted probabilities to assess the calibration of classification predictions.

Parameters:

y_true (array-like) – True binary labels. If multi-class, will be converted to binary.
y_proba (array-like) – Predicted probabilities. Should be of shape (n_samples, n_classes).
epsilon (float) – Small value to avoid log(0) errors. Default is 1e-7.
class_to_calculate (int) – The class to treat as the positive class in binary classification. Default is 1.
print_results (bool) – If True, prints the summary of the logistic regression results. Default is False.
fix_intercept (bool) – If True, fixes the intercept to 0. Can’t be used with fix_slope. Default is False.
fix_slope (bool) – If True, fixes the coefficient to 1. Can’t be used with fix_intercept. Default is False.

Returns:

coef (float) – The coefficient (slope) of the Cox regression.
intercept (float) – The intercept of the Cox regression.
coef_ci (tuple) – The confidence interval for the coefficient.
intercept_ci (tuple) – The confidence interval for the intercept.

Note

A well-calibrated model should have a coefficient close to 1 and an intercept close to 0.
The function clips probabilities to avoid numerical instability.
For multi-class problems, the function converts the problem to binary classification based on the specified class_to_calculate.

calzone.metrics.get_CI(result, alpha=0.05)[source]

Calculate confidence intervals for each field in the result.

Parameters:

result (numpy.ndarray) – Structured array containing the results for which to calculate confidence intervals.
alpha (float, optional) – The significance level for the confidence interval calculation. Defaults to 0.05.

Returns:

dict – A dictionary where keys are field names from the input array and values are tuples containing the lower and upper bounds of the confidence interval.

Note

This function calculates percentile-based confidence intervals for each field in the input structured array. It’s useful for bootstrap or Monte Carlo simulations.

calzone.metrics.hosmer_lemeshow_test(reliability, confidence, bin_count, df=None, **kwargs)[source]

Compute the Hosmer-Lemeshow test for goodness of fit.

This test is used to assess the calibration of binary classification models with full probability outputs. It compares observed and expected frequencies of events in groups of the data.

Parameters:

reliability (array-like) – Observed proportion of positive samples in each bin.
confidence (array-like) – Predicted probabilities for each bin.
bin_count (array-like) – Number of samples in each bin.
df (int, optional) – Degrees of freedom for the test. Defaults is nbins - 2.

Returns:

chi_squared (float) – The chi-squared statistic of the Hosmer-Lemeshow test.
p_value (float) – The p-value associated with the chi-squared statistic.
df (int) – The degrees of freedom for the test.

Note

The Hosmer-Lemeshow test is widely used for assessing calibration in probabiliticst models.
A small p-value (typically < 0.05) suggests that the model is a poor fit to the data.
This test can be sensitive to the number of groups and sample size.
It is recommended to use the Hosmer-Lemeshow test in conjunction with other metrics.

calzone.metrics.logit_func(coef, intercept)[source]

Create a logistic function with given coefficient and intercept.

Parameters:

coef (float) – The coefficient (slope) of the logistic function.
intercept (float) – The intercept of the logistic function.

Returns:

callable – A function that takes an input x and returns the logistic function value.

Note

The returned function applies the logistic transformation: f(x) = 1 / (1 + exp(-(coef * log(x / (1 - x)) + intercept)))

calzone.metrics.lowess_regression_analysis(y_true, y_proba, epsilon=1e-07, class_to_calculate=1, span=0.5, delta=0.001, it=0, **kwargs)[source]

Perform Lowess regression analysis for classification calibration.

This function applies Locally Weighted Scatterplot Smoothing (LOWESS) to assess the calibration of classification predictions.

Parameters:

y_true (array-like) – True binary labels. If multi-class, will be converted to binary.
y_proba (array-like) – Predicted probabilities. Should be of shape (n_samples, n_classes).
epsilon (float, optional) – Small value to avoid numerical instability when clipping probabilities. Defaults to 1e-10.
class_to_calculate (int, optional) – The class to treat as the positive class in binary classification. Defaults to 1.
span (float, optional) – The fraction of the data used when estimating each y-value. Defaults to 0.5.
delta (float, optional) – Distance within which to use linear-interpolation instead of weighted regression. Defaults to 0.001.
it (int, optional) – The number of residual-based reweightings to perform. Defaults to 0.

Returns:

ICI (float) – The Integrated Calibration Index.
sorted_proba (array-like) – Sorted predicted probabilities.
smoothed_proba (array-like) – Corresponding LOWESS-smoothed actual probabilities.

Note

The function clips probabilities to avoid numerical instability.
For multi-class problems, the function converts the problem to binary classification based on the specified class_to_calculate.
The Integrated Calibration Index (ICI) provides a measure of calibration error, with lower values indicating better calibration.

calzone.metrics.spiegelhalter_z_test(y_true, y_proba, class_to_calculate=1)[source]

Perform Spiegelhalter’s Z-test for calibration of probabilistic predictions.

This test assesses whether predicted probabilities are well-calibrated by comparing them to observed outcomes.

Parameters:

y_true (array-like) – True labels of the samples.
y_proba (array-like) – Predicted probabilities for each class. Shape should be (n_samples, n_classes).
class_to_calculate (int) – Index of the class to calculate the test for. Default is 1.

Returns:

z_score (float) – The z-score of the Spiegelhalter’s Z-test.
p_value (float) – The p-value associated with the z-score.

Note

This test is used to assess the calibration of a classification model.
A small p-value (typically < 0.05) suggests that the model is poorly calibrated.
The test assumes that predictions are independent and identically distributed.

calzone.utils module

Uitlity functions for the Calibration Measure package.

calzone.utils.apply_prevalence_adjustment(adjusted_prevalence, y_true, y_proba, class_to_calculate=1)[source]

Apply the prevalence adjustment method.

Parameters:

adjusted_prevalence (float) – The adjusted prevalence to test.
y_true (array-like) – True labels.
y_proba (array-like) – Predicted probabilities.
class_to_calculate (int) – The class index to adjust. Default is 1.

Returns:

numpy.ndarray – Adjusted probabilities.

class calzone.utils.data_loader(data_path)[source]

Bases: object

A class for loading and preprocessing data from a CSV file.

This class handles various data formats, including those with or without subgroup information and headers.

data_path

Path to the CSV file containing the data.

Type:: str

Header

Array of column headers from the CSV file.

Type:: numpy.ndarray

subgroups

List of subgroup column names, if present.

Type:: list

subgroup_indices

List of indices for subgroup columns, if present.

Type:: list

have_subgroup

Flag indicating whether subgroup information is present.

Type:: bool

data

Raw data loaded from the CSV file.

Type:: numpy.ndarray

probs

Probability values extracted from the data.

Type:: numpy.ndarray

labels

Label values extracted from the data.

Type:: numpy.ndarray

subgroups_class

List of unique subgroup classes for each subgroup, if present.

Type:: list

subgroups_index

List of indices for each subgroup class, if present.

Type:: list

__init__(self, data_path)[source]: Initializes the data_loader object and loads data from a CSV file.

transform_topclass(self)[source]: Transforms the data to top class binary problem.

__init__(data_path)[source]

Initializes the data_loader object and loads data from the specified file.

Parameters:: data_path (str) – Path to the CSV file containing the data.

The method performs the following steps: 1. Loads the header from the CSV file. 2. Checks for the presence of subgroup information. 3. Loads the data based on the presence or absence of subgroup information. 4. Extracts probability values and labels from the loaded data. 5. If subgroups are present, extracts subgroup classes and their indices.

Note: - If there is a header, it must be in the format: proba_0,proba_1,…,subgroup_1(optional),subgroup_2(optional),…,label - If there is no header, the columns must be in the order of proba_0,proba_1,…,label - Raises ValueError if the format is wrong. - Probability columns must be 2 or more.

transform_topclass()[source]

Transforms the data to top class binary problem

Returns:: data_loader – A new data_loader object with transformed data

class calzone.utils.fake_binary_data_generator(alpha_val, beta_val)[source]

Bases: object

A class for generating fake binary data and applying miscalibration.

This class provides methods to generate binary classification data and apply different types of miscalibration to the probabilities.

alpha_val

Alpha parameter for the beta distribution.

Type:: float

beta_val

Beta parameter for the beta distribution.

Type:: float

__init__(alpha_val, beta_val)[source]

Initialize the fake binary data generator.

Parameters:

alpha_val (float) – Alpha parameter for the beta distribution.
beta_val (float) – Beta parameter for the beta distribution.

abraitary_miscal(logits, miscal_function)[source]

Apply arbitrary miscalibration to the input logits.

This function allows for the application of any custom miscalibration function to the input logits.

Parameters:

logits (numpy.ndarray) – Input logits of shape (n_samples, 2).
miscal_function (callable) – Function to apply miscalibration to the logits.

Returns:

numpy.ndarray – Miscalibrated probabilities of shape (n_samples, 2).

generate_data(sample_size)[source]

Generate fake binary classification data.

Parameters:

sample_size (int) – Number of samples to generate.

Returns:

X (numpy.ndarray) – Array of shape (sample_size, 2) containing probabilities for each class.
y_true (numpy.ndarray) – Array of shape (sample_size,) containing true binary labels.

linear_miscal(X, miscal_scale)[source]

Apply linear miscalibration to the input probabilities.

This function transforms the input probabilities to logits, applies a linear scaling, and then converts back to probabilities.

Parameters:

X (numpy.ndarray) – Input probabilities of shape (n_samples, 2).
miscal_scale (float) – Scale factor for miscalibration.

Returns:

numpy.ndarray – Miscalibrated probabilities of shape (n_samples, 2).

calzone.utils.find_optimal_prevalence(y_true, y_proba, class_to_calculate=1, epsilon=1e-07)[source]

Find the optimal adjustment prevalence using scipy.optimize.

Parameters:

y_true (array-like) – True labels.
y_proba (array-like) – Predicted probabilities.
class_to_calculate (int) – The class index to optimize for. Default is 1.
epsilon (float) – Small value to avoid numerical instability. Default is 1e-7.

Returns:

optimal_prevalence (float) – The optimal prevalence.
adjusted_probabilities (numpy.ndarray) – The adjusted probabilities using the optimal prevalence.

calzone.utils.loss(adjusted_prevalence, y_true, y_proba, class_to_calculate=1)[source]

Calculate the loss function for prevalence adjustment.

Parameters:

adjusted_prevalence (float) – The adjusted prevalence.
y_true (array-like) – True labels.
y_proba (array-like) – Predicted probabilities.
class_to_calculate (int) – The class index to calculate loss for. Default is 1.

Returns:

float – Calculated loss value.

calzone.utils.make_roc_curve(y_true, y_proba, class_to_plot=None)[source]

Compute the Receiver Operating Characteristic (ROC) curve for binary or multiclass classification.

Parameters:

y_true (array-like) – True labels of the data. Shape (n_samples,).
y_proba (array-like) – Predicted probabilities of the positive class. Shape (n_samples, n_classes).
class_to_plot (int, optional) – The class to plot the ROC curve for. If None, plots the ROC curve for each class. Default is None.

Returns:

fpr (array) – False Positive Rate for the selected class or each class. Shape (n_points,).
tpr (array) – True Positive Rate for the selected class or each class. Shape (n_points,).
roc_auc (float or array) – Area Under the ROC Curve (AUC) for the selected class or each class. If class_to_plot is not None, returns a float. If class_to_plot is None, returns an array of shape (n_classes,).

Note

The input arrays y_true and y_proba must have the same number of samples.
The input array y_proba must have probabilities for each class in a multiclass problem.
The input array y_proba must not contain any NaN values.

Example

>>> y_true = [0, 1, 1, 0, 1]
>>> y_proba = [[0.2, 0.8], [0.6, 0.4], [0.3, 0.7], [0.9, 0.1], [0.4, 0.6]]
>>> fpr, tpr, roc_auc = roc_curve(y_true, y_proba, class_to_plot=1)

calzone.utils.reliability_diagram(y_true, y_proba, num_bins=10, class_to_plot=None, is_equal_freq=False, save_path=None)[source]

Compute the reliability diagram for a binary or multi-class classification model.

Parameters:

y_true (array-like) – True labels of the samples. Can be a binary array or a one-hot encoded array.
y_proba (array-like) – Predicted probabilities for each class. Shape should be (n_samples, n_classes).
num_bins (int) – Number of bins to divide the predicted probabilities into. Default is 10.
class_to_plot (int or None) – Index of the class to plot the reliability diagram for. If None, the diagram will be computed for all classes. Default is None.
is_equal_freq (bool) – If True, the bins will be equally frequent. If False, the bins will be equally spaced. Default is False.

Returns:

reliabilities (array-like) – Array of accuracies for each bin. Shape depends on the value of class_to_plot.
confidences (array-like) – Array of average confidences for each bin. Shape depends on the value of class_to_plot.
bin_edges (array-like) – Array of bin edges.
bin_counts (array-like) – Array of counts for each bin.

Note

The reliability diagram is a graphical tool to assess the calibration of a classification model. It plots the average predicted probability against the observed accuracy for each bin of predicted probabilities.
If y_true is a binary array, it will be converted to a one-hot encoded array internally.
If class_to_plot is not None, the reliability diagram will be computed only for the specified class. Otherwise, it will be computed for all classes.
The number of bins determines the granularity of the reliability diagram. Higher values result in more bins and a more detailed diagram.

calzone.utils.removing_nan(y_true, y_predict, y_proba)[source]

Remove rows containing NaN values from input arrays.

Parameters:

y_true (array-like) – True labels.
y_predict (array-like) – Predicted labels.
y_proba (array-like) – Predicted probabilities.

Returns:

y_true (array-like) – Cleaned version of y_true with NaN rows removed.
y_predict (array-like) – Cleaned version of y_predict with NaN rows removed.
y_proba (array-like) – Cleaned version of y_proba with NaN rows removed.

calzone.utils.softmax_to_logits(probabilities, epsilon=1e-07)[source]

Convert softmax probabilities to logits.

Parameters:

probabilities (array-like) – Input probabilities.
epsilon (float) – Small value to avoid log(0). Default is 1e-7.

Returns:

numpy.ndarray – Computed logits.

calzone.utils.transform_topclass(probs, labels)[source]

Transforms the data to top class binary problem

Parameters:

probs (numpy.ndarray) – Array of probability values
labels (numpy.ndarray) – Array of label values

Returns:

tuple – (transformed_probs, transformed_labels)

calzone.vis module

Visualization functions for the Calibration Measure package.

calzone.vis.plot_reliability_diagram(reliabilities, confidences, bin_counts, bin_edges=None, line=True, error_bar=False, z=1.96, title='Reliability Diagram', save_path=None, return_fig=False, custom_colors=None, dpi=150)[source]

Plot a reliability diagram to visualize the calibration of a model.

Parameters:

reliabilities (array-like) – Empirical frequencies for each bin.
confidences (array-like) – Mean predicted probabilities for each bin.
bin_counts (array-like) – Number of samples in each bin.
bin_edges (array-like, optional) – Edges of the bins. If None, assumes equal-spaced bins.
line (bool, optional) – If True, plot lines connecting points. If False, plot as a bar chart. Defaults to True.
error_bar (bool, optional) – If True, add error bars to the plot. Defaults to False.
z (float, optional) – Z-score for calculating Wilson score interval. Defaults to 1.96.
title (str, optional) – Title of the plot. Defaults to ‘Reliability Diagram’.
save_path (str, optional) – Path to save the figure. If None, figure is not saved. Defaults to None.
return_fig (bool, optional) – If True, return the figure object. Defaults to False.
custom_colors (list, optional) – List of custom colors for multi-class plots. Defaults to None.
dpi (int, optional) – DPI for saving the figure. Defaults to 150.

Returns:

matplotlib.figure.Figure, optional – The figure object if return_fig is True.

calzone.vis.plot_roc_curve(fpr, tpr, roc_auc, class_to_plot=None, title='ROC Curve', save_path=None, dpi=150, return_fig=False)[source]

Plots the Receiver Operating Characteristic (ROC) curve.

Parameters:

fpr (array-like) – False Positive Rate values.
tpr (array-like) – True Positive Rate values.
roc_auc (float or array-like) – Area Under the ROC Curve (AUC) value(s).
class_to_plot (int, optional) – The class to plot. If None, plots all classes. Defaults to None.
title (str, optional) – Title of the plot. Defaults to ‘ROC Curve’.
save_path (str, optional) – Path to save the figure. If None, the figure is not saved. Defaults to None.
dpi (int, optional) – The resolution in dots per inch for saving the figure. Defaults to 150.
return_fig (bool, optional) – If True, returns the figure object instead of displaying it. Defaults to False.

Returns:

matplotlib.figure.Figure or None – The figure object if return_fig is True, otherwise None.

This function creates a matplotlib figure showing the ROC curve(s).

calzone package

Submodules

calzone.metrics module

calzone.utils module

calzone.vis module

Module contents