Nonconformist

Conformal Prediction in Python

SweDS BTH 2015


Henrik Linusson, henrik.linusson@hb.se

Dept. of Information Technology, University of Borås

Overview

Conformal Prediction

  • What?
  • How?

Nonconformist

  • About
  • Classification
  • Regression

Conformal Prediction - What?

Framework for producing confidence predictions for test patterns \(x_1, ..., x_n\).

Predictions are multi-valued prediction regions, \(\Gamma_1^\epsilon, ..., \Gamma_n^\epsilon\).

Each prediction region contains true output \(y_i\) with probability \(1-\epsilon\).

Classification

Traditional classifier: \(\hat{y}_i = iris\_setosa\)

Conformal classifier: \(\Gamma_i^\epsilon = \left\{ iris\_setosa, iris\_versicolor \right\}\)

Regression

Traditional regressor: \(\hat{y}_i = 0.3\)

Conformal regressor: \(\Gamma_i^\epsilon = \left[ 0.1, 0.4 \right]\)

Conformal Prediction - What?

  • Requires exchangeability (identically distributed examples, not necessarily independent)
  • Confidence is calculated per instance (as opposed to per model)
  • Don't need to know priors
  • Can be applied to any machine learning algorithm

Applications

  • Classification
  • Regression
  • Anomaly detection
  • Change detection
  • ...

Nonconformist (1.2.5)

A Python implementation of the conformal prediction framework.

Vision

Provide a scikit-learn compatible implementation of conformal prediction, that implements the core framework as well as scientific advancements.

Available on GitHub and PyPi:

http://github.com/donlnz/nonconformist

% pip install nonconformist

API documentation:

http://donlnz.github.io/nonconformist/

Compatibility

Tested primarily with Python 2.7.x, but should be compatible with Python 3.x.

Requirements: numpy, scikit-learn

In [1]:
%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (10, 6)
import pandas
Populating the interactive namespace from numpy and matplotlib

In [2]:
# Normalize class enumeration; classes are 0, 1, 2, ...
def normalize_classes(ys):
    for i, c in enumerate(np.unique(ys)):
        ys[ys == c] = i
    return np.array(ys, dtype=int)

# Divide data set into training, calibration and test sets.
def get_indices(n):
    idx = np.random.permutation(n)
    train = idx[:int(n / 3)]
    cal = idx[int(n / 3):int(2 * n / 3)]
    test = idx[int(2 * n / 3):]
    return idx, train, cal, test
In [3]:
# Load data set, set up indices, normalize classes.
data_set = 'iris.csv'
ds = pandas.read_csv('dataset/csv_class/multi/' + data_set, header=None)
ds_x, ds_y = ds.values[:, :-1], ds.values[:, -1]

class_names = np.unique(ds_y)

# Set up indices for trainig, calibration and test (1/3 each)
idx, train, cal, test = get_indices(ds_y.size)

# Nonconformist only handles numerical classes, 0, 1, ...
ds_y = normalize_classes(ds_y)                 

Conformal Prediction - How?

We need three things:

  • Example data
  • A nonconformity function
  • A statistical test
In [4]:
'{}: {} instances, {} features, {} classes'.format(data_set,
                                                   ds_y.size,
                                                   ds_x.shape[1],
                                                   np.unique(ds_y).size)
Out[4]:
'iris.csv: 150 instances, 4 features, 3 classes'

Nonconformity function

Real-valued function that can rank examples according to how nonconforming (strange, atypical) they are.

Typical example:

  • Measure the error, E, of a classification model on the pattern \(x_i, y_i\)
  • If the error is small, then \(x_i, y_i\) is not strange
  • If the error is large, then \(x_i, y_i\) is strange
In [5]:
from nonconformist.icp import IcpClassifier
from nonconformist.nc import ProbEstClassifierNc, margin
from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier

is the underlying model, \(h\)

margin

is the error function, \[\Delta\left[y_i, h\left(x_i\right)\right] = 1 - \left(\hat{P}_h\left(y_i \mid x_i \right) - \text{arg}\max_{\tilde{y}\neq y_i} \left\{ \hat{P}_h\left( \tilde{y} \mid x_i \right) \right\} \right)\]

ProbEstClassifierNc

is the nonconformity function (combines RF + margin)

Training Procedure - ICP

  1. Divide training data \(Z\) into
    1. Proper training set \(Z_t\)
    2. Calibration set \(Z_c\) where \(|Z_c| = q\)
  2. Define nonconformity function, \(\alpha_i = \Delta\left[ y_i, h(x_i) \right]\)
  3. Train underlying predictive model, \(h\), on \(Z_t\)
  4. Measure nonconformity of calibration examples, \[\left\{ \alpha_1, ..., \alpha_q \right\} = \left\{ \Delta\left[ y_j, h(x_j) \right], (x_j, y_j) \in Z_c \right\}\]
In [6]:
train.size, cal.size, test.size
Out[6]:
(50, 50, 50)
In [7]:
# Define a nonconformity function (underlying model + error measure)
# and construct an ICP.

nc = ProbEstClassifierNc(RandomForestClassifier,
                         margin,
                         model_params={'n_estimators': 100})

icp = IcpClassifier(nc)
In [8]:
# Fit and calibrate ICP.

icp.fit(ds_x[train, :], ds_y[train])
icp.calibrate(ds_x[cal, :], ds_y[cal])

Prediction Procedure - ICP

  1. Obtain prediction, \(\hat{y}_i = h(x_i)\)
  2. Apply statistical test

\[p_i^\tilde{y} = \frac{\left|\left\{ \alpha_j \in \alpha_1, ..., \alpha_q : \alpha_j \gt \alpha_i^{\tilde{y}} \right\}\right|}{q + 1} + \theta \frac{\left|\left\{ \alpha_j \in \alpha_1, ..., \alpha_q : \alpha_j = \alpha_i^{\tilde{y}} \right\}\right| + 1}{q + 1}, \theta \sim U[0,1]\]

\(\Gamma_i^\epsilon = \left\{ \tilde{y} \in Y : p_i^{\tilde{y}} \gt \epsilon \right\}\)

Classification

  • Let \(\alpha_i^\tilde{y} = \Delta\left[\tilde{y}, h(x_i)\right]\), where \(\tilde{y} \in Y\)
  • Find \(p_i^\tilde{y}\)

Regression

  • Let \(p_i^\tilde{y} = \epsilon\)
  • Find \(\tilde{y} = \hat{y}_i \pm \Delta'\left[\alpha_i^\tilde{y}\right]\)
In [9]:
# Output p-values for 10 first test examples.
print(icp.predict(ds_x[test, :])[:10, :])
[[ 0.83143547  0.0123933   0.00104294]
 [ 0.01174884  0.32859891  0.01721744]
 [ 0.00332319  0.13980131  0.02827003]
 [ 0.01234043  0.00448942  0.78979513]
 [ 0.00576789  0.37220153  0.01853502]
 [ 0.02709644  0.05802177  0.07618165]
 [ 0.00929468  0.7973417   0.00971181]
 [ 0.00992861  0.0021417   0.35937582]
 [ 0.94852019  0.00299132  0.01757711]
 [ 0.0134875   0.2112158   0.00295814]]

In [10]:
# Output prediction sets for 10 first test examples.
print(icp.predict(ds_x[test, :], significance=0.05)[:10, :])
[[ True False False]
 [False  True False]
 [False  True False]
 [False False  True]
 [False  True False]
 [False  True  True]
 [False  True False]
 [False False  True]
 [ True False False]
 [False  True False]]

In [11]:
def plot_predictions_class(significance):
    fig, ax = plt.subplots(1, significance.size, sharey=True)
    for i in range(significance.size):
        predictions = icp.predict(ds_x[test, :], significance=significance[i])
        n, c = predictions.shape
        
        colors = ['c', 'm', 'b', 'y', 'r', 'g', 'b', 'k']
        
        for c_ in range(c):
            idx_y = np.array(filter(lambda x: predictions[x, c_], range(n)))
            idx_x = [c_] * idx_y.size
            ax[i].scatter(idx_x, idx_y, color=colors[c_])
            
        idx_multi = np.array(filter(lambda x: np.sum(predictions[x, :]) > 1, range(n)))
        for idx_multi_ in idx_multi:
            ax[i].plot(np.linspace(-1, c, 100), [idx_multi] * 100, '--', color='k')
        
        ax[i].axis([-.5, c - 0.5, -.5, n + .5])
        ax[i].set_xlabel('Class')
        ax[i].set_ylabel('Test instance #')
        ax[i].set_xticks(range(c))
        ax[i].set_title(r'$\epsilon = {}$'.format(significance[i]))
        ax[i].set_xticklabels(class_names)
    plt.show()
In [12]:
plot_predictions_class(significance=np.array([0.1, 0.05, 0.01]))
In [13]:
from nonconformist.evaluation import ClassIcpCvHelper
from nonconformist.evaluation import cross_val_score
from nonconformist.evaluation import class_one_c, class_avg_c, class_mean_errors
In [14]:
# Construct a cross-validation helper object for the ICP.
# Here, we define at what significance level we want to evaluate the ICP.
cv = ClassIcpCvHelper(icp)
In [15]:
# Perform cross-validation using the cross_val_score method.
score = cross_val_score(cv,
                        ds_x[idx, :],
                        ds_y[idx],
                        scoring_funcs=[class_mean_errors,
                                       class_avg_c,
                                       class_one_c])

score = score.drop(['fold', 'iter'], axis=1)
score[score.significance <= 0.1].groupby(['significance']).mean()
Out[15]:
class_avg_c class_mean_errors class_one_c
significance
0.01 2.329333 0.016667 0.118667
0.02 1.681333 0.028000 0.442000
0.03 1.116000 0.035333 0.902667
0.04 1.064667 0.045333 0.925333
0.05 1.024667 0.054667 0.936667
0.06 0.990000 0.062667 0.953333
0.07 0.974000 0.070000 0.948000
0.08 0.952667 0.082000 0.942000
0.09 0.936000 0.091333 0.930667
0.10 0.924000 0.100000 0.924000

10 rows × 3 columns

In [16]:
avgc = score.groupby(['significance']).mean()['class_avg_c'].values
errs = score.groupby(['significance']).mean()['class_mean_errors'].values
In [17]:
# Plot errors.
def plot_errors_class():   
    lin = np.linspace(0.01, 0.99, 99)
    fig, axs = plt.subplots(1, 2, figsize=(16,6))
    axs[0].plot(lin, errs, color='r')
    axs[0].plot(lin, lin, '--', color='k')
    
    axs[1].plot(lin[:25], errs[:25], color='r')
    axs[1].plot(lin[:25], lin[:25], '--', color='k')
    
    for ax in axs:
        ax.legend([data_set, 'Expected Errors'], loc=2)
        ax.set_xlabel('Significance')
        ax.set_ylabel('Error rate')
        
    axs[0].set_title(r'{} error rates, $\epsilon \in (0, 1)$'.format(data_set))
    axs[1].set_title(r'{} error rates, $\epsilon \in (0, 0.25]$'.format(data_set))
    plt.show()
In [18]:
plot_errors_class()
In [19]:
# Plot prediction region sizes.
def plot_sizes_class():
    lin = np.linspace(0.01, 0.99, 99)
    fig, axs = plt.subplots(1, 2, figsize=(16,6))
    axs[0].plot(lin, avgc, color='r')
    axs[0].set_ybound(0, np.unique(ds_y).size + 0.5)
    
    axs[1].plot(lin[:25], avgc[:25], color='r')
    axs[1].set_xbound(0, 0.25)
    axs[1].set_ybound(0, np.unique(ds_y).size + 0.5)
    
    for ax in axs:
        ax.set_xlabel('Significance')
        ax.legend([data_set])
        ax.set_ylabel('Mean #classes')
    
    axs[0].set_title(r'{} mean prediction set size, $\epsilon \in (0, 1)$'.format(data_set))
    axs[1].set_title(r'{} mean prediction set size, $\epsilon \in (0, 0.25]$'.format(data_set))
    plt.show()
In [20]:
plot_sizes_class()
In [21]:
# Load data set, set up indices, normalize classes.
data_set = 'stock.csv'
ds = pandas.read_csv('dataset/csv_reg/' + data_set, header=None)
ds_x, ds_y = ds.values[:, :-1], ds.values[:, -1]

# Set up indices for trainig, calibration and test (1/3 each)
idx, train, cal, test = get_indices(ds_y.size)
In [22]:
'{}: {} instances, {} features'.format(data_set,
                                       ds_y.size,
                                       ds_x.shape[1])
Out[22]:
'stock.csv: 950 instances, 9 features'
In [23]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from nonconformist.icp import IcpRegressor
from nonconformist.nc import RegressorNc, NormalizedRegressorNc
from nonconformist.nc import abs_error, abs_error_inv
In [24]:
# Default IcpRegressor, same interval size for each test pattern
nc = RegressorNc(RandomForestRegressor,
                 abs_error,
                 abs_error_inv,
                 model_params={'n_estimators': 100})
icp = IcpRegressor(nc)
In [25]:
# Normalized IcpRegressor, estimates difficulty of each test pattern
nc_norm = NormalizedRegressorNc(RandomForestRegressor,
                                KNeighborsRegressor,
                                abs_error,
                                abs_error_inv,
                                model_params={'n_estimators': 100})
icp_norm = IcpRegressor(nc_norm)
In [26]:
icp.fit(ds_x[train, :], ds_y[train])
icp_norm.fit(ds_x[train, :], ds_y[train])

icp.calibrate(ds_x[cal, :], ds_y[cal])
icp_norm.calibrate(ds_x[cal, :], ds_y[cal])
In [27]:
def plot_reg_predictions(significance):
    fig, ax = plt.subplots(2, 1, sharey=True, sharex=True)
    
    pred = icp.predict(ds_x[test, :], significance=significance)
    pred_norm = icp_norm.predict(ds_x[test, :], significance=significance)
    
    underlying_pred = np.mean(pred, axis=1)
    underlying_pred_norm = np.mean(pred_norm, axis=1)
    
    pred_size = (pred[:, 1] - pred[:, 0]) / 2
    pred_size_norm = (pred_norm[:, 1] - pred_norm[:, 0]) / 2
    
    idx_ = np.argsort(underlying_pred)
    idx_norm_ = np.argsort(underlying_pred_norm)
    
    underlying_pred = underlying_pred[idx_]
    underlying_pred_norm = underlying_pred_norm[idx_norm_]
    
    pred_size = pred_size[idx_]
    pred_size_norm = pred_size_norm[idx_norm_]
    
    ax[0].plot(underlying_pred - pred_size, color='c')
    ax[0].plot(underlying_pred + pred_size, color='m')
    ax[0].plot(underlying_pred, color='b')
    ax[0].scatter(range(test.size), ds_y[test[idx_]], color='k', s=1)
    
    ax[1].plot(underlying_pred_norm - pred_size_norm, color='c')
    ax[1].plot(underlying_pred_norm + pred_size_norm, color='m')
    ax[1].scatter(range(test.size), ds_y[test[idx_]], color='k', s=1)
    ax[1].plot(underlying_pred_norm, color='b')
    
    ax[0].axis([0, test.size, pred.min(), pred.max()])
    ax[1].axis([0, test.size, pred_norm.min(), pred_norm.max()])
    
    ax[0].legend([r'$min$', r'$max$', r'$\hat{y}$', r'$y$'], 2)
    ax[1].legend([r'$min$', r'$max$', r'$\hat{y}$', r'$y$'], 2)
    ax[0].set_title(r'{}, Non-normalized, $\epsilon = {}$'.format(data_set,
                                                                  significance))
    ax[1].set_title(r'{}, Normalized, $\epsilon = {}$'.format(data_set,
                                                              significance))
    plt.show()
In [28]:
plot_reg_predictions(0.01)
In [35]:
from nonconformist.evaluation import RegIcpCvHelper, run_experiment
from nonconformist.evaluation import reg_mean_errors, reg_mean_size
from os import listdir

csv_folder = 'dataset/csv_reg/'
csv_files = [csv_folder + f for f in listdir(csv_folder)]

result = run_experiment(RegIcpCvHelper(icp_norm),
                        csv_files,
                        folds=2,
                        iterations=1,
                        scoring_funcs=[reg_mean_errors,
                                       reg_mean_size])

print(result.groupby(['data_set', 'significance']).mean())
result.to_csv('results.csv')

# Dev: result = run_experiment([model_1, model_2, ...], .. )
                          fold  iter  reg_mean_errors  reg_mean_size
data_set    significance                                            
cooling.csv 0.01           0.5     0         0.026052       6.427263
            0.02           0.5     0         0.026052       6.427263
            0.03           0.5     0         0.031274       6.114432
            0.04           0.5     0         0.032579       5.988957
            0.05           0.5     0         0.054728       5.794245
            0.06           0.5     0         0.070391       5.345505
            0.07           0.5     0         0.070391       5.345505
            0.08           0.5     0         0.105605       4.997145
            0.09           0.5     0         0.114723       4.925896
            0.10           0.5     0         0.132972       4.780487
            0.11           0.5     0         0.156464       4.605108
            0.12           0.5     0         0.178606       4.454894
            0.13           0.5     0         0.189037       4.383452
            0.14           0.5     0         0.196853       4.310160
            0.15           0.5     0         0.208582       4.227798
            0.16           0.5     0         0.211189       4.183726
            0.17           0.5     0         0.224234       4.090384
            0.18           0.5     0         0.254222       3.895567
            0.19           0.5     0         0.264666       3.849651
            0.20           0.5     0         0.273798       3.796729
            0.21           0.5     0         0.276402       3.778029
            0.22           0.5     0         0.277708       3.769471
            0.23           0.5     0         0.285540       3.731766
            0.24           0.5     0         0.322029       3.495056
            0.25           0.5     0         0.329842       3.448385
            0.26           0.5     0         0.336362       3.422681
            0.27           0.5     0         0.337665       3.397482
            0.28           0.5     0         0.358549       3.289697
            0.29           0.5     0         0.362465       3.261847
            0.30           0.5     0         0.375486       3.192264
            0.31           0.5     0         0.378094       3.160819
            0.32           0.5     0         0.400277       3.094361
            0.33           0.5     0         0.415912       2.994736
            0.34           0.5     0         0.415912       2.994736
            0.35           0.5     0         0.417214       2.971319
            0.36           0.5     0         0.422422       2.933933
            0.37           0.5     0         0.430255       2.864927
            0.38           0.5     0         0.438085       2.813662
            0.39           0.5     0         0.439387       2.806111
            0.40           0.5     0         0.470640       2.667253
            0.41           0.5     0         0.473251       2.645087
            0.42           0.5     0         0.508489       2.478000
            0.43           0.5     0         0.518913       2.442657
            0.44           0.5     0         0.526739       2.396870
            0.45           0.5     0         0.543693       2.336573
            0.46           0.5     0         0.551512       2.288062
            0.47           0.5     0         0.552814       2.263000
            0.48           0.5     0         0.555422       2.223962
            0.49           0.5     0         0.568443       2.149684
            0.50           0.5     0         0.584078       2.105297
            0.51           0.5     0         0.604935       1.969819
            0.52           0.5     0         0.611459       1.930066
            0.53           0.5     0         0.621893       1.839068
            0.54           0.5     0         0.628410       1.804753
            0.55           0.5     0         0.641431       1.749919
            0.56           0.5     0         0.650566       1.695733
            0.57           0.5     0         0.657080       1.649113
            0.58           0.5     0         0.660986       1.630941
            0.59           0.5     0         0.674030       1.569752
            0.60           0.5     0         0.679239       1.552703
                           ...   ...              ...            ...

[297 rows x 4 columns]

Other Features

  • Conditional Conformal Prediction
  • Aggregated Conformal Prediction

Future Work

  • Improve experimentation toolbox
  • Interpolated \(p\)-values
  • Exchangeability testing (change detection)
  • Transductive Conformal Prediction

Questions?