GenAI Systems Lab Open interactive version →
Evaluation 11 min read

Conformal Prediction: Rigorous Uncertainty Quantification Without Bayesian Assumptions

A distribution-free framework that gives a coverage guarantee: the true label will be in the prediction set at least (1-α)% of the time, for any model, any distribution. Split conformal from scratch, Mondrian CP for conditional coverage, and how to use prediction set size as a real-time uncertainty signal in production.

Conformal Prediction: Rigorous Uncertainty Quantification Without Bayesian Assumptions

Bayesian uncertainty requires a prior and a likelihood model. Calibration (ECE) is a frequentist approximation. Conformal prediction is neither — it's a distribution-free framework that gives you a coverage guarantee: the true label will be in the prediction set at least (1-α)% of the time, regardless of the model or data distribution. No assumptions required.

The Core Idea

Instead of outputting a single prediction, output a set of predictions that is guaranteed to contain the true label with probability 1-α. The set is constructed from a calibration dataset the model has never seen.

# Split Conformal Prediction (the standard approach)
# Step 1: fit model on training data
# Step 2: compute nonconformity scores on held-out calibration set
# nonconformity score = 1 - softmax_prob[true_label]  (for classification)

import numpy as np

def split_conformal(cal_scores, alpha=0.1):
    n = len(cal_scores)
    # Find the (1-alpha)(1 + 1/n) quantile of calibration scores
    q_level = np.ceil((1 - alpha) * (n + 1)) / n
    q_hat = np.quantile(cal_scores, q_level)
    return q_hat

# At test time: include all classes where 1 - softmax_prob[class] <= q_hat
def predict_set(softmax_probs, q_hat):
    scores = 1 - softmax_probs
    return np.where(scores <= q_hat)[0]  # indices of included classes

The coverage guarantee: if calibration data and test data are exchangeable (same distribution), the prediction set contains the true label with probability >= 1-α. This is a finite-sample guarantee, not an asymptotic one. It holds for any model, any loss function, any data distribution.

Why This Beats Softmax for Uncertainty

Mondrian Conformal Prediction: Conditional Coverage

Standard conformal gives marginal coverage: averaged over all inputs, coverage is 1-α. But you might want coverage to hold separately for each class, demographic group, or input type. Mondrian conformal splits the calibration set by group and computes a separate threshold per group.

This matters in production when coverage needs to be uniform across groups — medical diagnosis must have 95% coverage for all patient demographics, not just on average. Standard conformal might give 99% coverage for the majority group and 88% for the minority group while averaging 95%.

When to Use It in Production

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →