pyuncertainnumber.calibration

Submodules

Classes

TMCMC

Class for TMCMC implementation

Stage

Container for one TMCMC stage.

MCMCCalibrator

Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS).

KNNCalibrator

Unified kNN-based calibrator for black-box models or precomputed simulations.

EpistemicFilter

The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores.

Package Contents

class pyuncertainnumber.calibration.TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)

Class for TMCMC implementation

Parameters:
  • N (int) – number of particles to be sampled from posterior

  • parameters (list) – list of (size Nop) prior distributions instances

  • log_likelihood (callable) – log likelihood function on \(\theta\) to be defined which is problem specific

  • status_file_name (str) – name of the status file to store status of the tmcmc sampling

Returns:

returns trace file of all samples of all tmcmc stages.

At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

mytrace

Example

>>> from pyuncertainnumber.calibration.tmcmc import TMCMC
>>> from pyuncertainnumber import pba
>>> # create an instance of TMCMC class and run tmcmc
>>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions
>>> t = TMCMC(
>>>     N=1000,                                             # number of particles
>>>     parameters=parameters,                              # prior distributions
>>>     names =['k1', 'k2'],                                # parameter names
>>>     log_likelihood=log_likelihood_function,             # user defined log likelihood function
>>>     mutation_steps=1,                                   # number of MCMC steps for perturbation
>>>     status_file_name='tmcmc_progressing_status.txt',    # status log file
>>> )
>>> mytrace = t.run()
>>> # for loading trace file
>>> import pickle
>>> with open('tmcmc_trace.pkl', 'rb') as f:
>>>     mytrace = pickle.load(f)
>>> re = TMCMC(trace=mytrace, names=names)                  # create TMCMC instance with loaded trace and names
>>> re.plot_updated_distribution(save=False)                # plot prior and posterior distribution
>>> prior_samples = re.load_prior_samples()                 # load prior samples as DataFrame
>>> posterior_samples = re.load_posterior_samples()         # load posterior samples as DataFrame
2DOF TMCMC example

Example of TMCMC calibration of a 2-DOF system

N = None
parameters = None
names = None
log_likelihood = None
mutation_steps = 5
status_file_name = None
trace = None
check_names()
run() list[Stage]

Run the TMCMC algorithm

Returns:

returns trace file of all samples of all tmcmc stages.

Return type:

mytrace

check_trace()
save_trace(mytrace, file_name: str, save_dir=None)

Save the trace to a file

Parameters:
  • mytrace – trace file of all samples of all tmcmc stages.

  • file_name (str) – name of the file to save the trace

load_prior_samples(trace=None) pandas.DataFrame
load_posterior_samples(trace=None) pandas.DataFrame
plot_updated_distribution(trace=None, save=False)

Plot the prior and posterior distribution of the parameters

Parameters:
  • trace – trace file of all samples of all tmcmc stages.

  • save (bool) – if True, save the figure

class pyuncertainnumber.calibration.Stage

Container for one TMCMC stage.

Parameters:
  • Sm (NDArray) – samples before resampling (N x Np)

  • Lm (NDArray) – log-likelihoods at Sm (N,)

  • Wm_n (NDArray) – normalized incremental weights (N,)

  • ESS (Optional[float]) – effective sample size the next stage (can be None in terminal stage)

  • beta (float) – tempering parameter next stage beta

  • Smcap (Optional[NDArray]) – resampled samples (N x Np), None in terminal stage

Sm: numpy.typing.NDArray
Lm: numpy.typing.NDArray
Wm_n: numpy.typing.NDArray
ESS: float | None
beta: float
Smcap: numpy.typing.NDArray | None
__repr__() str
property samples: numpy.typing.NDArray

Samples before resampling (from previous stage).

property log_likelihoods: numpy.typing.NDArray

Log-likelihood evaluated at samples.

property weights: numpy.typing.NDArray

Normalized incremental importance weights.

property effective_sample_size: float | None

Effective sample size after reweighting the next stage.

property tempering_parameter: float

Tempering parameter beta next stage beta.

property resampled_samples: numpy.typing.NDArray | None

Samples after importance resampling (before MH mutation).

class pyuncertainnumber.calibration.MCMCCalibrator(n_chains: int = 4, n_samples: int = 1000, burn_in: int = 200)

Bases: Calibrator

Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS).

n_chains = 4
n_samples = 1000
burn_in = 200
_prior = None
_likelihood = None
_posterior_chain = None
setup(prior=None, likelihood=None, model=None)

Define priors and likelihood (or simulator-based likelihood).

calibrate(observations: Any, resample_n: int | None = None) Any

Run MCMC to sample posterior given observations.

get_posterior() Any

Return MCMC chain or posterior samples.

class pyuncertainnumber.calibration.KNNCalibrator(knn: int = 100, a_tol: float = 0.05, evaluate_model: bool = False)

Bases: Calibrator

Unified kNN-based calibrator for black-box models or precomputed simulations.

Parameters:
  • knn (int) – Number of neighbors per observed row. Default: 100.

  • a_tol (float) – Tolerance for matching simulated \(\xi\) to a requested \(\xi^*\) (when reusing). A simulation is kept if \(\|\xi_{\text{sim}} - \xi^*\|_\infty < a_{\text{tol}}\). Default: 0.05.

  • evaluate_model (bool) – If True, call the black-box model for each \(\xi\) in xi_list on a shared \(\theta\) grid. If False, reuse simulated_data (requires y/theta/xi arrays). Default: False.

  • random_state (int) – Seed for reproducibility (affects theta_sampler and resampling). Default: 42.

Note

Setup (unified approach):

  • If evaluate_model=False and simulated_data is provided:

    • Reuse pre-computed simulations

    • Build a per-design kNN index by filtering rows with \(\|\xi - \xi^*\|_\infty < a_{\text{tol}}\) for each \(\xi^*\) in xi_list

  • If evaluate_model=True:

    • Simulate \(y = \text{model}(\theta, \xi)\) for each \(\xi\) in xi_list

    • Use a shared \(\theta\) grid drawn once from theta_sampler(n_samples)

    • Build per-design kNN indices on this shared grid

Calibration workflow (single/multi-design):

For each observation pair \((y_{\text{obs}}, \xi)\):

  1. Standardize \(y_{\text{obs}}\) using the per-design scaler

  2. Find k nearest neighbors in y-space

  3. Map neighbor indices to \(\theta\) values for that design

  4. Stack \(\theta\) samples across all observations/designs (or apply voting/intersection)

knn_illustration

KNN calibration illustration.

knn = 100
a_tol
evaluate_model = False
random_state = 42
_theta_grid: numpy.ndarray | None = None
_theta_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]
_y_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]
_scaler_by_xi: Dict[Tuple[float, Ellipsis], sklearn.preprocessing.StandardScaler]
_neigh_by_xi: Dict[Tuple[float, Ellipsis], sklearn.neighbors.NearestNeighbors]
_grid_idx_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]
_posterior: Dict[str, Any] | None = None
_sim_y: numpy.ndarray | None = None
_sim_theta: numpy.ndarray | None = None
_sim_xi: numpy.ndarray | None = None
static _key_from_xi(xi) Tuple[float, Ellipsis]

Stable tuple key for a scalar/vector design ξ.

setup(model: Callable[[numpy.ndarray, float | numpy.ndarray], numpy.ndarray] | None = None, theta_sampler: Callable[[int], numpy.ndarray] | None = None, simulated_data: Dict[str, numpy.ndarray] | None = None, xi_list: List[float | numpy.ndarray] | None = None, n_samples: int = 10000)

Prepare per-design kNN structures by either reusing simulated_data or simulating for each design.

Parameters:
  • model (callable, optional) – Black-box simulator with signature model(theta, xi) -> y (vectorized over theta).

  • theta_sampler (callable, optional) – Sampler for \(\theta\); required when evaluate_model=True.

  • simulated_data (dict, optional) – Dict with keys {“y”: (n, dy), “theta”: (n, dθ), “xi”: (n, dξ)} when reusing sims.

  • xi_list (list, optional) – List of designs; each item can be scalar or array-like. If None, defaults to [0.0].

  • n_samples (int) – Number of \(\theta\) samples to draw when evaluate_model=True. Default: 10000.

nearest(y: numpy.ndarray | List[float], xi: float | numpy.ndarray, k: int | None = None, return_dist: bool = False)

Return k nearest neighbors for y at design xi.

Parameters:
  • y (array-like) – Query outputs, shape (m, d_y) or (d_y,).

  • xi (scalar or array-like) – Design key to select the per-design index.

  • k (int, optional) – Number of neighbors; defaults to self.knn.

  • return_dist (bool) – If True, also return distances and raw indices. Default: False.

Returns:

Shape (m*k, dθ) stacked \(\theta\) for all query rows. distances (ndarray, optional): Returned if return_dist=True. indices (ndarray, optional): Returned if return_dist=True.

Return type:

theta_neighbors (ndarray)

calibrate(observations, resample_n: int | None = None, combine: str = 'stack', combine_params: dict | None = None)

Run kNN calibration and aggregate posterior θ across neighbor-hit blocks.

Parameters:
  • observations – Observed simulator or model outputs to calibrate against.

  • resample_n (int | None) – If set, resample posterior θ samples to this size. If None, return all aggregated θ without resampling.

  • combine (str) –

    Aggregation mode. One of:

    • ’stack’: concatenate all kNN θ; optional de-duplication.

    • ’intersect’: retain θ hit at least min_count times across neighbor blocks.

  • combine_params (dict | None) –

    Optional parameters controlling aggregation and KDE weighting.

    Supported keys:

    • dedup (bool): Default False. Remove duplicate θ (only for ‘stack’).

    • theta_match_tol (float): Default 1e-9. Tolerance or rounding quantum for comparing/merging θ values.

    • min_count (int | None): Minimum occurrences for ‘intersect’. Default is max(1, ceil(0.5 * total_blocks)), meaning θ must appear in about half of neighbor lists.

    • use_kde (bool): Default False. If True, fit KDE on aggregated θ to compute log-scores and normalized weights.

    • kde_bandwidth (float | None): Bandwidth for KDE. If None (default), use Scott’s rule.

Tip

Two aggregation modes are supported:

  • stack: Concatenate all kNN θ into a single array. Supports optional de-duplication of nearly identical θ values.

  • intersect: Keep θ values that occur in at least min_count neighbor blocks across all observations/design points (default ≈ half of all blocks).

Optional density weighting via KDE can be applied after aggregation to compute normalized posterior weights.

Returns:

A dictionary with keys:
  • ’mode’ (str): Always ‘knn’.

  • ’theta’ (ndarray): Posterior samples of shape (N, dθ); resampled if resample_n is provided.

  • ’weights’ (ndarray | None): None for stack/intersect, or a length-N array of KDE weights if use_kde=True.

  • ’meta’ (dict): Aggregation info; may include KDE bandwidth if density weighting is used.

Return type:

dict

_round_rows(A: numpy.ndarray, tol: float) tuple[numpy.ndarray, numpy.ndarray]

Round rows of A to multiples of tol and return (unique_rows, counts).

Parameters:
  • A (ndarray) – Input array to process.

  • tol (float) – Tolerance for rounding. If tol <= 0, exact matching is used.

Returns:

(unique_rows, counts) where unique_rows are the deduplicated rows

and counts are the occurrence counts.

Return type:

tuple

_kde_logweights(X, bw=0.5, n_max_exact=5000)

Compute KDE-based log-weights for posterior samples X.

Parameters:
  • X (ndarray) – Posterior samples, shape (n, d).

  • bw (float) – Bandwidth for Gaussian kernel. Default: 0.5.

  • n_max_exact (int) – Max n for exact pairwise KDE. Above this, fall back to sklearn.KernelDensity. Default: 5000.

Returns:

  • logp (ndarray): Log-density values at X, shape (n,).

  • w (ndarray): Normalized weights, shape (n,).

Return type:

tuple

get_posterior() Any

Return the last computed posterior dict; raises if calibrate() hasn’t been called.

class pyuncertainnumber.calibration.EpistemicFilter(xe_samples: numpy.typing.NDArray, discrepancy_scores: numpy.typing.NDArray = None, sets_of_discrepancy: list = None)

The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores.

Parameters:
  • xe_samples (NDArray) – Proposed Samples of epistemic parameters, shape (ne, n_dimensions). Typically samples from a bounded set of some epistemic parameters.

  • discrepancy_scores (NDArray, optional) – Discrepancy scores between the model simulations and the observation. Associated with each xe sample, shape (ne,). Defaults to None.

  • sets_of_discrepancy (list, optional) – List of sets of discrepancy scores for multiple datasets. Each element should be an NDArray of shape (ne,). Defaults to None.

Tip

For performance functions that output multiple responses, some aggregation of discrepancy scores may be used. Depending on the number of observation, either a single set of discrepancy scores or multiple sets can be provided.

convex hull with bounds

Convex hull with bounds illustration.

xe_samples
discrepancy_scores = None
sets_of_discrepancy
filter(threshold: float | list)

Filter the epistemic samples based on a discrepancy threshold.

Parameters:

threshold (float | list) – The discrepancy threshold(s) for filtering data points.

Returns:

  • the filtered xe samples;

  • hull;

  • lower bounds;

  • upper bounds of the bounding box, or (None, None) if unsuccessful.

Return type:

tuple

iterative_filter(thresholds: list) list

Iteratively filter the epistemic samples based on multiple thresholds.

Parameters:
  • thresholds (list) – List of discrepancy thresholds for filtering data points.

  • simulation_model (callable, optional) – A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None.

Note

thresholds are assumed to be sorted in ascending order.

iterative_filter_w_simulation_model(thresholds: list, re_sample: bool = False, simulation_model=None)

Iteratively filter the epistemic samples based on multiple thresholds.

Parameters:
  • thresholds (list) – List of discrepancy thresholds for filtering data points.

  • re_sample (bool, optional) – If True, re-evaluate discrepancy scores at each iteration using the simulation_model. Defaults to False.

  • simulation_model (callable, optional) – A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None.

Note

if re_sample is True, the simulation_model should be provided to re-evaluate discrepancy scores.

filter_on_sets(plot_hulls: bool = True)

Filter the epistemic samples based on multiple sets of discrepancy scores.

Parameters:

plot_hulls (bool, optional) – If True, plots the convex hulls of the first five sets. Defaults to True.

Returns:

  • lower bounds;

  • upper bounds of the intersected bounding box, or (None, None) if unsuccessful

Return type:

tuple

Note

self.sets_of_discrepancy must exist.

plot_hull_with_bounds(filtered_xe, hull=None, ax=None, show=True, x_title=None, y_title=None, hull_alpha=0.25)

Plot the convex hull and bounding box of the epistemic samples.

Parameters:
  • hull (ConvexHull, optional) – Precomputed convex hull. If None, it is computed.

  • ax (matplotlib Axes, optional) – Existing axes to draw on. If None, a new figure/axes is created.

  • show (bool, optional) – If True, calls plt.show() at the end. Defaults to True.

  • x_title (str, optional) – Label for the x-axis. Defaults to None.

  • y_title (str, optional) – Label for the y-axis. Defaults to None.

  • hull_alpha (float, optional) – Transparency of the hull surface. Defaults to 0.25.

Returns:

The axes containing the plot.

Return type:

ax (matplotlib Axes)