pyuncertainnumber.calibration¶

Submodules¶

Classes¶

`TMCMC`	Class for TMCMC implementation
`Stage`	Container for one TMCMC stage.
`MCMCCalibrator`	Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS).
`KNNCalibrator`	Unified kNN-based calibrator for black-box models or precomputed simulations.
`EpistemicFilter`	The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores.

Package Contents¶

class pyuncertainnumber.calibration.TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)¶

Class for TMCMC implementation

Parameters:

N (int) – number of particles to be sampled from posterior
parameters (list) – list of (size Nop) prior distributions instances
log_likelihood (callable) – log likelihood function on \(\theta\) to be defined which is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling

Returns:

returns trace file of all samples of all tmcmc stages.: At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

mytrace

Example

>>> from pyuncertainnumber.calibration.tmcmc import TMCMC
>>> from pyuncertainnumber import pba
>>> # create an instance of TMCMC class and run tmcmc
>>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions

>>> t = TMCMC(
>>>     N=1000,                                             # number of particles
>>>     parameters=parameters,                              # prior distributions
>>>     names =['k1', 'k2'],                                # parameter names
>>>     log_likelihood=log_likelihood_function,             # user defined log likelihood function
>>>     mutation_steps=1,                                   # number of MCMC steps for perturbation
>>>     status_file_name='tmcmc_progressing_status.txt',    # status log file
>>> )
>>> mytrace = t.run()

>>> # for loading trace file
>>> import pickle
>>> with open('tmcmc_trace.pkl', 'rb') as f:
>>>     mytrace = pickle.load(f)
>>> re = TMCMC(trace=mytrace, names=names)                  # create TMCMC instance with loaded trace and names

>>> re.plot_updated_distribution(save=False)                # plot prior and posterior distribution
>>> prior_samples = re.load_prior_samples()                 # load prior samples as DataFrame
>>> posterior_samples = re.load_posterior_samples()         # load posterior samples as DataFrame

2DOF TMCMC example — Example of TMCMC calibration of a 2-DOF system¶

N = None¶

parameters = None¶

names = None¶

log_likelihood = None¶

mutation_steps = 5¶

status_file_name = None¶

trace = None¶

check_names()¶

run() → list[Stage]¶

Run the TMCMC algorithm

Returns:: returns trace file of all samples of all tmcmc stages.
Return type:: mytrace

check_trace()¶

save_trace(mytrace, file_name: str, save_dir=None)¶

Save the trace to a file

Parameters:

mytrace – trace file of all samples of all tmcmc stages.
file_name (str) – name of the file to save the trace

load_prior_samples(trace=None) → pandas.DataFrame¶

load_posterior_samples(trace=None) → pandas.DataFrame¶

plot_updated_distribution(trace=None, save=False)¶

Plot the prior and posterior distribution of the parameters

Parameters:

trace – trace file of all samples of all tmcmc stages.
save (bool) – if True, save the figure

class pyuncertainnumber.calibration.Stage¶

Container for one TMCMC stage.

Parameters:

Sm (NDArray) – samples before resampling (N x Np)
Lm (NDArray) – log-likelihoods at Sm (N,)
Wm_n (NDArray) – normalized incremental weights (N,)
ESS (Optional[float]) – effective sample size the next stage (can be None in terminal stage)
beta (float) – tempering parameter next stage beta
Smcap (Optional[NDArray]) – resampled samples (N x Np), None in terminal stage

Sm: numpy.typing.NDArray¶

Lm: numpy.typing.NDArray¶

Wm_n: numpy.typing.NDArray¶

ESS: float | None¶

beta: float¶

Smcap: numpy.typing.NDArray | None¶

__repr__() → str¶

property samples: numpy.typing.NDArray¶: Samples before resampling (from previous stage).

property log_likelihoods: numpy.typing.NDArray¶: Log-likelihood evaluated at samples.

property weights: numpy.typing.NDArray¶: Normalized incremental importance weights.

property effective_sample_size: float | None¶: Effective sample size after reweighting the next stage.

property tempering_parameter: float¶: Tempering parameter beta next stage beta.

property resampled_samples: numpy.typing.NDArray | None¶: Samples after importance resampling (before MH mutation).

class pyuncertainnumber.calibration.MCMCCalibrator(n_chains: int = 4, n_samples: int = 1000, burn_in: int = 200)¶

Bases: Calibrator

Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS).

n_chains = 4¶

n_samples = 1000¶

burn_in = 200¶

_prior = None¶

_likelihood = None¶

_posterior_chain = None¶

setup(prior=None, likelihood=None, model=None)¶: Define priors and likelihood (or simulator-based likelihood).

calibrate(observations: Any, resample_n: int | None = None) → Any¶: Run MCMC to sample posterior given observations.

get_posterior() → Any¶: Return MCMC chain or posterior samples.

class pyuncertainnumber.calibration.KNNCalibrator(knn: int = 100, a_tol: float = 0.05, evaluate_model: bool = False)¶

Bases: Calibrator

Unified kNN-based calibrator for black-box models or precomputed simulations.

Parameters:

knn (int) – Number of neighbors per observed row. Default: 100.
a_tol (float) – Tolerance for matching simulated \(\xi\) to a requested \(\xi^*\) (when reusing). A simulation is kept if \(\|\xi_{\text{sim}} - \xi^*\|_\infty < a_{\text{tol}}\). Default: 0.05.
evaluate_model (bool) – If True, call the black-box model for each \(\xi\) in xi_list on a shared \(\theta\) grid. If False, reuse simulated_data (requires y/theta/xi arrays). Default: False.
random_state (int) – Seed for reproducibility (affects theta_sampler and resampling). Default: 42.

Note

Setup (unified approach):

If evaluate_model=False and simulated_data is provided:
- Reuse pre-computed simulations
- Build a per-design kNN index by filtering rows with \(\|\xi - \xi^*\|_\infty < a_{\text{tol}}\) for each \(\xi^*\) in xi_list
If evaluate_model=True:
- Simulate \(y = \text{model}(\theta, \xi)\) for each \(\xi\) in xi_list
- Use a shared \(\theta\) grid drawn once from theta_sampler(n_samples)
- Build per-design kNN indices on this shared grid

Calibration workflow (single/multi-design):

For each observation pair \((y_{\text{obs}}, \xi)\):

Standardize \(y_{\text{obs}}\) using the per-design scaler
Find k nearest neighbors in y-space
Map neighbor indices to \(\theta\) values for that design
Stack \(\theta\) samples across all observations/designs (or apply voting/intersection)

knn_illustration — KNN calibration illustration.¶

knn = 100¶

a_tol¶

evaluate_model = False¶

random_state = 42¶

_theta_grid: numpy.ndarray | None = None¶

_theta_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]¶

_y_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]¶

_scaler_by_xi: Dict[Tuple[float, Ellipsis], sklearn.preprocessing.StandardScaler]¶

_neigh_by_xi: Dict[Tuple[float, Ellipsis], sklearn.neighbors.NearestNeighbors]¶

_grid_idx_by_xi: Dict[Tuple[float, Ellipsis], numpy.ndarray]¶

_posterior: Dict[str, Any] | None = None¶

_sim_y: numpy.ndarray | None = None¶

_sim_theta: numpy.ndarray | None = None¶

_sim_xi: numpy.ndarray | None = None¶

static _key_from_xi(xi) → Tuple[float, Ellipsis]¶: Stable tuple key for a scalar/vector design ξ.

Prepare per-design kNN structures by either reusing simulated_data or simulating for each design.

Parameters:

model (callable, optional) – Black-box simulator with signature model(theta, xi) -> y (vectorized over theta).
theta_sampler (callable, optional) – Sampler for \(\theta\); required when evaluate_model=True.
simulated_data (dict, optional) – Dict with keys {“y”: (n, dy), “theta”: (n, dθ), “xi”: (n, dξ)} when reusing sims.
xi_list (list, optional) – List of designs; each item can be scalar or array-like. If None, defaults to [0.0].
n_samples (int) – Number of \(\theta\) samples to draw when evaluate_model=True. Default: 10000.

nearest(y: numpy.ndarray | List[float], xi: float | numpy.ndarray, k: int | None = None, return_dist: bool = False)¶

Return k nearest neighbors for y at design xi.

Parameters:

y (array-like) – Query outputs, shape (m, d_y) or (d_y,).
xi (scalar or array-like) – Design key to select the per-design index.
k (int, optional) – Number of neighbors; defaults to self.knn.
return_dist (bool) – If True, also return distances and raw indices. Default: False.

Returns:

Shape (m*k, dθ) stacked \(\theta\) for all query rows. distances (ndarray, optional): Returned if return_dist=True. indices (ndarray, optional): Returned if return_dist=True.

Return type:

theta_neighbors (ndarray)

calibrate(observations, resample_n: int | None = None, combine: str = 'stack', combine_params: dict | None = None)¶

Run kNN calibration and aggregate posterior θ across neighbor-hit blocks.

Parameters:

observations – Observed simulator or model outputs to calibrate against.
resample_n (int | None) – If set, resample posterior θ samples to this size. If None, return all aggregated θ without resampling.
combine (str) –
Aggregation mode. One of:
- ’stack’: concatenate all kNN θ; optional de-duplication.
- ’intersect’: retain θ hit at least min_count times across neighbor blocks.
combine_params (dict | None) –
Optional parameters controlling aggregation and KDE weighting.

Supported keys:
- dedup (bool): Default False. Remove duplicate θ (only for ‘stack’).
- theta_match_tol (float): Default 1e-9. Tolerance or rounding quantum for comparing/merging θ values.
- min_count (int | None): Minimum occurrences for ‘intersect’. Default is max(1, ceil(0.5 * total_blocks)), meaning θ must appear in about half of neighbor lists.
- use_kde (bool): Default False. If True, fit KDE on aggregated θ to compute log-scores and normalized weights.
- kde_bandwidth (float | None): Bandwidth for KDE. If None (default), use Scott’s rule.

Tip

Two aggregation modes are supported:

stack: Concatenate all kNN θ into a single array. Supports optional de-duplication of nearly identical θ values.
intersect: Keep θ values that occur in at least min_count neighbor blocks across all observations/design points (default ≈ half of all blocks).

Optional density weighting via KDE can be applied after aggregation to compute normalized posterior weights.

Returns:

A dictionary with keys:

’mode’ (str): Always ‘knn’.
’theta’ (ndarray): Posterior samples of shape (N, dθ); resampled if resample_n is provided.
’weights’ (ndarray | None): None for stack/intersect, or a length-N array of KDE weights if use_kde=True.
’meta’ (dict): Aggregation info; may include KDE bandwidth if density weighting is used.

Return type:

dict

_round_rows(A: numpy.ndarray, tol: float) → tuple[numpy.ndarray, numpy.ndarray]¶

Round rows of A to multiples of tol and return (unique_rows, counts).

Parameters:

A (ndarray) – Input array to process.
tol (float) – Tolerance for rounding. If tol <= 0, exact matching is used.

Returns:

(unique_rows, counts) where unique_rows are the deduplicated rows: and counts are the occurrence counts.

Return type:

tuple

_kde_logweights(X, bw=0.5, n_max_exact=5000)¶

Compute KDE-based log-weights for posterior samples X.

Parameters:

X (ndarray) – Posterior samples, shape (n, d).
bw (float) – Bandwidth for Gaussian kernel. Default: 0.5.
n_max_exact (int) – Max n for exact pairwise KDE. Above this, fall back to sklearn.KernelDensity. Default: 5000.

Returns:

logp (ndarray): Log-density values at X, shape (n,).
w (ndarray): Normalized weights, shape (n,).

Return type:

tuple

get_posterior() → Any¶: Return the last computed posterior dict; raises if calibrate() hasn’t been called.

class pyuncertainnumber.calibration.EpistemicFilter(xe_samples: numpy.typing.NDArray, discrepancy_scores: numpy.typing.NDArray = None, sets_of_discrepancy: list = None)¶

The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores.

Parameters:

xe_samples (NDArray) – Proposed Samples of epistemic parameters, shape (ne, n_dimensions). Typically samples from a bounded set of some epistemic parameters.
discrepancy_scores (NDArray, optional) – Discrepancy scores between the model simulations and the observation. Associated with each xe sample, shape (ne,). Defaults to None.
sets_of_discrepancy (list, optional) – List of sets of discrepancy scores for multiple datasets. Each element should be an NDArray of shape (ne,). Defaults to None.

Tip

For performance functions that output multiple responses, some aggregation of discrepancy scores may be used. Depending on the number of observation, either a single set of discrepancy scores or multiple sets can be provided.

convex hull with bounds — Convex hull with bounds illustration.¶

xe_samples¶

discrepancy_scores = None¶

sets_of_discrepancy¶

filter(threshold: float | list)¶

Filter the epistemic samples based on a discrepancy threshold.

Parameters:

threshold (float | list) – The discrepancy threshold(s) for filtering data points.

Returns:

the filtered xe samples;
hull;
lower bounds;
upper bounds of the bounding box, or (None, None) if unsuccessful.

Return type:

tuple

iterative_filter(thresholds: list) → list¶

Iteratively filter the epistemic samples based on multiple thresholds.

Parameters:

thresholds (list) – List of discrepancy thresholds for filtering data points.
simulation_model (callable, optional) – A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None.

Note

thresholds are assumed to be sorted in ascending order.

iterative_filter_w_simulation_model(thresholds: list, re_sample: bool = False, simulation_model=None)¶

Iteratively filter the epistemic samples based on multiple thresholds.

Parameters:

thresholds (list) – List of discrepancy thresholds for filtering data points.
re_sample (bool, optional) – If True, re-evaluate discrepancy scores at each iteration using the simulation_model. Defaults to False.
simulation_model (callable, optional) – A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None.

Note

if re_sample is True, the simulation_model should be provided to re-evaluate discrepancy scores.

filter_on_sets(plot_hulls: bool = True)¶

Filter the epistemic samples based on multiple sets of discrepancy scores.

Parameters:

plot_hulls (bool, optional) – If True, plots the convex hulls of the first five sets. Defaults to True.

Returns:

lower bounds;
upper bounds of the intersected bounding box, or (None, None) if unsuccessful

Return type:

tuple

Note

self.sets_of_discrepancy must exist.

plot_hull_with_bounds(filtered_xe, hull=None, ax=None, show=True, x_title=None, y_title=None, hull_alpha=0.25)¶

Plot the convex hull and bounding box of the epistemic samples.

Parameters:

hull (ConvexHull, optional) – Precomputed convex hull. If None, it is computed.
ax (matplotlib Axes, optional) – Existing axes to draw on. If None, a new figure/axes is created.
show (bool, optional) – If True, calls plt.show() at the end. Defaults to True.
x_title (str, optional) – Label for the x-axis. Defaults to None.
y_title (str, optional) – Label for the y-axis. Defaults to None.
hull_alpha (float, optional) – Transparency of the hull surface. Defaults to 0.25.

Returns:

The axes containing the plot.

Return type:

ax (matplotlib Axes)