pyuncertainnumber.calibration ============================= .. py:module:: pyuncertainnumber.calibration Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/pyuncertainnumber/calibration/calibration/index /autoapi/pyuncertainnumber/calibration/data_peeling/index /autoapi/pyuncertainnumber/calibration/epistemic_filter/index /autoapi/pyuncertainnumber/calibration/iterative_filtering/index /autoapi/pyuncertainnumber/calibration/knn/index /autoapi/pyuncertainnumber/calibration/mcmc/index /autoapi/pyuncertainnumber/calibration/pdfs/index /autoapi/pyuncertainnumber/calibration/tmcmc/index /autoapi/pyuncertainnumber/calibration/utils_bayesian/index Classes ------- .. autoapisummary:: pyuncertainnumber.calibration.TMCMC pyuncertainnumber.calibration.Stage pyuncertainnumber.calibration.MCMCCalibrator pyuncertainnumber.calibration.KNNCalibrator pyuncertainnumber.calibration.EpistemicFilter Package Contents ---------------- .. py:class:: TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None) Class for TMCMC implementation :param N: number of particles to be sampled from posterior :type N: int :param parameters: list of (size Nop) prior distributions instances :type parameters: list :param log_likelihood: log likelihood function on :math:`\theta` to be defined which is problem specific :type log_likelihood: callable :param status_file_name: name of the status file to store status of the tmcmc sampling :type status_file_name: str :returns: returns trace file of all samples of all tmcmc stages. At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap] :rtype: mytrace .. rubric:: Example >>> from pyuncertainnumber.calibration.tmcmc import TMCMC >>> from pyuncertainnumber import pba >>> # create an instance of TMCMC class and run tmcmc >>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions >>> t = TMCMC( >>> N=1000, # number of particles >>> parameters=parameters, # prior distributions >>> names =['k1', 'k2'], # parameter names >>> log_likelihood=log_likelihood_function, # user defined log likelihood function >>> mutation_steps=1, # number of MCMC steps for perturbation >>> status_file_name='tmcmc_progressing_status.txt', # status log file >>> ) >>> mytrace = t.run() >>> # for loading trace file >>> import pickle >>> with open('tmcmc_trace.pkl', 'rb') as f: >>> mytrace = pickle.load(f) >>> re = TMCMC(trace=mytrace, names=names) # create TMCMC instance with loaded trace and names >>> re.plot_updated_distribution(save=False) # plot prior and posterior distribution >>> prior_samples = re.load_prior_samples() # load prior samples as DataFrame >>> posterior_samples = re.load_posterior_samples() # load posterior samples as DataFrame .. figure:: /_static/tmcmc_2dof.png :alt: 2DOF TMCMC example :align: center :width: 70% Example of TMCMC calibration of a 2-DOF system .. py:attribute:: N :value: None .. py:attribute:: parameters :value: None .. py:attribute:: names :value: None .. py:attribute:: log_likelihood :value: None .. py:attribute:: mutation_steps :value: 5 .. py:attribute:: status_file_name :value: None .. py:attribute:: trace :value: None .. py:method:: check_names() .. py:method:: run() -> list[Stage] Run the TMCMC algorithm :returns: returns trace file of all samples of all tmcmc stages. :rtype: mytrace .. py:method:: check_trace() .. py:method:: save_trace(mytrace, file_name: str, save_dir=None) Save the trace to a file :param mytrace: trace file of all samples of all tmcmc stages. :param file_name: name of the file to save the trace :type file_name: str .. py:method:: load_prior_samples(trace=None) -> pandas.DataFrame .. py:method:: load_posterior_samples(trace=None) -> pandas.DataFrame .. py:method:: plot_updated_distribution(trace=None, save=False) Plot the prior and posterior distribution of the parameters :param trace: trace file of all samples of all tmcmc stages. :param save: if True, save the figure :type save: bool .. py:class:: Stage Container for one TMCMC stage. :param Sm: samples before resampling (N x Np) :type Sm: NDArray :param Lm: log-likelihoods at Sm (N,) :type Lm: NDArray :param Wm_n: normalized incremental weights (N,) :type Wm_n: NDArray :param ESS: effective sample size the next stage (can be None in terminal stage) :type ESS: Optional[float] :param beta: tempering parameter next stage beta :type beta: float :param Smcap: resampled samples (N x Np), None in terminal stage :type Smcap: Optional[NDArray] .. py:attribute:: Sm :type: numpy.typing.NDArray .. py:attribute:: Lm :type: numpy.typing.NDArray .. py:attribute:: Wm_n :type: numpy.typing.NDArray .. py:attribute:: ESS :type: Optional[float] .. py:attribute:: beta :type: float .. py:attribute:: Smcap :type: Optional[numpy.typing.NDArray] .. py:method:: __repr__() -> str .. py:property:: samples :type: numpy.typing.NDArray Samples before resampling (from previous stage). .. py:property:: log_likelihoods :type: numpy.typing.NDArray Log-likelihood evaluated at `samples`. .. py:property:: weights :type: numpy.typing.NDArray Normalized incremental importance weights. .. py:property:: effective_sample_size :type: Optional[float] Effective sample size after reweighting the next stage. .. py:property:: tempering_parameter :type: float Tempering parameter beta next stage beta. .. py:property:: resampled_samples :type: Optional[numpy.typing.NDArray] Samples after importance resampling (before MH mutation). .. py:class:: MCMCCalibrator(n_chains: int = 4, n_samples: int = 1000, burn_in: int = 200) Bases: :py:obj:`Calibrator` Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS). .. py:attribute:: n_chains :value: 4 .. py:attribute:: n_samples :value: 1000 .. py:attribute:: burn_in :value: 200 .. py:attribute:: _prior :value: None .. py:attribute:: _likelihood :value: None .. py:attribute:: _posterior_chain :value: None .. py:method:: setup(prior=None, likelihood=None, model=None) Define priors and likelihood (or simulator-based likelihood). .. py:method:: calibrate(observations: Any, resample_n: Optional[int] = None) -> Any Run MCMC to sample posterior given observations. .. py:method:: get_posterior() -> Any Return MCMC chain or posterior samples. .. py:class:: KNNCalibrator(knn: int = 100, a_tol: float = 0.05, evaluate_model: bool = False) Bases: :py:obj:`Calibrator` Unified kNN-based calibrator for black-box models or precomputed simulations. :param knn: Number of neighbors per observed row. Default: 100. :type knn: int :param a_tol: Tolerance for matching simulated :math:`\xi` to a requested :math:`\xi^*` (when reusing). A simulation is kept if :math:`\|\xi_{\text{sim}} - \xi^*\|_\infty < a_{\text{tol}}`. Default: 0.05. :type a_tol: float :param evaluate_model: If True, call the black-box model for each :math:`\xi` in ``xi_list`` on a shared :math:`\theta` grid. If False, reuse ``simulated_data`` (requires y/theta/xi arrays). Default: False. :type evaluate_model: bool :param random_state: Seed for reproducibility (affects theta_sampler and resampling). Default: 42. :type random_state: int .. note:: **Setup (unified approach)**: - If ``evaluate_model=False`` and ``simulated_data`` is provided: * Reuse pre-computed simulations * Build a per-design kNN index by filtering rows with :math:`\|\xi - \xi^*\|_\infty < a_{\text{tol}}` for each :math:`\xi^*` in ``xi_list`` - If ``evaluate_model=True``: * Simulate :math:`y = \text{model}(\theta, \xi)` for each :math:`\xi` in ``xi_list`` * Use a shared :math:`\theta` grid drawn once from ``theta_sampler(n_samples)`` * Build per-design kNN indices on this shared grid **Calibration workflow (single/multi-design)**: For each observation pair :math:`(y_{\text{obs}}, \xi)`: 1. Standardize :math:`y_{\text{obs}}` using the per-design scaler 2. Find k nearest neighbors in y-space 3. Map neighbor indices to :math:`\theta` values for that design 4. Stack :math:`\theta` samples across all observations/designs (or apply voting/intersection) .. figure:: /_static/knn_illustration.png :alt: knn_illustration :align: center :width: 50% KNN calibration illustration. .. py:attribute:: knn :value: 100 .. py:attribute:: a_tol .. py:attribute:: evaluate_model :value: False .. py:attribute:: random_state :value: 42 .. py:attribute:: _theta_grid :type: Optional[numpy.ndarray] :value: None .. py:attribute:: _theta_by_xi :type: Dict[Tuple[float, Ellipsis], numpy.ndarray] .. py:attribute:: _y_by_xi :type: Dict[Tuple[float, Ellipsis], numpy.ndarray] .. py:attribute:: _scaler_by_xi :type: Dict[Tuple[float, Ellipsis], sklearn.preprocessing.StandardScaler] .. py:attribute:: _neigh_by_xi :type: Dict[Tuple[float, Ellipsis], sklearn.neighbors.NearestNeighbors] .. py:attribute:: _grid_idx_by_xi :type: Dict[Tuple[float, Ellipsis], numpy.ndarray] .. py:attribute:: _posterior :type: Optional[Dict[str, Any]] :value: None .. py:attribute:: _sim_y :type: Optional[numpy.ndarray] :value: None .. py:attribute:: _sim_theta :type: Optional[numpy.ndarray] :value: None .. py:attribute:: _sim_xi :type: Optional[numpy.ndarray] :value: None .. py:method:: _key_from_xi(xi) -> Tuple[float, Ellipsis] :staticmethod: Stable tuple key for a scalar/vector design ξ. .. py:method:: setup(model: Optional[Callable[[numpy.ndarray, Union[float, numpy.ndarray]], numpy.ndarray]] = None, theta_sampler: Optional[Callable[[int], numpy.ndarray]] = None, simulated_data: Optional[Dict[str, numpy.ndarray]] = None, xi_list: Optional[List[Union[float, numpy.ndarray]]] = None, n_samples: int = 10000) Prepare per-design kNN structures by either reusing simulated_data or simulating for each design. :param model: Black-box simulator with signature ``model(theta, xi) -> y`` (vectorized over theta). :type model: callable, optional :param theta_sampler: Sampler for :math:`\theta`; required when ``evaluate_model=True``. :type theta_sampler: callable, optional :param simulated_data: Dict with keys {"y": (n, dy), "theta": (n, dθ), "xi": (n, dξ)} when reusing sims. :type simulated_data: dict, optional :param xi_list: List of designs; each item can be scalar or array-like. If None, defaults to [0.0]. :type xi_list: list, optional :param n_samples: Number of :math:`\theta` samples to draw when ``evaluate_model=True``. Default: 10000. :type n_samples: int .. py:method:: nearest(y: Union[numpy.ndarray, List[float]], xi: Union[float, numpy.ndarray], k: Optional[int] = None, return_dist: bool = False) Return k nearest neighbors for y at design xi. :param y: Query outputs, shape (m, d_y) or (d_y,). :type y: array-like :param xi: Design key to select the per-design index. :type xi: scalar or array-like :param k: Number of neighbors; defaults to ``self.knn``. :type k: int, optional :param return_dist: If True, also return distances and raw indices. Default: False. :type return_dist: bool :returns: Shape (m*k, dθ) stacked :math:`\theta` for all query rows. distances (ndarray, optional): Returned if ``return_dist=True``. indices (ndarray, optional): Returned if ``return_dist=True``. :rtype: theta_neighbors (ndarray) .. py:method:: calibrate(observations, resample_n: int | None = None, combine: str = 'stack', combine_params: dict | None = None) Run kNN calibration and aggregate posterior θ across neighbor-hit blocks. :param observations: Observed simulator or model outputs to calibrate against. :param resample_n: If set, resample posterior θ samples to this size. If `None`, return all aggregated θ without resampling. :type resample_n: int | None :param combine: Aggregation mode. One of: - **'stack'**: concatenate all kNN θ; optional de-duplication. - **'intersect'**: retain θ hit at least `min_count` times across neighbor blocks. :type combine: str :param combine_params: Optional parameters controlling aggregation and KDE weighting. Supported keys: - **dedup** (bool): Default `False`. Remove duplicate θ (only for 'stack'). - **theta_match_tol** (float): Default `1e-9`. Tolerance or rounding quantum for comparing/merging θ values. - **min_count** (int | None): Minimum occurrences for 'intersect'. Default is `max(1, ceil(0.5 * total_blocks))`, meaning θ must appear in about half of neighbor lists. - **use_kde** (bool): Default `False`. If `True`, fit KDE on aggregated θ to compute log-scores and normalized weights. - **kde_bandwidth** (float | None): Bandwidth for KDE. If `None` (default), use Scott's rule. :type combine_params: dict | None .. tip:: Two aggregation modes are supported: - **stack**: Concatenate all kNN θ into a single array. Supports optional de-duplication of nearly identical θ values. - **intersect**: Keep θ values that occur in at least `min_count` neighbor blocks across all observations/design points (default ≈ half of all blocks). Optional density weighting via KDE can be applied after aggregation to compute normalized posterior weights. :returns: A dictionary with keys: - **'mode'** (str): Always `'knn'`. - **'theta'** (ndarray): Posterior samples of shape `(N, dθ)`; resampled if `resample_n` is provided. - **'weights'** (ndarray | None): `None` for stack/intersect, or a length-`N` array of KDE weights if `use_kde=True`. - **'meta'** (dict): Aggregation info; may include KDE bandwidth if density weighting is used. :rtype: dict .. py:method:: _round_rows(A: numpy.ndarray, tol: float) -> tuple[numpy.ndarray, numpy.ndarray] Round rows of A to multiples of tol and return (unique_rows, counts). :param A: Input array to process. :type A: ndarray :param tol: Tolerance for rounding. If tol <= 0, exact matching is used. :type tol: float :returns: (unique_rows, counts) where unique_rows are the deduplicated rows and counts are the occurrence counts. :rtype: tuple .. py:method:: _kde_logweights(X, bw=0.5, n_max_exact=5000) Compute KDE-based log-weights for posterior samples X. :param X: Posterior samples, shape (n, d). :type X: ndarray :param bw: Bandwidth for Gaussian kernel. Default: 0.5. :type bw: float :param n_max_exact: Max n for exact pairwise KDE. Above this, fall back to sklearn.KernelDensity. Default: 5000. :type n_max_exact: int :returns: - **logp** (ndarray): Log-density values at X, shape (n,). - **w** (ndarray): Normalized weights, shape (n,). :rtype: tuple .. py:method:: get_posterior() -> Any Return the last computed posterior dict; raises if calibrate() hasn't been called. .. py:class:: EpistemicFilter(xe_samples: numpy.typing.NDArray, discrepancy_scores: numpy.typing.NDArray = None, sets_of_discrepancy: list = None) The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores. :param xe_samples: Proposed Samples of epistemic parameters, shape (ne, n_dimensions). Typically samples from a bounded set of some epistemic parameters. :type xe_samples: NDArray :param discrepancy_scores: Discrepancy scores between the model simulations and the observation. Associated with each xe sample, shape (ne,). Defaults to None. :type discrepancy_scores: NDArray, optional :param sets_of_discrepancy: List of sets of discrepancy scores for multiple datasets. Each element should be an NDArray of shape (ne,). Defaults to None. :type sets_of_discrepancy: list, optional .. tip:: For performance functions that output multiple responses, some aggregation of discrepancy scores may be used. Depending on the number of observation, either a single set of discrepancy scores or multiple sets can be provided. .. figure:: /_static/convex_hull.png :alt: convex hull with bounds :align: center :width: 50% Convex hull with bounds illustration. .. py:attribute:: xe_samples .. py:attribute:: discrepancy_scores :value: None .. py:attribute:: sets_of_discrepancy .. py:method:: filter(threshold: float | list) Filter the epistemic samples based on a discrepancy threshold. :param threshold: The discrepancy threshold(s) for filtering data points. :type threshold: float | list :returns: - the filtered xe samples; - hull; - lower bounds; - upper bounds of the bounding box, or (None, None) if unsuccessful. :rtype: tuple .. py:method:: iterative_filter(thresholds: list) -> list Iteratively filter the epistemic samples based on multiple thresholds. :param thresholds: List of discrepancy thresholds for filtering data points. :type thresholds: list :param simulation_model: A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None. :type simulation_model: callable, optional .. note:: thresholds are assumed to be sorted in ascending order. .. py:method:: iterative_filter_w_simulation_model(thresholds: list, re_sample: bool = False, simulation_model=None) Iteratively filter the epistemic samples based on multiple thresholds. :param thresholds: List of discrepancy thresholds for filtering data points. :type thresholds: list :param re_sample: If True, re-evaluate discrepancy scores at each iteration using the simulation_model. Defaults to False. :type re_sample: bool, optional :param simulation_model: A function that takes xe_samples as input and returns discrepancy scores. Required if re_sample is True. Defaults to None. :type simulation_model: callable, optional .. note:: if re_sample is True, the simulation_model should be provided to re-evaluate discrepancy scores. .. py:method:: filter_on_sets(plot_hulls: bool = True) Filter the epistemic samples based on multiple sets of discrepancy scores. :param plot_hulls: If True, plots the convex hulls of the first five sets. Defaults to True. :type plot_hulls: bool, optional :returns: - lower bounds; - upper bounds of the intersected bounding box, or (None, None) if unsuccessful :rtype: tuple .. note:: `self.sets_of_discrepancy` must exist. .. py:method:: plot_hull_with_bounds(filtered_xe, hull=None, ax=None, show=True, x_title=None, y_title=None, hull_alpha=0.25) Plot the convex hull and bounding box of the epistemic samples. :param hull: Precomputed convex hull. If None, it is computed. :type hull: ConvexHull, optional :param ax: Existing axes to draw on. If None, a new figure/axes is created. :type ax: matplotlib Axes, optional :param show: If True, calls plt.show() at the end. Defaults to True. :type show: bool, optional :param x_title: Label for the x-axis. Defaults to None. :type x_title: str, optional :param y_title: Label for the y-axis. Defaults to None. :type y_title: str, optional :param hull_alpha: Transparency of the hull surface. Defaults to 0.25. :type hull_alpha: float, optional :returns: The axes containing the plot. :rtype: ax (matplotlib Axes)