pyuncertainnumber.calibration
=============================

.. py:module:: pyuncertainnumber.calibration


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/pyuncertainnumber/calibration/calibration/index
   /autoapi/pyuncertainnumber/calibration/data_peeling/index
   /autoapi/pyuncertainnumber/calibration/epistemic_filter/index
   /autoapi/pyuncertainnumber/calibration/iterative_filtering/index
   /autoapi/pyuncertainnumber/calibration/knn/index
   /autoapi/pyuncertainnumber/calibration/mcmc/index
   /autoapi/pyuncertainnumber/calibration/pdfs/index
   /autoapi/pyuncertainnumber/calibration/tmcmc/index
   /autoapi/pyuncertainnumber/calibration/utils_bayesian/index


Classes
-------

.. autoapisummary::

   pyuncertainnumber.calibration.TMCMC
   pyuncertainnumber.calibration.Stage
   pyuncertainnumber.calibration.MCMCCalibrator
   pyuncertainnumber.calibration.KNNCalibrator
   pyuncertainnumber.calibration.EpistemicFilter


Package Contents
----------------

.. py:class:: TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)

   Class for TMCMC implementation

   :param N: number of particles to be sampled from posterior
   :type N: int
   :param parameters: list of (size Nop) prior distributions instances
   :type parameters: list
   :param log_likelihood: log likelihood function on :math:`\theta` to be defined which is problem specific
   :type log_likelihood: callable
   :param status_file_name: name of the status file to store status of the tmcmc sampling
   :type status_file_name: str

   :returns:

             returns trace file of all samples of all tmcmc stages.
                 At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]
   :rtype: mytrace

   .. rubric:: Example

   >>> from pyuncertainnumber.calibration.tmcmc import TMCMC
   >>> from pyuncertainnumber import pba
   >>> # create an instance of TMCMC class and run tmcmc
   >>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions

   >>> t = TMCMC(
   >>>     N=1000,                                             # number of particles
   >>>     parameters=parameters,                              # prior distributions
   >>>     names =['k1', 'k2'],                                # parameter names
   >>>     log_likelihood=log_likelihood_function,             # user defined log likelihood function
   >>>     mutation_steps=1,                                   # number of MCMC steps for perturbation
   >>>     status_file_name='tmcmc_progressing_status.txt',    # status log file
   >>> )
   >>> mytrace = t.run()

   >>> # for loading trace file
   >>> import pickle
   >>> with open('tmcmc_trace.pkl', 'rb') as f:
   >>>     mytrace = pickle.load(f)
   >>> re = TMCMC(trace=mytrace, names=names)                  # create TMCMC instance with loaded trace and names

   >>> re.plot_updated_distribution(save=False)                # plot prior and posterior distribution
   >>> prior_samples = re.load_prior_samples()                 # load prior samples as DataFrame
   >>> posterior_samples = re.load_posterior_samples()         # load posterior samples as DataFrame

   .. figure:: /_static/tmcmc_2dof.png
       :alt: 2DOF TMCMC example
       :align: center
       :width: 70%

       Example of TMCMC calibration of a 2-DOF system


   .. py:attribute:: N
      :value: None


   .. py:attribute:: parameters
      :value: None


   .. py:attribute:: names
      :value: None


   .. py:attribute:: log_likelihood
      :value: None


   .. py:attribute:: mutation_steps
      :value: 5


   .. py:attribute:: status_file_name
      :value: None


   .. py:attribute:: trace
      :value: None


   .. py:method:: check_names()


   .. py:method:: run() -> list[Stage]

      Run the TMCMC algorithm

      :returns: returns trace file of all samples of all tmcmc stages.
      :rtype: mytrace


   .. py:method:: check_trace()


   .. py:method:: save_trace(mytrace, file_name: str, save_dir=None)

      Save the trace to a file

      :param mytrace: trace file of all samples of all tmcmc stages.
      :param file_name: name of the file to save the trace
      :type file_name: str


   .. py:method:: load_prior_samples(trace=None) -> pandas.DataFrame


   .. py:method:: load_posterior_samples(trace=None) -> pandas.DataFrame


   .. py:method:: plot_updated_distribution(trace=None, save=False)

      Plot the prior and posterior distribution of the parameters

      :param trace: trace file of all samples of all tmcmc stages.
      :param save: if True, save the figure
      :type save: bool


.. py:class:: Stage

   Container for one TMCMC stage.

   :param Sm: samples before resampling (N x Np)
   :type Sm: NDArray
   :param Lm: log-likelihoods at Sm (N,)
   :type Lm: NDArray
   :param Wm_n: normalized incremental weights (N,)
   :type Wm_n: NDArray
   :param ESS: effective sample size the next stage (can be None in terminal stage)
   :type ESS: Optional[float]
   :param beta: tempering parameter next stage beta
   :type beta: float
   :param Smcap: resampled samples (N x Np), None in terminal stage
   :type Smcap: Optional[NDArray]


   .. py:attribute:: Sm
      :type:  numpy.typing.NDArray


   .. py:attribute:: Lm
      :type:  numpy.typing.NDArray


   .. py:attribute:: Wm_n
      :type:  numpy.typing.NDArray


   .. py:attribute:: ESS
      :type:  Optional[float]


   .. py:attribute:: beta
      :type:  float


   .. py:attribute:: Smcap
      :type:  Optional[numpy.typing.NDArray]


   .. py:method:: __repr__() -> str


   .. py:property:: samples
      :type: numpy.typing.NDArray


      Samples before resampling (from previous stage).


   .. py:property:: log_likelihoods
      :type: numpy.typing.NDArray


      Log-likelihood evaluated at `samples`.


   .. py:property:: weights
      :type: numpy.typing.NDArray


      Normalized incremental importance weights.


   .. py:property:: effective_sample_size
      :type: Optional[float]


      Effective sample size after reweighting the next stage.


   .. py:property:: tempering_parameter
      :type: float


      Tempering parameter beta next stage beta.


   .. py:property:: resampled_samples
      :type: Optional[numpy.typing.NDArray]


      Samples after importance resampling (before MH mutation).


.. py:class:: MCMCCalibrator(n_chains: int = 4, n_samples: int = 1000, burn_in: int = 200)

   Bases: :py:obj:`Calibrator`


   Calibration via Bayesian MCMC (e.g. Metropolis-Hastings, HMC, NUTS).


   .. py:attribute:: n_chains
      :value: 4


   .. py:attribute:: n_samples
      :value: 1000


   .. py:attribute:: burn_in
      :value: 200


   .. py:attribute:: _prior
      :value: None


   .. py:attribute:: _likelihood
      :value: None


   .. py:attribute:: _posterior_chain
      :value: None


   .. py:method:: setup(prior=None, likelihood=None, model=None)

      Define priors and likelihood (or simulator-based likelihood).


   .. py:method:: calibrate(observations: Any, resample_n: Optional[int] = None) -> Any

      Run MCMC to sample posterior given observations.


   .. py:method:: get_posterior() -> Any

      Return MCMC chain or posterior samples.


.. py:class:: KNNCalibrator(knn: int = 100, a_tol: float = 0.05, evaluate_model: bool = False)

   Bases: :py:obj:`Calibrator`


   Unified kNN-based calibrator for black-box models or precomputed simulations.

   :param knn: Number of neighbors per observed row. Default: 100.
   :type knn: int
   :param a_tol: Tolerance for matching simulated :math:`\xi` to a requested :math:`\xi^*`
                 (when reusing). A simulation is kept if :math:`\|\xi_{\text{sim}} - \xi^*\|_\infty < a_{\text{tol}}`.
                 Default: 0.05.
   :type a_tol: float
   :param evaluate_model: If True, call the black-box model for each :math:`\xi` in ``xi_list``
                          on a shared :math:`\theta` grid. If False, reuse ``simulated_data`` (requires y/theta/xi arrays).
                          Default: False.
   :type evaluate_model: bool
   :param random_state: Seed for reproducibility (affects theta_sampler and resampling). Default: 42.
   :type random_state: int

   .. note::

      **Setup (unified approach)**:
      
      - If ``evaluate_model=False`` and ``simulated_data`` is provided:
      
        * Reuse pre-computed simulations
        * Build a per-design kNN index by filtering rows with :math:`\|\xi - \xi^*\|_\infty < a_{\text{tol}}`
          for each :math:`\xi^*` in ``xi_list``
      
      - If ``evaluate_model=True``:
      
        * Simulate :math:`y = \text{model}(\theta, \xi)` for each :math:`\xi` in ``xi_list``
        * Use a shared :math:`\theta` grid drawn once from ``theta_sampler(n_samples)``
        * Build per-design kNN indices on this shared grid
      
      **Calibration workflow (single/multi-design)**:
      
      For each observation pair :math:`(y_{\text{obs}}, \xi)`:
      
      1. Standardize :math:`y_{\text{obs}}` using the per-design scaler
      2. Find k nearest neighbors in y-space
      3. Map neighbor indices to :math:`\theta` values for that design
      4. Stack :math:`\theta` samples across all observations/designs (or apply voting/intersection)

   .. figure:: /_static/knn_illustration.png
       :alt: knn_illustration
       :align: center
       :width: 50%

       KNN calibration illustration.


   .. py:attribute:: knn
      :value: 100


   .. py:attribute:: a_tol


   .. py:attribute:: evaluate_model
      :value: False


   .. py:attribute:: random_state
      :value: 42


   .. py:attribute:: _theta_grid
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:attribute:: _theta_by_xi
      :type:  Dict[Tuple[float, Ellipsis], numpy.ndarray]


   .. py:attribute:: _y_by_xi
      :type:  Dict[Tuple[float, Ellipsis], numpy.ndarray]


   .. py:attribute:: _scaler_by_xi
      :type:  Dict[Tuple[float, Ellipsis], sklearn.preprocessing.StandardScaler]


   .. py:attribute:: _neigh_by_xi
      :type:  Dict[Tuple[float, Ellipsis], sklearn.neighbors.NearestNeighbors]


   .. py:attribute:: _grid_idx_by_xi
      :type:  Dict[Tuple[float, Ellipsis], numpy.ndarray]


   .. py:attribute:: _posterior
      :type:  Optional[Dict[str, Any]]
      :value: None


   .. py:attribute:: _sim_y
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:attribute:: _sim_theta
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:attribute:: _sim_xi
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:method:: _key_from_xi(xi) -> Tuple[float, Ellipsis]
      :staticmethod:


      Stable tuple key for a scalar/vector design ξ.


   .. py:method:: setup(model: Optional[Callable[[numpy.ndarray, Union[float, numpy.ndarray]], numpy.ndarray]] = None, theta_sampler: Optional[Callable[[int], numpy.ndarray]] = None, simulated_data: Optional[Dict[str, numpy.ndarray]] = None, xi_list: Optional[List[Union[float, numpy.ndarray]]] = None, n_samples: int = 10000)

      Prepare per-design kNN structures by either reusing simulated_data or simulating for each design.

      :param model: Black-box simulator with signature ``model(theta, xi) -> y``
                    (vectorized over theta).
      :type model: callable, optional
      :param theta_sampler: Sampler for :math:`\theta`; required when ``evaluate_model=True``.
      :type theta_sampler: callable, optional
      :param simulated_data: Dict with keys {"y": (n, dy), "theta": (n, dθ), "xi": (n, dξ)}
                             when reusing sims.
      :type simulated_data: dict, optional
      :param xi_list: List of designs; each item can be scalar or array-like.
                      If None, defaults to [0.0].
      :type xi_list: list, optional
      :param n_samples: Number of :math:`\theta` samples to draw when ``evaluate_model=True``. Default: 10000.
      :type n_samples: int


   .. py:method:: nearest(y: Union[numpy.ndarray, List[float]], xi: Union[float, numpy.ndarray], k: Optional[int] = None, return_dist: bool = False)

      Return k nearest neighbors for y at design xi.

      :param y: Query outputs, shape (m, d_y) or (d_y,).
      :type y: array-like
      :param xi: Design key to select the per-design index.
      :type xi: scalar or array-like
      :param k: Number of neighbors; defaults to ``self.knn``.
      :type k: int, optional
      :param return_dist: If True, also return distances and raw indices. Default: False.
      :type return_dist: bool

      :returns: Shape (m*k, dθ) stacked :math:`\theta` for all query rows.
                distances (ndarray, optional): Returned if ``return_dist=True``.
                indices (ndarray, optional): Returned if ``return_dist=True``.
      :rtype: theta_neighbors (ndarray)


   .. py:method:: calibrate(observations, resample_n: int | None = None, combine: str = 'stack', combine_params: dict | None = None)

      Run kNN calibration and aggregate posterior θ across neighbor-hit blocks.

      :param observations: Observed simulator or model outputs to calibrate against.
      :param resample_n: If set, resample posterior θ samples to this size.
                         If `None`, return all aggregated θ without resampling.
      :type resample_n: int | None
      :param combine: Aggregation mode. One of:

                      - **'stack'**: concatenate all kNN θ; optional de-duplication.

                      - **'intersect'**: retain θ hit at least `min_count` times across neighbor blocks.
      :type combine: str
      :param combine_params: Optional parameters controlling aggregation and KDE weighting.

                             Supported keys:

                             - **dedup** (bool): Default `False`. Remove duplicate θ (only for 'stack').

                             - **theta_match_tol** (float): Default `1e-9`. Tolerance or rounding quantum for comparing/merging θ values.

                             - **min_count** (int | None): Minimum occurrences for 'intersect'. Default is `max(1, ceil(0.5 * total_blocks))`, meaning θ must appear in about half of neighbor lists.

                             - **use_kde** (bool): Default `False`. If `True`, fit KDE on aggregated θ to compute log-scores and normalized weights.

                             - **kde_bandwidth** (float | None): Bandwidth for KDE. If `None` (default), use Scott's rule.
      :type combine_params: dict | None

      .. tip::

         Two aggregation modes are supported:
         
         - **stack**: Concatenate all kNN θ into a single array. Supports optional de-duplication of nearly identical θ values.
         
         - **intersect**: Keep θ values that occur in at least `min_count` neighbor blocks across all observations/design points (default ≈ half of all blocks).
         
         Optional density weighting via KDE can be applied after aggregation to compute normalized posterior weights.

      :returns:

                A dictionary with keys:
                    - **'mode'** (str): Always `'knn'`.
                    - **'theta'** (ndarray): Posterior samples of shape `(N, dθ)`;
                      resampled if `resample_n` is provided.
                    - **'weights'** (ndarray | None): `None` for stack/intersect,
                      or a length-`N` array of KDE weights if `use_kde=True`.
                    - **'meta'** (dict): Aggregation info; may include KDE bandwidth
                      if density weighting is used.
      :rtype: dict


   .. py:method:: _round_rows(A: numpy.ndarray, tol: float) -> tuple[numpy.ndarray, numpy.ndarray]

      Round rows of A to multiples of tol and return (unique_rows, counts).

      :param A: Input array to process.
      :type A: ndarray
      :param tol: Tolerance for rounding. If tol <= 0, exact matching is used.
      :type tol: float

      :returns:

                (unique_rows, counts) where unique_rows are the deduplicated rows
                    and counts are the occurrence counts.
      :rtype: tuple


   .. py:method:: _kde_logweights(X, bw=0.5, n_max_exact=5000)

      Compute KDE-based log-weights for posterior samples X.

      :param X: Posterior samples, shape (n, d).
      :type X: ndarray
      :param bw: Bandwidth for Gaussian kernel. Default: 0.5.
      :type bw: float
      :param n_max_exact: Max n for exact pairwise KDE. Above this, fall back to
                          sklearn.KernelDensity. Default: 5000.
      :type n_max_exact: int

      :returns:     - **logp** (ndarray): Log-density values at X, shape (n,).
                    - **w** (ndarray): Normalized weights, shape (n,).
      :rtype: tuple


   .. py:method:: get_posterior() -> Any

      Return the last computed posterior dict; raises if calibrate() hasn't been called.


.. py:class:: EpistemicFilter(xe_samples: numpy.typing.NDArray, discrepancy_scores: numpy.typing.NDArray = None, sets_of_discrepancy: list = None)

   The EpistemicFilter method to reduce the epistemic uncertainty space based on discrepancy scores.

   :param xe_samples: Proposed Samples of epistemic parameters, shape (ne, n_dimensions).
                      Typically samples from a bounded set of some epistemic parameters.
   :type xe_samples: NDArray
   :param discrepancy_scores: Discrepancy scores between the model simulations and the observation.
                              Associated with each xe sample, shape (ne,). Defaults to None.
   :type discrepancy_scores: NDArray, optional
   :param sets_of_discrepancy: List of sets of discrepancy scores for multiple datasets.
                               Each element should be an NDArray of shape (ne,). Defaults to None.
   :type sets_of_discrepancy: list, optional

   .. tip::

      For performance functions that output multiple responses, some aggregation of discrepancy scores may be used.
      Depending on the number of observation, either a single set of discrepancy scores or multiple sets can be provided.

   .. figure:: /_static/convex_hull.png
       :alt: convex hull with bounds
       :align: center
       :width: 50%

       Convex hull with bounds illustration.


   .. py:attribute:: xe_samples


   .. py:attribute:: discrepancy_scores
      :value: None


   .. py:attribute:: sets_of_discrepancy


   .. py:method:: filter(threshold: float | list)

      Filter the epistemic samples based on a discrepancy threshold.

      :param threshold: The discrepancy threshold(s) for filtering data points.
      :type threshold: float | list

      :returns:     - the filtered xe samples;
                    - hull;
                    - lower bounds;
                    - upper bounds of the bounding box, or (None, None) if unsuccessful.
      :rtype: tuple


   .. py:method:: iterative_filter(thresholds: list) -> list

      Iteratively filter the epistemic samples based on multiple thresholds.

      :param thresholds: List of discrepancy thresholds for filtering data points.
      :type thresholds: list
      :param simulation_model: A function that takes xe_samples as input and returns discrepancy scores.
                               Required if re_sample is True. Defaults to None.
      :type simulation_model: callable, optional

      .. note:: thresholds are assumed to be sorted in ascending order.


   .. py:method:: iterative_filter_w_simulation_model(thresholds: list, re_sample: bool = False, simulation_model=None)

      Iteratively filter the epistemic samples based on multiple thresholds.

      :param thresholds: List of discrepancy thresholds for filtering data points.
      :type thresholds: list
      :param re_sample: If True, re-evaluate discrepancy scores at each iteration using the simulation_model.
                        Defaults to False.
      :type re_sample: bool, optional
      :param simulation_model: A function that takes xe_samples as input and returns discrepancy scores.
                               Required if re_sample is True. Defaults to None.
      :type simulation_model: callable, optional

      .. note:: if re_sample is True, the simulation_model should be provided to re-evaluate discrepancy scores.


   .. py:method:: filter_on_sets(plot_hulls: bool = True)

      Filter the epistemic samples based on multiple sets of discrepancy scores.

      :param plot_hulls: If True, plots the convex hulls of the first five sets. Defaults to True.
      :type plot_hulls: bool, optional

      :returns:     - lower bounds;
                    - upper bounds of the intersected bounding box, or (None, None) if unsuccessful
      :rtype: tuple

      .. note:: `self.sets_of_discrepancy` must exist.


   .. py:method:: plot_hull_with_bounds(filtered_xe, hull=None, ax=None, show=True, x_title=None, y_title=None, hull_alpha=0.25)

      Plot the convex hull and bounding box of the epistemic samples.

      :param hull: Precomputed convex hull. If None, it is computed.
      :type hull: ConvexHull, optional
      :param ax: Existing axes to draw on. If None, a new figure/axes is created.
      :type ax: matplotlib Axes, optional
      :param show: If True, calls plt.show() at the end. Defaults to True.
      :type show: bool, optional
      :param x_title: Label for the x-axis. Defaults to None.
      :type x_title: str, optional
      :param y_title: Label for the y-axis. Defaults to None.
      :type y_title: str, optional
      :param hull_alpha: Transparency of the hull surface. Defaults to 0.25.
      :type hull_alpha: float, optional

      :returns: The axes containing the plot.
      :rtype: ax (matplotlib Axes)