pyuncertainnumber.calibration.tmcmc

Classes

TMCMC

Class for TMCMC implementation

Stage

Container for one TMCMC stage.

Functions

recover_trace_samples(→ pandas.DataFrame)

Load back the Prior and Posterior samples in the form of a DataFrame

plot_updated_distribution(mytrace, names[, save])

Plot the prior and posterior distribution of the parameters

initial_population(→ numpy.typing.NDArray)

Generates initial population from prior distribution

log_prior(→ float)

Computes log_prior value at all particles

compute_beta_update_evidence(beta, log_likelihoods, ...)

Computes beta for the next stage and updated model evidence

propose(→ numpy.typing.NDArray)

Proposal distribution for MCMC in pertubation stage

MCMC_MH(→ tuple[numpy.typing.NDArray, float, float, int])

Markov chain Monte Carlo using Metropolis-Hastings which "perturbs" each particle using MCMC-MH

run_tmcmc_vec()

vectorised version of run_tmcmc to be implemented later

run_tmcmc_updated(→ list[Stage])

Updated main workflow of running Transitional MCMC

run_tmcmc(N, all_pars, log_likelihood, status_file_name)

Main workflow of running Transitional MCMC

log_likelihood(→ numpy.ndarray)

Picklable function at module level.

gaussian_likelihood_fun(params, data, simulator, ...)

Computes the Gaussian log-likelihood of the data given the model and parameters.

get_top_samples(trace[, top_num])

Get the top posterior samples based on likelihood values.

get_posterior_samples(→ numpy.typing.NDArray)

Get the posterior particles of the last stage from the trace.

hdi_1d(→ tuple[float, float])

Obtain the highest density interval (HDI) from particles

get_hdi_bounds() → pandas.DataFrame)

Compute HDI bounds for each column in data

filter_hdi_bounds(→ pandas.DataFrame)

Extract HDI bounds for a given credibility level.

transform_old_trace_to_new(→ list[Stage])

Transform old trace format to new Stage class format.

Module Contents

class pyuncertainnumber.calibration.tmcmc.TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)

Class for TMCMC implementation

Parameters:
  • N (int) – number of particles to be sampled from posterior

  • parameters (list) – list of (size Nop) prior distributions instances

  • log_likelihood (callable) – log likelihood function on \(\theta\) to be defined which is problem specific

  • status_file_name (str) – name of the status file to store status of the tmcmc sampling

Returns:

returns trace file of all samples of all tmcmc stages.

At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

mytrace

Example

>>> from pyuncertainnumber.calibration.tmcmc import TMCMC
>>> from pyuncertainnumber import pba
>>> # create an instance of TMCMC class and run tmcmc
>>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions
>>> t = TMCMC(
>>>     N=1000,                                             # number of particles
>>>     parameters=parameters,                              # prior distributions
>>>     names =['k1', 'k2'],                                # parameter names
>>>     log_likelihood=log_likelihood_function,             # user defined log likelihood function
>>>     mutation_steps=1,                                   # number of MCMC steps for perturbation
>>>     status_file_name='tmcmc_progressing_status.txt',    # status log file
>>> )
>>> mytrace = t.run()
>>> # for loading trace file
>>> import pickle
>>> with open('tmcmc_trace.pkl', 'rb') as f:
>>>     mytrace = pickle.load(f)
>>> re = TMCMC(trace=mytrace, names=names)                  # create TMCMC instance with loaded trace and names
>>> re.plot_updated_distribution(save=False)                # plot prior and posterior distribution
>>> prior_samples = re.load_prior_samples()                 # load prior samples as DataFrame
>>> posterior_samples = re.load_posterior_samples()         # load posterior samples as DataFrame
2DOF TMCMC example

Example of TMCMC calibration of a 2-DOF system

N = None
parameters = None
names = None
log_likelihood = None
mutation_steps = 5
status_file_name = None
trace = None
check_names()
run() list[Stage]

Run the TMCMC algorithm

Returns:

returns trace file of all samples of all tmcmc stages.

Return type:

mytrace

check_trace()
save_trace(mytrace, file_name: str, save_dir=None)

Save the trace to a file

Parameters:
  • mytrace – trace file of all samples of all tmcmc stages.

  • file_name (str) – name of the file to save the trace

load_prior_samples(trace=None) pandas.DataFrame
load_posterior_samples(trace=None) pandas.DataFrame
plot_updated_distribution(trace=None, save=False)

Plot the prior and posterior distribution of the parameters

Parameters:
  • trace – trace file of all samples of all tmcmc stages.

  • save (bool) – if True, save the figure

class pyuncertainnumber.calibration.tmcmc.Stage

Container for one TMCMC stage.

Parameters:
  • Sm (NDArray) – samples before resampling (N x Np)

  • Lm (NDArray) – log-likelihoods at Sm (N,)

  • Wm_n (NDArray) – normalized incremental weights (N,)

  • ESS (Optional[float]) – effective sample size the next stage (can be None in terminal stage)

  • beta (float) – tempering parameter next stage beta

  • Smcap (Optional[NDArray]) – resampled samples (N x Np), None in terminal stage

Sm: numpy.typing.NDArray
Lm: numpy.typing.NDArray
Wm_n: numpy.typing.NDArray
ESS: float | None
beta: float
Smcap: numpy.typing.NDArray | None
__repr__() str
property samples: numpy.typing.NDArray

Samples before resampling (from previous stage).

property log_likelihoods: numpy.typing.NDArray

Log-likelihood evaluated at samples.

property weights: numpy.typing.NDArray

Normalized incremental importance weights.

property effective_sample_size: float | None

Effective sample size after reweighting the next stage.

property tempering_parameter: float

Tempering parameter beta next stage beta.

property resampled_samples: numpy.typing.NDArray | None

Samples after importance resampling (before MH mutation).

pyuncertainnumber.calibration.tmcmc.recover_trace_samples(mytrace: list[Stage], names: list[str]) pandas.DataFrame

Load back the Prior and Posterior samples in the form of a DataFrame

Parameters:
  • mytrace (list[Stage]) – trace file of all samples of all tmcmc stages.

  • names (list[str]) – list of parameter names

Returns:

Prior and Posterior in the trace as a DataFrame

Return type:

(pd.DataFrame)

pyuncertainnumber.calibration.tmcmc.plot_updated_distribution(mytrace, names: list[str], save=False)

Plot the prior and posterior distribution of the parameters

Parameters:
  • mytrace – trace file of all samples of all tmcmc stages.

  • names (list[str]) – list of parameter names

  • save (bool) – if True, save the figure

pyuncertainnumber.calibration.tmcmc.initial_population(N: float, all_pars: list) numpy.typing.NDArray

Generates initial population from prior distribution

Parameters:
  • N (float) – number of particles.

  • all_pars (list) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.

Returns

ini_pop (NDArray): the initial population which is a numpy array of size N x Np.

Note

Np is number of parameters

pyuncertainnumber.calibration.tmcmc.log_prior(s: numpy.typing.NDArray, all_pars: list[pyuncertainnumber.calibration.pdfs.ProbabilityDensityFun | pyuncertainnumber.Distribution]) float

Computes log_prior value at all particles

Parameters:
  • s (NDArray) – numpy array of size (N x Np) of all particles.

  • all_pars (list[ProbabilityDensityFun | Distribution]) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.

Returns

log_p (NDArray): log prior at all N particles which is a numpy array of size N

Note

Np denotes number of parameters

pyuncertainnumber.calibration.tmcmc.compute_beta_update_evidence(beta: float, log_likelihoods: numpy.typing.NDArray, log_evidence: float, prev_ESS: int)

Computes beta for the next stage and updated model evidence

Parameters:
  • beta (float) – stage parameter.

  • log_likelihoods (NDArray) – log likelihood values at all particles which is numpy array of size N

  • log_evidence (float) – log of evidence.

  • prev_ESS (int) – effective sample size of previous stage

Returns:

A 4-tuple (new_beta, log_evidence, Wm_n, ESS) where: - new_beta (float): stage parameter for next stage. - log_evidence (float): updated log evidence. - Wm_n (NDArray): weights of particles for the next stage. - ESS (float): effective sample size of new stage.

Return type:

tuple[float, float, NDArray, float]

pyuncertainnumber.calibration.tmcmc.propose(current: numpy.typing.NDArray, covariance: numpy.typing.NDArray, n: int) numpy.typing.NDArray

Proposal distribution for MCMC in pertubation stage

Parameters:
  • current (NDArray) – numpy array of size Np current particle location

  • covariance (NDArray) – numpy array of size Np x Np proposal covariance matrix

  • n (int) – number of proposals.

Returns:

n proposals

Return type:

NDArray

pyuncertainnumber.calibration.tmcmc.MCMC_MH(particle_num: int, Em: numpy.typing.NDArray, Nm_steps: int, current: numpy.typing.NDArray, likelihood_current: float, posterior_current: float, beta: float, numAccepts: int, all_pars: list, log_likelihood: callable) tuple[numpy.typing.NDArray, float, float, int]

Markov chain Monte Carlo using Metropolis-Hastings which “perturbs” each particle using MCMC-MH

Conduct Nm_steps steps of MH for each particle.

Parameters:
  • particle_num (int) – The index of the current particle/parameter vector being evaluated.

  • Em (NDArray) – proposal covarince matrix which is numpy array of size Nop x Nop

  • Nm_steps (int) – number of perturbation steps.

  • current (NDArray) – numpy array of size Nop. current particle location

  • likelihood_current (float) – log likelihood value at current particle

  • posterior_current (float) – log posterior value at current particle

  • beta (float) – stage parameter.

  • numAccepts (int) – total number of accepts

  • all_pars – list of size Nop Nop is number of parameters all_pars[i] is object of type pdfs all parameters to be inferred.

  • log_likelihood (callable) – log likelihood function to be defined in main.py.

Returns:

A 4-tuple (current, likelihood_current, posterior_current, numAccepts) where:

  • current (NDArray): Perturbed particle location of shape (Nop,).

  • likelihood_current (float): Log-likelihood at the perturbed particle.

  • posterior_current (float): Log-posterior at the perturbed particle.

  • numAccepts (int): Updated total number of accepts.

Return type:

tuple[NDArray, float, float, int]

pyuncertainnumber.calibration.tmcmc.run_tmcmc_vec()

vectorised version of run_tmcmc to be implemented later

pyuncertainnumber.calibration.tmcmc.run_tmcmc_updated(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5) list[Stage]

Updated main workflow of running Transitional MCMC

Solves the “At this point, beta has been updated to the new value, but Postm is still the log-target from the previous beta (except at the very first stage). So Postmcap is inconsistent with the beta you pass into MCMC_MH.”

Parameters:
  • N (int) – int number of particles to be sampled from posterior

  • all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.

  • log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific

  • status_file_name (str) – name of the status file to store status of the tmcmc sampling

  • Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.

  • Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.

Returns:

the trace that contains all iterations of the Stage instances.

Return type:

list[Stage]

pyuncertainnumber.calibration.tmcmc.run_tmcmc(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5)

Main workflow of running Transitional MCMC

Parameters:
  • N (int) – int number of particles to be sampled from posterior

  • all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.

  • log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific

  • status_file_name (str) – name of the status file to store status of the tmcmc sampling

  • Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.

  • Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.

  • parallel_processing (str) – should be either ‘multiprocessing’ or ‘mpi’

Returns:

mytrace: returns trace file of all samples of all tmcmc stages.

at stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

a tuple containing

pyuncertainnumber.calibration.tmcmc.log_likelihood(sam_id: int, param_vec: numpy.typing.NDArray, data_empirical, n_xa_samples, simulator: callable, which_likelihood_calculator='gaussian') numpy.ndarray

Picklable function at module level.

Parameters:
  • sam_id – (int) a dummy sample index which is required in the main run_tmcmc workflow

  • param_vec (array-like) – indeed a 1d epistemic_param_vec, the parameter vector, i.e. epistemic_domain

Note

Gaussian approximate likelihood function is implemented here. Other likelihood functions such as “pseudo” and “vae” are underway. sam_id is not used here, but it is required in the main run_tmcmc workflow

pyuncertainnumber.calibration.tmcmc.gaussian_likelihood_fun(params, data, simulator, n_xa_samples, dissimilarity_metric='wasserstein')

Computes the Gaussian log-likelihood of the data given the model and parameters.

Parameters:
  • params – Parameters for the black-box model.

  • data – Empirical data (numpy array of shape (time_steps, features, samples)).

  • simulator – A function that generates model response samples given parameters. Here response may be time series data of shape (time_steps, features, n_xa_samples), or maybe some performance metrics extracted from the time series data.

  • n_xa_samples – Number of samples to generate from the simulator.

Returns:

The Gaussian log-likelihood of the data given the model.

Return type:

log_likelihood

pyuncertainnumber.calibration.tmcmc.get_top_samples(trace, top_num=20)

Get the top posterior samples based on likelihood values.

pyuncertainnumber.calibration.tmcmc.get_posterior_samples(trace: list[Stage]) numpy.typing.NDArray

Get the posterior particles of the last stage from the trace.

Parameters:

trace (list[Stage]) – trace object.

pyuncertainnumber.calibration.tmcmc.hdi_1d(x: numpy.typing.NDArray, cred_mass=0.95) tuple[float, float]

Obtain the highest density interval (HDI) from particles

Parameters:
  • x (NDArray) – 1D array of samples.

  • cred_mass (float) – Credible mass for the HDI.

Returns:

Lower and upper bounds of the HDI.

Return type:

tuple

pyuncertainnumber.calibration.tmcmc.get_hdi_bounds(data: pandas.DataFrame | numpy.typing.NDArray, levels=(0.99, 0.95, 0.2, 0.1, 0.05)) pandas.DataFrame

Compute HDI bounds for each column in data

Parameters:
  • data (pd.DataFrame | NDArray) – posterior samples (shape: n_samples x n_vars).

  • levels (tuple) – A tuple of high density intervalc levels.

Returns:

DataFrame with HDI bounds for each column and level.

Return type:

pd.DataFrame

pyuncertainnumber.calibration.tmcmc.filter_hdi_bounds(df, level: int = 95) pandas.DataFrame

Extract HDI bounds for a given credibility level.

Parameters:
  • df (pd.DataFrame) – HDI dataframe containing columns like hdi_95_low, hdi_95_high

  • level (int | str) – significance level (e.g. 95, 90, 20)

Returns:

DataFrame with columns [hdi_<level>_low, hdi_<level>_high]

Return type:

(pd.DataFrame)

pyuncertainnumber.calibration.tmcmc.transform_old_trace_to_new(trace: list[list]) list[Stage]

Transform old trace format to new Stage class format.

Parameters:

trace (list[list]) – Old trace instance which is list of lists, where each inner list contains: [Sm, Lm, Wm_n, ESS, beta, Smcap]

Returns:

New trace instance which is list of Stage class instances.

Return type:

list[Stage]