pyuncertainnumber.calibration.tmcmc¶

Classes¶

`TMCMC`	Class for TMCMC implementation
`Stage`	Container for one TMCMC stage.

Functions¶

`recover_trace_samples`(→ pandas.DataFrame)	Load back the Prior and Posterior samples in the form of a DataFrame
`plot_updated_distribution`(mytrace, names[, save])	Plot the prior and posterior distribution of the parameters
`initial_population`(→ numpy.typing.NDArray)	Generates initial population from prior distribution
`log_prior`(→ float)	Computes log_prior value at all particles
`compute_beta_update_evidence`(beta, log_likelihoods, ...)	Computes beta for the next stage and updated model evidence
`propose`(→ numpy.typing.NDArray)	Proposal distribution for MCMC in pertubation stage
`MCMC_MH`(→ tuple[numpy.typing.NDArray, float, float, int])	Markov chain Monte Carlo using Metropolis-Hastings which "perturbs" each particle using MCMC-MH
`run_tmcmc_vec`()	vectorised version of run_tmcmc to be implemented later
`run_tmcmc_updated`(→ list[Stage])	Updated main workflow of running Transitional MCMC
`run_tmcmc`(N, all_pars, log_likelihood, status_file_name)	Main workflow of running Transitional MCMC
`log_likelihood`(→ numpy.ndarray)	Picklable function at module level.
`gaussian_likelihood_fun`(params, data, simulator, ...)	Computes the Gaussian log-likelihood of the data given the model and parameters.
`get_top_samples`(trace[, top_num])	Get the top posterior samples based on likelihood values.
`get_posterior_samples`(→ numpy.typing.NDArray)	Get the posterior particles of the last stage from the trace.
`hdi_1d`(→ tuple[float, float])	Obtain the highest density interval (HDI) from particles
`get_hdi_bounds`() → pandas.DataFrame)	Compute HDI bounds for each column in data
`filter_hdi_bounds`(→ pandas.DataFrame)	Extract HDI bounds for a given credibility level.
`transform_old_trace_to_new`(→ list[Stage])	Transform old trace format to new Stage class format.

Module Contents¶

class pyuncertainnumber.calibration.tmcmc.TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)¶

Class for TMCMC implementation

Parameters:

N (int) – number of particles to be sampled from posterior
parameters (list) – list of (size Nop) prior distributions instances
log_likelihood (callable) – log likelihood function on \(\theta\) to be defined which is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling

Returns:

returns trace file of all samples of all tmcmc stages.: At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

mytrace

Example

>>> from pyuncertainnumber.calibration.tmcmc import TMCMC
>>> from pyuncertainnumber import pba
>>> # create an instance of TMCMC class and run tmcmc
>>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions

>>> t = TMCMC(
>>>     N=1000,                                             # number of particles
>>>     parameters=parameters,                              # prior distributions
>>>     names =['k1', 'k2'],                                # parameter names
>>>     log_likelihood=log_likelihood_function,             # user defined log likelihood function
>>>     mutation_steps=1,                                   # number of MCMC steps for perturbation
>>>     status_file_name='tmcmc_progressing_status.txt',    # status log file
>>> )
>>> mytrace = t.run()

>>> # for loading trace file
>>> import pickle
>>> with open('tmcmc_trace.pkl', 'rb') as f:
>>>     mytrace = pickle.load(f)
>>> re = TMCMC(trace=mytrace, names=names)                  # create TMCMC instance with loaded trace and names

>>> re.plot_updated_distribution(save=False)                # plot prior and posterior distribution
>>> prior_samples = re.load_prior_samples()                 # load prior samples as DataFrame
>>> posterior_samples = re.load_posterior_samples()         # load posterior samples as DataFrame

2DOF TMCMC example — Example of TMCMC calibration of a 2-DOF system¶

N = None¶

parameters = None¶

names = None¶

log_likelihood = None¶

mutation_steps = 5¶

status_file_name = None¶

trace = None¶

check_names()¶

run() → list[Stage]¶

Run the TMCMC algorithm

Returns:: returns trace file of all samples of all tmcmc stages.
Return type:: mytrace

check_trace()¶

save_trace(mytrace, file_name: str, save_dir=None)¶

Save the trace to a file

Parameters:

mytrace – trace file of all samples of all tmcmc stages.
file_name (str) – name of the file to save the trace

load_prior_samples(trace=None) → pandas.DataFrame¶

load_posterior_samples(trace=None) → pandas.DataFrame¶

plot_updated_distribution(trace=None, save=False)¶

Plot the prior and posterior distribution of the parameters

Parameters:

trace – trace file of all samples of all tmcmc stages.
save (bool) – if True, save the figure

class pyuncertainnumber.calibration.tmcmc.Stage¶

Container for one TMCMC stage.

Parameters:

Sm (NDArray) – samples before resampling (N x Np)
Lm (NDArray) – log-likelihoods at Sm (N,)
Wm_n (NDArray) – normalized incremental weights (N,)
ESS (Optional[float]) – effective sample size the next stage (can be None in terminal stage)
beta (float) – tempering parameter next stage beta
Smcap (Optional[NDArray]) – resampled samples (N x Np), None in terminal stage

Sm: numpy.typing.NDArray¶

Lm: numpy.typing.NDArray¶

Wm_n: numpy.typing.NDArray¶

ESS: float | None¶

beta: float¶

Smcap: numpy.typing.NDArray | None¶

__repr__() → str¶

property samples: numpy.typing.NDArray¶: Samples before resampling (from previous stage).

property log_likelihoods: numpy.typing.NDArray¶: Log-likelihood evaluated at samples.

property weights: numpy.typing.NDArray¶: Normalized incremental importance weights.

property effective_sample_size: float | None¶: Effective sample size after reweighting the next stage.

property tempering_parameter: float¶: Tempering parameter beta next stage beta.

property resampled_samples: numpy.typing.NDArray | None¶: Samples after importance resampling (before MH mutation).

pyuncertainnumber.calibration.tmcmc.recover_trace_samples(mytrace: list[Stage], names: list[str]) → pandas.DataFrame¶

Load back the Prior and Posterior samples in the form of a DataFrame

Parameters:

mytrace (list[Stage]) – trace file of all samples of all tmcmc stages.
names (list[str]) – list of parameter names

Returns:

Prior and Posterior in the trace as a DataFrame

Return type:

(pd.DataFrame)

pyuncertainnumber.calibration.tmcmc.plot_updated_distribution(mytrace, names: list[str], save=False)¶

Plot the prior and posterior distribution of the parameters

Parameters:

mytrace – trace file of all samples of all tmcmc stages.
names (list[str]) – list of parameter names
save (bool) – if True, save the figure

pyuncertainnumber.calibration.tmcmc.initial_population(N: float, all_pars: list) → numpy.typing.NDArray¶

Generates initial population from prior distribution

Parameters:

N (float) – number of particles.
all_pars (list) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.

Returns: ini_pop (NDArray): the initial population which is a numpy array of size N x Np.

Note

Np is number of parameters

pyuncertainnumber.calibration.tmcmc.log_prior(s: numpy.typing.NDArray, all_pars: list[pyuncertainnumber.calibration.pdfs.ProbabilityDensityFun | pyuncertainnumber.Distribution]) → float¶

Computes log_prior value at all particles

Parameters:

s (NDArray) – numpy array of size (N x Np) of all particles.
all_pars (list[ProbabilityDensityFun | Distribution]) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.

Returns: log_p (NDArray): log prior at all N particles which is a numpy array of size N

Note

Np denotes number of parameters

pyuncertainnumber.calibration.tmcmc.compute_beta_update_evidence(beta: float, log_likelihoods: numpy.typing.NDArray, log_evidence: float, prev_ESS: int)¶

Computes beta for the next stage and updated model evidence

Parameters:

beta (float) – stage parameter.
log_likelihoods (NDArray) – log likelihood values at all particles which is numpy array of size N
log_evidence (float) – log of evidence.
prev_ESS (int) – effective sample size of previous stage

Returns:

A 4-tuple (new_beta, log_evidence, Wm_n, ESS) where: - new_beta (float): stage parameter for next stage. - log_evidence (float): updated log evidence. - Wm_n (NDArray): weights of particles for the next stage. - ESS (float): effective sample size of new stage.

Return type:

tuple[float, float, NDArray, float]

pyuncertainnumber.calibration.tmcmc.propose(current: numpy.typing.NDArray, covariance: numpy.typing.NDArray, n: int) → numpy.typing.NDArray¶

Proposal distribution for MCMC in pertubation stage

Parameters:

current (NDArray) – numpy array of size Np current particle location
covariance (NDArray) – numpy array of size Np x Np proposal covariance matrix
n (int) – number of proposals.

Returns:

n proposals

Return type:

NDArray

pyuncertainnumber.calibration.tmcmc.MCMC_MH(particle_num: int, Em: numpy.typing.NDArray, Nm_steps: int, current: numpy.typing.NDArray, likelihood_current: float, posterior_current: float, beta: float, numAccepts: int, all_pars: list, log_likelihood: callable) → tuple[numpy.typing.NDArray, float, float, int]¶

Markov chain Monte Carlo using Metropolis-Hastings which “perturbs” each particle using MCMC-MH

Conduct Nm_steps steps of MH for each particle.

Parameters:

particle_num (int) – The index of the current particle/parameter vector being evaluated.
Em (NDArray) – proposal covarince matrix which is numpy array of size Nop x Nop
Nm_steps (int) – number of perturbation steps.
current (NDArray) – numpy array of size Nop. current particle location
likelihood_current (float) – log likelihood value at current particle
posterior_current (float) – log posterior value at current particle
beta (float) – stage parameter.
numAccepts (int) – total number of accepts
all_pars – list of size Nop Nop is number of parameters all_pars[i] is object of type pdfs all parameters to be inferred.
log_likelihood (callable) – log likelihood function to be defined in main.py.

Returns:

A 4-tuple (current, likelihood_current, posterior_current, numAccepts) where:

current (NDArray): Perturbed particle location of shape (Nop,).
likelihood_current (float): Log-likelihood at the perturbed particle.
posterior_current (float): Log-posterior at the perturbed particle.
numAccepts (int): Updated total number of accepts.

Return type:

tuple[NDArray, float, float, int]

pyuncertainnumber.calibration.tmcmc.run_tmcmc_vec()¶: vectorised version of run_tmcmc to be implemented later

pyuncertainnumber.calibration.tmcmc.run_tmcmc_updated(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5) → list[Stage]¶

Updated main workflow of running Transitional MCMC

Solves the “At this point, beta has been updated to the new value, but Postm is still the log-target from the previous beta (except at the very first stage). So Postmcap is inconsistent with the beta you pass into MCMC_MH.”

Parameters:

N (int) – int number of particles to be sampled from posterior
all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.
log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling
Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.

Returns:

the trace that contains all iterations of the Stage instances.

Return type:

list[Stage]

pyuncertainnumber.calibration.tmcmc.run_tmcmc(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5)¶

Main workflow of running Transitional MCMC

Parameters:

N (int) – int number of particles to be sampled from posterior
all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.
log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling
Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
parallel_processing (str) – should be either ‘multiprocessing’ or ‘mpi’

Returns:

mytrace: returns trace file of all samples of all tmcmc stages.: at stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]

Return type:

a tuple containing

pyuncertainnumber.calibration.tmcmc.log_likelihood(sam_id: int, param_vec: numpy.typing.NDArray, data_empirical, n_xa_samples, simulator: callable, which_likelihood_calculator='gaussian') → numpy.ndarray¶

Picklable function at module level.

Parameters:

sam_id – (int) a dummy sample index which is required in the main run_tmcmc workflow
param_vec (array-like) – indeed a 1d epistemic_param_vec, the parameter vector, i.e. epistemic_domain

Note

Gaussian approximate likelihood function is implemented here. Other likelihood functions such as “pseudo” and “vae” are underway. sam_id is not used here, but it is required in the main run_tmcmc workflow

pyuncertainnumber.calibration.tmcmc.gaussian_likelihood_fun(params, data, simulator, n_xa_samples, dissimilarity_metric='wasserstein')¶

Computes the Gaussian log-likelihood of the data given the model and parameters.

Parameters:

params – Parameters for the black-box model.
data – Empirical data (numpy array of shape (time_steps, features, samples)).
simulator – A function that generates model response samples given parameters. Here response may be time series data of shape (time_steps, features, n_xa_samples), or maybe some performance metrics extracted from the time series data.
n_xa_samples – Number of samples to generate from the simulator.

Returns:

The Gaussian log-likelihood of the data given the model.

Return type:

log_likelihood

pyuncertainnumber.calibration.tmcmc.get_top_samples(trace, top_num=20)¶: Get the top posterior samples based on likelihood values.

pyuncertainnumber.calibration.tmcmc.get_posterior_samples(trace: list[Stage]) → numpy.typing.NDArray¶

Get the posterior particles of the last stage from the trace.

Parameters:: trace (list[Stage]) – trace object.

pyuncertainnumber.calibration.tmcmc.hdi_1d(x: numpy.typing.NDArray, cred_mass=0.95) → tuple[float, float]¶

Obtain the highest density interval (HDI) from particles

Parameters:

x (NDArray) – 1D array of samples.
cred_mass (float) – Credible mass for the HDI.

Returns:

Lower and upper bounds of the HDI.

Return type:

tuple

pyuncertainnumber.calibration.tmcmc.get_hdi_bounds(data: pandas.DataFrame | numpy.typing.NDArray, levels=(0.99, 0.95, 0.2, 0.1, 0.05)) → pandas.DataFrame¶

Compute HDI bounds for each column in data

Parameters:

data (pd.DataFrame | NDArray) – posterior samples (shape: n_samples x n_vars).
levels (tuple) – A tuple of high density intervalc levels.

Returns:

DataFrame with HDI bounds for each column and level.

Return type:

pd.DataFrame

pyuncertainnumber.calibration.tmcmc.filter_hdi_bounds(df, level: int = 95) → pandas.DataFrame¶

Extract HDI bounds for a given credibility level.

Parameters:

df (pd.DataFrame) – HDI dataframe containing columns like hdi_95_low, hdi_95_high
level (int | str) – significance level (e.g. 95, 90, 20)

Returns:

DataFrame with columns [hdi_<level>_low, hdi_<level>_high]

Return type:

(pd.DataFrame)

pyuncertainnumber.calibration.tmcmc.transform_old_trace_to_new(trace: list[list]) → list[Stage]¶

Transform old trace format to new Stage class format.

Parameters:: trace (list[list]) – Old trace instance which is list of lists, where each inner list contains: [Sm, Lm, Wm_n, ESS, beta, Smcap]
Returns:: New trace instance which is list of Stage class instances.
Return type:: list[Stage]