pyuncertainnumber.calibration.tmcmc¶
Classes¶
Functions¶
|
Load back the Prior and Posterior samples in the form of a DataFrame |
|
Plot the prior and posterior distribution of the parameters |
|
Generates initial population from prior distribution |
|
Computes log_prior value at all particles |
|
Computes beta for the next stage and updated model evidence |
|
Proposal distribution for MCMC in pertubation stage |
|
Markov chain Monte Carlo using Metropolis-Hastings which "perturbs" each particle using MCMC-MH |
vectorised version of run_tmcmc to be implemented later |
|
|
Updated main workflow of running Transitional MCMC |
|
Main workflow of running Transitional MCMC |
|
Picklable function at module level. |
|
Computes the Gaussian log-likelihood of the data given the model and parameters. |
|
Get the top posterior samples based on likelihood values. |
|
Get the posterior particles of the last stage from the trace. |
|
Obtain the highest density interval (HDI) from particles |
|
Compute HDI bounds for each column in data |
|
Extract HDI bounds for a given credibility level. |
|
Transform old trace format to new Stage class format. |
Module Contents¶
- class pyuncertainnumber.calibration.tmcmc.TMCMC(N: int = None, parameters: list = None, names: list[str] = None, log_likelihood: callable = None, mutation_steps: int = 5, status_file_name: str = None, trace=None)¶
Class for TMCMC implementation
- Parameters:
N (int) – number of particles to be sampled from posterior
parameters (list) – list of (size Nop) prior distributions instances
log_likelihood (callable) – log likelihood function on \(\theta\) to be defined which is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling
- Returns:
- returns trace file of all samples of all tmcmc stages.
At stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]
- Return type:
mytrace
Example
>>> from pyuncertainnumber.calibration.tmcmc import TMCMC >>> from pyuncertainnumber import pba >>> # create an instance of TMCMC class and run tmcmc >>> parameters = [pba.D("uniform", (0.8, 2.2)), pba.D("uniform", (0.4, 1.2))] # dummy prior distributions
>>> t = TMCMC( >>> N=1000, # number of particles >>> parameters=parameters, # prior distributions >>> names =['k1', 'k2'], # parameter names >>> log_likelihood=log_likelihood_function, # user defined log likelihood function >>> mutation_steps=1, # number of MCMC steps for perturbation >>> status_file_name='tmcmc_progressing_status.txt', # status log file >>> ) >>> mytrace = t.run()
>>> # for loading trace file >>> import pickle >>> with open('tmcmc_trace.pkl', 'rb') as f: >>> mytrace = pickle.load(f) >>> re = TMCMC(trace=mytrace, names=names) # create TMCMC instance with loaded trace and names
>>> re.plot_updated_distribution(save=False) # plot prior and posterior distribution >>> prior_samples = re.load_prior_samples() # load prior samples as DataFrame >>> posterior_samples = re.load_posterior_samples() # load posterior samples as DataFrame
Example of TMCMC calibration of a 2-DOF system¶
- N = None¶
- parameters = None¶
- names = None¶
- log_likelihood = None¶
- mutation_steps = 5¶
- status_file_name = None¶
- trace = None¶
- check_names()¶
- run() list[Stage]¶
Run the TMCMC algorithm
- Returns:
returns trace file of all samples of all tmcmc stages.
- Return type:
mytrace
- check_trace()¶
- save_trace(mytrace, file_name: str, save_dir=None)¶
Save the trace to a file
- Parameters:
mytrace – trace file of all samples of all tmcmc stages.
file_name (str) – name of the file to save the trace
- load_prior_samples(trace=None) pandas.DataFrame¶
- load_posterior_samples(trace=None) pandas.DataFrame¶
- plot_updated_distribution(trace=None, save=False)¶
Plot the prior and posterior distribution of the parameters
- Parameters:
trace – trace file of all samples of all tmcmc stages.
save (bool) – if True, save the figure
- class pyuncertainnumber.calibration.tmcmc.Stage¶
Container for one TMCMC stage.
- Parameters:
Sm (NDArray) – samples before resampling (N x Np)
Lm (NDArray) – log-likelihoods at Sm (N,)
Wm_n (NDArray) – normalized incremental weights (N,)
ESS (Optional[float]) – effective sample size the next stage (can be None in terminal stage)
beta (float) – tempering parameter next stage beta
Smcap (Optional[NDArray]) – resampled samples (N x Np), None in terminal stage
- Sm: numpy.typing.NDArray¶
- Lm: numpy.typing.NDArray¶
- Wm_n: numpy.typing.NDArray¶
- ESS: float | None¶
- beta: float¶
- Smcap: numpy.typing.NDArray | None¶
- __repr__() str¶
- property samples: numpy.typing.NDArray¶
Samples before resampling (from previous stage).
- property log_likelihoods: numpy.typing.NDArray¶
Log-likelihood evaluated at samples.
- property weights: numpy.typing.NDArray¶
Normalized incremental importance weights.
- property effective_sample_size: float | None¶
Effective sample size after reweighting the next stage.
- property tempering_parameter: float¶
Tempering parameter beta next stage beta.
- property resampled_samples: numpy.typing.NDArray | None¶
Samples after importance resampling (before MH mutation).
- pyuncertainnumber.calibration.tmcmc.recover_trace_samples(mytrace: list[Stage], names: list[str]) pandas.DataFrame¶
Load back the Prior and Posterior samples in the form of a DataFrame
- Parameters:
mytrace (list[Stage]) – trace file of all samples of all tmcmc stages.
names (list[str]) – list of parameter names
- Returns:
Prior and Posterior in the trace as a DataFrame
- Return type:
(pd.DataFrame)
- pyuncertainnumber.calibration.tmcmc.plot_updated_distribution(mytrace, names: list[str], save=False)¶
Plot the prior and posterior distribution of the parameters
- Parameters:
mytrace – trace file of all samples of all tmcmc stages.
names (list[str]) – list of parameter names
save (bool) – if True, save the figure
- pyuncertainnumber.calibration.tmcmc.initial_population(N: float, all_pars: list) numpy.typing.NDArray¶
Generates initial population from prior distribution
- Parameters:
N (float) – number of particles.
all_pars (list) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.
- Returns
ini_pop (NDArray): the initial population which is a numpy array of size N x Np.
Note
Np is number of parameters
- pyuncertainnumber.calibration.tmcmc.log_prior(s: numpy.typing.NDArray, all_pars: list[pyuncertainnumber.calibration.pdfs.ProbabilityDensityFun | pyuncertainnumber.Distribution]) float¶
Computes log_prior value at all particles
- Parameters:
s (NDArray) – numpy array of size (N x Np) of all particles.
all_pars (list[ProbabilityDensityFun | Distribution]) – All the parameters, of size Np, to be updated. all_pars[i] is object of type pdfs.
- Returns
log_p (NDArray): log prior at all N particles which is a numpy array of size N
Note
Np denotes number of parameters
- pyuncertainnumber.calibration.tmcmc.compute_beta_update_evidence(beta: float, log_likelihoods: numpy.typing.NDArray, log_evidence: float, prev_ESS: int)¶
Computes beta for the next stage and updated model evidence
- Parameters:
beta (float) – stage parameter.
log_likelihoods (NDArray) – log likelihood values at all particles which is numpy array of size N
log_evidence (float) – log of evidence.
prev_ESS (int) – effective sample size of previous stage
- Returns:
A 4-tuple
(new_beta, log_evidence, Wm_n, ESS)where: - new_beta (float): stage parameter for next stage. - log_evidence (float): updated log evidence. - Wm_n (NDArray): weights of particles for the next stage. - ESS (float): effective sample size of new stage.- Return type:
tuple[float, float, NDArray, float]
- pyuncertainnumber.calibration.tmcmc.propose(current: numpy.typing.NDArray, covariance: numpy.typing.NDArray, n: int) numpy.typing.NDArray¶
Proposal distribution for MCMC in pertubation stage
- Parameters:
current (NDArray) – numpy array of size Np current particle location
covariance (NDArray) – numpy array of size Np x Np proposal covariance matrix
n (int) – number of proposals.
- Returns:
n proposals
- Return type:
NDArray
- pyuncertainnumber.calibration.tmcmc.MCMC_MH(particle_num: int, Em: numpy.typing.NDArray, Nm_steps: int, current: numpy.typing.NDArray, likelihood_current: float, posterior_current: float, beta: float, numAccepts: int, all_pars: list, log_likelihood: callable) tuple[numpy.typing.NDArray, float, float, int]¶
Markov chain Monte Carlo using Metropolis-Hastings which “perturbs” each particle using MCMC-MH
Conduct Nm_steps steps of MH for each particle.
- Parameters:
particle_num (int) – The index of the current particle/parameter vector being evaluated.
Em (NDArray) – proposal covarince matrix which is numpy array of size Nop x Nop
Nm_steps (int) – number of perturbation steps.
current (NDArray) – numpy array of size Nop. current particle location
likelihood_current (float) – log likelihood value at current particle
posterior_current (float) – log posterior value at current particle
beta (float) – stage parameter.
numAccepts (int) – total number of accepts
all_pars – list of size Nop Nop is number of parameters all_pars[i] is object of type pdfs all parameters to be inferred.
log_likelihood (callable) – log likelihood function to be defined in main.py.
- Returns:
A 4-tuple
(current, likelihood_current, posterior_current, numAccepts)where:current (NDArray): Perturbed particle location of shape (Nop,).
likelihood_current (float): Log-likelihood at the perturbed particle.
posterior_current (float): Log-posterior at the perturbed particle.
numAccepts (int): Updated total number of accepts.
- Return type:
tuple[NDArray, float, float, int]
- pyuncertainnumber.calibration.tmcmc.run_tmcmc_vec()¶
vectorised version of run_tmcmc to be implemented later
- pyuncertainnumber.calibration.tmcmc.run_tmcmc_updated(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5) list[Stage]¶
Updated main workflow of running Transitional MCMC
Solves the “At this point, beta has been updated to the new value, but Postm is still the log-target from the previous beta (except at the very first stage). So Postmcap is inconsistent with the beta you pass into MCMC_MH.”
- Parameters:
N (int) – int number of particles to be sampled from posterior
all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.
log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling
Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
- Returns:
the trace that contains all iterations of the Stage instances.
- Return type:
list[Stage]
- pyuncertainnumber.calibration.tmcmc.run_tmcmc(N: int, all_pars: list, log_likelihood: callable, status_file_name: str, Nm_steps_max: int = 5, Nm_steps_maxmax: int = 5)¶
Main workflow of running Transitional MCMC
- Parameters:
N (int) – int number of particles to be sampled from posterior
all_pars (list) – All the parameters, of size Nop, to be updated. all_pars[i] is object of type pdfs.
log_likelihood (callable) – log likelihood function to be defined in main.py as is problem specific
status_file_name (str) – name of the status file to store status of the tmcmc sampling
Nm_steps_max (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
Nm_steps_maxmax (int, optional) – Numbers of MCMC steps for perturbation. The default is 5.
parallel_processing (str) – should be either ‘multiprocessing’ or ‘mpi’
- Returns:
- mytrace: returns trace file of all samples of all tmcmc stages.
at stage m: it contains [Sm, Lm, Wm_n, ESS, beta, Smcap]
- Return type:
a tuple containing
- pyuncertainnumber.calibration.tmcmc.log_likelihood(sam_id: int, param_vec: numpy.typing.NDArray, data_empirical, n_xa_samples, simulator: callable, which_likelihood_calculator='gaussian') numpy.ndarray¶
Picklable function at module level.
- Parameters:
sam_id – (int) a dummy sample index which is required in the main run_tmcmc workflow
param_vec (array-like) – indeed a 1d epistemic_param_vec, the parameter vector, i.e. epistemic_domain
Note
Gaussian approximate likelihood function is implemented here. Other likelihood functions such as “pseudo” and “vae” are underway. sam_id is not used here, but it is required in the main run_tmcmc workflow
- pyuncertainnumber.calibration.tmcmc.gaussian_likelihood_fun(params, data, simulator, n_xa_samples, dissimilarity_metric='wasserstein')¶
Computes the Gaussian log-likelihood of the data given the model and parameters.
- Parameters:
params – Parameters for the black-box model.
data – Empirical data (numpy array of shape (time_steps, features, samples)).
simulator – A function that generates model response samples given parameters. Here response may be time series data of shape (time_steps, features, n_xa_samples), or maybe some performance metrics extracted from the time series data.
n_xa_samples – Number of samples to generate from the simulator.
- Returns:
The Gaussian log-likelihood of the data given the model.
- Return type:
- pyuncertainnumber.calibration.tmcmc.get_top_samples(trace, top_num=20)¶
Get the top posterior samples based on likelihood values.
- pyuncertainnumber.calibration.tmcmc.get_posterior_samples(trace: list[Stage]) numpy.typing.NDArray¶
Get the posterior particles of the last stage from the trace.
- Parameters:
trace (list[Stage]) – trace object.
- pyuncertainnumber.calibration.tmcmc.hdi_1d(x: numpy.typing.NDArray, cred_mass=0.95) tuple[float, float]¶
Obtain the highest density interval (HDI) from particles
- Parameters:
x (NDArray) – 1D array of samples.
cred_mass (float) – Credible mass for the HDI.
- Returns:
Lower and upper bounds of the HDI.
- Return type:
tuple
- pyuncertainnumber.calibration.tmcmc.get_hdi_bounds(data: pandas.DataFrame | numpy.typing.NDArray, levels=(0.99, 0.95, 0.2, 0.1, 0.05)) pandas.DataFrame¶
Compute HDI bounds for each column in data
- Parameters:
data (pd.DataFrame | NDArray) – posterior samples (shape: n_samples x n_vars).
levels (tuple) – A tuple of high density intervalc levels.
- Returns:
DataFrame with HDI bounds for each column and level.
- Return type:
pd.DataFrame
- pyuncertainnumber.calibration.tmcmc.filter_hdi_bounds(df, level: int = 95) pandas.DataFrame¶
Extract HDI bounds for a given credibility level.
- Parameters:
df (pd.DataFrame) – HDI dataframe containing columns like hdi_95_low, hdi_95_high
level (int | str) – significance level (e.g. 95, 90, 20)
- Returns:
DataFrame with columns [hdi_<level>_low, hdi_<level>_high]
- Return type:
(pd.DataFrame)
- pyuncertainnumber.calibration.tmcmc.transform_old_trace_to_new(trace: list[list]) list[Stage]¶
Transform old trace format to new Stage class format.
- Parameters:
trace (list[list]) – Old trace instance which is list of lists, where each inner list contains: [Sm, Lm, Wm_n, ESS, beta, Smcap]
- Returns:
New trace instance which is list of Stage class instances.
- Return type:
list[Stage]