Markov chain Monte Carlo

Monte Carlo

Monte Carlo (term coined by S. Ulam and N. Metropolis) refers to the method of generating random numbers from a certain distribution.

For dealing with temporal uncertainty, for example, a Monte Carlo method consists on creating artificial “potential” time series within the boundary of existence, and then assessing the likelihood of the variable’s character state over time.

Markov chain

A Markov chain (after A. Markov) is a sequence of numbers where each number is dependent on the previous number in the sequence.

To simulate a Markov chain whose equilibrium distribution is a posterior density \(p(x)\) defined on the set of possible configurations of a system, we use Markov chain Monte Carlo (MCMC) methods.

Some MCMC algorithms:

  • Metropolis-Hastings

  • Gibbs sampling

  • Random walk Metropolis

Alternative to MCMC:

  • Importance/rejection sampling (used to evaluate the probability of rare events)


MCMC is used to fit a model and to draw samples from the joint posterior distribution of the model parameters.


Metropolis-Hastings

The Metropolis-Hastings algorithm serves to generate a sample for the posterior distribution of \(\theta\), which is the parameter of the data point’s distribution.


Algorithm

  1. Initialise \(x_0\)

  2. For \(i=0\) to \(N-1\)

    Sample \(u \sim U(0, 1)\) uniform distribution

    Sample \(x^\star \sim q(x^\star \mid x_t)\) candidate point on a proposal distribution

    If \(u < U() = \min\{1, \frac{p(x^\star) \; q(x_t \mid x^\star)}{p(x_t) \; q(x^\star \mid x_t)}\}\)

    \(x_{(t+1)} = x^\star\) accept candidate

    Else

    \(x_{(t+1)} = x_t\) increment time and sample \(x^\star\) until \(t_{\max}\) is reached.


Diagnostics

Diagnostics is to assess the MCMC performance.


MCMC Simulation

First generate a series of random numbers from a normal distribution with a mean value and some arbitrary variance.

After throwing away early samples, a process called burn-in, use algorithm for the MCMC iterations.

The trace plot (or time-series plot) is to judge how quickly the MCMC procedure converges in distribution.

When the variability is not the same over all iterations, the trace plot presents a “random walk” pattern.


Convergence

Convergence in MCMC is obtained when these outcomes become minimal with the increase of \(n\):

  • the change in the variance

  • the standard error


Note

Thinning interval is to achieve convergence by reducing autocorrelation of random points in the sampling.


Summed probability distribution


Todo

A proxy for population levels is found in summed probability distributions (SPD) of dates, which is the summation of the posterior probability distributions of calibrated dates.


Bayesian statistics

Markov chain Monte Carlo is a tool for sampling from distributions with high dimensionality, which is a need typically found in some Bayesian statistics contexts.

Bayesian analysis implies Bayesian inference, model fitting, and performing inferential and predictive summaries with predictive checks.

As mentioned in temporal uncertainty, the subjective approach in assigning probabilities includes a certain degree of belief or knowledge to the probability, and this is to arrive at a posterior distribution to updated the state of knowledge about the parameters.


Bayes theorem

Key concepts in Bayesian statistics are related in Bayes theorem, which is a rule that provides the conditional probability of \(\theta\) (model or hypothesis) occurring given that \(x\) (data or observation) already happened.

\(P(\theta \mid x) \;=\; \frac{ P(\theta)~P(x \mid \theta) } { P(x) }\)

That is, the posterior probability of \(\theta\) given \(x\) where

  • \(P(\theta \mid x)\) is the posterior probability of \(\theta\)

  • \(P(\theta)\) is the prior probability of \(\theta\)

  • \(P(x \mid \theta)\) is the likelihood of \(\theta\)

  • \(P(x)\) is the unconditional probability of \(x\)

Hence, the posterior is an updated degree of belief or knowledge based on the prior with known probabilities. This is our posterior target distribution sometimes written as \(P_{target}(\theta)\).

On the other hand, the likelihood is the probability that the hypothesis confers upon the observation, and the probability of the observation irrespective of any hypothesis is the unconditional probability of the observation.