Generalities#
Markov Chains#
A Markov process is a stockastic process in which events at the step \(N\) only depends on events at the step \(N-1\). It can be represented by a trasition matrix
Sarting from a single point \(\hat{x}\) the probability distribution \(p^{(n)}(x)\) after \(n\) steps is
If we consider the chain, i.e. the union of all points after \(n\) steps, its probability distribution is
which under transition behaves like
Thus the limiting chain \(C(x) \equiv C^{(\infty)}(x)\) satisfy the stationarity equation
Metropolis Hastings#
Metropolis Hastings [1] is an algorithm that uses Markov chains to provide sampling of a given distribution \(P(x)\).
Given a proposal transition matrix \(Q\)
Let’s define the acceptance as
Let’s consider the following Markov transition matrix
One can show that \(P(x)\) satisfies the stationarity equation for \(T\). Thus, if a large number of steps is performed, the chain resulting from \(T\) provides a good sample of \(P(x)\).
The resulting algorithm is as follows:
# Algorithm 1
X[n] # Point at the n-th markov iteration
Y, Fwd, Bkd = Proposal(X[n]) # Propose a point Y sampling Q(X(n) -> Y)
# Fwd = Q(X[n] -> Y)
# Bkd = Q(Y -> X[n])
Z = Uniform(0, 1) # Get a random Z between 0 and 1
A = P(Y) / P(X[n]) * Bkd / Fwd # Acceptance
if Z > A:
X[n + 1] = X[n] # Refused
else:
X[n + 1] = Y # Accepted
# Loop to get X[n + 2]
Multistep MCMC#
Consider a Markov process which transition matrix \(T\) is the composition of \(M\) matrices \(T^{(i)}, i=0,\dots,M-1\). This can be seen as a multi step process in which the transition matrices cycle during the Markov chain steps.
Sarting from a single point \(\hat{x}\) the probability distribution after \(n\) cycles and additional \(k=0, \dots, M-1\) steps is
and the partial chain, i.e. the union of all points at the ::k step of the cycle, after \(n\) cycles
The analogous of stationarity equation above is the following set of stationarity equations
The case we are interested in is when performing a multi step Metropolis Hasting MCMC sampling where all steps transition matrices correspond to the same posterior \(P\) and different proposals \(Q^{(i)}\). In this case it is easy to see that all \(C^{(k)}\) in the above equation sample P. This will be underlying mechanism of most of the algorithms described below (proposal mixing, parallel tempering, ensemble sampling).
Mixing Proposals#
While performing MCMC sampling, in order to better explore target posterior, one often alternates different proposals according to some “weights”.
To see how this works, suppose you want to sample a given posterior \(P(x)\) on some \(d\)-dimensional space. And suppose you have a set of \(N\) proposals \(Q_i\) that you want to alternate according to the discrete probability distribution \(p_i\)
In order to do this consider sampling the posterior \(\tilde{P}\) on the \((x, i)\) space
and perform a 2-steps MCMC as follows
Step1: starting from \((x, i)\) point propose a new point with proposal
\[\tilde{Q}\left(x, i \rightarrow y, j \right) = p_j~\delta(x - y)\]in this case, obviously, \({x=y}\) and the acceptance is
\[A = \frac{\tilde{P}(x, j)}{\tilde{P}(x, i)}\frac{p_i}{p_j} = 1\]thus the new point \((x, j)\) is always accepted.
Step2: propose a new point with proposal
\[\hat{Q}\left(x, j \rightarrow y, k \right) = \delta_{jk}~Q_j\left(x \rightarrow y\right)\]in this cas, of course, \({j=k}\) and the acceptance is
\[A = \frac{\tilde{P}(x, j)}{\tilde{P}(x, j)} \frac{Q_j\left(y \rightarrow x\right)}{Q_j\left(x \rightarrow y\right)} = \frac{P(y)}{P(x)}\frac{Q_j\left(y \rightarrow x\right)}{Q_j\left(x \rightarrow y\right)}\]
In this way we sample \(\tilde{P}\) and, by considering only the \(x\) variable in the chain, we marginalize over \(i\) obtaining \(P\).
Here is how the algorithm goes in practice:
# Algorithm 2
X[n] # Point at the n-th markov iteration
w # Weights of the proposals
Proposals # Set of proposal
i = choose(1, N, weights=w) # Choose one random index according to weights
Proposal = Proposals[1]
# Run Algorithm 1 to get X[n + 1]
# Loop to get X[n + 2]
Parallel MCMC#
When performing a multistep MCMC one can apply a mechanism of reordering and grouping the steps that allows us to partially parallelize the execution.
Suppose you have a \(N\)-steps MCMC sampling on a \(d\)-dimensional space with variable \(\mathbf{x}=(x_1,\dots,x_d)\) and consider the acceptance \(A^{(i)}, i=1,\dots,N\) for each steps.
Consider having consecutive steps that are mutually independent e.g.
\[T^{(i)}\left(x_1, \dots, x_k, x_i \rightarrow y_i\right)\Pi_{j \neq i}\delta(x_j - y_j)~~~~~~~i=k, \dots, d\]
They can be executed in parallel.
Note that the result of the sampling, the target posterior, is not affected by changing the order of steps in the cycle or by duplicating some of the steps. So one can rearrange the multi-steps MCMC so to have sub sequencies of steps that are independent and can thus be executed in parallel. We can see how this mechanism is the basis of parallel tempering and some walker mixing proposals (i.e. stretch) in ensemble sampling.