Parallel Tempering

Parallel Tempering#

Consider the problem of sampling a posterior, in some \(d\)-dimensional space, in the form

\[P(x) = L(x) \pi(x)\]

where \(L\) is the likelihood and \(\pi\) is some prior distribution.

A temperated posterior, for some \(T>1\) is defined as

\[P(x, \beta) = L(x)^{\beta} \pi(x) ~~~~~~~ \beta = \frac1T\]

For \(\beta > 1\) this is a shallower distribution in which it is easier to explore tails. Parallel tempering is a MCMC technique that consists in sampling different temperated posteriors (including the \(\beta=1\) target posterior) and exchange points from their chains so to improve the effectiveness of the sampling (in particular in the case of multi-modal posteriors).

The idea is to define a ladder of \(t\) temperatures \((\beta_1, \dots, \beta_t)\) with \(\beta_1=1\), and \(\beta_i < \beta_{i+1}\). We then sample, on a \(td\)-dimensional space, the posterior

\[\tilde{P}\left(x_1, \dots, x_d\right) \equiv \Pi_{i=1}^t~P(x_i, \beta_i)\]

Than we can perform a multi step as follows

  • t steps involve the above posterior and proposals that sequentially act only on 1 temperature chain

    \[Q^{(k)}\left(x_1, \dots, x_t \rightarrow x_1, \dots, y_t \right) = q^{(k)}\left(x_k \rightarrow y_k \right)\Pi_{j\neq k}\delta\left(x_j - y_j\right)~~~~~~k=1, \dots, t\]

    according to what we said in the parallel MCMC section, these steps can be performed in parallel. The corresponding acceptance is

    \[A^{(k)} = \frac{L(y_k)^{\beta_k}~\pi(y_k)}{L(x_k)^{\beta_k}~\pi(x_k)} \frac{q^{(k)}\left(y_k \rightarrow x_k\right)}{q^{(k)}\left(x_k \rightarrow y_k\right)}\]

    Thus essentially the acceptance of a simple MCMC chain sampling a \(P(x, \beta)\) posterior. Note that each \(q^{(k)}\) can be a mix of multiple proposals as described in the mixing proposals section.

  • t / 2 steps in which we propose the swap of the point of each chain with odd index \(k\) with the point of the next one, i.e.

    \[Q^{(k)}\left(x_1, \dots, x_t \rightarrow x_1, \dots, y_t \right) = \delta \left(x_k - y_{k+1}\right) \delta \left(y_k - x_{k+1}\right)~ \Pi_{j\neq k, j\neq k + 1}~\delta \left(x_j - y_j\right)\]

    for each \(k\) so that

    \[k = 1, \dots, t~~~~~ k \mod 2 = 1 ~~~~~ k + 1 \leq t\]

    Again we can see that all the resulting steps can be performed in parallel. The acceptance is (note that the proposal is symmetric)

    \[A^{(k)} = \frac{P(x_{k+1}, \beta_k)~P(x_{k}, \beta_{k+1})}{P(x_{k}, \beta_k)~P(x_{k+1}, \beta_{k+1})} = = \left(\frac{L(x_{k})}{L(x_{k+1})}\right)^{\beta_{k+1} - \beta_k}\]
  • t / 2 steps in which we propose the swap of the point of each chain with even index \(k\) with the point of the next one. That is the same proposal defined above with \(k\) so that

    \[k = 1, \dots, t~~~~~ k \mod 2 = 0 ~~~~~ k + 1 \leq t\]

    These steps can also be performed in parallel and the acceptance expression is the same as above.

In practice the algorithm goes as follows:

# Algorithm 3

X[n]                           # Point at the n-th markov iteration
                               # This is a (N. temperatures, dim) matrix

Beta[t]                        # temperature ladder

Proposal[t]                    # single chain proposal for all temperatures
                               # adding mixing is strightforward.

L, Pi                          # likelihood and prior

loop on temperatures t:
\\ this can be performed in parallel

    Y, Fwd, Bkd = Proposal[t](X[n][t])   # Propose a point Y sampling Q[t](X[n][t] -> Y)
                                         # Fwd = Q[t](X[n][t] -> Y)
                                         # Bkd = Q[t](Y -> X[n][t])

    Z = Uniform(0, 1)                    # Get a random Z between 0 and 1

    A = (L(Y) / L(X[n][t]))^Beta[t] * Pi(y) / Pi(x) * Bkd / Fwd # Acceptance

   if Z > A:

        X[n + 1][t] = X[n][t]            # Refused

   else:

        X[n + 1][t] = Y                  # Accepted

loop on odd temperatures t so that temperature t + 1 exists:
\\ this can be performed in parallel

    Z = Uniform(0, 1)                    # Get a random Z between 0 and 1

    A = (L(X[n + 1][t]) / L(X[n + 1][t + 1])^(Beta[t+1] - Beta[t])

    if Z > A:                                     # Accepted

        Y = X[n + 1][t]
        X[n + 1][t] = X[n + 1][t + 1]            # Swap
        X[n + 1][t + 1] = Y

loop on even temperatures t so that temperature t + 1 exists:
\\ this can be performed in parallel

    # Same as previous loop

# Loop to get X[n + 2]