Parallel Tempering
==================

Consider the problem of sampling a posterior, in some :math:`d`-dimensional space,  in the form

.. math::
   P(x) = L(x) \pi(x)

where :math:`L` is the likelihood and :math:`\pi` is some prior distribution.

A temperated posterior, for some :math:`T>1` is defined as

.. math::
   P(x, \beta) = L(x)^{\beta} \pi(x) ~~~~~~~ \beta = \frac1T

For :math:`\beta > 1` this is a shallower distribution in which it is easier to explore tails.
Parallel tempering is a MCMC technique that consists in sampling different temperated posteriors (including the :math:`\beta=1` target posterior) and exchange points from their chains so to improve the effectiveness of the sampling (in particular in the case of multi-modal posteriors).

The idea is to define a ladder of :math:`t` temperatures :math:`(\beta_1, \dots, \beta_t)` with :math:`\beta_1=1`, and :math:`\beta_i < \beta_{i+1}`. We then sample, on a :math:`td`-dimensional space, the posterior

.. math::
   \tilde{P}\left(x_1, \dots, x_d\right) \equiv \Pi_{i=1}^t~P(x_i, \beta_i)

Than we can perform a :ref:`multi step <multimcmc>` as follows

* **t steps** involve the above posterior and proposals that sequentially act only on 1 temperature chain

  .. math::
      Q^{(k)}\left(x_1, \dots, x_t \rightarrow x_1, \dots, y_t \right) = q^{(k)}\left(x_k \rightarrow y_k \right)\Pi_{j\neq k}\delta\left(x_j - y_j\right)~~~~~~k=1, \dots, t

  according to what we said in the :ref:`parallel MCMC section <parallelmcmc>`, these steps can be performed in parallel. The corresponding acceptance is

  .. math::
      A^{(k)} = \frac{L(y_k)^{\beta_k}~\pi(y_k)}{L(x_k)^{\beta_k}~\pi(x_k)}
                \frac{q^{(k)}\left(y_k \rightarrow x_k\right)}{q^{(k)}\left(x_k \rightarrow y_k\right)}

  Thus essentially the acceptance of a simple MCMC chain sampling a :math:`P(x, \beta)` posterior. Note that each :math:`q^{(k)}` can be a mix of multiple proposals as described in the :ref:`mixing proposals section <mixing>`.

* **t / 2 steps** in which we propose the swap of the point of each chain with odd index :math:`k` with the point of the next one, i.e.

  .. math::
     Q^{(k)}\left(x_1, \dots, x_t \rightarrow x_1, \dots, y_t \right) = \delta \left(x_k - y_{k+1}\right) \delta \left(y_k - x_{k+1}\right)~
                 \Pi_{j\neq k, j\neq k + 1}~\delta \left(x_j - y_j\right)

  for each :math:`k` so that

  .. math::
     k = 1, \dots, t~~~~~ k \mod 2 = 1 ~~~~~ k + 1 \leq t

  Again we can see that all the resulting steps can be :ref:`performed in parallel <parallelmcmc>`. The acceptance is (note that the proposal is symmetric)

  .. math::
     A^{(k)} = \frac{P(x_{k+1}, \beta_k)~P(x_{k}, \beta_{k+1})}{P(x_{k}, \beta_k)~P(x_{k+1}, \beta_{k+1})} =
             = \left(\frac{L(x_{k})}{L(x_{k+1})}\right)^{\beta_{k+1} - \beta_k}

* **t / 2 steps** in which we propose the swap of the point of each chain with even index :math:`k` with the point of the next one. That is the same
  proposal defined above with :math:`k` so that

  .. math::
     k = 1, \dots, t~~~~~ k \mod 2 = 0 ~~~~~ k + 1 \leq t

  These steps can also be :ref:`performed in parallel <parallelmcmc>` and the acceptance expression is the same as above.

.. _algo3:
In practice the algorithm goes as follows::

    # Algorithm 3

    X[n]                           # Point at the n-th markov iteration
                                   # This is a (N. temperatures, dim) matrix

    Beta[t]                        # temperature ladder

    Proposal[t]                    # single chain proposal for all temperatures
                                   # adding mixing is strightforward.

    L, Pi                          # likelihood and prior

    loop on temperatures t:
    \\ this can be performed in parallel

        Y, Fwd, Bkd = Proposal[t](X[n][t])   # Propose a point Y sampling Q[t](X[n][t] -> Y)
                                             # Fwd = Q[t](X[n][t] -> Y)
                                             # Bkd = Q[t](Y -> X[n][t])

        Z = Uniform(0, 1)                    # Get a random Z between 0 and 1

        A = (L(Y) / L(X[n][t]))^Beta[t] * Pi(y) / Pi(x) * Bkd / Fwd # Acceptance

       if Z > A:

            X[n + 1][t] = X[n][t]            # Refused

       else:

            X[n + 1][t] = Y                  # Accepted

    loop on odd temperatures t so that temperature t + 1 exists:
    \\ this can be performed in parallel

        Z = Uniform(0, 1)                    # Get a random Z between 0 and 1

        A = (L(X[n + 1][t]) / L(X[n + 1][t + 1])^(Beta[t+1] - Beta[t])

        if Z > A:                                     # Accepted

            Y = X[n + 1][t]
            X[n + 1][t] = X[n + 1][t + 1]            # Swap
            X[n + 1][t + 1] = Y

    loop on even temperatures t so that temperature t + 1 exists:
    \\ this can be performed in parallel

        # Same as previous loop

    # Loop to get X[n + 2]