# Model fitting using MCMC - The basic framework

In this tutorial we show how Bayesian model fitting using Markov Chain Monte Carlo can be done in Scalismo. To be able to focus on the main components of the framework instead of technical details, we start in this tutorial with a simple toy example, namely 1D Bayesian linear regression. The application to 3D shape modelling is discussed in depth in the next tutorial.

##### Related resources

Week 2 of our online course on shape model fitting may provide some helpful context for this tutorial.

### Problem setting

In a Bayesian linear regression an outcome variable $y$ is modelled a linear function of the explanatory variable $x$. The normal linear model assumes that the distribution of $y$ is a normal distribution with a mean $a \cdot x + b$ and variance $\sigma^2$.

In the following we will denote the unknown parameters $a$, $b$ and $\sigma^2$ by $\theta$; I.e. $\theta = (a, b, \sigma^2)$. The inference problem is to estimate the parameters $\theta$, given observations $X=(x_1, \ldots, x_n)$ and $Y=(y_1, \ldots, y_n)$. This is done by computing the posterior distribution:

The likelihood term $p(Y | \theta, X)$ is given by the normal distribution $N(a \cdot x + b,\sigma^2)$ define above. Hence the likelihood of observing the data $X, Y$ is

As prior distribution $p(\theta)$ we define

### Metropolis Hastings Algorithm

The way we approach such an inference problem in Scalismo is by using the Metropolis-Hastings algorithm. The Metropolis-Hastings algorithm allows us to draw samples from any distribution, given that the unnormalized distribution can be evaluated point-wise. This requirement is easy to fulfill for all shape modelling applications.

For setting up the Metropolis-Hastings algorithm, we need two things:

- The (unnormalized) target distribution, from which we want to sample. In our case this is the posterior distribution $p(\theta \mid Y, X))$. In Scalismo
the corresponding class is called the
`DistributionEvaluator`

. - A proposal distribution $Q(\theta' \mid \theta)$, which generates for a given sample $\theta$ a new sample $\theta'$.

The Metropolis Hastings algorithm introduces an ingenious scheme for accepting and rejecting the samples from this proposal distribution, based on their probability under the target density, such that the resulting sequence of samples is guaranteed to be distributed according to the target distribution. In practice, the algorithm works as follows: It uses the proposal generator to perturb a given sample $\theta$ to obtain a new sample $\theta'$. Then it checks, using the evaluator, which of the two samples, $\theta$ or $\theta'$ is more likely and uses this ratio as a basis for rejecting or accepting the new sample.

### Implementation in Scalismo

##### Preparation

As in the previous tutorials, we start by importing some commonly used objects and initializing the system.

To make the setup simple, we generate artificial data, which follows exactly our assumptions. In this way we will be able to see how well we estimated the parameters.

Before we discuss the two main components, the *Evaluator* and *Proposal generator* in detail, we first define a class for representing
the parameters $\theta = (a, b, \sigma^2)$:

We introduce a further class to represent a sample from the chain. A sample is simply a set of parameters together with a tag, which helps us to keep track later on, which proposal generator generated the sample:

#### Evaluators: Modelling the target density

In Scalismo, the target density is represented by classes, which we will refer to
as *Evaluators*. Any Evaluator is a subclass of the class `DistributionEvalutor`

,
which is defined in Scalismo as follows:

*Note: This trait is already defined in Scalismo, don't paste it into your code.*

We see that the only thing we need to define is the log probability of a sample.

In our case, we will define separate evaluators for the prior distribution $p(\theta)$ and the likelihood $p(Y | \theta, X)$.

The likelihood function, defined above, can be implemented as follows:

Notice that we work in Scalismo with log probabilities, and hence the product in above formula becomes a sum.

In a similar way, we encode the prior distribution:

The target density (i.e. the posterior distribution) can be computed by taking the product of the prior and the likelihood.

Note that the posteriorEvaluator represents the unnormalized posterior, as we did not normalize by the probability of the data $p(y)$.

#### The proposal generator

In Scalismo, a proposal generator is defined by extending the trait
*ProposalGenerator*, which is defined as follows

In order to be able to use a proposal generator in the Metropolis-Hastings algorithm,
we also need to implement the trait `TransitionProbability`

:

*Note: The above traits are already defined in Scalismo, don't paste them into your code.*

We use here one of the simples possible proposals, namely a *random walk proposal*. This is a proposal
which updates the current state by taking a step of random length in a random direction. For simplicity,
we update all three parameters together:

*Remark: the second constructor argument implicit rng : scalismo.utils.Random
is used to automatically pass the globally defined random generator object to the
class. If we always use this random generator to generate our random numbers, we can obtain reproducible runs,
by seeding this random generator at the beginning of our program.*

Let's define two random walk proposals with different step length:

Varying the step length allow us to sometimes take large step, in order to explore the global
landscape, and sometimes smaller steps, to explore a local environment. We can combine these proposal into a
`MixtureProposal`

, which chooses the individual proposals with a given
probability. Here We choose to take the large step 20% of the time, and the smaller
steps 80% of the time:

#### Building the Markov Chain

Now that we have all the components set up, we can assemble the Markov Chain.

To run the chain, we obtain an iterator, which we then consume to drive the sampling generation. To obtain the iterator, we need to specify the initial sample:

Our initial parameters might be far away from a high-probability area of our target density. Therefore it might take a few hundred or even a few thousand iterations before the produced samples start to follow the required distribution. We therefore have to drop the samples in this burn-in phase, before we use the samples:

As we have generated synthetic data, we can check if the expected value, computed from this samples, really corresponds to the parameters from which we sampled our data:

In the next tutorial, we see an example of how the exact same mechanism can be used for fitting shape models. Before we discuss this, we should, however, spend some time to discuss how the chain can be debugged in case something goes wrong. You can safely skip this section and come back to it later if you first want to see a practical example.

#### Debugging the Markov chain

Sometimes a chain does not work as expected. The reason is usually that our proposals
are not suitable for the target distribution. To diagnose the
behaviour of the chain we can introduce a logger. To write a logger, we need to extend
the trait `AcceptRejectLogger`

, which is defined as follows:

*Note: This trait is already defined in Scalismo, don't paste it into your code.*

The two methods, `accept`

and `reject`

are called whenever a sample is
accepted or rejected. We can overwrite these methods to implement our debugging code.

The following, very simple logger counts all the accepted and rejected samples and computes the acceptance ratio. This acceptance ratio is a simple, but already useful indicator to diagnose if all proposal generators function as expected.

To use the logger, we simply rerun the chain, but pass the logger now as
a second argument to the `iterator`

method:

We can now check how often the individual samples got accepted.

We see that the acceptance ratio of the random walk proposal, which takes the smaller step is quite high, but that the larger step is often rejected. We might therefore want to reduce this step size slightly, as a proposal that is so often rejected is not very efficient.

In more complicated applications, this type of debugging is crucial for obtaining efficient fitting algorithms.