Logarithmic posterior probability: How to implement

Consider the data and model from Simple example:

X = collect(1:10)
Y = [1.0, 1.78, 3.64, 3.72, 5.33, 2.73, 7.52, 9.19, 6.73, 8.95]
ΔY = [0.38, 0.86, 0.29, 0.45, 0.66, 2.46, 0.39, 0.45, 1.62, 1.54]
model = ModelFunctions((x,λ)-> λ*x)

Log-likelihood and log-posterior

The general procedure to obtain the log-likelihood and log-posterior functions is the same as described in Posterior probability: How to implement. However, the distributions and the prior need to be in logarithmic form. Thus, the default distributions of the FittingData constructor do not work. Instead, a FittingData object with logarithmic distributions needs to be created:

using Distributions
data_log_dists = FittingData(X,Y,ΔY,distributions = (y,m,Δy)-> logpdf(Normal(m,Δy),y))

The log-likelihood function can be obtained by using the log_posterior_objective function:

log_likelihood = log_posterior_objective(data_log_dists,model)

As described in Using priors, the likelihood is obtained by using the prior λ-> 1 (or in the logarithmic case λ-> 0). Again, this is what happens in the background. To use a prior, it just needs to be passed as third argument:

log_posterior = log_posterior_objective(data_log_dists,model, λ-> logpdf(Normal(1,0.1),λ))

The resulting functions can be compared by adjusting the constant offset (see Logarithmic scale)

Example block output

Application: Regularized least squares

Consider the weighted least squares objective from LSQ:-How-to-implement

lsq = lsq_objective(data_log_dists, model)

Note that the distributions field of data_log_dists has no effect on least squares objectives.

To replicate the least squares objective, unnormalized logarithmic normal distributions can be used:

data_lsq_dists = FittingData(X,Y,ΔY, distributions = (y,m,Δy)-> -(y-m)^2/Δy^2)
lsq_likelihood = log_posterior_objective(data_lsq_dists,model)
Example block output

The lsq_likelihood is the same function as lsq, but with the opposite sign (because it is a logarithmic unnormalized posterior probability density). This can be fixed by using λ -> -lsq_likelihood(λ).

Now, a regularization can be implemented by using a corresponding logarithmic prior:

lsq_posterior = log_posterior_objective(data_lsq_dists,model, λ -> - λ^2)

Partial derivatives and gradients

Analytical derivatives can be obtained almost in the same way as for the least squares objective derivatives with log_posterior_partials and log_posterior_gradient. However, two additional arguments need to be provided: the partial derivatives of the logarithmic $y$-uncertainty distributions and the gradient of the prior distribution.

To illustrate the process, we consider a normal distribution as logarithmic $y$-uncertainty distribution.

data = FittingData(X,Y,ΔY,distributions = (y,m,Δy)-> -(m-y)^2/(2Δy^2))

As for the least squares objective derivatives, the partial derivatives of the model need to be added to the ModelFunctions object:

model = ModelFunctions((x,λ)-> λ*x, partials = [(x,λ)-> x])

The partial derivative of the logarithmic $y$-uncertainty distribution (w.r.t. the model function value) is

\[\frac{\partial}{\partial m} L(y,m,\Delta y) = \frac{\partial}{\partial m} - \frac{(m-y)^2}{2\Delta y^2} = - \frac{(m-y)}{\Delta y^2}\]

∂_mL(y,m,Δy) = -(m-y)/(Δy^2)

Since the parameter in this example is one-dimensional, the gradient of the prior distribution is just the derivative. Thus, if the prior distribution is $p_0(\lambda) = - (\lambda-1)^2$, the gradient is

\[\nabla p_0(\lambda)= \frac{d}{d\lambda} - \lambda^2 = -2(\lambda-1)\]

# Define the gradient for the prior as vector of functions.
# Even in the 1-dim case.
∇p_0 = [λ -> -2*(λ-1)]

The partial derivatives of the log-posterior can now be obtained with log_posterior_partials:

log_partials = log_posterior_partials(data_log_dists,model, ∂_mL, ∇p_0)
1-element Vector{Function}:
 #36 (generic function with 1 method)
log_partials[1](1.078)
-0.06327403301740508