
LSE Social Statistics Seminar
February 2, 2026
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\N}{\mathrm{N}} \newcommand{\SN}{\mathrm{SN}} \]
Bayesian Computational Statistics and Modeling Research Group (BAYESCOMP)
Led by Prof. Håvard Rue
Integrated Nested Laplace Approximation (INLA, Rue et al., 2009)




The Ingredients 🥕🍅🧅🧄
\[ \begin{align} y \mid \mathcal X, \vartheta_1 &\sim \prod_{i=1}^n \pi(y_i \mid \mathcal X, \vartheta_1) \\ \mathcal X \mid \vartheta_2 &\sim \N(0, Q(\vartheta_2)^{-1}) \\ \vartheta := \{\vartheta_1, \vartheta_2\} &\sim \pi(\vartheta) \end{align} \]
Hierarchical Latent Gaussian Models (LGMs), encompassing
The Recipe 🧑🍳 (avail. in R/INLA)

Let \(y_s \in \mathbb R^p\) be obs. vector. For \(s=1,\dots,n\), \[ \definecolor{kaustorg}{RGB}{241, 143, 0} \definecolor{kausttur}{RGB}{0, 166, 170} \definecolor{kaustmer}{RGB}{177, 15, 46} \definecolor{kaustgrn}{RGB}{173, 191, 4} \definecolor{kaustblu}{RGB}{82, 132, 196} \definecolor{kaustpur}{RGB}{156, 111, 174} \begin{gathered} y_s = {\color{Gray}\nu +\,} {\color{kaustpur}\Lambda} \eta_s + \epsilon_s \\ \eta_s = {\color{Gray}\alpha +\,} {\color{kaustgrn} B} \eta + \zeta_s \end{gathered} \]
Assumptions: \(\epsilon_s \sim \N(0,{\color{kaustorg}\Theta})\), \(\eta_s \sim \N(0,{\color{kausttur}\Psi})\), \(\operatorname{Cov}(\epsilon_s,\eta_s)=0\).
{blavaan} (Merkle et al., 2021):
SEM as LGMs \[ \begin{align} y \mid \eta, \vartheta_1 &\sim \prod_{s=1}^n \N_p(y_s \mid \nu + \Lambda \eta_s, \Theta ) \\[.1em] (I-B)\eta_s \mid \vartheta_2 &\sim \N_q(\alpha, \Psi ) \\[.4em] \vartheta := \{\vartheta_1, \vartheta_2\} &\sim \pi(\vartheta) \end{align} \]

ℹ Finding posterior mode.
✔ Finding posterior mode. [41ms]
ℹ Computing the Hessian.
✔ Computing the Hessian. [97ms]
ℹ Performing VB correction.
✔ VB correction; mean |δ| = 0.048σ. [83ms]
⠙ Fitting skew normal to 0/30 marginals.
⠹ Fitting skew normal to 19/30 marginals.
✔ Fitting skew normal to 30/30 marginals. [588ms]
ℹ Sampling covariances and defined parameters.
✔ Sampling covariances and defined parameters. [61ms]
⠙ Computing ppp and DIC.
✔ Computing ppp and DIC. [215ms]
Custom implementation that distils INLA ideas for SEM. Scales with \(m\) only!
Notes and assumptions
Works only for continuous \(y\) assumed normally distributed (for now).
Log posterior densities and gradients are “cheap” to compute.
For numerical stability, consider transformations \(\vartheta^\natural \xrightarrow{\ g \ } \vartheta\) to work in an unrestricted parameter space. E.g. \(\vartheta = \log \psi\) and \(\vartheta = \operatorname{artanh}(\rho)\).
For MVN-SEM, the (marginal) likelihood \(\pi(y \mid \vartheta) = \int \pi(y\mid\eta,\vartheta)\,\pi(\eta \mid \vartheta) \, d\eta\) can be computed in closed form: \(y_s \sim \mathcal{N}_p(\mu(\vartheta), \Sigma(\vartheta))\), where \[ \begin{aligned} \mu(\vartheta) &= \nu + \Lambda (I - B)^{-1} \alpha \\ \Sigma(\vartheta) &= \Lambda (I - B)^{-1} \Psi (I - B)^{-\top} \Lambda^\top + \Theta \end{aligned} \]
The log-likelihood is evaluated simply, given sufficient statistics \(\bar y\) and \(S\):
\[ \log \pi(y \mid \vartheta) = -\frac{n}{2}\Bigl[ \log \bigl|\Sigma(\vartheta)\bigr| + \operatorname{tr} \bigl(\Sigma(\vartheta)^{-1} S\bigr) {\color{Gray}+ \bigl(\bar { y} - \mu(\vartheta)\bigr)^{\top} \Sigma(\vartheta)^{-1} \bigl(\bar { y} - \mu(\vartheta)\bigr)} \Bigr] \]
An old technique for approximating integrals; out of fashion due to MCMC rise.
Consider the (unnormalised) posterior density function \(\pi(\vartheta \mid y) \propto \pi(y \mid \vartheta) \, \pi(\vartheta)\). From a 2nd order Taylor expansion about \(\vartheta^* = \argmax_\vartheta \log \pi(\vartheta \mid y)\), \[ \log \pi(\vartheta \mid y) \approx \log \pi(\vartheta^* \mid y) + \frac12 (\vartheta - \vartheta^*)^\top H (\vartheta - \vartheta^*), \] it is clear we can approximate the posterior by \(\N(\vartheta^*, -H^{-1})\), where \[ H = \nabla_\vartheta^2 \log \pi(\vartheta \mid y) \big|_{\vartheta=\vartheta^*}. \]
Laplace approximations achieve a relative error of \(O(n^{-1})\), outperforming the slower, additive \(O(n^{-1/2}\)) error typical of simulation-based methods (Rue et al., 2017).
As \(n\to\infty\), the approximation becomes exact by the Bernstein-von Mises theorem (van der Vaart, 1998).
\[ \pi(\vartheta_j \mid y) = \int_{\mathbb R^{m-1}} \pi(\vartheta_j,\vartheta_{-j} \mid y) \ d\vartheta_{-j}. \tag{1}\]
Conceptually,


\[ \log\pi(\vartheta_j \mid y) \approx \overbrace{\log \pi(\vartheta_j, \hat\vartheta_{-j} \mid y)}^{\text{log height}} - \overbrace{\frac{1}{2} \log \left| H_{-j} \right|}^{\text{log width}} + \frac{m-1}{2}\log(2\pi). \]
This is, in fact, the standard Laplace approximation (Tierney et al., 1989). It is expensive to compute, as for every \(\vartheta_j\) evaluation, we need
We propose solutions to these problems next.
Lemma 1 Let \(\vartheta\sim\N_m(\vartheta^*, \Sigma)\), and consider the conditional expectation \[ \operatorname{E}[\vartheta \mid \vartheta_j = x] = \vartheta^* + (x - \vartheta^*_j) \Sigma_{jj}^{-1} \cdot \Sigma_{\cdot j}. \]
The set \(\mathcal{C}_j = \big\{ \vartheta \in \mathbb{R}^m \mid \vartheta_{-j} = \operatorname{E}_{{\pi_G}}[\vartheta_{-j} \mid \vartheta_j] \big\}\) is the sufficient integration path for the marginal, i.e. \(\pi(\vartheta_j) \propto \pi(\vartheta)\big|_{\vartheta\in\mathcal C_j}\).
\(\!\) Implementation
Scanning along the CMP is cheap: no re-optimisation needed.
For Gaussian posteriors, this is exact; otherwise it provides a linearised approximation of the the ridge.
CMP for \(\vartheta_1\) \[\vartheta_2 = 0.5 \vartheta_1\]
\[ \small \begin{gathered} f(\vartheta_1, \vartheta_2) = \N(\vartheta_2 \mid 0, e^{\vartheta_1}) \, \N(\vartheta_1 \mid 0, 1) \\ \vartheta^* = (-0.5, 0)^\top, \Sigma_\vartheta = \operatorname{diag}(1, e^{0.5}) \\ \end{gathered} \]
CMP for \(\vartheta_1\) \[\vartheta_2 = 0\]
The Laplace approximation tells us that the volume of the conditional slice is formally weighted by the ‘width’ term, \(|-H_{-j}|^{-1/2}\).
Along the CMP \(\vartheta(t) = \vartheta^* + t v_j\), define the log-volume correction function \[ C_j(t) = -\frac12 \log \left|-H_{-j}(\vartheta(t)) \right|, \] so that \(\log (\vartheta_j(t) \mid y) \approx \log(\vartheta(t) \mid y) + C_j(t)\).
Lemma 2 Let \(L\) be a whitening matrix for \(-H\). For the \(j\)th component,
\[ \gamma_j := -\frac{d}{dt} \log |H_{-j}(\vartheta(t))| \bigg|_{t=0} = \sum_{k\neq j} L_{\cdot k}^\top \, \frac{d}{dt} \left( \frac{d}{dL_{\cdot k}} \Big[ \nabla_{\vartheta} \log \pi(\vartheta(t) \mid y) \Big] \right) \Bigg|_{t=0} \ . \]
\(\!\) Implementation
inner \(\small\nabla^2\)] Centrally perturb \(\vartheta\) along \(L_{\cdot k}\) and evaluate gradients. The dot product \(L_{\cdot k}\) with the resulting gradient difference isolates the \(k\)th diagonal curvature term. This establishes
outer \(\small\nabla^3\)] Forward difference of (b) and (a) to compute diagonals of \(dH\).
Posterior mode is often far away from the “typical set”, especially for skewed distributions.
Laplace may misrepresent the bulk of the distribution—correction needed!

\[ \underbrace{z \sim N(0, R)}_{\text{Gaussian correlation}} \xrightarrow{\ \Phi(\cdot) \ } \underbrace{u}_{\ \text{Uniform} \ } \xrightarrow{ \ F_{\text{SN}}^{-1}(\cdot) \ } \underbrace{\vartheta^{(b)}}_{\text{SN Marginals}} \xrightarrow{ \ h(\cdot) \ } \text{trans. params.} \]
Reconstruct joint posterior samples \(h(\vartheta^{(b)})\) using a Copula (Nelsen, 2006):
With the posterior samples \(\vartheta^{\natural(b)}\) generated via the copula, we can compute standard Bayesian fit indices.
For MVN-SEM, the conditional posterior distribution \(\pi(\eta \mid y,\vartheta^\natural)\) is multivariate normal with parameters \[ \begin{aligned} \operatorname{E}(\eta_s \mid y, \vartheta^\natural) &= \alpha + \Phi \Lambda^\top \Sigma^{-1} y_s \\ \operatorname{Var}(\eta_s \mid y, \vartheta^\natural) &= \Phi - \Phi \Lambda^\top \Sigma^{-1} \Lambda \Phi \end{aligned} \] where \(\Phi = (I - B)^{-1} \Psi (I - B)^{-\top}\).
Plug-in estimate, e.g. \(\hat\eta_s = \operatorname{E}(\eta_s \mid y, \vartheta^\natural) = \hat\alpha + \hat\Phi \hat\Lambda^\top \hat\Sigma^{-1} y_s\) (Thurstone, 1935).
Better to account for parameter uncertainty:
Prediction to a new observation \(y_{\text{new}}\) is similar.

INLAvaan 0.2.3.9003 ended normally after 82 iterations
Estimator BAYES
Optimization method NLMINB
Number of model parameters 30
Number of observations 75
Model Test (User Model):
Marginal log-likelihood -1688.740
PPP (Chi-square) 0.002
[1] 0.984
ind60=~x2 ind60=~x3 dem60=~y2 dem60=~y3 dem65=~y6 dem65=~y7
2.2191369 1.8512281 0.6560777 1.0206567 1.1631579 1.2813651
...Output Truncated...
Information Criteria:
Deviance (DIC) 3339.054
Effective parameters (pD) 84.787
Parameter Estimates:
Marginalisation method SKEWNORM
VB correction TRUE
Latent Variables:
Estimate SD 2.5% 97.5% NMAD Prior
ind60 =~
x1 1.000
x2 2.219 0.146 1.950 2.524 0.005 normal(0,10)
x3 1.851 0.157 1.556 2.172 0.006 normal(0,10)
dem60 =~
y1 1.000
y2 0.656 0.263 0.164 1.198 0.052 normal(0,10)
y3 1.021 0.147 0.748 1.326 0.012 normal(0,10)
dem65 =~
y5 1.000
y6 1.163 0.177 0.842 1.538 0.008 normal(0,10)
y7 1.281 0.172 0.968 1.642 0.016 normal(0,10)
y8 1.086 0.201 0.702 1.493 0.022 normal(0,10)
Regressions:
Estimate SD 2.5% 97.5% NMAD Prior
dem60 ~
ind60 1.310 0.430 0.477 2.162 0.001 normal(0,10)
dem65 ~
ind60 0.823 0.260 0.325 1.346 0.001 normal(0,10)
dem60 0.768 0.108 0.570 0.994 0.056 normal(0,10)
Covariances:
Estimate SD 2.5% 97.5% NMAD Prior
.y1 ~~
.y5 0.224 0.398 -0.189 1.352 0.035 beta(1,1)
.y2 ~~
y4 0.444 1.918 1.622 9.151 0.072 beta(1,1)
.y6 0.311 0.805 0.606 3.771 0.037 beta(1,1)
.y3 ~~
.y7 0.207 0.714 -0.246 2.560 0.009 beta(1,1)
.y8 ~~
y4 0.253 1.098 -0.766 3.538 0.080 beta(1,1)
.y6 ~~
.y8 0.305 0.680 0.160 2.831 0.014 beta(1,1)
Variances:
Estimate SD 2.5% 97.5% NMAD Prior
.x1 0.088 0.021 0.198 0.053 0.007 gamma(1,.5)[sd]
.x2 0.123 0.065 1.486 0.019 0.044 gamma(1,.5)[sd]
.x3 0.498 0.098 0.336 0.717 0.003 gamma(1,.5)[sd]
.y1 1.395 0.550 8.362 0.415 0.009 gamma(1,.5)[sd]
.y2 9.914 2.199 6.379 14.956 0.023 gamma(1,.5)[sd]
.y3 5.334 1.053 7.676 3.567 0.005 gamma(1,.5)[sd]
.y5 2.298 0.549 5.190 1.364 0.007 gamma(1,.5)[sd]
.y6 5.303 0.975 7.457 3.648 0.003 gamma(1,.5)[sd]
.y7 3.712 0.848 5.597 2.285 0.004 gamma(1,.5)[sd]
.y8 4.009 0.889 5.994 2.525 0.004 gamma(1,.5)[sd]
y4 10.887 1.803 7.904 14.958 0.008 gamma(1,.5)[sd]
ind60 0.451 0.088 0.305 0.650 0.003 gamma(1,.5)[sd]
.dem60 4.680 1.078 7.057 2.845 0.021 gamma(1,.5)[sd]
.dem65 0.434 0.308 10.637 0.039 0.119 gamma(1,.5)[sd]




While MCMC methods remain the gold standard for exact Bayesian inference, they can be prohibitively slow.
🚀 INLA methodology offers a rapid alternative for SEM, delivering Bayesian results at near the speed of frequentist estimators.
{INLAvaan} v0.2.3)
acfa(), asem(), agrowth() + “usual” methods (summary, predict, plot, etc.)