Approximate Bayesian inference for Structural Equation Models (SEM): The INLA approach

LSE Social Statistics Seminar

Haziq Jamil

Research Specialist
BAYESCOMP @ CEMSE-KAUST

https://haziqj.ml/lse-feb26 | 📥 PDF

Håvard Rue

Professor of Statistics
BAYESCOMP @ CEMSE-KAUST

February 2, 2026

\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\N}{\mathrm{N}} \newcommand{\SN}{\mathrm{SN}} \]

Bayesian Computational Statistics and Modeling Research Group (BAYESCOMP)

Led by Prof. Håvard Rue

Integrated Nested Laplace Approximation (INLA, Rue et al., 2009)

(right) Granular spatio-temporal mapping of HIV prevalence in sub-Saharan Africa for a 17-year period (Dwyer-Lindgren et al., 2019)

(top) Spatial modelling of corticol surface fMRI data: 2.5M data points, 8.7k mesh triangles = 148 seconds (Mejia et al., 2020).

Latent Gaussian Models and INLA

The Ingredients 🥕🍅🧅🧄

\[ \begin{align} y \mid \mathcal X, \vartheta_1 &\sim \prod_{i=1}^n \pi(y_i \mid \mathcal X, \vartheta_1) \\ \mathcal X \mid \vartheta_2 &\sim \N(0, Q(\vartheta_2)^{-1}) \\ \vartheta := \{\vartheta_1, \vartheta_2\} &\sim \pi(\vartheta) \end{align} \]

Hierarchical Latent Gaussian Models (LGMs), encompassing

  • Generalized linear mixed models
  • Generalized additive models
  • Spatial/spatio-temporal models
  • Survival models
  • etc.

The Recipe 🧑‍🍳 (avail. in R/INLA)

  1. Form the latent field \(\mathcal X\)
  2. Mode finding for Laplace approx. \[\pi(\mathcal X \mid \vartheta,y)\approx \N(\mu(\vartheta), P(\vartheta)^{-1})\]
  3. Conditional posterior refinement
    • Classic: Laplace approx. \(x_i \in \mathcal X\)
    • Modern: VB mean correction
  4. Integrate over \(\vartheta\) to get marginals \(\pi(x_i \mid y)\) and \(\pi(\vartheta_j \mid y)\)
  5. (Modern) Post hoc formation of \(\eta=A\mathcal X\)

Structural Equation Models (SEM)

  • Let \(y_s \in \mathbb R^p\) be obs. vector. For \(s=1,\dots,n\), \[ \definecolor{kaustorg}{RGB}{241, 143, 0} \definecolor{kausttur}{RGB}{0, 166, 170} \definecolor{kaustmer}{RGB}{177, 15, 46} \definecolor{kaustgrn}{RGB}{173, 191, 4} \definecolor{kaustblu}{RGB}{82, 132, 196} \definecolor{kaustpur}{RGB}{156, 111, 174} \begin{gathered} y_s = {\color{Gray}\nu +\,} {\color{kaustpur}\Lambda} \eta_s + \epsilon_s \\ \eta_s = {\color{Gray}\alpha +\,} {\color{kaustgrn} B} \eta + \zeta_s \end{gathered} \]

  • Assumptions: \(\epsilon_s \sim \N(0,{\color{kaustorg}\Theta})\), \(\eta_s \sim \N(0,{\color{kausttur}\Psi})\), \(\operatorname{Cov}(\epsilon_s,\eta_s)=0\).

  • Collect free entries in \(\color{Gray}\nu\), \(\color{kaustpur}\Lambda\), \(\color{kaustorg}\Theta\), \(\color{Gray}\alpha\), \(\color{kaustgrn} B\), and \(\color{kausttur}\Psi\) in \(\vartheta^\natural \in \mathbb R^m\), with \(m<p(p+1)/2 + p\).
  • Bayesian analysis puts priors on \(\vartheta^\natural\). Use defaults from {blavaan} (Merkle et al., 2021):
    • \(\color{Gray}\nu,\color{kaustpur}\Lambda,\color{kaustgrn} B \sim\) normal.
    • \(\color{kaustorg}\Theta,\color{kausttur}\Psi \sim\) gamma.
    • SRS decomposition for covariances; \(\rho\sim\) beta.

SEM + INLA?

SEM as LGMs \[ \begin{align} y \mid \eta, \vartheta_1 &\sim \prod_{s=1}^n \N_p(y_s \mid \nu + \Lambda \eta_s, \Theta ) \\[.1em] (I-B)\eta_s \mid \vartheta_2 &\sim \N_q(\alpha, \Psi ) \\[.4em] \vartheta := \{\vartheta_1, \vartheta_2\} &\sim \pi(\vartheta) \end{align} \]

  • SEM has all the ingredients to cook with INLA. So can we?
  • Yes! But…
    • INLA likes sparse precision matrices.
    • Complexity scales with \(n\) and \(m\).
    • Complicated R interface for SEM.

✨New recipe for SEM

mod <- "
  ind60 =~ x1 + x2 + x3
  dem60 =~ y1 + y2 + y3
  dem65 =~ y5 + y6 + y7 + y8

  dem60 ~ ind60
  dem65 ~ ind60 + dem60

  y1 ~~ y5
  y2 ~~ y4 + y6
  y3 ~~ y7
  y4 ~~ y8
  y6 ~~ y8
"
fit <- INLAvaan::asem(
  model = mod, 
  data = PoliticalDemocracy
)
ℹ Finding posterior mode.
✔ Finding posterior mode. [41ms]
ℹ Computing the Hessian.
✔ Computing the Hessian. [97ms]
ℹ Performing VB correction.
✔ VB correction; mean |δ| = 0.048σ. [83ms]
⠙ Fitting skew normal to 0/30 marginals.
⠹ Fitting skew normal to 19/30 marginals.
✔ Fitting skew normal to 30/30 marginals. [588ms]
ℹ Sampling covariances and defined parameters.
✔ Sampling covariances and defined parameters. [61ms]
⠙ Computing ppp and DIC.
✔ Computing ppp and DIC. [215ms]

Custom implementation that distils INLA ideas for SEM. Scales with \(m\) only!

✨New recipe for SEM (cont.)

  1. Mode finding and initial global Laplace approximation
  2. Integration-free marginalisation of the posterior
    1. Marginal profiling via axis scanning
    2. Volume correction of profile densities
    3. Skew normal curve fitting
  3. Variational Bayes correction to the mean
  4. Gaussian copula sampling for quantities of interest

Notes and assumptions

  • Works only for continuous \(y\) assumed normally distributed (for now).

  • Log posterior densities and gradients are “cheap” to compute.

  • For numerical stability, consider transformations \(\vartheta^\natural \xrightarrow{\ g \ } \vartheta\) to work in an unrestricted parameter space. E.g. \(\vartheta = \log \psi\) and \(\vartheta = \operatorname{artanh}(\rho)\).

Detailed recipe

Efficient likelihood evaluation

  • For MVN-SEM, the (marginal) likelihood \(\pi(y \mid \vartheta) = \int \pi(y\mid\eta,\vartheta)\,\pi(\eta \mid \vartheta) \, d\eta\) can be computed in closed form: \(y_s \sim \mathcal{N}_p(\mu(\vartheta), \Sigma(\vartheta))\), where \[ \begin{aligned} \mu(\vartheta) &= \nu + \Lambda (I - B)^{-1} \alpha \\ \Sigma(\vartheta) &= \Lambda (I - B)^{-1} \Psi (I - B)^{-\top} \Lambda^\top + \Theta \end{aligned} \]

  • The log-likelihood is evaluated simply, given sufficient statistics \(\bar y\) and \(S\):

    \[ \log \pi(y \mid \vartheta) = -\frac{n}{2}\Bigl[ \log \bigl|\Sigma(\vartheta)\bigr| + \operatorname{tr} \bigl(\Sigma(\vartheta)^{-1} S\bigr) {\color{Gray}+ \bigl(\bar { y} - \mu(\vartheta)\bigr)^{\top} \Sigma(\vartheta)^{-1} \bigl(\bar { y} - \mu(\vartheta)\bigr)} \Bigr] \]

  • Variations of this likelihood include:
    1. Multigroup analysis: \(\sum_g \log \N(\mu_g, \Sigma_g)\)
    2. Multilevel structure: \(\Sigma_j = I_{n_j} \otimes \Sigma_w + J_{n_j} \otimes \Sigma_b\)
    3. Missing data: FIML \(\sum_r \log \N(\mu_{r}, \Sigma_r)\)
  • Note: Analytical expressions for the gradient function \(\nabla_\vartheta \log(y \mid \vartheta)\) are also available (e.g., see Jamil et al., 2026, Appendix A).

Global Laplace approximation

  • An old technique for approximating integrals; out of fashion due to MCMC rise.

  • Consider the (unnormalised) posterior density function \(\pi(\vartheta \mid y) \propto \pi(y \mid \vartheta) \, \pi(\vartheta)\). From a 2nd order Taylor expansion about \(\vartheta^* = \argmax_\vartheta \log \pi(\vartheta \mid y)\), \[ \log \pi(\vartheta \mid y) \approx \log \pi(\vartheta^* \mid y) + \frac12 (\vartheta - \vartheta^*)^\top H (\vartheta - \vartheta^*), \] it is clear we can approximate the posterior by \(\N(\vartheta^*, -H^{-1})\), where \[ H = \nabla_\vartheta^2 \log \pi(\vartheta \mid y) \big|_{\vartheta=\vartheta^*}. \]

  • Laplace approximations achieve a relative error of \(O(n^{-1})\), outperforming the slower, additive \(O(n^{-1/2}\)) error typical of simulation-based methods (Rue et al., 2017).

  • As \(n\to\infty\), the approximation becomes exact by the Bernstein-von Mises theorem (van der Vaart, 1998).

What is marginalisation anyway?

  • For every \(j\), we want

\[ \pi(\vartheta_j \mid y) = \int_{\mathbb R^{m-1}} \pi(\vartheta_j,\vartheta_{-j} \mid y) \ d\vartheta_{-j}. \tag{1}\]

  • Conceptually,

    1. Slice the joint density along \(\vartheta_j\).
    2. Compute the “area” of these \((m-1)\) dimensional slices \[\begin{align}\text{Area } \approx &\text{ Joint peak (height)}\\ &\ \times \text{Local spread (width)}\end{align}\]
      • If width is the same everywhere (constant conditional variance), then \(\text{Area} \propto \text{Height}\).
    3. Total area should equal 1 (normalisation).

What is marginalisation anyway? (cont.)

Bivariate normal \(\vartheta_1,\vartheta_2 \sim \N(0,1)\) with \(\rho_{12}=0.5\)

Approximate marginalisation

  • Let \(f(\vartheta_{-j}) = \log \pi(\vartheta_j, \vartheta_{-j} \mid y)\). Taylor expand about mode \(\hat\vartheta_{-j}\) to get \(\phantom{aaaaa}\) \[ f(\vartheta_{-j}) \approx \log \pi(\vartheta_j, \hat\vartheta_{-j} \mid y) + \frac{1}{2} (\vartheta_{-j} - \hat\vartheta_{-j})^\top H_{-j} (\vartheta_{-j} - \hat\vartheta_{-j})^\top. \]
  • Substituting back into marginalisation integral \(\int e^{f(\vartheta_{-j})} \ d\vartheta_{-j}\) in (1) gives

\[ \log\pi(\vartheta_j \mid y) \approx \overbrace{\log \pi(\vartheta_j, \hat\vartheta_{-j} \mid y)}^{\text{log height}} - \overbrace{\frac{1}{2} \log \left| H_{-j} \right|}^{\text{log width}} + \frac{m-1}{2}\log(2\pi). \]

  • This is, in fact, the standard Laplace approximation (Tierney et al., 1989). It is expensive to compute, as for every \(\vartheta_j\) evaluation, we need

    1. An optimisation of \(\vartheta_{-j}\) at this location; and
    2. The log determinant of the Hessian there.
  • We propose solutions to these problems next.

(Solution 1) Conditional mean path

Lemma 1 Let \(\vartheta\sim\N_m(\vartheta^*, \Sigma)\), and consider the conditional expectation \[ \operatorname{E}[\vartheta \mid \vartheta_j = x] = \vartheta^* + (x - \vartheta^*_j) \Sigma_{jj}^{-1} \cdot \Sigma_{\cdot j}. \]

The set \(\mathcal{C}_j = \big\{ \vartheta \in \mathbb{R}^m \mid \vartheta_{-j} = \operatorname{E}_{{\pi_G}}[\vartheta_{-j} \mid \vartheta_j] \big\}\) is the sufficient integration path for the marginal, i.e. \(\pi(\vartheta_j) \propto \pi(\vartheta)\big|_{\vartheta\in\mathcal C_j}\).

\(\!\) Implementation

  1. Lay out a grid \(\mathcal G_j \in \{ \vartheta^* + t \, v_j \mid t \in [-4,4] \big\}\), \(v_j=\Sigma_{jj}^{-1/2} \Sigma_{\cdot j}\)
  2. Evaluate \(\log \pi(\vartheta \mid y)\) at each grid point.
  • Scanning along the CMP is cheap: no re-optimisation needed.

  • For Gaussian posteriors, this is exact; otherwise it provides a linearised approximation of the the ridge.

Height tracking along CMP

Bivariate normal \(\vartheta_1,\vartheta_2 \sim \N(0,1)\) with \(\rho_{12}=0.5\)

CMP for \(\vartheta_1\) \[\vartheta_2 = 0.5 \vartheta_1\]

Height tracking along CMP

Funnel-shaped, heteroscedastic posterior

\[ \small \begin{gathered} f(\vartheta_1, \vartheta_2) = \N(\vartheta_2 \mid 0, e^{\vartheta_1}) \, \N(\vartheta_1 \mid 0, 1) \\ \vartheta^* = (-0.5, 0)^\top, \Sigma_\vartheta = \operatorname{diag}(1, e^{0.5}) \\ \end{gathered} \]

CMP for \(\vartheta_1\) \[\vartheta_2 = 0\]

Volume correction

  • The Laplace approximation tells us that the volume of the conditional slice is formally weighted by the ‘width’ term, \(|-H_{-j}|^{-1/2}\).

  • Along the CMP \(\vartheta(t) = \vartheta^* + t v_j\), define the log-volume correction function \[ C_j(t) = -\frac12 \log \left|-H_{-j}(\vartheta(t)) \right|, \] so that \(\log (\vartheta_j(t) \mid y) \approx \log(\vartheta(t) \mid y) + C_j(t)\).

  • A linearisation argument about \(t=0\) (the mode) gives \[ C_j(t) \approx \overbrace{C_j(0)}^{\textrm{const.}} + t \cdot \overbrace{C_j'(0)}^{\gamma_j}, \]
    • The constant term \(C_j(0)\) is irrelevant for density shape (only normalisation).
    • Hence, only the derivative term \(\gamma_j := C'_j(0)\) is needed.
  • Unfortunately, this takes \(O(m^3)\) time when computed naïvely.

(Solution 2) Efficient volume correction

Lemma 2 Let \(L\) be a whitening matrix for \(-H\). For the \(j\)th component,

\[ \gamma_j := -\frac{d}{dt} \log |H_{-j}(\vartheta(t))| \bigg|_{t=0} = \sum_{k\neq j} L_{\cdot k}^\top \, \frac{d}{dt} \left( \frac{d}{dL_{\cdot k}} \Big[ \nabla_{\vartheta} \log \pi(\vartheta(t) \mid y) \Big] \right) \Bigg|_{t=0} \ . \]

\(\!\) Implementation

  1. [inner \(\small\nabla^2\)] Centrally perturb \(\vartheta\) along \(L_{\cdot k}\) and evaluate gradients. The dot product \(L_{\cdot k}\) with the resulting gradient difference isolates the \(k\)th diagonal curvature term. This establishes
    1. Diagonals of \(H\) at the mode (baseline).
    2. Diagonals of \(H\) at a perturbed location along CMP.
  2. [outer \(\small\nabla^3\)] Forward difference of (b) and (a) to compute diagonals of \(dH\).
  • The implementation requires \(2m\) gradient evaluations per parameter \(\vartheta_j\), and is computed without ever forming/factoring a Hessian matrix.

Skew normal fitting

  • So now we have paired evaluations \((t_k, d_{kj})\), where \(t_k\in[-4,4]\) are std. grid points, and \(d_{kj} \approx \log \pi(\vartheta^* + t_k v_j \mid y ) + \gamma_j \, t\) (unnormalised) for each marginal \(j\).

  • Fit a skew normal pdf to these points via weighted least squares, i.e. solve \[(\hat\xi, \hat\omega, \hat\alpha, \hat c) = \argmin_{\xi, \omega, \alpha, c} \sum_{k=1}^{K} w_k \big( d_k - \left\{ \log f_{\text{SN}}(t_k; \xi, \omega, \alpha) + c \right\}\big)^2.\] and transform back to natural parameters via \(\vartheta_j \mapsto g^{-1}\big(\vartheta^*_j + \Sigma_{jj}^{1/2} \vartheta_j\big)\).

Variational Bayes correction

  • Posterior mode is often far away from the “typical set”, especially for skewed distributions.

  • Laplace may misrepresent the bulk of the distribution—correction needed!

  • van Niekerk & Rue (2024) propose a Variational Bayes (VB) correction to the mean of global Laplace: \[ \hat\delta = \argmin_\delta D_{\text{KL}}\big( \N(\vartheta^* + \delta, -H^{-1}) \;\Vert\; \pi(\vartheta \mid y) \big). \]
    • Distribution that minimises information loss (Zellner, 1988).
    • Laplace gets the shape (curvature) right. Avoid the massive cost of learning a high-dimensional covariance matrix.
    • Shift the entire Gaussian distribution to center over the true posterior mean.

Gaussian copula sampling

Transformed parameters

  • Marginal approximations are accurate (skew normal), but derived quantities (e.g., indirect effects, covariances) require the joint dependency structure.

\[ \underbrace{z \sim N(0, R)}_{\text{Gaussian correlation}} \xrightarrow{\ \Phi(\cdot) \ } \underbrace{u}_{\ \text{Uniform} \ } \xrightarrow{ \ F_{\text{SN}}^{-1}(\cdot) \ } \underbrace{\vartheta^{(b)}}_{\text{SN Marginals}} \xrightarrow{ \ h(\cdot) \ } \text{trans. params.} \]

  • Reconstruct joint posterior samples \(h(\vartheta^{(b)})\) using a Copula (Nelsen, 2006):

    1. Generate Latent Dependence. Draw \(z \sim N(0, R)\), where \(R\) is the correlation matrix from the inverse Hessian at the mode.
    2. Probability Integral Transform. Get correlated uniform variates via \(u = \Phi(z)\).
    3. Inverse CDF Mapping. Apply the inverse CDF of the fitted skew normal marginals.
    4. Parameter Transformation. Map to desired transformation using \(h(\cdot)\).
    5. Inference. Use samples to compute estimates, credible intervals, etc.

Gaussian copula sampling

Fit indices

With the posterior samples \(\vartheta^{\natural(b)}\) generated via the copula, we can compute standard Bayesian fit indices.

  1. Posterior Predictive P-value (PPP)
    • Compares the likelihood ratio discrepancy of observed against replicated data.
    • Replicated sufficient statistics are drawn from a Wishart distribution: \[S_{\text{rep}} \sim \operatorname{Wish}(n-1, \Sigma(\vartheta^{\natural(b)}))\]
    • Keep track proportion of \(T(S_{\text{obs}}, \Sigma(\vartheta^{\natural(b)}))\) vs. \(T(S_{\text{rep}}, \Sigma(\vartheta^{\natural(b)}))\).
  2. Deviance Information Criterion (DIC)
    • Models with smaller deviance, \(D(\vartheta)=-2 \log \pi(y \mid \vartheta)\), are preferred. Compute using the samples via: \[\text{DIC} = \bar{D} + p_D\] where \(p_D = \bar{D} - D(\hat\vartheta^\natural)\) and \(\bar D = B^{-1}\sum_b D(\vartheta^{\natural(b)})\).

Gaussian copula sampling

Factor scores

  • For MVN-SEM, the conditional posterior distribution \(\pi(\eta \mid y,\vartheta^\natural)\) is multivariate normal with parameters \[ \begin{aligned} \operatorname{E}(\eta_s \mid y, \vartheta^\natural) &= \alpha + \Phi \Lambda^\top \Sigma^{-1} y_s \\ \operatorname{Var}(\eta_s \mid y, \vartheta^\natural) &= \Phi - \Phi \Lambda^\top \Sigma^{-1} \Lambda \Phi \end{aligned} \] where \(\Phi = (I - B)^{-1} \Psi (I - B)^{-\top}\).

  • Plug-in estimate, e.g. \(\hat\eta_s = \operatorname{E}(\eta_s \mid y, \vartheta^\natural) = \hat\alpha + \hat\Phi \hat\Lambda^\top \hat\Sigma^{-1} y_s\) (Thurstone, 1935).

  • Better to account for parameter uncertainty:

    • Draw \(\eta_s^{(b)}\sim \N \big(\operatorname{E}(\eta_s \mid y, \vartheta^{\natural(b)}), \operatorname{Var}(\eta_s \mid y, \vartheta^{\natural(b)}) \big)\)
    • Compute posterior summaries across \(b=1,\dots,B\), effectively getting \[ \operatorname{E}(h(\eta_s)\mid y) = \int h(\eta_s) \, \pi(\eta_s \mid y, \vartheta^\natural) \, \pi(\vartheta^\natural \mid y) \ d\vartheta^\natural \]
  • Prediction to a new observation \(y_{\text{new}}\) is similar.

R/INLAvaan

Political democracy example

mod <- "
  ind60 =~ x1 + x2 + x3
  dem60 =~ y1 + y2 + y3
  dem65 =~ y5 + y6 + y7 + y8

  dem60 ~ ind60
  dem65 ~ ind60 + dem60

  y1 ~~ y5
  y2 ~~ y4 + y6
  y3 ~~ y7
  y4 ~~ y8
  y6 ~~ y8
"
fit <- INLAvaan::asem(
  model = mod, 
  data = PoliticalDemocracy
)

Vignettes and tutorials available on https://inlavaan.haziqj.ml/

Inspect fit

print(fit)
INLAvaan 0.2.3.9003 ended normally after 82 iterations

  Estimator                                      BAYES
  Optimization method                           NLMINB
  Number of model parameters                        30

  Number of observations                            75

Model Test (User Model):

   Marginal log-likelihood                   -1688.740 
   PPP (Chi-square)                              0.002 
print(fit@timing$total)  # seconds
[1] 0.984
head(coef(fit))
ind60=~x2 ind60=~x3 dem60=~y2 dem60=~y3 dem65=~y6 dem65=~y7 
2.2191369 1.8512281 0.6560777 1.0206567 1.1631579 1.2813651 

Inspect fit (cont.)

summary(fit)
...Output Truncated...

Information Criteria:

   Deviance (DIC)                             3339.054 
   Effective parameters (pD)                    84.787 

Parameter Estimates:

   Marginalisation method                     SKEWNORM
   VB correction                                  TRUE

Latent Variables:
                   Estimate       SD     2.5%    97.5%     NMAD    Prior       
  ind60 =~                                                                     
    x1                1.000                                                    
    x2                2.219    0.146    1.950    2.524    0.005    normal(0,10)
    x3                1.851    0.157    1.556    2.172    0.006    normal(0,10)
  dem60 =~                                                                     
    y1                1.000                                                    
    y2                0.656    0.263    0.164    1.198    0.052    normal(0,10)
    y3                1.021    0.147    0.748    1.326    0.012    normal(0,10)
  dem65 =~                                                                     
    y5                1.000                                                    
    y6                1.163    0.177    0.842    1.538    0.008    normal(0,10)
    y7                1.281    0.172    0.968    1.642    0.016    normal(0,10)
    y8                1.086    0.201    0.702    1.493    0.022    normal(0,10)

Regressions:
                   Estimate       SD     2.5%    97.5%     NMAD    Prior       
  dem60 ~                                                                      
    ind60             1.310    0.430    0.477    2.162    0.001    normal(0,10)
  dem65 ~                                                                      
    ind60             0.823    0.260    0.325    1.346    0.001    normal(0,10)
    dem60             0.768    0.108    0.570    0.994    0.056    normal(0,10)

Covariances:
                   Estimate       SD     2.5%    97.5%     NMAD    Prior       
 .y1 ~~                                                                        
   .y5                0.224    0.398   -0.189    1.352    0.035       beta(1,1)
 .y2 ~~                                                                        
    y4                0.444    1.918    1.622    9.151    0.072       beta(1,1)
   .y6                0.311    0.805    0.606    3.771    0.037       beta(1,1)
 .y3 ~~                                                                        
   .y7                0.207    0.714   -0.246    2.560    0.009       beta(1,1)
 .y8 ~~                                                                        
    y4                0.253    1.098   -0.766    3.538    0.080       beta(1,1)
 .y6 ~~                                                                        
   .y8                0.305    0.680    0.160    2.831    0.014       beta(1,1)

Variances:
                   Estimate       SD     2.5%    97.5%     NMAD    Prior       
   .x1                0.088    0.021    0.198    0.053    0.007 gamma(1,.5)[sd]
   .x2                0.123    0.065    1.486    0.019    0.044 gamma(1,.5)[sd]
   .x3                0.498    0.098    0.336    0.717    0.003 gamma(1,.5)[sd]
   .y1                1.395    0.550    8.362    0.415    0.009 gamma(1,.5)[sd]
   .y2                9.914    2.199    6.379   14.956    0.023 gamma(1,.5)[sd]
   .y3                5.334    1.053    7.676    3.567    0.005 gamma(1,.5)[sd]
   .y5                2.298    0.549    5.190    1.364    0.007 gamma(1,.5)[sd]
   .y6                5.303    0.975    7.457    3.648    0.003 gamma(1,.5)[sd]
   .y7                3.712    0.848    5.597    2.285    0.004 gamma(1,.5)[sd]
   .y8                4.009    0.889    5.994    2.525    0.004 gamma(1,.5)[sd]
    y4               10.887    1.803    7.904   14.958    0.008 gamma(1,.5)[sd]
    ind60             0.451    0.088    0.305    0.650    0.003 gamma(1,.5)[sd]
   .dem60             4.680    1.078    7.057    2.845    0.021 gamma(1,.5)[sd]
   .dem65             0.434    0.308   10.637    0.039    0.119 gamma(1,.5)[sd]

Comparison with MCMC

Parameter recovery from simulated data

Effects of prior distributions

  • Loadings: \(\lambda \sim \N(0,10^2) \rightsquigarrow \lambda \sim \N(3,0.1^2)\)

  • Latent variances: \(\psi^{1/2} \sim \Gamma(1,0.5) \rightsquigarrow \psi \sim \Gamma(100,50)\)

Discussion

Summary & limitations

  • While MCMC methods remain the gold standard for exact Bayesian inference, they can be prohibitively slow.

  • 🚀 INLA methodology offers a rapid alternative for SEM, delivering Bayesian results at near the speed of frequentist estimators.

    • Not aiming to replace MCMC; use case is model development, testing, and exploratory analysis.
  • Limitations
    • Mode-Mean Discrepancy: The approximation is anchored at the posterior mode (MAP), which can be unrepresentative of the probability mass center in small samples
    • Complex Geometries: The skew normal family cannot capture multimodality or heavy tails (diagnosed via high NMAD values)
    • Dimensions: Optimised for moderate parameter counts (\(m < 100\))
    • Gaussian Likelihoods: Currently restricted to continuous outcomes where the likelihood and analytic gradients are computationally cheap.

Current and future state of SEM + INLA

  • Available features ({INLAvaan} v0.2.3)
    • acfa(), asem(), agrowth() + “usual” methods (summary, predict, plot, etc.)
    • Prior specification
    • Equality constraints & defined parameters
    • Multigroup analysis
    • Multilevel SEM
    • FIML missing data support
  • Forward looking

شكراً جزيلاً

https://haziqj.ml/lse-feb26

References

Bollen, K. A. (1989). Structural equations with latent variables (pp. xiv, 514). John Wiley & Sons. https://doi.org/10.1002/9781118619179
Dwyer-Lindgren, L., Cork, M. A., Sligar, A., Steuben, K. M., Wilson, K. F., Provost, N. R., Mayala, B. K., VanderHeide, J. D., Collison, M. L., Hall, J. B., Biehl, M. H., Carter, A., Frank, T., Douwes-Schultz, D., Burstein, R., Casey, D. C., Deshpande, A., Earl, L., El Bcheraoui, C., … Hay, S. I. (2019). Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017. Nature, 570(7760), 189–193. https://doi.org/10.1038/s41586-019-1200-9
Freni-Sterrantino, A., Rustand, D., Van Niekerk, J., Krainski, E., & Rue, H. (2025). A graphical framework for interpretable correlation matrix models for multivariate regression. Statistical Methods & Applications, 34(3), 409–447. https://doi.org/10.1007/s10260-025-00788-y
Jamil, H., Rosseel, Y., Kemp, O., & Kosmidis, I. (2026). Bias-Reduced Estimation of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, To appear. https://doi.org/10.1080/10705511.2025.2610462
Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
Krainski, E., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F., & Rue, H. (2018). Advanced spatial modeling with stochastic partial differential equations using R and INLA. Chapman and Hall/CRC.
Mejia, A. F., Yue, Y. (Ryan)., Bolin, D., Lindgren, F., & Lindquist, M. A. (2020). A Bayesian General Linear Modeling Approach to Cortical Surface fMRI Data Analysis. Journal of the American Statistical Association, 115(530), 501–520. https://doi.org/10.1080/01621459.2019.1611582
Merkle, E. C., Fitzsimmons, E., Uanhoro, J., & Goodrich, B. (2021). Efficient Bayesian Structural Equation Modeling in Stan. Journal of Statistical Software, 100, 1–22. https://doi.org/10.18637/jss.v100.i06
Nelsen, R. B. (2006). An introduction to copulas (2nd ed). Springer.
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian Inference for Latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2), 319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x
Rue, H., Riebler, A., Sørbye, S. H., Illian, J. B., Simpson, D. P., & Lindgren, F. K. (2017). Bayesian Computing with INLA: A Review. Annual Review of Statistics and Its Application, 4(1), 395–421. https://doi.org/10.1146/annurev-statistics-060116-054045
Simpson, D., Rue, H., Riebler, A., Martins, T. G., & Sørbye, S. H. (2017). Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors. Statistical Science, 32(1), 1–28. https://doi.org/10.1214/16-STS576
Thurstone, L. L. (1935). The vectors of mind: Multiple-factor analysis for the isolation of primary traits. University of Chicago Press. https://doi.org/10.1037/10018-000
Tierney, L., Kass, R. E., & Kadane, J. B. (1989). Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions. Journal of the American Statistical Association, 84(407), 710–716. https://doi.org/10.1080/01621459.1989.10478824
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511802256
van Niekerk, J., Krainski, E., Rustand, D., & Rue, H. (2023). A new avenue for Bayesian inference with INLA. Computational Statistics & Data Analysis, 181, 107692. https://doi.org/10.1016/j.csda.2023.107692
van Niekerk, J., & Rue, H. (2024). Low-rank Variational Bayes correction to the Laplace method. Journal of Machine Learning Research.
Zellner, A. (1988). Optimal Information Processing and Bayes’s Theorem. The American Statistician, 42(4), 278–280. https://doi.org/10.2307/2685143