LIGOF code

Haziq Jamil

In [1]:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lavaan)

This is lavaan 0.6-19
lavaan is FREE software! Please report any bugs.

library(lavaan.bingof)  
options(width = 80)

Simulate data

Data used for testing is a 1-factor model with 5 binary items.

In [2]:

simulate_data <- function(n = 1000) {
  popmod <- "
    f1 =~ 0.8*y1 + 0.7*y2 + 0.5*y3 + 0.4*y4 + 0.3*y5
    f1 ~~ 1*f1
  
    y1| -1.43*t1
    y2| -0.55*t2
    y3| -0.13*t3
    y4| -0.72*t4
    y5| -1.13*t5
  " 
  
  lavaan::simulateData(popmod, sample.nobs = n) |>
    lapply(ordered) |>
    as_tibble()
}

dat <- simulate_data(n = 1000)
glimpse(dat)

Rows: 1,000
Columns: 5
$ y1 <ord> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ y2 <ord> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2…
$ y3 <ord> 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2…
$ y4 <ord> 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2…
$ y5 <ord> 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2…

Fit a model

By default, the DWLS estimator is used (with mean and variance adjustment for standard errors).

In [3]:

fit <- cfa("f1 =~ y1 + y2 + y3 + y4 + y5", dat, std.lv = TRUE)
summary(fit)

lavaan 0.6-19 ended normally after 15 iterations

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                        10

  Number of observations                          1000

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 1.949       2.547
  Degrees of freedom                                 5           5
  P-value (Chi-square)                           0.856       0.769
  Scaling correction factor                                  0.779
  Shift parameter                                            0.046
    simple second-order correction                                

Parameter Estimates:

  Parameterization                               Delta
  Standard errors                           Robust.sem
  Information                                 Expected
  Information saturated (h1) model        Unstructured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f1 =~                                               
    y1                0.830    0.083   10.034    0.000
    y2                0.688    0.064   10.744    0.000
    y3                0.413    0.057    7.208    0.000
    y4                0.333    0.061    5.447    0.000
    y5                0.333    0.070    4.790    0.000

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    y1|t1            -1.426    0.058  -24.409    0.000
    y2|t1            -0.598    0.042  -14.119    0.000
    y3|t1            -0.154    0.040   -3.855    0.000
    y4|t1            -0.650    0.043  -15.159    0.000
    y5|t1            -1.063    0.049  -21.700    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.311                           
   .y2                0.527                           
   .y3                0.829                           
   .y4                0.889                           
   .y5                0.889                           
    f1                1.000

Extract proportions and fitted probabilities

This function extracts the univariate and bivariate margins of the sample and fitted probabilities from a lavaan object.

In [4]:

lavaan.bingof::get_uni_bi_moments

function (.lavobject, wtd = TRUE) 
{
    list2env(extract_lavaan_info(.lavobject), environment())
    if (!isTRUE(wtd)) 
        wt <- 1
    N <- sum(wt)
    pdot1 <- pidot1 <- rep(NA, p)
    for (i in seq_along(pidot1)) {
        pdot1[i] <- sum(wt[dat[, i] == 2])/N
        pidot1[i] <- pnorm(TH[i], mean = mu_ystar[i], sd = sqrt(Var_ystar[i, 
            i]), lower.tail = FALSE)
    }
    id <- combn(p, 2)
    pdot2 <- pidot2 <- rep(NA, ncol(id))
    for (k in seq_along(pidot2)) {
        i <- id[1, k]
        j <- id[2, k]
        pdot2[k] <- sum(wt[dat[, i] == 2 & dat[, j] == 2])/N
        pidot2[k] <- mnormt::sadmvn(lower = c(TH[i], TH[j]), 
            upper = c(Inf, Inf), mean = mu_ystar[c(i, j)], varcov = Var_ystar[c(i, 
                j), c(i, j)])
    }
    list(pdot1 = pdot1, pidot1 = pidot1, pdot2 = pdot2, pidot2 = pidot2)
}
<bytecode: 0x128cf9020>
<environment: namespace:lavaan.bingof>

get_uni_bi_moments(fit)

$pdot1
[1] 0.923 0.725 0.561 0.742 0.856

$pidot1
[1] 0.923 0.725 0.561 0.742 0.856

$pdot2
 [1] 0.703 0.538 0.701 0.798 0.444 0.563 0.640 0.430 0.497 0.645

$pidot2
 [1] 0.7026938 0.5376652 0.6993516 0.8010992 0.4449536 0.5638761 0.6392395
 [8] 0.4339455 0.4926660 0.6436034

e2_hat <- with(get_uni_bi_moments(fit), {
    p2_hat <- c(pdot1, pdot2)
    pi2_hat <- c(pidot1, pidot2)
    p2_hat - pi2_hat
  })
str(e2_hat)

 num [1:15] 0.00 0.00 1.11e-16 0.00 -1.11e-16 ...

\(\boldsymbol\Sigma_2\) matrix

This is the estimate of the asymptotic covariance matrix of the sample univariate and bivariate moments. Most pertinent is the option method = "theoretical", which actually calculates \[ \hat{\boldsymbol\Sigma}_2 = \operatorname{Var}(\mathbf y_2) = \operatorname{E}(\mathbf y_2\mathbf y_2^\top) - \operatorname{E}(\mathbf y_2)\operatorname{E}(\mathbf y_2)^\top. \] The expectation of the first part involves trivariate and tetravariate moments, calculated using standard normal integrals via mnormt::sadmvn().

In [5]:

lavaan.bingof:::create_Sigma2_matrix

function (.lavobject, method = c("theoretical", "weighted", "force_unweighted", 
    "strat", "strat2", "multinomial")) 
{
    list2env(extract_lavaan_info(.lavobject), environment())
    list2env(get_uni_bi_moments(.lavobject), environment())
    p2_hat <- c(pdot1, pdot2)
    pi2_hat <- c(pidot1, pidot2)
    method <- match.arg(method, c("theoretical", "weighted", 
        "force_unweighted", "strat", "strat2", "multinomial"))
    if (method == "theoretical") {
        S <- p * (p + 1)/2
        Eysq <- matrix(NA, S, S)
        id <- c(1:p, asplit(combn(p, 2), 2))
        idS <- gtools::combinations(S, 2, repeats = TRUE)
        colnames(idS) <- c("i", "j")
        idy <- as_tibble(idS) %>% mutate(var1 = id[i], var2 = id[j], 
            y = mapply(c, var1, var2, SIMPLIFY = FALSE))
        for (s in seq_len(nrow(idS))) {
            i <- idy$i[s]
            j <- idy$j[s]
            yy <- unique(idy$y[[s]])
            dimy <- length(yy)
            Eysq[i, j] <- mnormt::sadmvn(lower = TH[yy], upper = rep(Inf, 
                dimy), mean = mu_ystar[yy], varcov = Var_ystar[yy, 
                yy])
        }
        Eysq[lower.tri(Eysq)] <- t(Eysq)[lower.tri(Eysq)]
        res <- Eysq - tcrossprod(pi2_hat)
    }
    else if (method == "multinomial") {
        res <- diag(pi2_hat) - tcrossprod(pi2_hat)
    }
    else {
        dat <- dat %>% mutate(across(everything(), function(y) as.numeric(y) - 
            1))
        idx <- combn(p, 2)
        varnames <- colnames(dat)
        for (k in seq_len(ncol(idx))) {
            i <- idx[1, k]
            j <- idx[2, k]
            varname <- paste0(varnames[i], varnames[j])
            yi <- dat[, i, drop = TRUE]
            yj <- dat[, j, drop = TRUE]
            yij <- (yi == 1) * (yj == 1)
            dat[[varname]] <- yij
        }
        if (method == "strat") {
            strat_wt <- unique(wt)
            nstrat <- length(strat_wt)
            Eysq_strat <- Esqy_strat <- list()
            for (k in seq_along(strat_wt)) {
                idxl <- wt == strat_wt[k]
                dats <- dat[idxl, ]
                wts <- wt[idxl]
                Esqy_strat[[k]] <- (strat_wt[k]/sum(strat_wt)) * 
                  apply(dats, 2, mean)
                Eysq_strat[[k]] <- (strat_wt[k]/sum(strat_wt)) * 
                  (cov.wt(dats, center = FALSE)$cov)
            }
            Eysq <- Reduce("+", Eysq_strat)
            Esqy <- tcrossprod(Reduce("+", Esqy_strat))
            res <- Eysq - Esqy
        }
        else if (method == "strat2") {
            strat_wt <- unique(wt)
            nstrat <- length(strat_wt)
            res <- list()
            for (k in seq_along(strat_wt)) {
                idxl <- wt == strat_wt[k]
                dats <- dat[idxl, ]
                wts <- wt[idxl]
                dats <- scale(dats, center = pi2_hat, scale = FALSE)
                na <- length(wts)
                res[[k]] <- cov.wt(dats, wt = wts/sum(wt))$cov/((na - 
                  1)) * na^2
            }
            res <- Reduce("+", res) * nrow(dat)
        }
        else {
            if (method == "force_unweighted") 
                wt[] <- 1
            res <- cov.wt(dat, wt)$cov
        }
    }
    res
}
<bytecode: 0x10ba3a0c8>
<environment: namespace:lavaan.bingof>

lavaan.bingof:::create_Sigma2_matrix(fit) |>
  Matrix::Matrix() |>
  round(3)

15 x 15 Matrix of class "dsyMatrix"
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
 [1,] 0.071 0.034 0.020 0.014 0.011 0.054 0.041 0.054 0.062 0.026 0.030 0.032
 [2,] 0.034 0.199 0.038 0.026 0.019 0.193 0.047 0.043 0.042 0.122 0.155 0.176
 [3,] 0.020 0.038 0.246 0.018 0.012 0.043 0.236 0.028 0.026 0.195 0.039 0.040
 [4,] 0.014 0.026 0.018 0.191 0.008 0.029 0.021 0.180 0.018 0.025 0.145 0.027
 [5,] 0.011 0.019 0.012 0.008 0.123 0.021 0.015 0.014 0.115 0.017 0.018 0.092
 [6,] 0.054 0.193 0.043 0.029 0.021 0.209 0.059 0.059 0.059 0.124 0.154 0.173
 [7,] 0.041 0.047 0.236 0.021 0.015 0.059 0.249 0.044 0.045 0.198 0.047 0.048
 [8,] 0.054 0.043 0.028 0.180 0.014 0.059 0.044 0.210 0.052 0.039 0.156 0.044
 [9,] 0.062 0.042 0.026 0.018 0.115 0.059 0.045 0.052 0.159 0.036 0.039 0.110
[10,] 0.026 0.122 0.195 0.025 0.017 0.124 0.198 0.039 0.036 0.247 0.104 0.114
[11,] 0.030 0.155 0.039 0.145 0.018 0.154 0.047 0.156 0.039 0.104 0.246 0.140
[12,] 0.032 0.176 0.040 0.027 0.092 0.173 0.048 0.044 0.110 0.114 0.140 0.231
[13,] 0.020 0.040 0.191 0.112 0.013 0.045 0.187 0.117 0.027 0.162 0.110 0.042
[14,] 0.021 0.041 0.216 0.019 0.071 0.046 0.211 0.030 0.081 0.179 0.042 0.083
[15,] 0.019 0.034 0.024 0.166 0.093 0.038 0.028 0.163 0.097 0.033 0.138 0.090
      [,13] [,14] [,15]
 [1,] 0.020 0.021 0.019
 [2,] 0.040 0.041 0.034
 [3,] 0.191 0.216 0.024
 [4,] 0.112 0.019 0.166
 [5,] 0.013 0.071 0.093
 [6,] 0.045 0.046 0.038
 [7,] 0.187 0.211 0.028
 [8,] 0.117 0.030 0.163
 [9,] 0.027 0.081 0.097
[10,] 0.162 0.179 0.033
[11,] 0.110 0.042 0.138
[12,] 0.042 0.083 0.090
[13,] 0.246 0.171 0.105
[14,] 0.171 0.250 0.068
[15,] 0.105 0.068 0.229

Jacobian matrix

This is the \(\boldsymbol\Delta_{2,\pi} \in \mathbb R^{S \times T}\) matrix, which is the Jacobian of the probability density function of the multivariate normal distribution with respect to the parameters of the model. Its \((s, t)\)-th entry is \(\partial \boldsymbol[\pi_2(\theta)]_s / \partial\theta_t\) evaluated at the parameter estimates \(\hat\theta\) of the model.

In [6]:

create_Delta2_matrix <- function(lavobject) {
  p <- lavobject@Model@nvar
  all_thresh <- inspect(lavobject, "est")$tau

  if(ncol(all_thresh) != 1L) {
    stop("This simplified function only handles purely binary indicators (1 threshold per variable).")
  }
  tau <- as.numeric(all_thresh)
  Sigma_hat <- inspect(lavobject, "implied")$cov
  rho_ij <- Sigma_hat[lower.tri(Sigma_hat)]

  Delta_full <- lavaan:::computeDelta(lavobject@Model)[[1]]
  derTauToTheta <- Delta_full[1:p, , drop = FALSE]
  derRhoToTheta <- Delta_full[-(1:p), , drop = FALSE]

  pair_idx <- which(lower.tri(Sigma_hat), arr.ind = TRUE)
  npairs <- nrow(pair_idx)

  # Precompute all necessary values
  dnorm_tau <- dnorm(tau)
  tau_i <- tau[pair_idx[, 2]]
  tau_j <- tau[pair_idx[, 1]]
  dnorm_tau_i <- dnorm_tau[pair_idx[, 2]]
  dnorm_tau_j <- dnorm_tau[pair_idx[, 1]]
  rho_values <- rho_ij

  # Common terms for vectorized calculations
  denominator_sq <- 1 - rho_values^2
  sqrt_denominator_sq <- sqrt(denominator_sq)
  z1 <- (rho_values * tau_i - tau_j) / sqrt_denominator_sq
  z2 <- (rho_values * tau_j - tau_i) / sqrt_denominator_sq

  # Vectorized derivatives calculations
  dP.dTaui <- -dnorm_tau_i * pnorm(z1)
  dP.dTauj <- -dnorm_tau_j * pnorm(z2)
  exponent <- -0.5 * (tau_i^2 - 2 * rho_values * tau_i * tau_j + tau_j^2) / denominator_sq
  dP.dRho <- exp(exponent) / (2 * pi * sqrt_denominator_sq)

  # Vectorized matrix operations
  i_indices <- pair_idx[, 2]
  j_indices <- pair_idx[, 1]

  dP_taui_theta <- dP.dTaui * derTauToTheta[i_indices, , drop = FALSE]
  dP_tauj_theta <- dP.dTauj * derTauToTheta[j_indices, , drop = FALSE]
  dP_rho_theta <- dP.dRho * derRhoToTheta

  derBiv_11_wrtTheta <- dP_taui_theta + dP_tauj_theta + dP_rho_theta

  # Combine results
  rbind(
    -dnorm_tau * derTauToTheta,
    derBiv_11_wrtTheta
  )
}

The arrangement for {lavaan} parameters are in this order: loadings, thresholds, and then factor correlations. As we can see below in the first row, the first 5 entries are zero since the derivative of the univariate probabilities with respect to the loadings is zero (it only depends on thresholds).

In [7]:

create_Delta2_matrix(fit) |> 
  Matrix::Matrix(sparse = TRUE) |>
  round(3)

15 x 10 sparse Matrix of class "dgCMatrix"
                                                                      
 [1,] .     .     .     .     .     -0.144  .      .      .      .    
 [2,] .     .     .     .     .      .     -0.334  .      .      .    
 [3,] .     .     .     .     .      .      .     -0.394  .      .    
 [4,] .     .     .     .     .      .      .      .     -0.323  .    
 [5,] .     .     .     .     .      .      .      .      .     -0.227
 [6,] 0.047 0.056 .     .     .     -0.057 -0.303  .      .      .    
 [7,] 0.024 .     0.048 .     .     -0.052  .     -0.366  .      .    
 [8,] 0.019 .     .     0.048 .     -0.087  .      .     -0.292  .    
 [9,] 0.016 .     .     .     0.039 -0.109  .      .      .     -0.200
[10,] .     0.057 0.096 .     .      .     -0.165 -0.283  .      .    
[11,] .     0.040 .     0.082 .      .     -0.234  .     -0.219  .    
[12,] .     0.029 .     .     0.060  .     -0.277  .      .     -0.146
[13,] .     .     0.043 0.054 .      .      .     -0.291 -0.170  .    
[14,] .     .     0.030 .     0.038  .      .     -0.336  .     -0.114
[15,] .     .     .     0.026 0.026  .      .      .     -0.272 -0.160

Extract code

For use in other .qmd documents.

In [8]:

# knitr::purl("ligof.qmd", output = "ligof.R", quiet = TRUE)