Model description for glca

Suppose that there are G groups, and the gth group consists of n_g observations for g = 1, …, G, and there are M categorical manifest items, where the mth item has r_m categories for m = 1, …, M. Let Y_ig = (Y_ig1, …, Y_igM)^⊤ and ${\bf y}_{ig} = (y_{ig1}, \ldots, y_{igM})^\top$ denote a set of item variables and their responses given by the ith individual within the gth group, respectively. The number of possible response patterns of Y_ig is $\prod_{m = 1}^{M}r_m$, and it is likely that most of these response patterns are sparse. The multiple-group LCA assumes that associations among manifest items can be explained by the latent classifier L_ig, where L_ig is the latent class variable having C categories for the ith individual within the gth group. To reflect multiple-group data structures, we discuss two different LCA approaches, namely fixed-effect and random-effect LCA.

Models

Fixed-effect latent class analysis

The fixed-effect LCA can reflect group differences in latent structure by specifying an LCR model for a given subgroup. We extend the simultaneous LCA (Clogg & Goodman, 1985) by incorporating logistic regression in the class prevalence and refer it to multiple-group latent class regression (mgLCR). Let ${\bf x}_{ig}=(x_{ig1}, \ldots, x_{igp})^\top$ be a subject-specific p × 1 vector of covariates for the ith individual within the gth group, either discrete or continuous. Then, the observed-data likelihood of mgLCR for the ith individual within the gth group can be specified as where I(y_igm = k) is an indicator function that is equal to 1 when the response to the mth item from the ith individual within the gth group is k and is otherwise equal to 0. The likelihood given in @ref(eq:likeli) contains two types of parameters:

ρ_{mk ∣ cg} represents the probability of an individual within the gth group responding k to the mth item given his or her latent class as c.
$\gamma_{c \mid g}({\bf x}_{ig})$ is the probability of the ith individual belonging to the latent class c within the gth group, which could be influenced by the subject-specific covariates ${\bf x}_{ig}$.

The ρ-parameter is the measurement parameter in mgLCR (i.e., item-response probability), describing a tendency of individuals in a latent class c to respond to the mth item for m = 1, …, M. Comparison of estimated item-response probabilities across groups is a valuable strategy for quantifying measurement invariance because they solely determine the meaning of the latent class. By comparing the model fit with the parameter held constant across groups (i.e., ρ_mk ∣ c = ρ_{mk ∣ c1} = … = ρ_{mk ∣ cG} for k = 1, …, r_m, m = 1, …, M, and c = 1, …, C) against an alternative model with freely varying parameters, we obtain evidence on whether measurement invariance across groups can be assumed. As given in @ref(eq:likeli), the subject-specific covariates ${\bf x}_{ig}$ may influence the probability of the individual belonging to a specific class in the form of logistic regression as where the coefficient vector β_c ∣ g = (β_1c ∣ g, …, β_pc ∣ g)^⊤ can be interpreted as the expected change in the log odds of belonging to a class c versus belonging to the referent class C (i.e., α_C ∣ g = 0 and $\boldsymbol{\beta}_{C \mid g} = {\bf 0}$ for g = 1, …, G). Then, the observed log-likelihood function for the mgLCR model can be specified as It should be noted that, similar to item-response probabilities, coefficients of logistic regression can be constrained to be equal across subgroups (i.e., β_c = β_c ∣ 1 = ⋯ = β_c ∣ G for c = 1, …, C) to test whether the effects of covariates are identical across groups.

Random-effect latent class analysis

The random-effect LCA considers the group variation in the latent class prevalence for each group using random coefficients, for example, $$ P(L_{ig} = c) = \frac{\exp(\alpha_{c} + \sigma_{c} \lambda_{g})}{\sum_{c' = 1}^{C}\exp(\alpha_{c'} + \sigma_{c'} \lambda_{g})}, $$ where λ = (λ₁, …, λ_G)^⊤ represents group variation in the class prevalence. In the parametric random-effect LCA, the random coefficients are assumed to be derived from parametric distributions such as standard normal distribution. However, the nonparametric approach assumes no specific distribution; rather, it only assumes that random coefficients follow a specific probability mass function with some mass points. In other words, the nonparametric approach employs categorical level-2 latent variable (i.e., latent cluster) U_g whose probability mass function is P(U_g = w) = δ_w for w = 1, …, W. Using the classification mechanics of LCA, the latent cluster membership of level-2 units can be identified by the small number of representative patterns of class prevalences in multiple groups. Therefore, the meaning of the wth level of latent cluster variable is determined by the prevalence of latent classes P(L_ig = c ∣ U_g = w) for c = 1, …, C. Considering latent cluster variable as a group variable, the nonparametric approach provides more meaningful interpretations in group comparison than parametric approach; we can examine whether the latent class structure differs across latent cluster memberships. Therefore, we focus on the nonparametric random-effect LCA, hereafter referred to as nonparametric latent class regression (npLCR).

The observed-data likelihood of npLCR for the gth group can be expressed by where ${\bf x}_{ig} = (x_{ig1}, \ldots, x_{igp})^\top$ and ${\bf z}_g=(z_{g1}, \ldots, z_{gq})^\top$ denote vectors of subject-specific (i.e., level-1) and group-specific (i.e., level-2) covariates for i = 1, …, n_g and g = 1, …, G, respectively. The likelihood given in @ref(eq:grouploglik-mLCA) contains three types of parameters:

ρ_mk ∣ c represents the probability of an individual responding k to the mth item given his or her latent class as c.
$\gamma_{c \mid w}({\bf x}_{ig}, {\bf z}_g)$ is the probability of the ith individual within the gth group belonging to the latent class c given the latent cluster w, which could be influenced by the subject-specific covariates ${\bf x}_{ig}$ and the group-specific covariates ${\bf z}_g$.
δ_w represents the latent cluster prevalence for w = 1, …, W.

The class prevalence can be modeled using the logistic regression as where vectors β_1c ∣ w = (β_{11c ∣ w}, …, β_{1pc ∣ w})^⊤ and β_2c = (β_21c, …, β_2qc)^⊤ are logistic regression coefficients for level-1 and level-2 covariates, respectively. Then, the observed log-likelihood of npLCR is specified as

Note that coefficients for level-1 covariates depend on both latent classes and clusters, while coefficients for level-2 covariates depend only on latent class membership. We may refer the model @ref(eq:MLCA-reg) to the random slope model as coefficients for level-1 covariates are different across latent clusters. The coefficients for level-1 covariates can be constrained to be equal across clusters (i.e., β_1c = β_1c ∣ 1 = ⋯ = β_1c ∣ W for c = 1, …, C) to test whether the effects of level-1 covariates are identical across all latent cluster memberships. It should also be noted that the measurement invariance is assumed across latent cluster memberships in npLCR (i.e., ρ_mk ∣ c = ρ_{mk ∣ c1} = … = ρ_{mk ∣ cW} for k = 1, …, r_m, m = 1, …, M, and c = 1, …, C). If not, the item response probability may vary across latent cluster memberships, suggesting that the latent class structure itself is different between latent clusters. Thus, it no longer makes sense to use latent class prevalences as identifiers for the latent cluster membership.

Estimation

Fixed-effect latent class analysis

The package glca finds the maximum-likelihood (ML) estimates for mgLCR and npLCR using expectation-maximization (EM) algorithm (Dempster et al., 1977). The EM algorithm iterates two steps: expectation step (E-step) and maximization step (M-step) in order to find the solution maximizing the log-likelihood functions given in @ref(eq:loglik-mgLCA) and @ref(eq:loglik-mLCA).

For mgLCR, E-step computes the posterior probabilities with current estimates for i = 1, …, n_g, g = 1, …, G, and c = 1, …, C. M-step maximizes the complete-data likelihood (i.e., the likelihood for the cross-classification by L_ig and y_ig) with respect to β- and ρ-parameters. In particular, when all values of θ_ig(c) are known, updated estimates for β-parameters can be calculated by the Newton-Raphson algorithm for multinomial logistic regression given in @ref(eq:mgLCA-reg), provided that the computational routine allows fractional responses rather than integer counts (Bandeen-Roche et al., 1997). Therefore, the package glca conducts one-cycle of Newton-Raphson algorithm to update β-parameters at every iteration in M-step. If there is no covariate in the model, the class prevalence can be updated directly without estimating β-parameters as $\hat{\gamma}_{c \mid g} = P(L_{ig}=c) = \sum_{i=1}^{n_g} \theta_{ig(c)}/n_g$ for c = 1, …, C and g = 1, …, G. The item-response probabilities, ρ_{mk ∣ cg} can be interpreted as parameters in a multinomial distribution when θ_ig(c) is known, so we have $$ \hat{\rho}_{mk \mid cg} = \frac{\sum_{i=1}^{n_g} \theta_{ig(c)}I(y_{igm} = k)}{\sum_{i=1}^{n_g} \theta_{ig(c)}} $$ for k = 1, …, r_m, m = 1, …, M, c = 1, …, C, and g = 1, …, G. Under the measurement invariance assumption (i.e., ρ_mk ∣ c = ρ_{mk ∣ c1} = … = ρ_{mk ∣ cG}), the ρ-parameter will be updated as $$ \hat{\rho}_{mk \mid c} = \frac{\sum_{g = 1}^{G} \sum_{i = 1}^{n_g} \theta_{ig(c)}I(y_{igm} = k)}{\sum_{g = 1}^{G} \sum_{i = 1}^{n_g} \theta_{ig(c)}} $$ for k = 1, …, r_m, m = 1, …, M, and c = 1, …, C.

Random-effect latent class analysis

For npLCR, E-step involves the joint posterior probability where y_g = (y_1g^⊤, …, y_{n_gg}^⊤)^⊤ and x_g = (x_1g^⊤, …, x_{n_gg}^⊤)^⊤ are all observations for M manifest items and p subject-specific covariates from the gth group, respectively. As shown in @ref(eq:full-post), computational complexity increases exponentially as the number of individuals per group n_g increases in E-step for npLCR; when the model has W latent clusters and C latent classes, the implementation of the E-step would yield dimensional complexity proportional to W × C^n_g for each group. It is not computationally feasible even with a moderate number of individuals per group. Besides, M-step for npLCR requires only some marginal versions of posterior probabilities rather than the joint posterior probability given in @ref(eq:full-post). Therefore, the E-step can be modified to alleviate the computational complexity by accommodating the hierarchical structure of npLCR. Vermunt, 2003 proposed the upward-downward (UD) algorithm for calculating the marginal posterior probability directly in the E-step. The UD algorithm is similar to the forward-backward (FB) algorithm for handling hidden Markov models (Baum et al., 1970; Juang & Rabiner, 1991). In the UD algorithm, marginal posterior probabilities θ_ig(w, c) can be calculated as for i = 1, …, n_g, g = 1, …, G, c = 1, …, C, and w = 1, …, W. The marginal posterior probability, θ_ig(w, c) is the product of upward probability, θ_g(w) and downward probability, θ_{ig(c ∣ w)}. In @ref(eq:UD), it should be noted that an individual’s class membership is assumed to depend only on his/her observations, which can be presented as P(L_ig = c ∣ U_g = w, Y_g = y_g, x_g, z_g) = P(L_ig = c ∣ U_g = w, Y_ig = y_ig, x_ig, z_g) for i = 1, …, n_g. Then, the upward and downward probabilities are easily calculated as with current estimates, respectively. M-step maximizes the complete-data likelihood (i.e., the likelihood for the cross-classification by U_g, L_ig and y_ig) with respect to β_1c ∣ w, β_2c, and ρ_mk ∣ c. In particular, when θ_ig(w, c) is known, updated estimates for β-parameters can be calculated by Newton-Raphson algorithm for multinomial logistic regression given in @ref(eq:MLCA-reg), provided that the computational routine allows fractional responses rather than integer counts (Bandeen-Roche et al., 1997). Therefore, the package glca conducts one-cycle of Newton-Raphson algorithm to update β-parameters at every iteration in M-step. If there is no covariate in the model, the class prevalence can be updated directly without estimating β-parameters as $\hat{\gamma}_{c \mid w} = P(L_{ig}=c \mid U_g =w) = \sum_{g=1}^G \sum_{i=1}^{n_g} \theta_{ig(w,c)}/\sum_{g=1}^G \theta_{g(w)}$ for c = 1, …, C and w = 1, …, W. The cluster prevalence δ_w and the item-response probabilities ρ_mk ∣ c can be interpreted as parameters in multinomial distributions, so we have for w = 1, …, W, c = 1, …, C, k = 1, …, r_m, and m = 1, …, M. The marginal posterior probability used in @ref(eq:m-nplcr)$ are easily obtained by $\theta_{ig(c)} = \sum_{w = 1}^{W} \theta_{ig(w,c)}$.

Handling missing data

Missing data occur in nearly all empirical data, despite the vigorous efforts of researchers to prevent it. Missing data cause two general problems. First, if subjects with any missing data on the variables are removed from the dataset, the sample can be very small especially when the number of missing values is large. This can lead to a great loss of information and poor statistical power. Second, frequently the subjects who provide incomplete data are different from those who provide complete data. If adjustments are not made for these differences, results may be biased.

In the case of random missing-data mechanisms (i.e., ignorable missing data) such as missing completely at random (MCAR) and missing at random (MAR) (Little & Rubin, 2019), two methods for dealing with missing data are typically available: full-information maximum likelihood (FIML) and multiple imputation (MI, Schafer, 1997). In MI plausible values are imputed multiple times in place of missing values to create multiple complete datasets. The use of MI for multiple-group LCA has an advantage in that missing data on covariates and group variable can be handled. However, the disadvantage is that LCA must be fitted separately for each imputed complete dataset, and the results must be combined to obtain the final estimates. FIML is a model-based missing data procedure where model estimates are adjusted on the basis of all of the information provided by subjects with complete data and partially complete data. Most software packages for LCA employ a FIML approach because it requires no additional input from the user other than specifying that what code is used to denote missing data. However, this approach cannot handle missing data when missingness occurs in group variable or covariates in multiple-group LCA.

The package glca estimates model parameters using a FIML approach when some responses are found missing on manifest items: in E-step the missing responses are excluded from computing the posterior probability; and in M-step the indicator I(y_igm = k) for the missing response is replaced with the updated ρ-parameter from previous iteration. In short, the package glca can handle any ignorable missing data on manifest items, but individuals with missing data on group variable or any covariate are deleted from the analysis. However, missing values on group variable and covariates can be treated using multiple multivariate imputation by chained equations (MICE, van Buuren et al., 2006), which is implemented in the R package mice (van Buuren & Groothuis-Oudshoorn, 2011). MICE imputation could be used to create multiple sets of complete group variable and covariates for multiple-group LCA. Each complete dataset can be analyzed using the package glca and combining results across imputed datasets are easily obtained.

Finding global maximum

Since the log-likelihood of LCA may have several local-maxima problem, the estimated parameters from EM algorithm can be deviated from the globally optimal solution. To cope with this problem, we recommend starting the algorithm using several different initial sets of random values and ascertaining whether they consistently converge to the same solution. If they converge, the solution can be considered as the ML estimates. If not, we recommend examining the distribution of the likelihood values and selecting the largest likelihood value, which usually corresponds to the ML solution. The package glca allows investigators to try different starting values either by using random starting values or providing their own starting values. An investigator can select the number of initial sets of random values (default is 10) in the package glca, and then the package iterates EM algorithm a small number of times (default is 50) for each set of random values. Among the initial sets of model parameters, those producing the largest value of likelihood will be chosen for the main iteration.

Standard error calculation

The standard error of the estimated parameters can be calculated using the observed empirical information matrix (Mclachlan & Krishnan, 2007, p. 114), $$ \boldsymbol{I}_{e}(\hat{\boldsymbol{\Psi}};\mathbf{Y}) = \sum_{g = 1}^{G}\mathbf{s}(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}})\mathbf{s}^\top(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}}), $$ where $\mathbf{s}(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}})$ is the score function of the parameter vector Ψ for the gth group, evaluated at their MLE $\hat{\boldsymbol{\Psi}}$. In the parameter vector Ψ, all probability parameters such as δ and ρ-parameters are transformed into free parameters using baseline logit function. The variance of $\hat{\boldsymbol{\Psi}}$ can be obtained by the inverse of $\boldsymbol{I}_{e}(\hat{\boldsymbol{\Psi}};\mathbf{Y})$. However, as our target parameters are a re-parameterized version of Ψ, we should apply delta method to the variance of $\hat{\boldsymbol{\Psi}}$. Let q(Ψ) denote the original parameters of multiple-group latent class models. Then, the variance-covariance matrix of the estimates is $$ \textsf{Var}\left(q(\hat{\boldsymbol{\Psi}})\right) = J_q(\hat{\boldsymbol{\Psi}})\textsf{Var}(\hat{\boldsymbol{\Psi}})J_q(\hat{\boldsymbol{\Psi}})^\top, $$ where $J_q(\hat{\boldsymbol{\Psi}})$ is the Jacobian matrix of the function q() evaluated at the MLE of Ψ. Details of the score functions and the Jacobian matrices are provided in Appendix.

Model assessment and selection

Assessing absolute model fit for measuring goodness-of-fit

Absolute model fit refers to whether a specified multiple-group latent class model provides an adequate representation of the data. Typically, the analyst assesses absolute model fit by fitting a particular model to the observed data and testing the null hypothesis that the observed data has been produced by the fitted model. Thus, one usually hopes to find a model for which the null hypothesis is not rejected. This hypothesis test for LCA is based on a contingency table; the expected cell counts are estimated according to the specified model and its estimated parameters, then compared to the observed cell counts. The likelihood-ratio statistic, G² (Agresti, 2013) is used to assess absolute model fit in the package glca. The G² test statistic is derived from the difference in the log-likelihood values between the fitted model and the saturated model (i.e., expected and observed cell counts, respectively), where the residual degree of freedom is calculated by subtracting the number of parameters in the fitted model from those in the saturated model. The number of parameters for the saturated model is the lesser of number of possible combinations of categorical variables and number of cases in the model.

It should be noted that the contingency table for the LCA type of model is commonly sparse. When there are many cells containing very few observations in the cross-classification table, the large-sample approximation to the chi-square distribution for the G² statistic is not appropriate. In such case, the package glca allows us to conduct goodness-of-fit test using the bootstrap likelihood-ratio test (BLRT) statistic (Langeheine et al., 1996). This approach generates random datasets multiple times using the estimated parameters and calculates the G² statistic for each generated dataset. In BLRT the resulting distribution of the G² statistic across the random datasets is used as the reference distribution. The relative position of the G² statistic obtained from the original dataset within the reference distribution can be used as a measure of absolute model fit. In fact, the right tail probability of the observed G² value is regarded as a bootstrap p-value. For example, if the observed G² value falls in the uppermost tail of the reference distribution, we may conclude that this test statistic is unlikely observed under the model corresponding to the null hypothesis. Such finding would provide evidence to reject the null hypothesis.

Assessing relative model fit for exploring group differences

When comparing two or more groups in multiple-group latent class model, it should be checked if the latent features are identical or not across groups. The relative model fit refers to deciding which of two or more models represents a better fit to a particular dataset. The measurement invariance in multiple-group LCA can be tested by comparing the model fits of constrained versus unconstrained model; the unconstrained (full) model allows all parameters to vary across groups, while the constrained (reduced) model allows only the class prevalences to vary but item-response probabilities to be equal across groups. The package glca conducts the chi-square likelihood-ratio test (LRT) to assess relative model fit by comparing two competing models for testing measurement invariance.

Similar to the item-response probabilities, the coefficients for the level-1 covariates can also be tested for equality across groups using chi-square LRT in the package glca. By comparing the fit of reduced model with the coefficients held constant across groups (i.e., β_c = β_c ∣ 1 = ⋯ = β_c ∣ G in mgLCR and β_1c = β_1c ∣ 1 = ⋯ = β_1c ∣ W in npLCR for c = 1, …, C) against full model with freely varying coefficients, we obtain evidence on whether the effects of level-1 covariates on latent class prevalences can be assumed to be identical across groups. To make the group comparison for coefficients valid, the assumption of measurement invariance must be met to ensure consistent meaning of latent classes across groups.

The deviance statistic, a test statistic of LRT for relative model fit, is obtained by twice the difference in the log-likelihood values of two competing models. The degree of freedom for deviance statistic is the difference in the number of free parameters of the two multiple-group latent class models. For example, the validity of the measurement invariance assumption can be tested by calculating the log-likelihood from the model where item-response probabilities are constrained to be equal across subgroups and comparing it with the log-likelihood from the freely estimated model.

Choosing the numbers of latent classes and latent clusters

The chi-square LRT cannot be used to compare latent class models with a different number of latent classes or clusters because these two models are not nested. Thus, the package glca provides several information criteria commonly used in LCA such as Akaike’s information criterion (AIC), Bozdogan’s criterion (CAIC), and Schwartz’s Bayesian information criterion (BIC) (Akaike, 1974; Bozdogan, 1987; Schwarz, 1978) to compare the fit of non-nested competing models; the model with a smaller AIC (or BIC) value is preferred. Another model fit index provided by the package is entropy, which is widely used in research practices although it can be a poor measure for model selection as it often depends on the number of classes (Collins & Lanza, 2009). The model with relatively higher entropy value is preferred.

The package glca also generates the empirical distribution of the deviance statistic to help select a better model between two non-nested competing models with a different number of latent classes or clusters using BLRT (Langeheine et al., 1996). The null hypothesis is the simpler model is adequate. Thus, the bootstrap sample will be drawn from the simpler model. Using a generated bootstrap sample, both competing models are estimated and the deviance between these two models is calculated. By repeating this procedure multiple times, we can construct the reference distribution of the deviance. Similar to the bootstrap p-value for the G² statistic, the relative position of observed deviance within the reference distribution presents bootstrap p-value; the null model with a bootstrap p-value > 0.05 is preferred with a significance level of α = 0.05. An important advantage of using BLRT is that this method can be applied to the test for comparing two nested latent class models even when the condition for large-sample approximation is not satisfied. It should be noted that the optimal model should be selected by comprehensively considering both conceptual and analytical implications, and the quantitative goodness-of-fit statistics.

Appendix

Score functions and Jacobian matrices

Fixed-effect latent class analysis:

Let α_g and β_g be vectorized parameters containing all coefficients of α_c ∣ g and β_c ∣ g given in @ref(eq:mgLCA-reg) for c = 1, …, C − 1 and g = 1, …, G, respectively. Further, let s(Y_g; α_g) and s(Y_g; β_g) denote score functions of α_g and β_g, respectively. Then, the element of s(Y_g; α_g) and the p × 1 sub-vector of s(Y_g; β_g) are obtained as for c = 1, …, C − 1 and g = 1, …, G, respectively. Note that ℒ_g in @ref(eq:score-reg-mglcr) is the product of observed-data likelihoods given in @ref(eq:likeli) for all observations in the gth group (i.e., $\mathcal{L}_{g}=\prod_{i=1}^{n_g}\mathcal{L}_{ig}$).

Let ρ_g denote vectorized item-response probabilities containing all ρ_m ∣ cg = (ρ_{m1 ∣ cg}, …, ρ_{mr_m ∣ cg})^⊤ for m = 1, …, M, c = 1, …, C and g = 1, …, G. Each of ρ-parameters in ρ_m ∣ cg is re-parameterized by the baseline logit function π_{mk ∣ cg} = ln (ρ_{mk ∣ cg}/ρ_{mr_m ∣ cg}) for k = 1, …, r_m − 1, and let π_g denote vectorized free parameters containing all π_m ∣ cg = (π_{m1 ∣ cg}, …, π_{mr_m − 1 ∣ cg})^⊤ for m = 1, …, M, c = 1, …, C and g = 1, …, G. Further, let s(Y_g; π_g) denote score function of all free parameters π_g. Then, the element of score function s(Y_g; π_g) is obtained as for k = 1, …, r_m − 1, m = 1, …, M, c = 1, …, C, and g = 1, …, G.

Let Ψ denote vector for all free parameters α = (α₁^⊤, …, α_G^⊤)^⊤, β = (β₁^⊤, …, β_G^⊤)^⊤, and π = (π₁^⊤, …, π_G^⊤)^⊤, and let q(Ψ) be the function to transform back to the original parameters of mgLCR. Then, the Jacobian matrix for the function q(Ψ) is where J_q(α, β) is an identity matrix of size equal to the number of regression coefficients α and β. The sub-matrix of J_q(π) given in @ref(eq:Jacobian-mglcr) can be specified by ∂ρ_m ∣ cg/∂π_{m′ ∣ c′g′}, which is the matrix of size r_m × (r_m′ − 1) for m, m′ = 1, …, M; c, c′ = 1, …, C; and g, g′ = 1, …, G. The element of this sub-matrix in the kth row and the k′th column can be obtained as for k = 1, …, r_m and k′ = 1, …, r_m − 1.

Random-effect latent class analysis:

The prevalences for latent clusters, δ = (δ₁, …, δ_W)^⊤ are re-parametrized by the baseline logit function ζ_w = ln (δ_w/δ_W) for w = 1, …, W − 1. Let s(Y_g; ζ) denote score function of ζ = (ζ₁, …, ζ_W − 1)^⊤ for the gth group. Then, the wth element of s(Y_g; ζ) is obtained as for w = 1, …, W − 1, where ℒ_g is the observed-data likelihood of npLCR for the gth group given in @ref(eq:grouploglik-mLCA).

Let α and β₁ be vectorized parameters containing all coefficients of level-1 covariates, α_c ∣ w and β_1c ∣ w given in @ref(eq:MLCA-reg) for c = 1, …, C − 1 and w = 1, …, W, respectively. Further, let s(Y_g; α) and s(Y_g; β₁) denote score functions of α and β₁ for the gth group, respectively. Then, the element of s(Y_g; α) and the p × 1 sub-vector of s(Y_g; β₁) are obtained as for c = 1, …, C − 1, w = 1, …, W, and g = 1, …, G, respectively. Let β₂ denote vectorized parameters containing all coefficients of level-2 covariates, β_2c given in @ref(eq:MLCA-reg) for c = 1, …, C − 1. The q × 1 sub-vector of score function for β₂, s(Y_g; β₂) can be obtained as for c = 1, …, C − 1 and g = 1, …, G. The score functions for the free parameters of item-response probabilities are identical to mgLCR given in @ref(eq:score-pi).

Let Ψ denote vector for all free parameters ζ, α, β₁, β₂, and π, and let q(Ψ) be the function to transform back to the original parameters of npLCR. Then, the Jacobian matrix for the function q(Ψ) is $$ J_q(\boldsymbol{\Psi}) = \begin{bmatrix} J_g(\boldsymbol{\zeta}) & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & J_q(\boldsymbol{\alpha}, \boldsymbol{\beta}_1, \boldsymbol{\beta}_2) & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & J_q(\boldsymbol{\pi}) \end{bmatrix}, $$ where J(α, β₁, β₂) is an identity matrix of size equal to the number of regression coefficients given in @ref(eq:MLCA-reg). The sub-matrix of J_q(π) are identical to those given in @ref(eq:Jacobian-rho), an the elements of J_q(ζ) can be obtained by $$ \begin{aligned} \frac{\partial \delta_{w}}{\partial \zeta_{w'}} = \delta_{w} \left(I(w = w') - \delta_{w'}\right) \end{aligned} $$ for w = 1, …, W and w′ = 1, …, W − 1.