Suppose that there are G groups, and the gth group consists of ng observations for g = 1, …, G, and there are M categorical manifest items, where the mth item has rm categories for m = 1, …, M. Let Yig = (Yig1, …, YigM)⊤ and ${\bf y}_{ig} = (y_{ig1}, \ldots, y_{igM})^\top$ denote a set of item variables and their responses given by the ith individual within the gth group, respectively. The number of possible response patterns of Yig is $\prod_{m = 1}^{M}r_m$, and it is likely that most of these response patterns are sparse. The multiple-group LCA assumes that associations among manifest items can be explained by the latent classifier Lig, where Lig is the latent class variable having C categories for the ith individual within the gth group. To reflect multiple-group data structures, we discuss two different LCA approaches, namely fixed-effect and random-effect LCA.
The fixed-effect LCA can reflect group differences in latent structure by specifying an LCR model for a given subgroup. We extend the simultaneous LCA (Clogg & Goodman, 1985) by incorporating logistic regression in the class prevalence and refer it to multiple-group latent class regression (mgLCR). Let ${\bf x}_{ig}=(x_{ig1}, \ldots, x_{igp})^\top$ be a subject-specific p × 1 vector of covariates for the ith individual within the gth group, either discrete or continuous. Then, the observed-data likelihood of mgLCR for the ith individual within the gth group can be specified as where I(yigm = k) is an indicator function that is equal to 1 when the response to the mth item from the ith individual within the gth group is k and is otherwise equal to 0. The likelihood given in @ref(eq:likeli) contains two types of parameters:
The ρ-parameter is the measurement parameter in mgLCR (i.e., item-response probability), describing a tendency of individuals in a latent class c to respond to the mth item for m = 1, …, M. Comparison of estimated item-response probabilities across groups is a valuable strategy for quantifying measurement invariance because they solely determine the meaning of the latent class. By comparing the model fit with the parameter held constant across groups (i.e., ρmk ∣ c = ρmk ∣ c1 = … = ρmk ∣ cG for k = 1, …, rm, m = 1, …, M, and c = 1, …, C) against an alternative model with freely varying parameters, we obtain evidence on whether measurement invariance across groups can be assumed. As given in @ref(eq:likeli), the subject-specific covariates ${\bf x}_{ig}$ may influence the probability of the individual belonging to a specific class in the form of logistic regression as where the coefficient vector βc ∣ g = (β1c ∣ g, …, βpc ∣ g)⊤ can be interpreted as the expected change in the log odds of belonging to a class c versus belonging to the referent class C (i.e., αC ∣ g = 0 and $\boldsymbol{\beta}_{C \mid g} = {\bf 0}$ for g = 1, …, G). Then, the observed log-likelihood function for the mgLCR model can be specified as It should be noted that, similar to item-response probabilities, coefficients of logistic regression can be constrained to be equal across subgroups (i.e., βc = βc ∣ 1 = ⋯ = βc ∣ G for c = 1, …, C) to test whether the effects of covariates are identical across groups.
The random-effect LCA considers the group variation in the latent class prevalence for each group using random coefficients, for example, $$ P(L_{ig} = c) = \frac{\exp(\alpha_{c} + \sigma_{c} \lambda_{g})}{\sum_{c' = 1}^{C}\exp(\alpha_{c'} + \sigma_{c'} \lambda_{g})}, $$ where λ = (λ1, …, λG)⊤ represents group variation in the class prevalence. In the parametric random-effect LCA, the random coefficients are assumed to be derived from parametric distributions such as standard normal distribution. However, the nonparametric approach assumes no specific distribution; rather, it only assumes that random coefficients follow a specific probability mass function with some mass points. In other words, the nonparametric approach employs categorical level-2 latent variable (i.e., latent cluster) Ug whose probability mass function is P(Ug = w) = δw for w = 1, …, W. Using the classification mechanics of LCA, the latent cluster membership of level-2 units can be identified by the small number of representative patterns of class prevalences in multiple groups. Therefore, the meaning of the wth level of latent cluster variable is determined by the prevalence of latent classes P(Lig = c ∣ Ug = w) for c = 1, …, C. Considering latent cluster variable as a group variable, the nonparametric approach provides more meaningful interpretations in group comparison than parametric approach; we can examine whether the latent class structure differs across latent cluster memberships. Therefore, we focus on the nonparametric random-effect LCA, hereafter referred to as nonparametric latent class regression (npLCR).
The observed-data likelihood of npLCR for the gth group can be expressed by where ${\bf x}_{ig} = (x_{ig1}, \ldots, x_{igp})^\top$ and ${\bf z}_g=(z_{g1}, \ldots, z_{gq})^\top$ denote vectors of subject-specific (i.e., level-1) and group-specific (i.e., level-2) covariates for i = 1, …, ng and g = 1, …, G, respectively. The likelihood given in @ref(eq:grouploglik-mLCA) contains three types of parameters:
The class prevalence can be modeled using the logistic regression as where vectors β1c ∣ w = (β11c ∣ w, …, β1pc ∣ w)⊤ and β2c = (β21c, …, β2qc)⊤ are logistic regression coefficients for level-1 and level-2 covariates, respectively. Then, the observed log-likelihood of npLCR is specified as
Note that coefficients for level-1 covariates depend on both latent classes and clusters, while coefficients for level-2 covariates depend only on latent class membership. We may refer the model @ref(eq:MLCA-reg) to the random slope model as coefficients for level-1 covariates are different across latent clusters. The coefficients for level-1 covariates can be constrained to be equal across clusters (i.e., β1c = β1c ∣ 1 = ⋯ = β1c ∣ W for c = 1, …, C) to test whether the effects of level-1 covariates are identical across all latent cluster memberships. It should also be noted that the measurement invariance is assumed across latent cluster memberships in npLCR (i.e., ρmk ∣ c = ρmk ∣ c1 = … = ρmk ∣ cW for k = 1, …, rm, m = 1, …, M, and c = 1, …, C). If not, the item response probability may vary across latent cluster memberships, suggesting that the latent class structure itself is different between latent clusters. Thus, it no longer makes sense to use latent class prevalences as identifiers for the latent cluster membership.
The package glca
finds the maximum-likelihood (ML)
estimates for mgLCR and npLCR using expectation-maximization (EM)
algorithm (Dempster et al., 1977). The EM algorithm iterates two steps:
expectation step (E-step) and maximization step (M-step) in order to
find the solution maximizing the log-likelihood functions given in
@ref(eq:loglik-mgLCA) and @ref(eq:loglik-mLCA).
For mgLCR, E-step computes the posterior probabilities
with current estimates for i = 1, …, ng,
g = 1, …, G, and
c = 1, …, C. M-step
maximizes the complete-data likelihood (i.e., the likelihood for the
cross-classification by Lig and
yig)
with respect to β- and ρ-parameters. In particular, when
all values of θig(c)
are known, updated estimates for β-parameters can be calculated by
the Newton-Raphson algorithm for multinomial logistic regression given
in @ref(eq:mgLCA-reg), provided that the computational routine allows
fractional responses rather than integer counts (Bandeen-Roche et al.,
1997). Therefore, the package glca
conducts one-cycle of
Newton-Raphson algorithm to update β-parameters at every iteration in
M-step. If there is no covariate in the model, the class prevalence can
be updated directly without estimating β-parameters as $\hat{\gamma}_{c \mid g} = P(L_{ig}=c) =
\sum_{i=1}^{n_g} \theta_{ig(c)}/n_g$ for c = 1, …, C and g = 1, …, G. The
item-response probabilities, ρmk ∣ cg
can be interpreted as parameters in a multinomial distribution when
θig(c)
is known, so we have $$
\hat{\rho}_{mk \mid cg} = \frac{\sum_{i=1}^{n_g} \theta_{ig(c)}I(y_{igm}
= k)}{\sum_{i=1}^{n_g} \theta_{ig(c)}}
$$ for k = 1, …, rm,
m = 1, …, M, c = 1, …, C, and g = 1, …, G. Under the
measurement invariance assumption (i.e., ρmk ∣ c = ρmk ∣ c1 = … = ρmk ∣ cG),
the ρ-parameter will be
updated as $$
\hat{\rho}_{mk \mid c} = \frac{\sum_{g = 1}^{G} \sum_{i = 1}^{n_g}
\theta_{ig(c)}I(y_{igm} = k)}{\sum_{g = 1}^{G} \sum_{i = 1}^{n_g}
\theta_{ig(c)}}
$$ for k = 1, …, rm,
m = 1, …, M, and
c = 1, …, C.
For npLCR, E-step involves the joint posterior probability
where yg = (y1g⊤, …, yngg⊤)⊤
and xg = (x1g⊤, …, xngg⊤)⊤
are all observations for M
manifest items and p
subject-specific covariates from the gth group, respectively. As shown in
@ref(eq:full-post), computational complexity increases exponentially as
the number of individuals per group ng increases in
E-step for npLCR;
when the model has W latent
clusters and C latent classes,
the implementation of the E-step would yield dimensional complexity
proportional to W × Cng
for each group. It is not computationally feasible even with a moderate
number of individuals per group. Besides, M-step for npLCR requires only
some marginal versions of posterior probabilities rather than the joint
posterior probability given in @ref(eq:full-post). Therefore, the E-step
can be modified to alleviate the computational complexity by
accommodating the hierarchical structure of npLCR. Vermunt, 2003
proposed the upward-downward (UD) algorithm for calculating the marginal
posterior probability directly in the E-step. The UD algorithm is
similar to the forward-backward (FB) algorithm for handling hidden
Markov models (Baum et al., 1970; Juang & Rabiner, 1991). In the UD
algorithm, marginal posterior probabilities θig(w, c)
can be calculated as for i = 1, …, ng,
g = 1, …, G, c = 1, …, C, and w = 1, …, W. The marginal
posterior probability, θig(w, c)
is the product of upward probability, θg(w)
and downward probability, θig(c ∣ w).
In @ref(eq:UD), it should be noted that an individual’s class membership
is assumed to depend only on his/her observations, which can be
presented as P(Lig = c ∣ Ug = w, Yg = yg, xg, zg) = P(Lig = c ∣ Ug = w, Yig = yig, xig, zg)
for i = 1, …, ng.
Then, the upward and downward probabilities are easily calculated as
with current estimates,
respectively.
M-step maximizes the complete-data
likelihood (i.e., the likelihood for the cross-classification by Ug, Lig and
yig)
with respect to β1c ∣ w,
β2c, and
ρmk ∣ c.
In particular, when θig(w, c)
is known, updated estimates for β-parameters can be calculated by
Newton-Raphson algorithm for multinomial logistic regression given in
@ref(eq:MLCA-reg), provided that the computational routine allows
fractional responses rather than integer counts (Bandeen-Roche et al.,
1997). Therefore, the package glca
conducts one-cycle of
Newton-Raphson algorithm to update β-parameters at every iteration in
M-step. If there is no covariate in the model, the class prevalence can
be updated directly without estimating β-parameters as $\hat{\gamma}_{c \mid w} = P(L_{ig}=c \mid U_g =w) =
\sum_{g=1}^G \sum_{i=1}^{n_g} \theta_{ig(w,c)}/\sum_{g=1}^G
\theta_{g(w)}$ for c = 1, …, C and w = 1, …, W. The cluster
prevalence δw and the
item-response probabilities ρmk ∣ c
can be interpreted as parameters in multinomial distributions, so we
have for w = 1, …, W, c = 1, …, C, k = 1, …, rm,
and m = 1, …, M. The
marginal posterior probability used in @ref(eq:m-nplcr)$ are easily
obtained by $\theta_{ig(c)} = \sum_{w = 1}^{W}
\theta_{ig(w,c)}$.
Missing data occur in nearly all empirical data, despite the vigorous efforts of researchers to prevent it. Missing data cause two general problems. First, if subjects with any missing data on the variables are removed from the dataset, the sample can be very small especially when the number of missing values is large. This can lead to a great loss of information and poor statistical power. Second, frequently the subjects who provide incomplete data are different from those who provide complete data. If adjustments are not made for these differences, results may be biased.
In the case of random missing-data mechanisms (i.e., ignorable missing data) such as missing completely at random (MCAR) and missing at random (MAR) (Little & Rubin, 2019), two methods for dealing with missing data are typically available: full-information maximum likelihood (FIML) and multiple imputation (MI, Schafer, 1997). In MI plausible values are imputed multiple times in place of missing values to create multiple complete datasets. The use of MI for multiple-group LCA has an advantage in that missing data on covariates and group variable can be handled. However, the disadvantage is that LCA must be fitted separately for each imputed complete dataset, and the results must be combined to obtain the final estimates. FIML is a model-based missing data procedure where model estimates are adjusted on the basis of all of the information provided by subjects with complete data and partially complete data. Most software packages for LCA employ a FIML approach because it requires no additional input from the user other than specifying that what code is used to denote missing data. However, this approach cannot handle missing data when missingness occurs in group variable or covariates in multiple-group LCA.
The package glca
estimates model parameters using a FIML
approach when some responses are found missing on manifest items: in
E-step the missing responses are excluded from computing the posterior
probability; and in M-step the indicator I(yigm = k)
for the missing response is replaced with the updated ρ-parameter from previous iteration.
In short, the package glca
can handle any ignorable missing
data on manifest items, but individuals with missing data on group
variable or any covariate are deleted from the analysis.
However, missing values on group variable and covariates can be treated
using multiple multivariate imputation by chained equations (MICE, van
Buuren et al., 2006), which is implemented in the R package mice
(van Buuren & Groothuis-Oudshoorn, 2011). MICE imputation could be
used to create multiple sets of complete group variable and covariates
for multiple-group LCA. Each complete dataset can be analyzed using the
package glca
and combining results across imputed datasets
are easily obtained.
Since the log-likelihood of LCA may have several local-maxima
problem, the estimated parameters from EM algorithm can be deviated from
the globally optimal solution. To cope with this problem, we recommend
starting the algorithm using several different initial sets of random
values and ascertaining whether they consistently converge to the same
solution. If they converge, the solution can be considered as the ML
estimates. If not, we recommend examining the distribution of the
likelihood values and selecting the largest likelihood value, which
usually corresponds to the ML solution. The package glca
allows investigators to try different starting values either by using
random starting values or providing their own starting values. An
investigator can select the number of initial sets of random values
(default is 10) in the package glca
, and then the package
iterates EM algorithm a small number of times (default is 50) for each
set of random values. Among the initial sets of model parameters, those
producing the largest value of likelihood will be chosen for the main
iteration.
The standard error of the estimated parameters can be calculated using the observed empirical information matrix (Mclachlan & Krishnan, 2007, p. 114), $$ \boldsymbol{I}_{e}(\hat{\boldsymbol{\Psi}};\mathbf{Y}) = \sum_{g = 1}^{G}\mathbf{s}(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}})\mathbf{s}^\top(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}}), $$ where $\mathbf{s}(\mathbf{Y}_{g}; \hat{\boldsymbol{\Psi}})$ is the score function of the parameter vector Ψ for the gth group, evaluated at their MLE $\hat{\boldsymbol{\Psi}}$. In the parameter vector Ψ, all probability parameters such as δ and ρ-parameters are transformed into free parameters using baseline logit function. The variance of $\hat{\boldsymbol{\Psi}}$ can be obtained by the inverse of $\boldsymbol{I}_{e}(\hat{\boldsymbol{\Psi}};\mathbf{Y})$. However, as our target parameters are a re-parameterized version of Ψ, we should apply delta method to the variance of $\hat{\boldsymbol{\Psi}}$. Let q(Ψ) denote the original parameters of multiple-group latent class models. Then, the variance-covariance matrix of the estimates is $$ \textsf{Var}\left(q(\hat{\boldsymbol{\Psi}})\right) = J_q(\hat{\boldsymbol{\Psi}})\textsf{Var}(\hat{\boldsymbol{\Psi}})J_q(\hat{\boldsymbol{\Psi}})^\top, $$ where $J_q(\hat{\boldsymbol{\Psi}})$ is the Jacobian matrix of the function q() evaluated at the MLE of Ψ. Details of the score functions and the Jacobian matrices are provided in Appendix.
Absolute model fit refers to whether a specified multiple-group
latent class model provides an adequate representation of the data.
Typically, the analyst assesses absolute model fit by fitting a
particular model to the observed data and testing the null hypothesis
that the observed data has been produced by the fitted model. Thus, one
usually hopes to find a model for which the null hypothesis is not
rejected. This hypothesis test for LCA is based on a contingency table;
the expected cell counts are estimated according to the specified model
and its estimated parameters, then compared to the observed cell counts.
The likelihood-ratio statistic, G2 (Agresti, 2013) is
used to assess absolute model fit in the package glca.
The
G2 test statistic
is derived from the difference in the log-likelihood values between the
fitted model and the saturated model (i.e., expected and observed cell
counts, respectively), where the residual degree of freedom is
calculated by subtracting the number of parameters in the fitted model
from those in the saturated model. The number of parameters for the
saturated model is the lesser of number of possible combinations of
categorical variables and number of cases in the model.
It should be noted that the contingency table for the LCA type of
model is commonly sparse. When there are many cells containing very few
observations in the cross-classification table, the large-sample
approximation to the chi-square distribution for the G2 statistic is not
appropriate. In such case, the package glca
allows us to
conduct goodness-of-fit test using the bootstrap likelihood-ratio test
(BLRT) statistic (Langeheine et al., 1996). This approach generates
random datasets multiple times using the estimated parameters and
calculates the G2
statistic for each generated dataset. In BLRT the resulting distribution
of the G2 statistic
across the random datasets is used as the reference distribution. The
relative position of the G2 statistic obtained
from the original dataset within the reference distribution can be used
as a measure of absolute model fit. In fact, the right tail probability
of the observed G2
value is regarded as a bootstrap p-value. For example, if the
observed G2 value
falls in the uppermost tail of the reference distribution, we may
conclude that this test statistic is unlikely observed under the model
corresponding to the null hypothesis. Such finding would provide
evidence to reject the null hypothesis.
When comparing two or more groups in multiple-group latent class
model, it should be checked if the latent features are identical or not
across groups. The relative model fit refers to deciding which of two or
more models represents a better fit to a particular dataset. The
measurement invariance in multiple-group LCA can be tested by comparing
the model fits of constrained versus unconstrained model; the
unconstrained (full) model allows all parameters to vary across groups,
while the constrained (reduced) model allows only the class prevalences
to vary but item-response probabilities to be equal across groups. The
package glca
conducts the chi-square likelihood-ratio test
(LRT) to assess relative model fit by comparing two competing models for
testing measurement invariance.
Similar to the item-response probabilities, the coefficients for the
level-1 covariates can also be tested for equality across groups using
chi-square LRT in the package glca.
By comparing the fit of
reduced model with the coefficients held constant across groups (i.e.,
βc = βc ∣ 1 = ⋯ = βc ∣ G
in mgLCR and β1c = β1c ∣ 1 = ⋯ = β1c ∣ W
in npLCR for c = 1, …, C) against full
model with freely varying coefficients, we obtain evidence on whether
the effects of level-1 covariates on latent class prevalences can be
assumed to be identical across groups. To make the group comparison for
coefficients valid, the assumption of measurement invariance must be met
to ensure consistent meaning of latent classes across groups.
The deviance statistic, a test statistic of LRT for relative model fit, is obtained by twice the difference in the log-likelihood values of two competing models. The degree of freedom for deviance statistic is the difference in the number of free parameters of the two multiple-group latent class models. For example, the validity of the measurement invariance assumption can be tested by calculating the log-likelihood from the model where item-response probabilities are constrained to be equal across subgroups and comparing it with the log-likelihood from the freely estimated model.
The chi-square LRT cannot be used to compare latent class models with
a different number of latent classes or clusters because these two
models are not nested. Thus, the package glca
provides
several information criteria commonly used in LCA such as Akaike’s
information criterion (AIC), Bozdogan’s criterion (CAIC), and Schwartz’s
Bayesian information criterion (BIC) (Akaike, 1974; Bozdogan, 1987;
Schwarz, 1978) to compare the fit of non-nested competing models; the
model with a smaller AIC (or BIC) value is preferred. Another model fit
index provided by the package is entropy, which is widely used in
research practices although it can be a poor measure for model selection
as it often depends on the number of classes (Collins & Lanza,
2009). The model with relatively higher entropy value is preferred.
The package glca
also generates the empirical
distribution of the deviance statistic to help select a better model
between two non-nested competing models with a different number of
latent classes or clusters using BLRT (Langeheine et al., 1996). The
null hypothesis is the simpler model is adequate. Thus, the bootstrap
sample will be drawn from the simpler model. Using a generated bootstrap
sample, both competing models are estimated and the deviance between
these two models is calculated. By repeating this procedure multiple
times, we can construct the reference distribution of the deviance.
Similar to the bootstrap p-value for the G2 statistic, the
relative position of observed deviance within the reference distribution
presents bootstrap p-value;
the null model with a bootstrap p-value > 0.05 is preferred with
a significance level of α = 0.05. An important advantage of
using BLRT is that this method can be applied to the test for comparing
two nested latent class models even when the condition for large-sample
approximation is not satisfied. It should be noted that the optimal
model should be selected by comprehensively considering both conceptual
and analytical implications, and the quantitative goodness-of-fit
statistics.
Fixed-effect latent class analysis:
Let αg and βg be vectorized parameters containing all coefficients of αc ∣ g and βc ∣ g given in @ref(eq:mgLCA-reg) for c = 1, …, C − 1 and g = 1, …, G, respectively. Further, let s(Yg; αg) and s(Yg; βg) denote score functions of αg and βg, respectively. Then, the element of s(Yg; αg) and the p × 1 sub-vector of s(Yg; βg) are obtained as for c = 1, …, C − 1 and g = 1, …, G, respectively. Note that ℒg in @ref(eq:score-reg-mglcr) is the product of observed-data likelihoods given in @ref(eq:likeli) for all observations in the gth group (i.e., $\mathcal{L}_{g}=\prod_{i=1}^{n_g}\mathcal{L}_{ig}$).
Let ρg denote vectorized item-response probabilities containing all ρm ∣ cg = (ρm1 ∣ cg, …, ρmrm ∣ cg)⊤ for m = 1, …, M, c = 1, …, C and g = 1, …, G. Each of ρ-parameters in ρm ∣ cg is re-parameterized by the baseline logit function πmk ∣ cg = ln (ρmk ∣ cg/ρmrm ∣ cg) for k = 1, …, rm − 1, and let πg denote vectorized free parameters containing all πm ∣ cg = (πm1 ∣ cg, …, πmrm − 1 ∣ cg)⊤ for m = 1, …, M, c = 1, …, C and g = 1, …, G. Further, let s(Yg; πg) denote score function of all free parameters πg. Then, the element of score function s(Yg; πg) is obtained as for k = 1, …, rm − 1, m = 1, …, M, c = 1, …, C, and g = 1, …, G.
Let Ψ denote vector for all free parameters α = (α1⊤, …, αG⊤)⊤, β = (β1⊤, …, βG⊤)⊤, and π = (π1⊤, …, πG⊤)⊤, and let q(Ψ) be the function to transform back to the original parameters of mgLCR. Then, the Jacobian matrix for the function q(Ψ) is where Jq(α, β) is an identity matrix of size equal to the number of regression coefficients α and β. The sub-matrix of Jq(π) given in @ref(eq:Jacobian-mglcr) can be specified by ∂ρm ∣ cg/∂πm′ ∣ c′g′, which is the matrix of size rm × (rm′ − 1) for m, m′ = 1, …, M; c, c′ = 1, …, C; and g, g′ = 1, …, G. The element of this sub-matrix in the kth row and the k′th column can be obtained as for k = 1, …, rm and k′ = 1, …, rm − 1.
Random-effect latent class analysis:
The prevalences for latent clusters, δ = (δ1, …, δW)⊤ are re-parametrized by the baseline logit function ζw = ln (δw/δW) for w = 1, …, W − 1. Let s(Yg; ζ) denote score function of ζ = (ζ1, …, ζW − 1)⊤ for the gth group. Then, the wth element of s(Yg; ζ) is obtained as for w = 1, …, W − 1, where ℒg is the observed-data likelihood of npLCR for the gth group given in @ref(eq:grouploglik-mLCA).
Let α and β1 be vectorized parameters containing all coefficients of level-1 covariates, αc ∣ w and β1c ∣ w given in @ref(eq:MLCA-reg) for c = 1, …, C − 1 and w = 1, …, W, respectively. Further, let s(Yg; α) and s(Yg; β1) denote score functions of α and β1 for the gth group, respectively. Then, the element of s(Yg; α) and the p × 1 sub-vector of s(Yg; β1) are obtained as for c = 1, …, C − 1, w = 1, …, W, and g = 1, …, G, respectively. Let β2 denote vectorized parameters containing all coefficients of level-2 covariates, β2c given in @ref(eq:MLCA-reg) for c = 1, …, C − 1. The q × 1 sub-vector of score function for β2, s(Yg; β2) can be obtained as for c = 1, …, C − 1 and g = 1, …, G. The score functions for the free parameters of item-response probabilities are identical to mgLCR given in @ref(eq:score-pi).
Let Ψ denote vector for all free parameters ζ, α, β1, β2, and π, and let q(Ψ) be the function to transform back to the original parameters of npLCR. Then, the Jacobian matrix for the function q(Ψ) is $$ J_q(\boldsymbol{\Psi}) = \begin{bmatrix} J_g(\boldsymbol{\zeta}) & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & J_q(\boldsymbol{\alpha}, \boldsymbol{\beta}_1, \boldsymbol{\beta}_2) & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & J_q(\boldsymbol{\pi}) \end{bmatrix}, $$ where J(α, β1, β2) is an identity matrix of size equal to the number of regression coefficients given in @ref(eq:MLCA-reg). The sub-matrix of Jq(π) are identical to those given in @ref(eq:Jacobian-rho), an the elements of Jq(ζ) can be obtained by $$ \begin{aligned} \frac{\partial \delta_{w}}{\partial \zeta_{w'}} = \delta_{w} \left(I(w = w') - \delta_{w'}\right) \end{aligned} $$ for w = 1, …, W and w′ = 1, …, W − 1.