ASR025. LMM with correlated residuals and model comparison using the LRT - Twins’ IQ

The complete script for this example can be downloaded here:

Dataset

In this example we will use the D002 dataset, and the first few rows are presented below:

pair twin iq
112 A 113
112 B 109
114 A 94
114 B 100
126 A 99
126 B 86


Models

First, we want to fit a LMM with correlated residuals that follows the following definitions:

\[ y = \mu + twin + e\ \] where,

    \(y\) is the IQ score,

    \(\mu\) is the overall mean,

    \(twin\) is the fixed effect of twin,

    \(e\) is the random residual effect, with \(e \sim \mathcal{N}(0,\mathbf{R})\).

Here, we consider the definition of the residual matrix as \(\mathbf{R} = \mathbf{I_p} \otimes \mathbf{G}\), where \(\mathbf{I_p}\) is an identity matrix of dimension \(p\) (the number of levels in the factor pair), and \(\mathbf{G}\) is a variance-covariance matrix of dimension 2 \(\times\) 2, relating to a single variance and a common correlation between the two levels of the factor twin.

Note that this model is an alternative of the ASR003 model by including the correlated residual structure between pairs, rather than consider this as a random effect.


Furthermore, we are also interested to find out if this model is significantly better than the simpler model below that assumes residuals to be independent:

\[ y = \mu + twin + e\ \] where,

    \(y\) is the IQ score,

    \(\mu\) is the overall mean,

    \(twin\) is the fixed effect of twin, and

    \(e\) is the random residual effect, with \(e \sim \mathcal{N}(0,\mathbf{R})\).

Here, the residual matrix \(\mathbf{R} = \mathbf{I_n}\), where \(\mathbf{I_n}\) is an identity matrix of dimension \(n \times n\) (the number of observations in the dataset).


Now, let’s take a look at how to write both of the above models with ASReml-R. Note that before model fitting, both twin and pair need to be set as factors.

Correlated residuals model (MC)

asr025_MC <- asreml(
  fixed = iq ~ twin,
  residual = ~id(pair):corv(twin),
  data = d002
)
ASReml Version 4.2 24/12/2024 07:08:17
          LogLik        Sigma2     DF     wall
 1     -205.2791           1.0     62   07:08:17  (  1 restrained)
 2     -193.6949           1.0     62   07:08:17  (  1 restrained)
 3     -186.4571           1.0     62   07:08:17
 4     -184.0584           1.0     62   07:08:17
 5     -183.6593           1.0     62   07:08:17
 6     -183.6503           1.0     62   07:08:17

As described before, the residuals variance structure is described by the Kronecker product \(\otimes\) (or direct product) of two variance structures: one associated with independence between pair, defined with id(), and the other modelling a correlation (and a variance) between residuals of the same twin pair, defined with corv(). It is important to note that some of these variance components have natural constraints: correlations are bounded between -1 and 1, and variances can only be positive.

Independent residuals model (MI)

asr025_MI <- asreml(
  fixed = iq ~ twin,
  residual = ~id(pair):idv(twin),
  data = d002
)
ASReml Version 4.2 24/12/2024 07:08:17
          LogLik        Sigma2     DF     wall
 1     -699.9008           1.0     62   07:08:17
 2     -564.0352           1.0     62   07:08:17
 3     -411.8756           1.0     62   07:08:17
 4     -301.1331           1.0     62   07:08:17
 5     -234.6199           1.0     62   07:08:17
 6     -209.0335           1.0     62   07:08:17
 7     -201.6120           1.0     62   07:08:17
 8     -200.4367           1.0     62   07:08:17
 9     -200.3853           1.0     62   07:08:17
10     -200.3851           1.0     62   07:08:17

This model defines the independence as the direct product of two identity matrices, id() and idv(), but this is equivalent to use residual = ~idv(units), where units corresponds to an indicator from 1 to \(n\) for all observations in the dataset.

Exploring output

To investigate weather the correlated model (MC) is a statistically significant improvement over the independent model (MI), we can use the likelihood ratio test (LRT) that will compare the REML log-likelihood values of the two model using the expression:

\[ d = -2 (logL_{MI} - logL_{MC}) \]

where,

    \(d\) is the deviance statistic that follows an approximated \({\chi}^2\) distribution with degrees of freedom equal to the difference in the number of estimated variance components between both models (in this example, 1 degree of freedom),

    \(logL_{MI}\) is the log-likelihood of the simpler model (independent), and

    \(logL_{MC}\) is the log-likelihood of the more complex model (correlated).

Note that the LRT is only valid for nested models that have the same fixed effect terms.


Here is how we perform the LRT directly using ASReml-R:

lrt(asr025_MI, asr025_MC, boundary = FALSE)
Likelihood ratio test(s) assuming nested random models.

                    df LR-statistic Pr(Chisq)    
asr025_MC/asr025_MI  1        33.47 7.238e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to this test, the \(d\) statistic is 33.47 with a p-value < 0.0001, indicating that the correlated model is a statistically significant different from the independent model, and therefore an improved model to be used to describe this data. Note that the option boundary = FALSE is indicating that this is a two-sided test, hence, no boundaries on the variance components to be tested with this hypothesis.


The variance component estimates associated with the correlated model are:

summary(asr025_MC)$varcomp
                     component   std.error   z.ratio bound %ch
pair:twin!R          1.0000000          NA        NA     F 0.0
pair:twin!twin!cor   0.8125831  0.06100087 13.320844     U 0.0
pair:twin!twin!var 211.0800432 48.79872053  4.325524     P 0.1

According to these results, the error variance is 211.08 and the correlation between residual errors of the same twin pair is 0.81. Both reported z-ratios are much larger than 1.96, indicating that the variances are highly likely to be significant.


More information

A more detailed version of this example with farther discussions is available in this blog