ASR005. Hierarchical LMM with nested factors and heterogeneous variances - Silicon wafers

The complete script for this example can be downloaded here:

Dataset

The model using the D003 dataset, and the first few rows are presented below:

source lot wafer site thick
1 1 1 1 2006
1 1 2 2 1988
1 1 3 3 2007
1 2 2 1 1987
1 2 3 2 1983
1 3 1 3 2004


Model

The model that we will fit in this example is an extension of the hierarchical LMM with nested random factors presented here: ASR004. In this case, we will extend this model to allow for some form of heterogeneous variances. The model of interest is:

\[ y = \mu + source + lot + lot:wafer + e\ \] where,

    \(y\) is the thickness of oxide layer on silicon wafers,

    \(\mu\) is the population mean,

    \(source\) is the fixed effect of source,

    \(lot\) is the random effect of lot, with \(lot \sim \mathcal{N}(0,\,\sigma^{2}_{l_s})\), where the variance is estimated under each level of source, \(s=\{1,2\}\),

    \(lot:wafer\) random effect of wafer within lot, with \(wafer \sim \mathcal{N}(0,\,\sigma^{2}_{w})\),

    \(e\) is the random residual effect, with \(e \sim \mathcal{N}(0,\,\sigma^{2}_{e})\).


Now, let’s take a look at how to write the model with ASReml-R. Note that before fitting the model, source, lot, and wafer need to be set as factors.

asr005 <- asreml(
  fixed = thick ~ source,
  random = ~at(source):lot + lot:wafer,
  residual = ~units,
  data = d003
)

The at()function is used for separating fixed or random terms into conditional subsets. In this case, the variance component of the conditional factor (e.g. lot) will be estimated for each level of the conditioning factor (i.e. source). The conditional factor must be a factor assumed fixed or random.

Note that the levels of the at() can be directly indicated by using: at(source, c(1, 2)), where in this case \(1\) and \(2\) represents the labels of the factor source. If this conditional term is part of the random model, then an equivalent alternative is to define a complex variance structure. For example, using diag(source):lot will assume a diagonal variance structure for lot where different variances will be estimated for each of the levels of source.


Exploring outputs

The statistical significance of the fixed effect of source can be evaluated with:

wald(asr005, denDF = 'numeric')$Wald
            Df denDF     F.inc           Pr
(Intercept)  1   3.8 5.914e+05 6.516776e-11
source       1   3.8 1.526e+00 2.882553e-01


And the predicted means, based on the above model, are:

predict(asr005, classify = 'source')$pvals
  source predicted.value std.error    status
1      1        1995.111  2.758078 Estimable
2      2        2005.194  7.682133 Estimable


The variance components estimated from this model are:

summary(asr005)$varcomp
                    component  std.error   z.ratio bound %ch
at(source, '1'):lot  17.07636  25.289347 0.6752393     P   0
at(source, '2'):lot 222.71180 192.714464 1.1556569     P   0
lot:wafer            35.86622  14.187862 2.5279509     P   0
units!R              12.56961   2.565778 4.8989468     P   0

Notice that the variance of lot is not equal for different levels of source. The significance of this difference can be verified comparing the residual likelihood of the current model with the equivalent model ASR004 that does not assume heterogeneous variances.