ASR018. LMM for a multi-environmental trial - Pine clones

The complete script for this example can be downloaded here:

Dataset

In this example, we will use the D021 and the D021ped datasets, which correspond to phenotypic and pedigree data of pine clones from a multi-environmental trial. A few lines of each data frame are shown below:

d021
clone	location	block	height_8	dia_8
18	Hill	2	35.9	7.00
18	Hill	4	38.3	6.60
18	Hill	6	27.7	5.85
18	Hill	8	37.0	6.40
18	Orchard	2	35.7	5.40
18	Orchard	3	37.1	6.05

d021ped
clone	dam	sire
1	0	0
2	0	0
3	0	0
4	0	0
5	0	0
6	0	0

The example asr017 illustrates the procedure to investigate each site separately, something that is required before fitting and multi-environmental trial (MET) analysis.

Model

In this example we will fit an MET model. The specification of this model is: \[ y = \mu + location + location:block + clone + location:clone + e\ \]

where,

\(y\) is the pine clones’ height at age 8,

\(\mu\) is the overall mean,

\(location\) is the fixed effect of site

\(location:block\) is the nested random effect of block within site, with \(block \sim \mathcal{N}(0,\,\sigma^{2}_{bi})\),

\(clone\) is the random effect of clone, with \(clone \sim \mathcal{N}(0,\,\sigma^{2}_{c})\)

\(location:clone\) is the random effect of the interaction of location with clone, with \(location:clone \sim \mathcal{N}(0,\,\sigma^{2}_{lc})\),

\(e\) is the random residual effect, with \(e \sim \mathcal{N}(0,\,\sigma^{2}_{ei})\).

The factors \(clone\) and \(location:clone\) are associated with the numerator relationship matrix (NRM) \(\mathbf{A}\). The letter \(i\) in \(\sigma^{2}_{bi}\) and \(\sigma^{2}_{ei}\) represent a different block effect and a different residual variance component for each location, respectively.

Now, let’s take a look at how to write the model with ASReml-R. Note that before fitting the model, location, gen, and clone need to be set as factors and the inverse of the NRM needs to be created using the pedigree information (see ASR013 for more details).

ainv<- ainverse(d021ped)

d021 <- d021[order(d021$location),]
asr018 <- asreml(
  fixed = height_8 ~ location,
  random = ~at(location):block + vm(clone, ainv) + idv(location):vm(clone, ainv),
  residual = ~dsum(~units|location),
  data = d021
)

Note that in the above model we are using at(location):block to allow for the estimation of a different block variance for each of the locations. In addition, the use of dsum() is required to request a different error variance for each of the locations. Furthermore, the above model is including the inverse of the NRM by the use of the expression vm() that stands for variance model and that links a factor to the inverse matrix.

Exploring output

The variance components are:

summary(asr018)$varcomp

                                     component std.error   z.ratio bound %ch
at(location, 'Hill'):block        1.269777e+00 0.7465103  1.700951     P 0.0
at(location, 'Orchard'):block     1.212249e+00 0.7138307  1.698230     P 0.0
at(location, 'Rail'):block        2.237795e+00 1.9372264  1.155154     P 0.0
vm(clone, ainv)                   5.739307e+00 0.9317013  6.160030     P 0.1
location:vm(clone, ainv)!location 4.536677e-07        NA        NA     B  NA
location_Hill!R                   1.227888e+01 0.6556501 18.727798     P 0.0
location_Orchard!R                1.368415e+01 0.6752724 20.264636     P 0.0
location_Rail!R                   1.312105e+01 0.9797690 13.391987     P 0.0

Note that there are differences on the variance components between the locations (particularly for the residual term). In addition, the term id(location):vm(clone, ainv) is almost zero, and therefore for this model and with this dataset we do not find any genotype-by-environment.

The heritability can be calculated by site, but in the following expression this is done across all the sites, as follow:

vpredict(asr018, h2 ~ V4/((V1+V2+V3)/3+V4+V5+(V6+V7+V8)/3))

   Estimate         SE
h2  0.28216 0.03533191

These results indicate a heritability estimate of 0.28 which signifies that the studied trait is under important genetic control. The small standard error of 0.035 associated with the heritability estimate is indicating reliable results.

The statistical significance of fixed effects can be evaluated such as:

wald(asr018, denDF = 'numeric')$Wald

            Df denDF    F.inc        Pr
(Intercept)  1  91.3 1958.000 0.0000000
location     2   7.0    1.622 0.2633317

According to these results, the fixed effects of location are not significantly different.

The random effects are:

BLUP <- as.data.frame(summary(asr018, coef = TRUE)$coef.random)
head(BLUP)

                               solution std.error    z.ratio
at(location, 'Hill'):block_1  1.1324195 0.5239748  2.1612097
at(location, 'Hill'):block_2  0.2533747 0.5135070  0.4934202
at(location, 'Hill'):block_3 -0.2799310 0.5125597 -0.5461431
at(location, 'Hill'):block_4  1.0394901 0.5101069  2.0377888
at(location, 'Hill'):block_5  0.9094417 0.5084001  1.7888309
at(location, 'Hill'):block_6 -0.6220001 0.5099794 -1.2196575

tail(BLUP)

                                       solution    std.error       z.ratio
location_Rail:vm(clone, ainv)_148 -4.283946e-08 0.0006735484 -6.360264e-05
location_Rail:vm(clone, ainv)_149 -4.746008e-08 0.0006735484 -7.046276e-05
location_Rail:vm(clone, ainv)_150 -1.665287e-07 0.0006735484 -2.472408e-04
location_Rail:vm(clone, ainv)_151 -7.264737e-08 0.0006735484 -1.078577e-04
location_Rail:vm(clone, ainv)_152 -1.338444e-07 0.0006735484 -1.987154e-04
location_Rail:vm(clone, ainv)_153  4.880225e-08 0.0006735484  7.245544e-05

Here, we can see that there are a large number of solutions, but the ones of more interest are the ones with the prefix vm(clone, ainv)_, which correspond to the breeding values.