idsort | indiv | dam | sire | hatch_date | cohort | generation | sex | morph | p_orange | p_yellow | p_grey |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 203 | 0 | 0 | NA | 1 | 1 | F | OY | 13.96 | 9.69 | 76.35 |
2 | 205 | 0 | 0 | NA | 1 | 1 | F | G | 0.40 | 0.24 | 99.37 |
3 | 206 | 0 | 0 | NA | 1 | 1 | F | Y | 1.13 | 34.73 | 64.14 |
4 | 209 | 0 | 0 | NA | 1 | 1 | F | O | 57.11 | 4.20 | 38.69 |
5 | 211 | 0 | 0 | NA | 1 | 1 | F | OY | 2.66 | 6.23 | 91.11 |
6 | 218 | 0 | 0 | NA | 1 | 1 | F | O | 35.78 | 0.03 | 64.19 |
ASR011. LMM with log transformation of the response and pedigree - Tawny dragon lizards
The complete script for this example can be downloaded here:
Dataset
The dataset we will use in this example is D007, and the first few rows are presented below:
The pedigree information for this analysis is contained within the above data frame, corresponding to columns 2, 3, and 4, with indiv
representing the individuals, and dam
and sire
the female and male parents, respectively.
Model
In this example will be fitting a LMM with pedigree information using a log-transformed response variable. The specification of the model is:
\[ y = \mu + cohort + sex + gen + e \]
where:\(y\) is the proportion of yellow,
\(\mu\) is the overall mean,
\(cohort\) is the fixed effect of cohort,
\(sex\) is the fixed effect of sex,
\(gen\) is the random effect of genotype, with \(gen \sim \mathcal{N}(0,\,\sigma^{2}_{g})\),
\(e\) is the random residual effect, with \(e \sim \mathcal{N}(0,\,\sigma^{2}_{e})\).The factor \(gen\) is associated with the numerator relationship matrix (NRM) \(\mathbf{A}\).
First, let’s fit the model with ASReml-R using the original (un-transformed) response variable. Note that before fitting the model, the inverse of NRM needs to be obtained using the pedigree information with the function ainverse()
(see ASR013 for more details), and indiv
, cohort
, and sex
need to be set as factors.
<- asreml(
asr011 fixed = p_yellow ~ cohort + sex,
random = ~vm(indiv, ainv),
residual = ~units,
data = d007
)
Note that the above model is including the inverse of the NRM by the use of the expression vm()
that indicates variance model and that links a factor to the inverse matrix.
The visual diagnostics of the residuals can be assessed with:
plot(asr011)
These plots look relatively reasonable; however, several issues are visible: the distribution of the residuals is skewed to the left; there is a trend on the top-left plot, indicating that the assumption of linearity might not be entirely valid; and several potential outliers are identified.
Therefore, it is possible to consider a transformation of the response variable to improve the model fit. The logarithmic transformation will be used here according to the following expression:
-
\(y\) is the transformed response variable, and \(pyellow\) is the proportion of yellow.
This transformation is implemented in R using the following code:
$p_yellow_log <- log((d007$p_yellow + 1)/(100 - d007$p_yellow + 1))) (d007
[1] -2.14495159 -4.39763007 -0.61774663 -2.92398837 -2.57321387 -4.58526464
[7] -4.61512052 -2.58218016 -2.73599740 -1.83993632 -2.65972470 -4.56583518
[13] -4.50967080 -4.29663968 -2.64696251 -2.11296423 -2.26551431 -0.49954731
[19] -3.53497384 -4.50967080 -0.54992266 -3.10642181 -1.64226674 -0.16665919
[25] -0.63892230 1.10227578 -2.50385695 -0.69933004 -4.56583518 -3.94834994
[31] -4.31154456 -4.20469262 0.54654371 -0.69314718 -3.72143501 -0.05805552
[37] -4.59511985 -4.54676856 -0.91835046 -4.61512052 -0.60399002 -2.68732407
[43] -2.71224088 -4.55625737 -4.54676856 -4.61512052 -4.58526464 -2.56579431
[49] -1.10752103 -4.58526464 -3.19348147 -0.90397025 -2.38512914 -4.47387232
[55] -4.23909156 -2.65492228 -2.04354862 -4.30406509 -4.52805133 -4.38949865
[61] -1.94054281 -3.07577498 -3.17295958 -1.13023548 -0.43373294 -4.61512052
[67] -3.89683435 -2.89597401 -4.42241878 -4.61512052 -4.61512052 -3.99700334
[73] -1.80555279 -2.04065552 -4.59511985 -4.15838535 -4.61512052 -4.61512052
[79] -4.60507117 -2.49546844 -4.61512052 -4.51881975 -4.48270515 -4.61512052
[85] -2.17776871 -3.22501081 -4.61512052 -4.61512052 -4.55625737 -4.55625737
[91] -2.99295635 -4.61512052 -2.97371752 -4.31154456 -4.23909156 -4.60507117
[97] -4.52805133 -2.62030287
Let’s now re-fit the model using the transformed response variable with:
<- asreml(
asr011_transfo fixed = p_yellow_log ~ cohort + sex,
random = ~vm(indiv, ainv),
residual = ~idv(units),
data=d007
)
Now, the diagnostic residuals plots (below) look better: they are more closely normally distributed and there seems to be no visible outliers.
Exploring output
The statistical significance of fixed effects can be evaluated with:
wald(asr011_transfo, denDF = 'numeric')$Wald
Df denDF F.inc Pr
(Intercept) 1 30.5 204.900 4.440892e-15
cohort 2 47.7 11.840 6.670866e-05
sex 2 87.0 2.534 8.519154e-02
Here we can see that both fixed effect factors are significant.
The variance components estimated by the model are:
summary(asr011_transfo)$varcomp
component std.error z.ratio bound %ch
vm(indiv, ainv) 1.3436165 0.5465975 2.458146 P 0
units!units 0.8116911 0.3531848 2.298205 P 0
units!R 1.0000000 NA NA F 0
And, the heritability for this analysis can be obtained as:
vpredict(asr011_transfo, h2 ~ V1/(V1+V2))
Estimate SE
h2 0.623399 0.1852652
These results indicate that we have a heritability estimate of 0.62, indicating that 62% of the phenotypic variability of the throat color expression is under additive genetic control. The standard error is quite high, but this is expected given that the small dataset.
The additive genetic effects (BLUP/breeding values) are:
<- summary(asr011_transfo, coef = TRUE)$coef.random
BLUP head(BLUP)
solution std.error z.ratio
vm(indiv, ainv)_203 0.5830478 0.6491449 0.8981781
vm(indiv, ainv)_205 -0.9781981 0.6704375 -1.4590443
vm(indiv, ainv)_206 1.3325487 0.6190151 2.1526915
vm(indiv, ainv)_209 -0.1563091 0.6718978 -0.2326382
vm(indiv, ainv)_211 0.4684198 0.6579716 0.7119149
vm(indiv, ainv)_218 -1.0958538 0.7124058 -1.5382438
The predictions (means) for each individual can be calculated as shown below, but note that here the factors cohort
and sex
are ignored. Hence, in these predictions we only considered \(\hat{\mu} + \hat{BLUP}\).
$Mean <- mean(d007$p_yellow_log)
BLUP$Preds <- BLUP$Mean + BLUP$solution
BLUPhead(BLUP)
solution std.error z.ratio Mean Preds
vm(indiv, ainv)_203 0.5830478 0.6491449 0.8981781 -3.170818 -2.587770
vm(indiv, ainv)_205 -0.9781981 0.6704375 -1.4590443 -3.170818 -4.149016
vm(indiv, ainv)_206 1.3325487 0.6190151 2.1526915 -3.170818 -1.838269
vm(indiv, ainv)_209 -0.1563091 0.6718978 -0.2326382 -3.170818 -3.327127
vm(indiv, ainv)_211 0.4684198 0.6579716 0.7119149 -3.170818 -2.702398
vm(indiv, ainv)_218 -1.0958538 0.7124058 -1.5382438 -3.170818 -4.266672
In our example, these predictions are based on the log-transformed response, which are difficult to interpret. However, we can use the following equation to do the back-transformation and get the predictions on their original scale:
-
\(y\) is the back transformed prediction, and \(P\) is the prediction of the transformed data.
This back-transformation is implemented using the following code:
$BT_Preds <- ((100 + 1)*exp(BLUP$Preds) - 1)/(1 + exp(BLUP$Preds))
BLUPhead(BLUP)
tail(BLUP)
solution std.error z.ratio Mean Preds BT_Preds
vm(indiv, ainv)_203 0.5830478 0.6491449 0.8981781 -3.170818 -2.587770 6.1328250
vm(indiv, ainv)_205 -0.9781981 0.6704375 -1.4590443 -3.170818 -4.149016 0.5845491
vm(indiv, ainv)_206 1.3325487 0.6190151 2.1526915 -3.170818 -1.838269 13.0001208
vm(indiv, ainv)_209 -0.1563091 0.6718978 -0.2326382 -3.170818 -3.327127 2.5345243
vm(indiv, ainv)_211 0.4684198 0.6579716 0.7119149 -3.170818 -2.702398 5.4088621
vm(indiv, ainv)_218 -1.0958538 0.7124058 -1.5382438 -3.170818 -4.266672 0.4111005
solution std.error z.ratio Mean Preds BT_Preds
vm(indiv, ainv)_251.5 -0.01168204 0.7010365 -0.01666396 -3.170818 -3.182500 3.0626201
vm(indiv, ainv)_251.6 -0.83804347 0.6974068 -1.20165655 -3.170818 -4.008862 0.8186964
vm(indiv, ainv)_251.7 -0.80524185 0.6974068 -1.15462286 -3.170818 -3.976060 0.8782250
vm(indiv, ainv)_261.3 -0.81711852 0.6857410 -1.19158477 -3.170818 -3.987937 0.8564535
vm(indiv, ainv)_261.6 -0.08687452 0.7030779 -0.12356315 -3.170818 -3.257693 2.7792473
vm(indiv, ainv)_206.1.1 0.38657605 0.7050686 0.54828145 -3.170818 -2.784242 4.9345300
These results can be used to rank individuals and perform some selections.