indiv | sire | dam | f_gen | grand_sire | sex |
---|---|---|---|---|---|
ID1 | 0 | 0 | 7 | 0 | 1 |
ID2 | 0 | 0 | 3 | NA | 2 |
ID3 | 0 | 0 | 2 | 0 | 1 |
ID4 | * | 0 | 2 | 0 | 2 |
ID5 | 0 | ID15 | 0 | 0 | 1 |
ID6 | ID1 | ID2 | 0 | 0 | 1 |
ID7 | ID3 | ID4 | 0 | 0 | 2 |
ID8 | ID3 | ID4 | 0 | 0 | 1 |
ID9 | ID8 | ID5 | 0 | 0 | 1 |
ID10 | ID6 | ID7 | 0 | ID3 | 1 |
ID11 | ID6 | ID7 | 0 | ID3 | 2 |
ID12 | ID8 | ID7 | 0 | ID3 | 2 |
ID13 | ID9 | ID11 | 0 | ID6 | 2 |
ID14 | ID15 | ID15 | 0 | 0 | 1 |
ASR013. Exploring the options to obtain a relationship matrix based on a pedigree
The complete script for this example can be downloaded here:
Dataset
The dataset we will use here is a the synthetic pedigree D009 that contains a total of 14 dummy records:
The ainverse()
function
Within ASReml-R, when we want to fit a linear model that includes pedigree information, we need to first obtain the inverse of the numerator relationship matrix (\(\mathbf{A^{-1}}\)). This is done using the ainverse()
function that will read the pedigree information and internally create the matrix inverse (in a sparse form). Importantly, the first column of a pedigree file should contain the identifiers for the individual, followed by the male and female parents in the next two columns (in any order the parents). In the above data frame, this corresponds to the columns indiv
, sire
, and dam
.
The ainverse()
function has several available options, including fgen
, gender
, groups
, groupOffset
, selfing
, inBreed
, mgs
, mv
, psort
and core_version
. In this example we will describe how to use the options: fgen
, selfing
, mgs
, mv
and psort
.
Let’s take a look at the first few rows of the \(\mathbf{A^{-1}}\) matrix created using our pedigree:
<- ainverse(d009ped) ainv
Row Column Ainverse
[1,] 1 1 1.333333
[2,] 2 2 3.333333
[3,] 3 3 1.500000
[4,] 4 3 0.500000
[5,] 4 4 1.500000
[6,] 5 5 2.000000
Note that this function recognizes from the pedigree file the defaults 0
and/or NA
as missing value. But for the first generation, individuals with these 0
are assumed to be the base population. Hence, in our example, the individuals included in the matrix are:
[1] "*" "ID15" "ID1" "ID2" "ID3" "ID4" "ID5" "ID6" "ID7" "ID8" "ID9" "ID10" "ID11" "ID12" "ID13" "ID14"
Using the library ASRgenomics
(that can be loaded using: library(ASRgenomics)
), we can also reconstruct the \(\mathbf{A}\) matrix of this pedigree. In the following code, we will first transformed the \(\mathbf{A^{-1}}\) matrix into its full form, and then obtain the NRM by obtaining the inverse:
<- ASRgenomics::sparse2full(K = ainv)
ainv.f <- G.inverse(G = ainv.f)$Ginv A
* ID15 ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14
* 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.25 0.25 0.12 0.12 0.12 0.25 0.12 0.00
ID15 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.12 1.00
ID1 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID2 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID3 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID4 0.50 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID5 0.00 0.50 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.50
ID6 0.00 0.00 0.50 0.50 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7 0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 1.00 0.50 0.25 0.50 0.50 0.75 0.38 0.00
ID8 0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.50 1.00 0.50 0.25 0.25 0.75 0.38 0.00
ID9 0.12 0.25 0.00 0.00 0.25 0.25 0.50 0.00 0.25 0.50 1.00 0.12 0.12 0.38 0.56 0.25
ID10 0.12 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 1.00 0.50 0.38 0.31 0.00
ID11 0.12 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 0.50 1.00 0.38 0.56 0.00
ID12 0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.75 0.75 0.38 0.38 0.38 1.25 0.38 0.00
ID13 0.12 0.12 0.12 0.12 0.25 0.25 0.25 0.25 0.38 0.38 0.56 0.31 0.56 0.38 1.06 0.12
ID14 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.12 1.50
This \(\mathbf{A}\) matrix describes the relationship of the 16 individuals in the pedigree, including ID15
and *
, which are considered as unspecified parents, and they are been added to the pedigree, as they were used as parents of other individuals.
On the off-diagonal, we can see the proportion of shared alleles among individuals. For example, individuals ID7
shares 50% of its alleles with both ID3
and ID4
as they are its parents. Additionally, ID7
and ID8
also share 50% of their alleles together as they are full-sibs. Furthermore, we can see that ID12
shares 75% of its alleles with both its parents, ID7
and ID8
. This higher percentage is due to presence of inbreeding or presence of relatedness between the its parents. Furthermore, we can see that ID14
shares 100% of its alleles with ID15
due the the fact that it is self-pollination of ID15
.
On the diagonal, values usually equal \(1\) as an individual shares with itself 100% of the alleles. However, in case of inbreeding, the value will correspond to \(1 +F\), with \(F\) an inbreeding coefficient ranging between 0 and 1. This inbreeding coefficient indicates the probability of receiving the same allele from both parents at a randomly chosen locus. In the case of ID12
, we can see that it has an inbreeding coefficient of 0.25, which indicates a probability of 25%; ID14
has an inbreeding coefficient of 0.5 (50%).
The mv
option
It is possible to define other strings rather than the defaults of 0
and NA
as missing value using the mv
option.
The mv
option is used such as:
<- ainverse(d009ped, mv = c("ID15", "*")) ainv
Here is the updated list of individuals from the above inverse:
[1] "ID1" "ID2" "ID3" "ID4" "ID5" "ID6" "ID7" "ID8" "ID9" "ID10" "ID11" "ID12" "ID13" "ID14"
The fgen
option
This option is used to specify the level of inbreeding or selfing of the founder individuals. This requires a list of length two. In fgen[[1]]
is name of the column of the pedigree data frame that contains the level of selfing or inbreeding. A value of 0 indicates a simple cross, a value of 1 indicates the individual is selfed once, 2 indicates it is selfed twice, and so on. If continuous values between 0 and 1 are considered here, then these will be considered as the level of inbreeding F
.
In fgen[[2]]
is indicated then this is level of inbreeding or selfing of implicit (not defined) individuals. By default, the function ainverse()
assumes that this set of base individuals are not inbred.
This is how to used this option:
<- ainverse(d009ped, mv = c( "*"), fgen = list("f_gen", 0.5)) ainv
And here is the new \(\mathbf{A}\) matrix obtained:
ID15 ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14
ID15 1.50 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 0.38 0.00 0.00 0.00 0.19 1.50
ID1 0.00 1.99 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID2 0.00 0.00 1.88 0.00 0.00 0.00 0.94 0.00 0.00 0.00 0.47 0.47 0.00 0.23 0.00
ID3 0.00 0.00 0.00 1.75 0.00 0.00 0.00 0.88 0.88 0.44 0.44 0.44 0.88 0.44 0.00
ID4 0.00 0.00 0.00 0.00 1.75 0.00 0.00 0.88 0.88 0.44 0.44 0.44 0.88 0.44 0.00
ID5 0.75 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.75
ID6 0.00 1.00 0.94 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7 0.00 0.00 0.00 0.88 0.88 0.00 0.00 1.00 0.88 0.44 0.50 0.50 0.94 0.47 0.00
ID8 0.00 0.00 0.00 0.88 0.88 0.00 0.00 0.88 1.00 0.50 0.44 0.44 0.94 0.47 0.00
ID9 0.38 0.00 0.00 0.44 0.44 0.50 0.00 0.44 0.50 1.00 0.22 0.22 0.47 0.61 0.38
ID10 0.00 0.50 0.47 0.44 0.44 0.00 0.50 0.50 0.44 0.22 1.00 0.50 0.47 0.36 0.00
ID11 0.00 0.50 0.47 0.44 0.44 0.00 0.50 0.50 0.44 0.22 0.50 1.00 0.47 0.61 0.00
ID12 0.00 0.00 0.00 0.88 0.88 0.00 0.00 0.94 0.94 0.47 0.47 0.47 1.44 0.47 0.00
ID13 0.19 0.25 0.23 0.44 0.44 0.25 0.25 0.47 0.47 0.61 0.36 0.61 0.47 1.11 0.19
ID14 1.50 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 0.38 0.00 0.00 0.00 0.19 1.75
We can see that ID1
, ID2
, ID3
, and ID4
have now different levels of inbreeding due to the assigned number of selfing from the f_gen
column (fgen[[1]]
). Additionally, we can see that individual ID15
, that was not defined in the pedigree, has now an inbreeding coefficient of 0.5 as defined by fgen[[2]]
.
Tip
The level of selfing of individual D14 in the
f_gen
column should be defined as 0 as its selfing is already been described in the pedigree.
The selfing
option
This option is used to assign the probability of selfing of an individual for which the male parent is unknown. This option assumes the order of three columns corresponds to identifiers for the individual, female and male parents.
<- d009ped[, c(1, 3, 2, 4, 5, 6)] d009ped_selfing
This is how to use this option:
<- ainverse(d009ped_selfing, mv = c("*"), selfing = 0.3) ainv
Now, let’s take a look at the new \(\mathbf{A}\) matrix:
ID15 ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14
ID15 1.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.32 0.00 0.00 0.00 0.16 1.00
ID1 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID2 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID3 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID4 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID5 0.65 0.00 0.00 0.00 0.00 1.15 0.00 0.00 0.00 0.58 0.00 0.00 0.00 0.29 0.65
ID6 0.00 0.50 0.50 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7 0.00 0.00 0.00 0.50 0.50 0.00 0.00 1.00 0.50 0.25 0.50 0.50 0.75 0.38 0.00
ID8 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.50 1.00 0.50 0.25 0.25 0.75 0.38 0.00
ID9 0.32 0.00 0.00 0.25 0.25 0.58 0.00 0.25 0.50 1.00 0.12 0.12 0.38 0.56 0.32
ID10 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 1.00 0.50 0.38 0.31 0.00
ID11 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 0.50 1.00 0.38 0.56 0.00
ID12 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.75 0.75 0.38 0.38 0.38 1.25 0.38 0.00
ID13 0.16 0.12 0.12 0.25 0.25 0.29 0.25 0.38 0.38 0.56 0.31 0.56 0.38 1.06 0.16
ID14 1.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.32 0.00 0.00 0.00 0.16 1.50
In this example, individual ID5
has only the female parent known. Now, we can see that the inbreeding level of this individual changed from 0 to 0.15.
The mgs
option
This option allows to use the maternal grand-sire of the individual instead of the female parent when creating to \(\mathbf{A^{-1}}\) matrix. This option assumes the order of three columns corresponds to identifiers for the individual, male parent, and maternal grand-sire, respectively.
<- d009ped[, c(1, 2, 5, 3, 4, 6)] d009ped_mgs
Here is how to use this option:
<- ainverse(d009ped_mgs, mv = "*", mgs = TRUE) ainv
The \(\mathbf{A}\) matrix is now:
ID15 ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14
ID15 1.0 0.00 0 0.00 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.5
ID1 0.0 1.00 0 0.00 0 0 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.0
ID2 0.0 0.00 1 0.00 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
ID3 0.0 0.00 0 1.00 0 0 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.12 0.0
ID4 0.0 0.00 0 0.00 1 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
ID5 0.0 0.00 0 0.00 0 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
ID6 0.0 0.50 0 0.00 0 0 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.0
ID7 0.0 0.00 0 0.50 0 0 0.00 1.00 0.25 0.12 0.12 0.12 0.25 0.06 0.0
ID8 0.0 0.00 0 0.50 0 0 0.00 0.25 1.00 0.50 0.12 0.12 0.62 0.25 0.0
ID9 0.0 0.00 0 0.25 0 0 0.00 0.12 0.50 1.00 0.06 0.06 0.31 0.50 0.0
ID10 0.0 0.25 0 0.25 0 0 0.50 0.12 0.12 0.06 1.00 0.31 0.12 0.16 0.0
ID11 0.0 0.25 0 0.25 0 0 0.50 0.12 0.12 0.06 0.31 1.00 0.12 0.16 0.0
ID12 0.0 0.00 0 0.50 0 0 0.00 0.25 0.62 0.31 0.12 0.12 1.12 0.16 0.0
ID13 0.0 0.12 0 0.12 0 0 0.25 0.06 0.25 0.50 0.16 0.16 0.16 1.00 0.0
ID14 0.5 0.00 0 0.00 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.0
The relationship matrix obtained with the mgs
option will result on breeding values that have lower accuracy than with a pedigree where the female parent is known, but it will have higher accuracy than if only the male parent is known.
The psort
option
When calculating the \(\mathbf{A^{-1}}\) matrix, if necessary, ainverse()
will first internally re-arrange the pedigree so that all individuals are defined and parents are listed before their offspring. The psort
option returns the pedigree data frame that is in fact used by ainverse()
. To illustrate this option, let’s first change the rows order of our data frame:
<- d009ped[c(1, 8, 2, 3, 12, 5, 6, 14, 7, 9, 10, 11, 13),] d009ped_psort
And then proceed to use the psort
option as:
<- ainverse(d009ped_psort, mv = "*", psort = TRUE) ainv_ped
Pedigree order (P1): Individual "ID3" moved to record 1
Pedigree order (P2): Individual "ID7" moved to record 2
Pedigree order (P2): Individual "ID4" moved to record 1
ainv_ped
indiv sire dam
1 ID4 0 0
2 ID3 0 0
3 ID7 ID3 ID4
4 ID15 0 0
5 ID1 0 0
6 ID8 ID3 ID4
7 ID2 0 0
8 ID12 ID8 ID7
9 ID5 0 ID15
10 ID6 ID1 ID2
11 ID14 ID15 ID15
12 ID9 ID8 ID5
13 ID10 ID6 ID7
14 ID11 ID6 ID7
15 ID13 ID9 ID11
Note that here, the option ainverse()
does not generate the inverse of a NRM, but generates a sorted and completed pedigree file. Here we can see that ID15
has been added to the pedigree and that the row order has been re-organized so that parents are described before their offspring.