ASR013. Exploring the options to obtain a relationship matrix based on a pedigree

The complete script for this example can be downloaded here:

Dataset

The dataset we will use here is a the synthetic pedigree D009 that contains a total of 14 dummy records:

indiv sire dam f_gen grand_sire sex
ID1 0 0 7 0 1
ID2 0 0 3 NA 2
ID3 0 0 2 0 1
ID4 * 0 2 0 2
ID5 0 ID15 0 0 1
ID6 ID1 ID2 0 0 1
ID7 ID3 ID4 0 0 2
ID8 ID3 ID4 0 0 1
ID9 ID8 ID5 0 0 1
ID10 ID6 ID7 0 ID3 1
ID11 ID6 ID7 0 ID3 2
ID12 ID8 ID7 0 ID3 2
ID13 ID9 ID11 0 ID6 2
ID14 ID15 ID15 0 0 1


The ainverse() function

Within ASReml-R, when we want to fit a linear model that includes pedigree information, we need to first obtain the inverse of the numerator relationship matrix (\(\mathbf{A^{-1}}\)). This is done using the ainverse() function that will read the pedigree information and internally create the matrix inverse (in a sparse form). Importantly, the first column of a pedigree file should contain the identifiers for the individual, followed by the male and female parents in the next two columns (in any order the parents). In the above data frame, this corresponds to the columns indiv, sire, and dam.

The ainverse() function has several available options, including fgen, gender, groups, groupOffset, selfing, inBreed, mgs, mv, psort and core_version. In this example we will describe how to use the options: fgen, selfing, mgs, mv and psort.

Let’s take a look at the first few rows of the \(\mathbf{A^{-1}}\) matrix created using our pedigree:

ainv <- ainverse(d009ped)
     Row Column Ainverse
[1,]   1      1 1.333333
[2,]   2      2 3.333333
[3,]   3      3 1.500000
[4,]   4      3 0.500000
[5,]   4      4 1.500000
[6,]   5      5 2.000000

Note that this function recognizes from the pedigree file the defaults 0 and/or NA as missing value. But for the first generation, individuals with these 0 are assumed to be the base population. Hence, in our example, the individuals included in the matrix are:

 [1] "*"    "ID15" "ID1"  "ID2"  "ID3"  "ID4"  "ID5"  "ID6"  "ID7"  "ID8"  "ID9"  "ID10" "ID11" "ID12" "ID13" "ID14"

Using the library ASRgenomics (that can be loaded using: library(ASRgenomics)), we can also reconstruct the \(\mathbf{A}\) matrix of this pedigree. In the following code, we will first transformed the \(\mathbf{A^{-1}}\) matrix into its full form, and then obtain the NRM by obtaining the inverse:

ainv.f <- ASRgenomics::sparse2full(K = ainv)
A <- G.inverse(G = ainv.f)$Ginv
        * ID15  ID1  ID2  ID3  ID4  ID5  ID6  ID7  ID8  ID9 ID10 ID11 ID12 ID13 ID14
*    1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.25 0.25 0.12 0.12 0.12 0.25 0.12 0.00
ID15 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.12 1.00
ID1  0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID2  0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID3  0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID4  0.50 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID5  0.00 0.50 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.50
ID6  0.00 0.00 0.50 0.50 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7  0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 1.00 0.50 0.25 0.50 0.50 0.75 0.38 0.00
ID8  0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.50 1.00 0.50 0.25 0.25 0.75 0.38 0.00
ID9  0.12 0.25 0.00 0.00 0.25 0.25 0.50 0.00 0.25 0.50 1.00 0.12 0.12 0.38 0.56 0.25
ID10 0.12 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 1.00 0.50 0.38 0.31 0.00
ID11 0.12 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 0.50 1.00 0.38 0.56 0.00
ID12 0.25 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.75 0.75 0.38 0.38 0.38 1.25 0.38 0.00
ID13 0.12 0.12 0.12 0.12 0.25 0.25 0.25 0.25 0.38 0.38 0.56 0.31 0.56 0.38 1.06 0.12
ID14 0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.12 1.50

This \(\mathbf{A}\) matrix describes the relationship of the 16 individuals in the pedigree, including ID15 and *, which are considered as unspecified parents, and they are been added to the pedigree, as they were used as parents of other individuals.

On the off-diagonal, we can see the proportion of shared alleles among individuals. For example, individuals ID7 shares 50% of its alleles with both ID3 and ID4 as they are its parents. Additionally, ID7 and ID8 also share 50% of their alleles together as they are full-sibs. Furthermore, we can see that ID12 shares 75% of its alleles with both its parents, ID7 and ID8. This higher percentage is due to presence of inbreeding or presence of relatedness between the its parents. Furthermore, we can see that ID14 shares 100% of its alleles with ID15 due the the fact that it is self-pollination of ID15.

On the diagonal, values usually equal \(1\) as an individual shares with itself 100% of the alleles. However, in case of inbreeding, the value will correspond to \(1 +F\), with \(F\) an inbreeding coefficient ranging between 0 and 1. This inbreeding coefficient indicates the probability of receiving the same allele from both parents at a randomly chosen locus. In the case of ID12, we can see that it has an inbreeding coefficient of 0.25, which indicates a probability of 25%; ID14 has an inbreeding coefficient of 0.5 (50%).


The mv option

It is possible to define other strings rather than the defaults of 0 and NA as missing value using the mv option.

The mv option is used such as:

ainv <- ainverse(d009ped, mv = c("ID15", "*"))

Here is the updated list of individuals from the above inverse:

 [1] "ID1"  "ID2"  "ID3"  "ID4"  "ID5"  "ID6"  "ID7"  "ID8"  "ID9"  "ID10" "ID11" "ID12" "ID13" "ID14"


The fgen option

This option is used to specify the level of inbreeding or selfing of the founder individuals. This requires a list of length two. In fgen[[1]] is name of the column of the pedigree data frame that contains the level of selfing or inbreeding. A value of 0 indicates a simple cross, a value of 1 indicates the individual is selfed once, 2 indicates it is selfed twice, and so on. If continuous values between 0 and 1 are considered here, then these will be considered as the level of inbreeding F.

In fgen[[2]] is indicated then this is level of inbreeding or selfing of implicit (not defined) individuals. By default, the function ainverse() assumes that this set of base individuals are not inbred.

This is how to used this option:

ainv <- ainverse(d009ped, mv = c( "*"), fgen = list("f_gen", 0.5)) 

And here is the new \(\mathbf{A}\) matrix obtained:

     ID15  ID1  ID2  ID3  ID4  ID5  ID6  ID7  ID8  ID9 ID10 ID11 ID12 ID13 ID14
ID15 1.50 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 0.38 0.00 0.00 0.00 0.19 1.50
ID1  0.00 1.99 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID2  0.00 0.00 1.88 0.00 0.00 0.00 0.94 0.00 0.00 0.00 0.47 0.47 0.00 0.23 0.00
ID3  0.00 0.00 0.00 1.75 0.00 0.00 0.00 0.88 0.88 0.44 0.44 0.44 0.88 0.44 0.00
ID4  0.00 0.00 0.00 0.00 1.75 0.00 0.00 0.88 0.88 0.44 0.44 0.44 0.88 0.44 0.00
ID5  0.75 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.75
ID6  0.00 1.00 0.94 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7  0.00 0.00 0.00 0.88 0.88 0.00 0.00 1.00 0.88 0.44 0.50 0.50 0.94 0.47 0.00
ID8  0.00 0.00 0.00 0.88 0.88 0.00 0.00 0.88 1.00 0.50 0.44 0.44 0.94 0.47 0.00
ID9  0.38 0.00 0.00 0.44 0.44 0.50 0.00 0.44 0.50 1.00 0.22 0.22 0.47 0.61 0.38
ID10 0.00 0.50 0.47 0.44 0.44 0.00 0.50 0.50 0.44 0.22 1.00 0.50 0.47 0.36 0.00
ID11 0.00 0.50 0.47 0.44 0.44 0.00 0.50 0.50 0.44 0.22 0.50 1.00 0.47 0.61 0.00
ID12 0.00 0.00 0.00 0.88 0.88 0.00 0.00 0.94 0.94 0.47 0.47 0.47 1.44 0.47 0.00
ID13 0.19 0.25 0.23 0.44 0.44 0.25 0.25 0.47 0.47 0.61 0.36 0.61 0.47 1.11 0.19
ID14 1.50 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 0.38 0.00 0.00 0.00 0.19 1.75

We can see that ID1, ID2, ID3, and ID4 have now different levels of inbreeding due to the assigned number of selfing from the f_gen column (fgen[[1]]). Additionally, we can see that individual ID15, that was not defined in the pedigree, has now an inbreeding coefficient of 0.5 as defined by fgen[[2]].

Tip

The level of selfing of individual D14 in the f_gen column should be defined as 0 as its selfing is already been described in the pedigree.


The selfing option

This option is used to assign the probability of selfing of an individual for which the male parent is unknown. This option assumes the order of three columns corresponds to identifiers for the individual, female and male parents.

d009ped_selfing <- d009ped[, c(1, 3, 2, 4, 5, 6)]

This is how to use this option:

ainv<- ainverse(d009ped_selfing, mv = c("*"), selfing = 0.3)

Now, let’s take a look at the new \(\mathbf{A}\) matrix:

     ID15  ID1  ID2  ID3  ID4  ID5  ID6  ID7  ID8  ID9 ID10 ID11 ID12 ID13 ID14
ID15 1.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.32 0.00 0.00 0.00 0.16 1.00
ID1  0.00 1.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID2  0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12 0.00
ID3  0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID4  0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.25 0.00
ID5  0.65 0.00 0.00 0.00 0.00 1.15 0.00 0.00 0.00 0.58 0.00 0.00 0.00 0.29 0.65
ID6  0.00 0.50 0.50 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25 0.00
ID7  0.00 0.00 0.00 0.50 0.50 0.00 0.00 1.00 0.50 0.25 0.50 0.50 0.75 0.38 0.00
ID8  0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.50 1.00 0.50 0.25 0.25 0.75 0.38 0.00
ID9  0.32 0.00 0.00 0.25 0.25 0.58 0.00 0.25 0.50 1.00 0.12 0.12 0.38 0.56 0.32
ID10 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 1.00 0.50 0.38 0.31 0.00
ID11 0.00 0.25 0.25 0.25 0.25 0.00 0.50 0.50 0.25 0.12 0.50 1.00 0.38 0.56 0.00
ID12 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.75 0.75 0.38 0.38 0.38 1.25 0.38 0.00
ID13 0.16 0.12 0.12 0.25 0.25 0.29 0.25 0.38 0.38 0.56 0.31 0.56 0.38 1.06 0.16
ID14 1.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.32 0.00 0.00 0.00 0.16 1.50

In this example, individual ID5 has only the female parent known. Now, we can see that the inbreeding level of this individual changed from 0 to 0.15.


The mgs option

This option allows to use the maternal grand-sire of the individual instead of the female parent when creating to \(\mathbf{A^{-1}}\) matrix. This option assumes the order of three columns corresponds to identifiers for the individual, male parent, and maternal grand-sire, respectively.

d009ped_mgs <- d009ped[, c(1, 2, 5, 3, 4, 6)]

Here is how to use this option:

ainv <- ainverse(d009ped_mgs, mv = "*", mgs = TRUE)

The \(\mathbf{A}\) matrix is now:

     ID15  ID1 ID2  ID3 ID4 ID5  ID6  ID7  ID8  ID9 ID10 ID11 ID12 ID13 ID14
ID15  1.0 0.00   0 0.00   0   0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.5
ID1   0.0 1.00   0 0.00   0   0 0.50 0.00 0.00 0.00 0.25 0.25 0.00 0.12  0.0
ID2   0.0 0.00   1 0.00   0   0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.0
ID3   0.0 0.00   0 1.00   0   0 0.00 0.50 0.50 0.25 0.25 0.25 0.50 0.12  0.0
ID4   0.0 0.00   0 0.00   1   0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.0
ID5   0.0 0.00   0 0.00   0   1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  0.0
ID6   0.0 0.50   0 0.00   0   0 1.00 0.00 0.00 0.00 0.50 0.50 0.00 0.25  0.0
ID7   0.0 0.00   0 0.50   0   0 0.00 1.00 0.25 0.12 0.12 0.12 0.25 0.06  0.0
ID8   0.0 0.00   0 0.50   0   0 0.00 0.25 1.00 0.50 0.12 0.12 0.62 0.25  0.0
ID9   0.0 0.00   0 0.25   0   0 0.00 0.12 0.50 1.00 0.06 0.06 0.31 0.50  0.0
ID10  0.0 0.25   0 0.25   0   0 0.50 0.12 0.12 0.06 1.00 0.31 0.12 0.16  0.0
ID11  0.0 0.25   0 0.25   0   0 0.50 0.12 0.12 0.06 0.31 1.00 0.12 0.16  0.0
ID12  0.0 0.00   0 0.50   0   0 0.00 0.25 0.62 0.31 0.12 0.12 1.12 0.16  0.0
ID13  0.0 0.12   0 0.12   0   0 0.25 0.06 0.25 0.50 0.16 0.16 0.16 1.00  0.0
ID14  0.5 0.00   0 0.00   0   0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00  1.0

The relationship matrix obtained with the mgs option will result on breeding values that have lower accuracy than with a pedigree where the female parent is known, but it will have higher accuracy than if only the male parent is known.


The psortoption

When calculating the \(\mathbf{A^{-1}}\) matrix, if necessary, ainverse() will first internally re-arrange the pedigree so that all individuals are defined and parents are listed before their offspring. The psort option returns the pedigree data frame that is in fact used by ainverse(). To illustrate this option, let’s first change the rows order of our data frame:

d009ped_psort <- d009ped[c(1, 8, 2, 3, 12, 5, 6, 14, 7, 9, 10, 11, 13),]

And then proceed to use the psort option as:

ainv_ped <- ainverse(d009ped_psort, mv = "*", psort = TRUE)
Pedigree  order (P1): Individual "ID3" moved to record 1
Pedigree  order (P2): Individual "ID7" moved to record 2
Pedigree  order (P2): Individual "ID4" moved to record 1
ainv_ped
   indiv sire  dam
1    ID4    0    0
2    ID3    0    0
3    ID7  ID3  ID4
4   ID15    0    0
5    ID1    0    0
6    ID8  ID3  ID4
7    ID2    0    0
8   ID12  ID8  ID7
9    ID5    0 ID15
10   ID6  ID1  ID2
11  ID14 ID15 ID15
12   ID9  ID8  ID5
13  ID10  ID6  ID7
14  ID11  ID6  ID7
15  ID13  ID9 ID11

Note that here, the option ainverse() does not generate the inverse of a NRM, but generates a sorted and completed pedigree file. Here we can see that ID15 has been added to the pedigree and that the row order has been re-organized so that parents are described before their offspring.