ASR026. Using SNP markers to obtain a genomic relationship matrix - Spring Barley

The complete script for this example can be downloaded here:

Dataset

The example that we will present here is based on the D022genomic dataset, and the first few rows are presented below:

lines i_11_10006 i_11_10030 i_11_10041 i_11_10043 i_11_10075 i_11_10176 i_11_10186
201157_A 0 0 0 2 2 2 2
AAPO 2 0 2 0 0 0 0
ABACUS 2 2 2 2 2 2 0
ABAVA 2 0 2 0 2 2 0
ACAPELLA 2 2 2 2 2 2 0
ACROBAT 2 2 2 2 2 2 2


Genomic relationship matrix (GRM)

In this example, we will reconstruct a genomic relationship matrix (\(\mathbf{G}\)), run some diagnostics, blend it, and finally obtain its inverse (\(\mathbf{G^{-1}}\)) using several functions of the ASRgenomics R library.


First, let’s transform the SNPs dataset into a matrix with the lines as row names, and we can see the first records as shown below.

d022genomic_matrix <- data.matrix(d022genomic[, c(2:3490)])
row.names(d022genomic_matrix) <- d022genomic$lines 
d022genomic_matrix[1:6, 1:6]
         i_11_10006 i_11_10030 i_11_10041 i_11_10043 i_11_10075 i_11_10176
201157_A          0          0          0          2          2          2
AAPO              2          0          2          0          0          0
ABACUS            2          2          2          2          2          2
ABAVA             2          0          2          0          2          2
ACAPELLA          2          2          2          2          2          2
ACROBAT           2          2          2          2          2          2


Then, we can apply some filters to the data using the qc.filtering() function:

d022genomic_matrix <- qc.filtering(M = d022genomic_matrix, maf = 0.05, marker.callrate = 0.2, 
                                   ind.callrate = 0.20, impute = FALSE, plots = FALSE)

In the above instance, the option maf = 0.05 removes SNPs with minor allele frequency inferior or equal to 0.05, the option marker.callrate = 0.2 removes SNPs with missing values superior or equal to 0.2, and the option ind.callrate = 0.20 removes individuals with a rate of missing values superior or equal to to 0.2. Here we set impute = FALSE because we do not want to impute missing values.

Initial marker matrix M contains 478 individuals and 3489 markers.
A total of 0 markers were removed because their proportion of missing values was equal or larger than 0.2.
A total of 0 individuals were removed because their proportion of missing values was equal or larger than 0.2.
A total of 0 markers were removed because their MAF was smaller than 0.05.
A total of 0 markers were removed because their heterozygosity was larger than 1.
A total of 0 markers were removed because their |F| was larger than 1.
Final cleaned marker matrix M contains 0% of missing SNPs.
Final cleaned marker matrix M contains 478 individuals and 3489 markers.

These results indicate that, in this particular case, no modification/eliminations were required on the matrix. Note that this function automatically creates the element M.clean which corresponds to the cleaned matrix.


Now, we are ready to create our \(\mathbf{G}\) matrix using the G.matrix() function:

d022_Gmatrix <- G.matrix(M = d022genomic_matrix$M.clean, method = "VanRaden")$G
d022_Gmatrix[1:6,1:6]

In this case, we used the method VanRaden which provides with the additive GRM. There are other methods available for both additive and dominance genomic matrices.


From this point, we can do a quality control of the \(\mathbf{G}\) matrix, and potentially remove some problematic data. An example of how to use the kinship.diagnostics() function is available here asr027.


To obtain the \(\mathbf{G^{-1}}\) matrix, we can use the G.inverse() function:

Ginv <- G.inverse(G = d022_Gmatrix, sparseform = TRUE)
Reciprocal conditional number for original matrix is: 2.7768272706449e-11
Reciprocal conditional number for inverted matrix is: 2.77682703953928e-11
Inverse of matrix G appears to be ill-conditioned.

In this case, the matrix is reported as ill-conditioned, so we can consider bending or blending the matrix. In this example, we will proceed to blend the matrix.


This is how we can use the G.tuneup() function to blend the matrix:

d022_Gmatrix_bl <- G.tuneup(G = d022_Gmatrix, blend = TRUE, pblend = 0.02)$Gb
Reciprocal conditional number for original matrix is: 2.7768272706449e-11
Determinant for original matrix is: 5.3821371077352e-125
Matrix was BLENDED using an identity matrix.
Reciprocal conditional number for tune-up matrix is: 6.05083845860486e-05

Here, we blended the \(\mathbf{G}\) matrix with an identity matrix. The proportion of identity matrix to blend for was set to 0.02.


Now, let’s obtain the \(\mathbf{G^{-1}}\) once more but this time using the blended matrix:

Ginv_bl <- G.inverse(G = d022_Gmatrix_bl, sparseform = TRUE)$Ginv
Reciprocal conditional number for original matrix is: 6.05083845860486e-05
Reciprocal conditional number for inverted matrix is: 5.85469923614297e-05
Inverse of matrix G does not appear to be ill-conditioned.

Here you can see that \(\mathbf{G^{-1}}\) does not appear to be ill-conditioned, so we can proceed to use this matrix for downstream analyses.