lines | i_11_10006 | i_11_10030 | i_11_10041 | i_11_10043 | i_11_10075 | i_11_10176 | i_11_10186 |
---|---|---|---|---|---|---|---|

201157_A | 0 | 0 | 0 | 2 | 2 | 2 | 2 |

AAPO | 2 | 0 | 2 | 0 | 0 | 0 | 0 |

ABACUS | 2 | 2 | 2 | 2 | 2 | 2 | 0 |

ABAVA | 2 | 0 | 2 | 0 | 2 | 2 | 0 |

ACAPELLA | 2 | 2 | 2 | 2 | 2 | 2 | 0 |

ACROBAT | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

# ASR026. Using SNP markers to obtain a genomic relationship matrix - Spring Barley

The complete script for this example can be downloaded here:

### Dataset

The example that we will present here is based on the D022genomic dataset, and the first few rows are presented below:

### Genomic relationship matrix (GRM)

In this example, we will reconstruct a genomic relationship matrix (\(\mathbf{G}\)), run some diagnostics, blend it, and finally obtain its inverse (\(\mathbf{G^{-1}}\)) using several functions of the `ASRgenomics`

R library.

First, let’s transform the SNPs dataset into a matrix with the lines as row names, and we can see the first records as shown below.

```
<- data.matrix(d022genomic[, c(2:3490)])
d022genomic_matrix row.names(d022genomic_matrix) <- d022genomic$lines
1:6, 1:6] d022genomic_matrix[
```

```
i_11_10006 i_11_10030 i_11_10041 i_11_10043 i_11_10075 i_11_10176
201157_A 0 0 0 2 2 2
AAPO 2 0 2 0 0 0
ABACUS 2 2 2 2 2 2
ABAVA 2 0 2 0 2 2
ACAPELLA 2 2 2 2 2 2
ACROBAT 2 2 2 2 2 2
```

Then, we can apply some filters to the data using the `qc.filtering()`

function:

```
<- qc.filtering(M = d022genomic_matrix, maf = 0.05, marker.callrate = 0.2,
d022genomic_matrix ind.callrate = 0.20, impute = FALSE, plots = FALSE)
```

In the above instance, the option `maf = 0.05`

removes SNPs with minor allele frequency inferior or equal to 0.05, the option `marker.callrate = 0.2`

removes SNPs with missing values superior or equal to 0.2, and the option `ind.callrate = 0.20`

removes individuals with a rate of missing values superior or equal to to 0.2. Here we set `impute = FALSE`

because we do not want to impute missing values.

`Initial marker matrix M contains 478 individuals and 3489 markers.`

`A total of 0 markers were removed because their proportion of missing values was equal or larger than 0.2.`

`A total of 0 individuals were removed because their proportion of missing values was equal or larger than 0.2.`

`A total of 0 markers were removed because their MAF was smaller than 0.05.`

`A total of 0 markers were removed because their heterozygosity was larger than 1.`

`A total of 0 markers were removed because their |F| was larger than 1.`

`Final cleaned marker matrix M contains 0% of missing SNPs.`

`Final cleaned marker matrix M contains 478 individuals and 3489 markers.`

These results indicate that, in this particular case, no modification/eliminations were required on the matrix. Note that this function automatically creates the element `M.clean`

which corresponds to the cleaned matrix.

Now, we are ready to create our \(\mathbf{G}\) matrix using the `G.matrix()`

function:

```
<- G.matrix(M = d022genomic_matrix$M.clean, method = "VanRaden")$G
d022_Gmatrix 1:6,1:6] d022_Gmatrix[
```

In this case, we used the method `VanRaden`

which provides with the additive GRM. There are other methods available for both additive and dominance genomic matrices.

From this point, we can do a quality control of the \(\mathbf{G}\) matrix, and potentially remove some problematic data. An example of how to use the `kinship.diagnostics()`

function is available here asr027.

To obtain the \(\mathbf{G^{-1}}\) matrix, we can use the `G.inverse()`

function:

`<- G.inverse(G = d022_Gmatrix, sparseform = TRUE) Ginv `

`Reciprocal conditional number for original matrix is: 2.7768272706449e-11`

`Reciprocal conditional number for inverted matrix is: 2.77682703953928e-11`

`Inverse of matrix G appears to be ill-conditioned.`

In this case, the matrix is reported as ill-conditioned, so we can consider bending or blending the matrix. In this example, we will proceed to blend the matrix.

This is how we can use the `G.tuneup()`

function to blend the matrix:

`<- G.tuneup(G = d022_Gmatrix, blend = TRUE, pblend = 0.02)$Gb d022_Gmatrix_bl `

`Reciprocal conditional number for original matrix is: 2.7768272706449e-11`

`Determinant for original matrix is: 5.3821371077352e-125`

`Matrix was BLENDED using an identity matrix.`

`Reciprocal conditional number for tune-up matrix is: 6.05083845860486e-05`

Here, we blended the \(\mathbf{G}\) matrix with an identity matrix. The proportion of identity matrix to blend for was set to 0.02.

Now, let’s obtain the \(\mathbf{G^{-1}}\) once more but this time using the blended matrix:

`<- G.inverse(G = d022_Gmatrix_bl, sparseform = TRUE)$Ginv Ginv_bl `

`Reciprocal conditional number for original matrix is: 6.05083845860486e-05`

`Reciprocal conditional number for inverted matrix is: 5.85469923614297e-05`

`Inverse of matrix G does not appear to be ill-conditioned.`

Here you can see that \(\mathbf{G^{-1}}\) does not appear to be ill-conditioned, so we can proceed to use this matrix for downstream analyses.